No. 012

Claude Fable/Mythos, Leiden Declaration, Frontiers editor resignation

June 15, 2026

Three things happened this week that belong together. Anthropic shipped research capability as a default feature of its flagship model, not as a specialized product, then pulled the model from public availability. A hundred and thirty mathematicians signed a declaration on protecting proof and attribution from AI. And an editor resigned from Frontiers because the publisher's AI was overriding his reviewer selections. Capability, governance, and infrastructure are moving at three different speeds, and the gap between the first and the other two is widening.

Capability & Evaluation

Claude Fable 5 and Claude Mythos 5

Anthropic, June 9 2026

First general-purpose flagship model to position research capability as a default feature rather than a specialized product, a phase transition from GPT-Rosalind (domain-specific) and Gemini for Science (tools suite) toward research as table stakes for any frontier model. Anthropic subsequently pulled Fable from public availability.
ErdosBench: A Research-Mathematics Benchmark Built from Synthetic Erdos-Style Problems

ulam.ai, June 2026

226 problems derived from open Erdos problems, designed to evaluate research-level mathematical reasoning after standard benchmarks became saturated, part of the emerging evaluation infrastructure alongside the sum-product and unit-distance disproofs from editions 008 and 011.
Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence

arXiv, May 31 2026

Category-theory framework from MIT that formally distinguishes retrieval, search, and discovery for AI-driven science, defining discovery as "verified regime transition" where old artifacts are preserved, the kind of formal foundation the governance conversation needs.

Peer Review & Infrastructure

Frontiers in Systems Neuroscience associate editor resigns over AI editorial automation

Michael Okun (UCL), Bluesky, June 2026

Public resignation as associate editor after the publisher's AI began auto-inviting unqualified reviewers and revoking invitations to qualified experts he had manually selected, first-person testimony of editorial AI overriding human judgment at a major publisher.
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning

arXiv (ICML 2026 oral), June 2026

Position paper arguing AI-assisted peer review must become an urgent research priority, with LLMs as collaborators for authors, reviewers, and area chairs rather than replacements, a constructive corollary to the Okun resignation and earlier coverage of the AAAI-26 AI review pilot.
A reporting checklist for large language models in behavioural science

Nature Human Behaviour, June 2026

GUIDE-LLM, a consensus-based reporting checklist for LLM research in behavioral and social science, addressing transparency, reproducibility, and ethical accountability as LLMs become both research tools and objects of study.
Building Scholar-Ready AI: A Conversation with Todd Toler

The Scholarly Kitchen, June 10 2026

Toler argues the central problem is preserving a "trustworthy chain of custody" when RAG systems chunk articles for retrieval, detaching provenance, peer-review status, and citation relationships from the content, infrastructure work that directly extends the metadata gap identified in edition 010.
We Can Create the Future of Science Right Now

Good Science (Stuart Buck), June 8 2026

Inventories existing AI tools across six layers of a hypothetical "living evidence synthesis" system, from document understanding to forensic audit to reproducibility checks, arguing the pieces exist but remain unconnected.

Research Culture

The Leiden Declaration on Artificial Intelligence and Mathematics

June 2026

Sixteen mathematicians, backed by the International Mathematical Union and 130+ signatories including Fields Medalist Peter Scholze, issue recommendations on protecting proof reliability, attribution, and peer review from AI risks, a direct community response to this season's AI-disproved-conjectures results.
Homogenizing effect of large language models (LLMs) on creative diversity: An empirical comparison of human and ChatGPT writing

Computers in Human Behavior: Artificial Humans, 2025

Empirical study finding LLM-assisted writing narrows creative diversity compared to human-only writing, the same narrowing pattern that Tang and Yang documented on the agent side in edition 010, now attested on the human-with-LLM side.
The social consequences of AI delegation

arXiv, June 9 2026

Reframes the LLMs-in-research question from "can LLMs replace participants" to "are humans using LLMs as surrogates for their own deliberation" across healthcare, legal, education, and research domains, shifting the unit of analysis from authorship to the decision itself.

Capability & Evaluation

Peer Review & Infrastructure

Research Culture

Get it in your inbox