No. 012
Claude Fable/Mythos, Leiden Declaration, Frontiers editor resignation
Three things happened this week that belong together. Anthropic shipped research capability as a default feature of its flagship model, not as a specialized product, then pulled the model from public availability. A hundred and thirty mathematicians signed a declaration on protecting proof and attribution from AI. And an editor resigned from Frontiers because the publisher's AI was overriding his reviewer selections. Capability, governance, and infrastructure are moving at three different speeds, and the gap between the first and the other two is widening.
Capability & Evaluation
-
Claude Fable 5 and Claude Mythos 5
Anthropic, June 9 2026
First general-purpose flagship model to position research capability as a default feature rather than a specialized product, a phase transition from GPT-Rosalind (domain-specific) and Gemini for Science (tools suite) toward research as table stakes for any frontier model. Anthropic subsequently pulled Fable from public availability.
-
ErdosBench: A Research-Mathematics Benchmark Built from Synthetic Erdos-Style Problems
ulam.ai, June 2026
226 problems derived from open Erdos problems, designed to evaluate research-level mathematical reasoning after standard benchmarks became saturated, part of the emerging evaluation infrastructure alongside the sum-product and unit-distance disproofs from editions 008 and 011.
-
arXiv, May 31 2026
Category-theory framework from MIT that formally distinguishes retrieval, search, and discovery for AI-driven science, defining discovery as "verified regime transition" where old artifacts are preserved, the kind of formal foundation the governance conversation needs.
Peer Review & Infrastructure
-
Frontiers in Systems Neuroscience associate editor resigns over AI editorial automation
Michael Okun (UCL), Bluesky, June 2026
Public resignation as associate editor after the publisher's AI began auto-inviting unqualified reviewers and revoking invitations to qualified experts he had manually selected, first-person testimony of editorial AI overriding human judgment at a major publisher.
-
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning
arXiv (ICML 2026 oral), June 2026
Position paper arguing AI-assisted peer review must become an urgent research priority, with LLMs as collaborators for authors, reviewers, and area chairs rather than replacements, a constructive corollary to the Okun resignation and earlier coverage of the AAAI-26 AI review pilot.
-
A reporting checklist for large language models in behavioural science
Nature Human Behaviour, June 2026
GUIDE-LLM, a consensus-based reporting checklist for LLM research in behavioral and social science, addressing transparency, reproducibility, and ethical accountability as LLMs become both research tools and objects of study.
-
Building Scholar-Ready AI: A Conversation with Todd Toler
The Scholarly Kitchen, June 10 2026
Toler argues the central problem is preserving a "trustworthy chain of custody" when RAG systems chunk articles for retrieval, detaching provenance, peer-review status, and citation relationships from the content, infrastructure work that directly extends the metadata gap identified in edition 010.
-
We Can Create the Future of Science Right Now
Good Science (Stuart Buck), June 8 2026
Inventories existing AI tools across six layers of a hypothetical "living evidence synthesis" system, from document understanding to forensic audit to reproducibility checks, arguing the pieces exist but remain unconnected.
Research Culture
-
The Leiden Declaration on Artificial Intelligence and Mathematics
June 2026
Sixteen mathematicians, backed by the International Mathematical Union and 130+ signatories including Fields Medalist Peter Scholze, issue recommendations on protecting proof reliability, attribution, and peer review from AI risks, a direct community response to this season's AI-disproved-conjectures results.
-
Computers in Human Behavior: Artificial Humans, 2025
Empirical study finding LLM-assisted writing narrows creative diversity compared to human-only writing, the same narrowing pattern that Tang and Yang documented on the agent side in edition 010, now attested on the human-with-LLM side.
-
The social consequences of AI delegation
arXiv, June 9 2026
Reframes the LLMs-in-research question from "can LLMs replace participants" to "are humans using LLMs as surrogates for their own deliberation" across healthcare, legal, education, and research domains, shifting the unit of analysis from authorship to the decision itself.