No. 012

Claude Fable/Mythos, Leiden Declaration, Frontiers editor resignation

Three things happened this week that belong together. Anthropic shipped research capability as a default feature of its flagship model, not as a specialized product, then pulled the model from public availability. A hundred and thirty mathematicians signed a declaration on protecting proof and attribution from AI. And an editor resigned from Frontiers because the publisher's AI was overriding his reviewer selections. Capability, governance, and infrastructure are moving at three different speeds, and the gap between the first and the other two is widening.

Capability & Evaluation

  • Claude Fable 5 and Claude Mythos 5

    Anthropic, June 9 2026

    First general-purpose flagship model to position research capability as a default feature rather than a specialized product, a phase transition from GPT-Rosalind (domain-specific) and Gemini for Science (tools suite) toward research as table stakes for any frontier model. Anthropic subsequently pulled Fable from public availability.

  • ErdosBench: A Research-Mathematics Benchmark Built from Synthetic Erdos-Style Problems

    ulam.ai, June 2026

    226 problems derived from open Erdos problems, designed to evaluate research-level mathematical reasoning after standard benchmarks became saturated, part of the emerging evaluation infrastructure alongside the sum-product and unit-distance disproofs from editions 008 and 011.

  • Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence

    arXiv, May 31 2026

    Category-theory framework from MIT that formally distinguishes retrieval, search, and discovery for AI-driven science, defining discovery as "verified regime transition" where old artifacts are preserved, the kind of formal foundation the governance conversation needs.

Peer Review & Infrastructure

  • Frontiers in Systems Neuroscience associate editor resigns over AI editorial automation

    Michael Okun (UCL), Bluesky, June 2026

    Public resignation as associate editor after the publisher's AI began auto-inviting unqualified reviewers and revoking invitations to qualified experts he had manually selected, first-person testimony of editorial AI overriding human judgment at a major publisher.

  • The AI Imperative: Scaling High-Quality Peer Review in Machine Learning

    arXiv (ICML 2026 oral), June 2026

    Position paper arguing AI-assisted peer review must become an urgent research priority, with LLMs as collaborators for authors, reviewers, and area chairs rather than replacements, a constructive corollary to the Okun resignation and earlier coverage of the AAAI-26 AI review pilot.

  • A reporting checklist for large language models in behavioural science

    Nature Human Behaviour, June 2026

    GUIDE-LLM, a consensus-based reporting checklist for LLM research in behavioral and social science, addressing transparency, reproducibility, and ethical accountability as LLMs become both research tools and objects of study.

  • Building Scholar-Ready AI: A Conversation with Todd Toler

    The Scholarly Kitchen, June 10 2026

    Toler argues the central problem is preserving a "trustworthy chain of custody" when RAG systems chunk articles for retrieval, detaching provenance, peer-review status, and citation relationships from the content, infrastructure work that directly extends the metadata gap identified in edition 010.

  • We Can Create the Future of Science Right Now

    Good Science (Stuart Buck), June 8 2026

    Inventories existing AI tools across six layers of a hypothetical "living evidence synthesis" system, from document understanding to forensic audit to reproducibility checks, arguing the pieces exist but remain unconnected.

Research Culture