No. 002

Hallucinated citations at scale, ICML catches AI reviewers, scholarly infrastructure gaps

The Scientific Record

  • Hallucinated citations are polluting the scientific literature

    Nature, April 1 2026

    A Nature analysis estimates that over 110,000 of the roughly 7 million scholarly publications from 2025 contain AI-hallucinated references. In computer science, 2.6% of papers accepted at three major conferences in 2025 had at least one fabricated citation, up from 0.3% in 2024. The problem is no longer anecdotal.

  • Half of social-science studies fail replication test in years-long project

    Nature, April 1 2026

    The DARPA-funded SCORE project replicated 164 studies across 12 social science fields and found a 49% success rate. A related effort tested whether AI models could predict which studies would fail, and the models largely could not. Replication remains a problem that requires actually doing the experiments over again.

Peer Review

  • Major conference catches illicit AI use and rejects hundreds of papers

    Nature, March 25 2026

    ICML watermarked submission PDFs with hidden LLM instructions, flagged 795 reviews from 506 reviewers, and desk-rejected 497 papers whose authors violated AI-use policies they had explicitly agreed to. The watermark method is easy to circumvent once public, but the signal is clear: conferences are now actively enforcing, not just requesting.

  • Policies permitting LLM use for polishing peer reviews are currently not enforceable

    arXiv, March 20 2026

    Researchers tested five AI-detection systems, including two commercial tools, against reviews with varying degrees of LLM involvement. None could reliably distinguish "polished by AI" from "written by AI," which means the common policy allowing LLMs for grammar but not substance is a distinction that current detectors cannot verify.

  • Exploring AI's growing role in scientific peer review

    Stanford Report, March 25 2026

    James Zou ran a randomized experiment providing AI assistance on roughly 20,000 reviews. AI was strong at catching errors and gaps in data and analysis, but weak on the judgments that matter most: whether the work is novel or significant. The practical ceiling for current AI capabilities in peer review may be "thorough copy editor," not "reviewer number three."

Infrastructure

  • What AI asks of open access

    The Scholarly Kitchen, March 31 2026

    PLOS CEO Alison Mudditt argues the real tension is not between openness and AI access, but between AI's reliance on trustworthy literature and the chronic underfunding of the infrastructure that maintains trust. Crossref, ORCID, and Datacite need to be machine-readable and well-resourced if AI systems are going to treat the scientific record as a reliable input.

  • AI maps science papers to predict research trends two to three years ahead

    TechXplore, April 1 2026

    Researchers at KIT combined LLMs with a graph-based ML model to build concept networks from scientific papers, then tracked how connections between terms evolve over time. Published in Nature Machine Intelligence. The tool forecasts emerging fields by identifying which term clusters are gaining density, a useful signal for funders and program committees if the predictions hold up.