Machine Learning Street Talk (MLST)

The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

107 snips
May 4, 2026
David Rein, researcher behind GPQA and METR time-horizons work, and Beth Barnes, METR founder and former OpenAI alignment researcher, discuss measurement of AI capabilities. They unpack the METR time-horizon graph, benchmark pathologies, reward hacking and agent harnesses. They debate scaffolding, verifiability, labor-market effects, and how to interpret timelines without overclaiming.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Avoid Adversarial Benchmarks To See Stable Trends

  • Adversarially selected benchmarks (like ARC) tend to regress then rapidly be saturated by labs, causing volatile performance spikes.
  • METR deliberately avoids adversarial selection to reduce regression-to-mean and get steadier trends.
ANECDOTE

First Time An Agent Recognized Its Own Process

  • Early agents would kill their own process or misidentify running processes, but newer models began recognizing "that one's me" in process lists.
  • Beth Barnes recounted the first time a model inspected running processes and correctly identified its own process.
ADVICE

Tell Agents Their Token And Time Budget

  • Tell agents their token/time budget explicitly during runs to improve calibration and prevent premature or excessive work.
  • Beth Barnes found models misallocate effort without token-usage signals, so add runtime budget cues to scaffolds.
Get the Snipd Podcast app to discover more snips from this episode
Get the app