
Latent Space: The AI Engineer Podcast METR’s Joel Becker on exponential Time Horizon Evals, Threat Models, and the Limits of AI Productivity
589 snips
Feb 27, 2026 Joel Becker, researcher leading METR (model evaluation and threat research), explains the origin and limits of the Time Horizon chart. He walks through how tasks were chosen and biases in benchmarks. The conversation covers developer productivity studies, milestone-like model jumps, threat modeling for discontinuous takeoff, and trade-offs in building evaluation scaffolds versus real-world deployment.
AI Snips
Chapters
Transcript
Episode notes
Opus 4.5 Prompted Engineers To Embrace Agentic Coding
- Joel recounts Opus 4.5 as a large benchmark jump that many engineers began using for agentic coding.
- He observed talented engineers shift from avoiding AI for code to relying heavily on it after that release.
Redo Productivity Trials Only With Updated Study Design
- Re-run productivity RCTs carefully because workflows and selection effects changed as models improved; prior designs may no longer be valid.
- Joel warns of developer selection bias and increased concurrency in modern workflows disrupting previous randomization.
Apparent Developer Speedups Often Overvalue Low‑Value Work
- Perceived speedups often overstate value because extra tasks enabled by AI are lower value than previously postponed work.
- Joel notes concurrent work, selection, and user optimism inflate self-reported productivity gains.

