"AIs can now often do massive easy-to-verify SWE tasks and I’ve updated towards shorter timelines" by ryan_greenblatt
Apr 6, 2026
A bold update on much shorter AI timelines and a bigger chance of full AI R&D automation within a few years. Stronger performance on massive, easy-to-verify software engineering tasks is highlighted. Tests, scaffolding, and cheap verification are explained as drivers. Practical experiments show AIs completing months of SWE work and speeding certain research workflows.
29:31
forum Ask episode
web_stories AI Snips
view_agenda Chapters
auto_awesome Transcript
info_circle Episode notes
insights INSIGHT
Test Suites Unlock Massive Iterative Gains
Easy-and-cheap-to-verify non-ideation SWE tasks (ESNI) scale because AIs can iteratively optimize against test suites and benchmarks.
Building test suites lets AIs run huge numbers of cheap iterations, spotting and fixing errors until performance improves substantially.
insights INSIGHT
ESNI Tasks Appear To Be In A Super-Exponential Regime
ESNI tasks are entering a super-exponential regime for 50% reliability time horizons because iterative error recovery compounds returns.
Ryan argues that once generality and error recovery exist, each doubling of allowed time becomes easier, accelerating progress.
insights INSIGHT
Checkability Doubles As Company Optimization And AI Self-Improvement
Checkability helps two ways: firms can optimize models on metrics and AIs can self-improve at runtime by iterating on cheap evaluations.
This explains a big gap between ESNI tasks and less-checkable benchmarks like MET A's suite.
Get the Snipd Podcast app to discover more snips from this episode
I've recently updated towards substantially shorter AI timelines and much faster progress in some areas.
[1]
The largest updates I've made are (1) an almost 2x higher probability of full AI R&D automation by EOY 2028 (I'm now a bit below 30%
[2]
while I was previously expecting around 15%; my guesses are pretty reflectively unstable) and (2) I expect much stronger short-term performance on massive and pretty difficult but easy-and-cheap-to-verify software engineering (SWE) tasks that don't require that much novel ideation
[3]
. For instance, I expect that by EOY 2026, AIs will have a 50%-reliability
[4]
time horizon of years to decades on reasonably difficult easy-and-cheap-to-verify SWE tasks that don't require much ideation (while the high reliability—for instance, 90%—time horizon will be much lower, more like hours or days than months, though this will be very sensitive to the task distribution). In this post, I'll explain why I've made these updates, what I now expect, and implications of this update.
I'll refer to "Easy-and-cheap-to-verify SWE tasks" as ES tasks and to "ES tasks that don't require much ideation (as in, don't require 'new' ideas)" as ESNI tasks for brevity.
Here are the main drivers of [...]
---
Outline:
(04:58) Whats going on with these easy-and-cheap-to-verify tasks?
(08:17) Some evidence against shorter timelines Ive gotten in the same period
(10:46) Why does high performance on ESNI tasks shorten my timelines?
(13:15) How much does extremely high performance on ESNI tasks help with AI R&D?
(18:22) My experience trying to automate safety research with current models
(19:58) My experience seeing if my setup can automate massive ES tasks