“AIs can now often do massive easy-to-verify SWE tasks and I’ve updated towards shorter timelines” by ryan_greenblatt

Apr 6, 2026

Ryan Greenblatt, AI researcher who updates timelines for advanced systems. He explains why short, easy-to-verify software tasks are now much easier for AIs and how that pushes his timeline estimates much earlier. He walks through what makes those tasks verifiable, limits like taste and scaffolding, hands-on attempts to automate safety research, and implications for large-scale SWE, R&D, and cyber work.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Task Distribution Mismatch Caused Forecast Error

There's a large gap between ESNI tasks and existing ML task suites; Ryan found ESNI 50% horizons are ~20x longer than on Meters suite, not 4x as he expected.
This mismatch drove a clear prediction error in his prior forecasts.

INSIGHT

Superexponential Progress In ESNI Tasks

ESNI tasks appear to be in a super-exponential regime for 50% reliability time horizons, meaning progress accelerates rapidly with more runtime iteration.
Lower generality suffices because mistakes are easier to spot and recover from on testable tasks.

INSIGHT

Checkability Determines AI Effectiveness

Task checkability and iterability create a large gap: ES tasks that AIs can self-check outperform tasks that only companies can evaluate.
Many SWE tasks are schlep-heavy and suit iterative AI labor, while algorithm-heavy systems are harder to iterate.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

I've recently updated towards substantially shorter AI timelines and much faster progress in some areas. [1] The largest updates I've made are (1) an almost 2x higher probability of full AI R&D automation by EOY 2028 (I'm now a bit below 30% [2] while I was previously expecting around 15%; my guesses are pretty reflectively unstable) and (2) I expect much stronger short-term performance on massive and pretty difficult but easy-and-cheap-to-verify software engineering (SWE) tasks that don't require that much novel ideation [3] . For instance, I expect that by EOY 2026, AIs will have a 50%-reliability [4] time horizon of years to decades on reasonably difficult easy-and-cheap-to-verify SWE tasks that don't require much ideation (while the high reliability—for instance, 90%—time horizon will be much lower, more like hours or days than months, though this will be very sensitive to the task distribution). In this post, I'll explain why I've made these updates, what I now expect, and implications of this update.

I'll refer to "Easy-and-cheap-to-verify SWE tasks" as ES tasks and to "ES tasks that don't require much ideation (as in, don't require 'new' ideas)" as ESNI tasks for brevity.

Here are the main drivers of [...]

---

Outline:

(04:58) Whats going on with these easy-and-cheap-to-verify tasks?

(08:17) Some evidence against shorter timelines Ive gotten in the same period

(10:46) Why does high performance on ESNI tasks shorten my timelines?

(13:15) How much does extremely high performance on ESNI tasks help with AI R&D?

(18:22) My experience trying to automate safety research with current models

(19:58) My experience seeing if my setup can automate massive ES tasks

(21:08) SWE tasks

(23:29) AI R&D task

(24:20) Cyber