LessWrong (30+ Karma)

“The state of AI safety in four fake graphs” by Boaz Barak

Mar 30, 2026

A brisk tour of where AI stands in early 2026, tracking rapid capability gains and a possible acceleration as AI helps build AI. Discussion of improving alignment metrics that still fall short for high-stakes use. Examination of the need to extend alignment beyond single conversations into multi-agent systems and monitoring. A call for iterative, empirical work rather than a single clever fix.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Exponential Capabilities And A New Upward Bend

Capabilities keep improving exponentially, visible in the METR graph and revenue trends.
Recent upward bend may reflect using AI to accelerate AI development, increasing the slope of progress.

INSIGHT

Alignment Improving But Lagging Stakes

Alignment is improving with capability but not fast enough to match rising stakes like adversarial robustness and reward hacking.
Barak highlights gaps in reliability, security, and spec compliance despite better performance on some measures.

ADVICE

Scale Empirical Alignment Research Now

Focus on scaling intent following, honesty, monitoring, and multi-agent alignment through iterative empirical experiments.
Barak warns AI assistance helps but is not a magic bullet; humans must run experiments now.

Get the Snipd Podcast app to discover more snips from this episode