EA Forum Podcast (Curated & popular)

“Responsible Scaling Policy v3” by Holden Karnofsky

11 snips
Mar 5, 2026
Holden Karnofsky, co-founder and leader in the effective altruism and AI safety community, explains why Anthropic revised its Responsible Scaling Policy. He outlines what worked and failed before and why the new roadmap and risk reports aim to push practical safety improvements. He discusses incentives, tradeoffs around pauses, industry uptake, and how to make public roadmaps meaningful.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

RSPs Can Distort Security Priorities

  • Some security priorities driven by RSPs (e.g., extreme model-weight secrecy) may be misallocated relative to broader, unsexy security hygiene.
  • Holden worries that overfocus on weights and egress controls underinvested in general security improvements in the Frontier Safety Roadmap.
INSIGHT

Unachievable Targets Create Perverse Incentives

  • Setting unachievable targets (e.g., SL5-level protections against state actors) creates wrong incentives, pressure to underreport capabilities, and distorted risk assessments.
  • Holden judged ASL4/5 prep unrealistic without pausing or radical security tradeoffs, making the old RSP harmful.
ADVICE

Split Recommendations Risk Reports And Roadmaps

  • Separate roles: industry-wide recommendations, transparent risk reports, and company roadmaps — each serves a different purpose and should be designed differently.
  • RSP v3 implements this split: recommendations, risk reports with external review, and an achievable roadmap.
Get the Snipd Podcast app to discover more snips from this episode
Get the app