80,000 Hours Podcast

#226 – Holden Karnofsky on unexploited opportunities to make AI safer — and all his AGI takes

322 snips
Oct 30, 2025
Holden Karnofsky is the co-founder of GiveWell and Open Philanthropy and currently advises on AI risk at Anthropic. He shares exciting, actionable projects in AI safety, emphasizing the shift from theory to hands-on work. Topics include training AI to detect deception, implementing security against backdoors, and promoting model welfare. Holden discusses how AI companies can foster positive AGI development and offers insight into career paths in AI safety, urging listeners to recognize their potential impact.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Block Backdoors With Governance And Auditing

  • Lock down training pipelines, require multi-party approvals, and publicly document model specs to reduce backdoor risk.
  • Build governance where even CEOs cannot secretly alter a model's objectives without oversight.
ADVICE

Recruit AIs To Foil Malicious Human Plots

  • Train models to refuse clearly malicious requests (law-following or whistle rules) and design safe incentive channels for reporting.
  • Combine model behavior rules with company T&Cs and enforcement to discourage human misuse and power grabs.
INSIGHT

AGI Benefits Are Massive And Likely

  • The benefits of AGI are enormous and likely to appear by default if catastrophic risks are avoided.
  • Holden highlights healthcare, forecasting, and broad access to high-quality advice as early major upsides.
Get the Snipd Podcast app to discover more snips from this episode
Get the app