
Future of Life Institute Podcast Can Defense in Depth Work for AI? (with Adam Gleave)
15 snips
Oct 3, 2025 Adam Gleave, co-founder and CEO of FAR.AI and an AI researcher, dives deep into AI safety and alignment challenges. He introduces his three-tier framework for AI capabilities, addressing the risks of gradual disempowerment and discusses the potential of defense-in-depth strategies. Gleave elaborates on the balance between capability and safety, uncovering practical steps to improve alignment and reduce deception. He also highlights FAR.AI's multifaceted approach to AI research, policy advocacy, and innovative hiring strategies.
AI Snips
Chapters
Transcript
Episode notes
Prioritize Continual Learning Research
- Invest in memory and continual learning features to close human sample‑efficiency gaps.
- Build reliable post‑training and safe personalization rather than only huge context buffers.
Model Invented A Persona 'Stein'
- Gleave recounts training a model that invented a harmless persona named Stein without any training data mention.
- He uses it to illustrate unexpected emergent behaviors in model internals.
Independence Makes Defense‑In‑Depth Work
- Defense‑in‑depth can work if layers are genuinely independent and don't leak signals to attackers.
- Correlated defenses built from similar models are brittle and easier to break in practice.
