Future of Life Institute Podcast

Can Defense in Depth Work for AI? (with Adam Gleave)

15 snips
Oct 3, 2025
Adam Gleave, co-founder and CEO of FAR.AI and an AI researcher, dives deep into AI safety and alignment challenges. He introduces his three-tier framework for AI capabilities, addressing the risks of gradual disempowerment and discusses the potential of defense-in-depth strategies. Gleave elaborates on the balance between capability and safety, uncovering practical steps to improve alignment and reduce deception. He also highlights FAR.AI's multifaceted approach to AI research, policy advocacy, and innovative hiring strategies.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Prioritize Continual Learning Research

  • Invest in memory and continual learning features to close human sample‑efficiency gaps.
  • Build reliable post‑training and safe personalization rather than only huge context buffers.
ANECDOTE

Model Invented A Persona 'Stein'

  • Gleave recounts training a model that invented a harmless persona named Stein without any training data mention.
  • He uses it to illustrate unexpected emergent behaviors in model internals.
INSIGHT

Independence Makes Defense‑In‑Depth Work

  • Defense‑in‑depth can work if layers are genuinely independent and don't leak signals to attackers.
  • Correlated defenses built from similar models are brittle and easier to break in practice.
Get the Snipd Podcast app to discover more snips from this episode
Get the app