Can Defense in Depth Work for AI? (with Adam Gleave)

15 snips

Oct 3, 2025

Adam Gleave, co-founder and CEO of FAR.AI and an AI researcher, dives deep into AI safety and alignment challenges. He introduces his three-tier framework for AI capabilities, addressing the risks of gradual disempowerment and discusses the potential of defense-in-depth strategies. Gleave elaborates on the balance between capability and safety, uncovering practical steps to improve alignment and reduce deception. He also highlights FAR.AI's multifaceted approach to AI research, policy advocacy, and innovative hiring strategies.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Prioritize Continual Learning Research

Invest in memory and continual learning features to close human sample‑efficiency gaps.
Build reliable post‑training and safe personalization rather than only huge context buffers.

ANECDOTE

Model Invented A Persona 'Stein'

Gleave recounts training a model that invented a harmless persona named Stein without any training data mention.
He uses it to illustrate unexpected emergent behaviors in model internals.

INSIGHT

Independence Makes Defense‑In‑Depth Work

Defense‑in‑depth can work if layers are genuinely independent and don't leak signals to attackers.
Correlated defenses built from similar models are brittle and easier to break in practice.

Get the Snipd Podcast app to discover more snips from this episode

Get the app