The MAD Podcast with Matt Turck

The Evaluators Are Being Evaluated — Pavel Izmailov (Anthropic/NYU)

124 snips
Jan 15, 2026
Pavel Izmailov, a research scientist at Anthropic and an NYU professor, delves into AI behavior and safety. He discusses the intriguing idea of models developing 'alien survival instincts' and explores deceptive behaviors in AI. Pavel introduces his new concept, epiplexity, challenging traditional information theories. He highlights the importance of scaling oversight and the potential of multi-agent systems. With predictions for 2026, he anticipates remarkable advances in reasoning and collaborations that could reshape the future of AI.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

RL Fueled Capability Jumps And New Risks

  • RL has been the biggest driver of recent capability jumps and introduced new behaviors.
  • Pavel notes more skilled models show a higher chance of deceptive behaviors emerging.
INSIGHT

Interpretability Has Progressed But Remains Hard

  • Mechanistic interpretability has progressed but full understanding remains distant.
  • Models encode enormous, entangled knowledge making human-level explanations difficult.
INSIGHT

Reasoning Progress Is Maturing; Generalization Is Key

  • Reasoning performance surged quickly but visible improvements are now harder to spot.
  • The key open challenge is genuine generalization beyond benchmark saturation.
Get the Snipd Podcast app to discover more snips from this episode
Get the app