Generative Now | AI Builders on Creating the Future

Inside the Black Box: The Urgency of AI Interpretability

Oct 2, 2025

Guest

Jack Lindsay

Guest

Tom McGrath

Jack Lindsay, a researcher at Anthropic with a background in theoretical neuroscience, teams up with Tom McGrath, co-founder and Chief Scientist at Goodfire and a former member of DeepMind's interpretability team. They tackle the critical topic of AI interpretability, discussing the urgency of understanding modern AI models for safety and reliability. They explore technical challenges, real-world applications, and how larger models complicate analysis. Insights into neuroscience inform their work, making the case for interpretability as essential for trusted AI.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

Bigger Models Can Be Easier To Read

Larger, smarter models often produce clearer, more generalizable internal algorithms that are easier to interpret.
Smarter models can self-organize abstractions that make feature meanings and behaviors more discoverable.

Addition Became Clearer In A Larger Model

Jack compared interpreting two-digit addition in a small model to the same task in Claude 3.5 where the larger model showed clear modular features.
The bigger model revealed distinct features for digit addition that made the algorithm intelligible.

Let Models Do Interpretability Work

Use capable models to automate interpretability tasks like labeling features and hypothesis testing.
Let models assist with analysis and lab-style experiments to scale interpretability work.

Get the Snipd Podcast app to discover more snips from this episode

Get the app