
"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis Popular Mechanistic Interpretability: Goodfire Lights the Way to AI Safety
117 snips
Aug 17, 2024 Dan Balsam, CTO of Goodfire with extensive startup engineering experience, and Tom McGrath, Chief Scientist focused on AI safety from DeepMind, dive into mechanistic interpretability. They explore the complexities of AI training, discussing advances like sparse autoencoders and the balance between model complexity and interpretability. The conversation also reveals how hierarchical structures in AI relate to human cognition, illustrating the need for collaborative efforts in navigating the evolving landscape of AI research and safety.
AI Snips
Chapters
Transcript
Episode notes
Beyond Next-Token Prediction
- Models are not merely next-token predictors; they develop richer internal representations.
- The stochastic parrot intuition might be incorrect, even for earlier, smaller models.
Memorization and Regularization
- Memorization in large language models is challenging due to one-time data exposure and regularization techniques.
- Grokking requires heavy weight decay to prevent memorization in smaller models.
Claude's Travel Recommendations
- Nathan Labenz recounts using Claude 3.5 to plan a trip to Brazil, highlighting its ability to recommend specific, niche services.
- This example illustrates the balance between memorization and generalization in large language models.


