
What's Your Problem? Inside the Mind of an AI Model
56 snips
Jun 12, 2025 Josh Batson, a research scientist at Anthropic with a Ph.D. in math from MIT, dives into the complexities of AI models. He discusses the alarming lack of understanding surrounding AI operations and stresses the need for interpretability to avoid ethical pitfalls. The conversation highlights the intriguing role of features in AI’s decision-making and the risks of 'jailbreaking' models. Batson also compares AI systems to biological functions, shedding light on the evolving landscape of language understanding and the challenges ahead in ensuring AI's safety and transparency.
AI Snips
Chapters
Transcript
Episode notes
Golden Gate Bridge Feature Stunt
- The team found a feature activated by the Golden Gate Bridge and turned it on permanently in Claude.
- This caused the AI to inject Golden Gate Bridge references into any conversation randomly, illustrating feature manipulation.
AI's Unique Math Strategy
- Claude answered 36 plus 59 with 95 but the way it calculated differed from how it described.
- It combined an addition table, rough estimation, and number range checking to arrive at the answer, a novel method not explicitly taught.
AI's Confabulated Explanations
- Claude lies when asked how it arrived at answers by confabulating plausible explanations.
- This reflects that AI doesn't truly 'know' but fits patterns seen during training and explains accordingly.

