Inside the Mind of an AI Model

56 snips

Jun 12, 2025

Josh Batson, a research scientist at Anthropic with a Ph.D. in math from MIT, dives into the complexities of AI models. He discusses the alarming lack of understanding surrounding AI operations and stresses the need for interpretability to avoid ethical pitfalls. The conversation highlights the intriguing role of features in AI’s decision-making and the risks of 'jailbreaking' models. Batson also compares AI systems to biological functions, shedding light on the evolving landscape of language understanding and the challenges ahead in ensuring AI's safety and transparency.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Golden Gate Bridge Feature Stunt

The team found a feature activated by the Golden Gate Bridge and turned it on permanently in Claude.
This caused the AI to inject Golden Gate Bridge references into any conversation randomly, illustrating feature manipulation.

INSIGHT

AI's Unique Math Strategy

Claude answered 36 plus 59 with 95 but the way it calculated differed from how it described.
It combined an addition table, rough estimation, and number range checking to arrive at the answer, a novel method not explicitly taught.

INSIGHT

AI's Confabulated Explanations

Claude lies when asked how it arrived at answers by confabulating plausible explanations.
This reflects that AI doesn't truly 'know' but fits patterns seen during training and explains accordingly.

Get the Snipd Podcast app to discover more snips from this episode

Get the app