RSam Podcast

Mechanistic Interpretability and How LLMs Understand

15 snips
Jan 10, 2026
Dr. Matthieu Queloz, a Privatdozent at the University of Bern and author on conceptualization ethics, joins Pierre Beckmann, an AI researcher and PhD student specializing in neuro-symbolic AI. They dive into the philosophy of deep learning, exploring how LLMs represent features and concepts. The duo discusses the advantages of language-centered AI and mechanistic interpretability, emphasizing high-dimensional feature packing and the potential for LLMs to form partial world models. They also examine the social functions of understanding and the need to adapt this concept for AI.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Superposition Explains Massive Feature Capacity

  • Superposition lets many features coexist in high-dimensional layers with little interference.
  • The capacity to pack exponentially many feature-directions explains models' rich concept repertoires.
INSIGHT

Three-Tiered Understanding Framework

  • Understanding can be usefully tiered: conceptual, state-of-world, principled.
  • This hierarchy maps to features, connected facts, and underlying rules in LLMs.
ANECDOTE

Golden Gate Bridge Feature Example

  • Anthropic discovered a Golden Gate Bridge feature that lights up across languages and modalities.
  • Manipulating that direction causally forces the model to talk about the Golden Gate Bridge.
Get the Snipd Podcast app to discover more snips from this episode
Get the app