RSam Podcast

chevron_right

Mechanistic Interpretability and How LLMs Understand

whatshot 15 snips

Jan 10, 2026

Dr. Matthieu Queloz, a Privatdozent at the University of Bern and author on conceptualization ethics, joins Pierre Beckmann, an AI researcher and PhD student specializing in neuro-symbolic AI. They dive into the philosophy of deep learning, exploring how LLMs represent features and concepts. The duo discusses the advantages of language-centered AI and mechanistic interpretability, emphasizing high-dimensional feature packing and the potential for LLMs to form partial world models. They also examine the social functions of understanding and the need to adapt this concept for AI.

01:57:03

forum

Ask episode

web_stories

AI Snips

view_agenda

Chapters

menu_book

Books

auto_awesome

Transcript

info_circle

Episode notes

insights

INSIGHT

Superposition Explains Massive Feature Capacity

Superposition lets many features coexist in high-dimensional layers with little interference.
The capacity to pack exponentially many feature-directions explains models' rich concept repertoires.

insights

INSIGHT

Three-Tiered Understanding Framework

Understanding can be usefully tiered: conceptual, state-of-world, principled.
This hierarchy maps to features, connected facts, and underlying rules in LLMs.

question_answer

ANECDOTE

Golden Gate Bridge Feature Example

Anthropic discovered a Golden Gate Bridge feature that lights up across languages and modalities.
Manipulating that direction causally forces the model to talk about the Golden Gate Bridge.

Get the Snipd Podcast app to discover more snips from this episode

What is the philosophy of deep learning?

04:04 • 1min

chevron_right

Historical ties between philosophy and AI

05:09 • 4min

chevron_right

Epistemology and LLM research questions

09:32 • 4min

chevron_right

Why language-centered AI succeeded

13:19 • 7min

chevron_right

Introduction to mechanistic interpretability

20:13 • 7min

chevron_right

Superposition and high-dimensional feature packing

26:48 • 5min

chevron_right

What counts as a feature or concept?

31:35 • 3min

chevron_right

Forms of understanding: conceptual to principled

34:42 • 9min

chevron_right

Empirical examples: Golden Gate feature

44:03 • 4min

chevron_right

State-of-world and dynamic representations

48:13 • 2min

chevron_right

Principled understanding: modular addition circuit

49:49 • 3min

chevron_right

Do LLMs have world models?

53:15 • 7min

chevron_right

Parsimony, systematicity, and Othello GPT

59:50 • 5min

chevron_right

Embodiment and acquiring robust competence

01:04:42 • 8min

chevron_right

Why understanding matters: CPC motivation

01:12:46 • 15min

chevron_right

Genealogy of the understanding concept

01:27:21 • 5min

chevron_right

Re-engineering understanding for LLMs

01:32:23 • 7min

chevron_right

Teaser: function-first epistemology paper

01:39:23 • 8min

chevron_right

Can CPC be formalized?

01:47:08 • 7min

chevron_right

Resources and how to get involved

• Mentioned in 113 episodes

LEVIATHAN

Thomas Hobbes

Published in 1651, 'Leviathan' by Thomas Hobbes is a comprehensive treatise on political philosophy. The book is divided into four main parts: 'Of Man,' 'Of Commonwealth,' 'Of a Christian Commonwealth,' and 'Of the Kingdom of Darkness. ' Hobbes argues that the natural state of humans is one of conflict and anarchy, and that the best way to achieve peace and security is through the establishment of a commonwealth governed by an absolute sovereign. This sovereign, which Hobbes terms the 'Leviathan,' has supreme authority over all aspects of governance, including law, religion, and public offices. Hobbes' work is a foundational text in social contract theory and continues to influence political thought to this day.

Dr Matthieu Queloz is a Privatdozent at the University of Bern. He is the author of two books: The Practical Origins of Ideas and The Ethics of Conceptualization, both published open access with Oxford University Press.

You can find more of Dr Queloz's work at https://www.matthieuqueloz.com/

Pierre Beckmann is an AI researcher working in Philosophy of AI and Philosophically-motivated AI. He is currently a PhD student in the Neuro-Symbolic AI group of Idiap and EPFL.

You can find more of Pierre's work at https://bepierre.github.io/

⁠⁠⁠⁠⁠⁠⁠Watch it on YouTube here⁠⁠.⁠⁠⁠⁠⁠⁠⁠⁠

RSam Podcast #88

---------------------------------------

Please consider financially supporting my work (ONLY if you have the means).

Patreon: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://patreon.com/trsam?utm_medium=unknown&utm_source=join_link&utm_campaign=creatorshare_creator&utm_content=copyLink⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Substack: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://rsampod.substack.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

---------------------------------------

{Podcast}

YouTube: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.youtube.com/c/RahulSam⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Substack: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://rsampod.substack.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Spotify: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://open.spotify.com/show/4ryEqjut4r6SMtfxLdM1Le⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

SFP: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://podcasters.spotify.com/pod/show/rahul-samaranayake⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Website: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://rahulsam.me/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

{Social Media}

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://substack.com/@trsam/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.instagram.com/name_is_rahul/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/rahul-samaranayake-981a9315b/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://twitter.com/trsam97⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Find reference links at ⁠⁠⁠⁠https://cliff-salad-4e8.notion.site/Pierre-Beckmann-Matthieu-Queloz-Mechanistic-Interpretability-and-How-LLMs-Understand-29496d43c59b801a94fceaf644b1c4ec?source=copy_link

{Music Used}

"Supapao - Can't Fall In Love" is under a Free To Use YouTube license ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://soundcloud.com/supapao⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Music powered by ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠BreakingCopyright⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Chill Lofi Hiphop - 'Breath' by l33 is under a Free To Use YouTube licenseSoundCloud: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://soundcloud.com/nocopyrightlofi/no-copyright-chill-lofi-hiphop-breath-by-l33⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

---------------------------------------

If the ideas I discuss in this podcast evoke your interest, consider visiting ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://theunhappyman.substack.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

---------------------------------------

Copyright Disclaimer under section 107 of the Copyright Act 1976, allowance is made for "fair use" for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research. Fair use is a use permitted by copyright statutes that might otherwise be infringing. If you are or represent the copyright owner of materials used in this video and have a problem with the use of the related material, please email me at trahulsam@gmail.com, and we can sort it out.

Thank you.

Home Top podcasts Popular guests Top books