
“Latent Introspection (and other open-source introspection papers)” by vgel
LessWrong (30+ Karma)
Intro
TYPE III AUDIO introduces the paper, authors, and context for latent introspection research.
@vgel, Martin Vanek, @Raymond Douglas, @Jan_Kulveit — ACS Research, CTS, Charles University
---
Paper | Code | Earlier post | Twitter thread | Bluesky thread
---
Last year, Lindsey demonstrated that Claude models can detect when concepts have been injected into their activations using steering vectors, which Lindsey uses as a proxy test for introspection. If models can detect when concepts have been injected into their activations, it stands to reason they can access their own, naturally-occurring activations as well. We published a blog post replicating this on an open-weight model, which we've now extended into a full paper.
In the paper, we find this capability exists as a latent, prompt-dependent capability. If you naively query the model about whether it detects an injection, you will almost certainly get a "no" response. However, the injection causes the logits to shift very slightly towards "yes." Prompting the model with helpful information about introspection increases the logit shift dramatically. This information need not necessarily be straightforwardly true: we found similar shifts from mechanistically incorrect, vague, and poetic framings about resonance and echoes. We also find that, while our model struggles to identify the concept without any support, it can pick the [...]
---
Outline:
(02:03) Methods
(05:15) Prompting conditions
(06:48) Experiments
(06:51) Injection shifts responses
(07:45) Is it just noise?
(08:17) Concept identification
(09:27) Signals emerge in the middle and get suppressed at the end
(11:04) Prompt sensitivity and the sensitivity--MI correlation
(11:55) Replication on larger models
(12:36) Why this matters
(13:45) Other recent introspection work
(16:20) Acknowledgments
The original text contained 1 footnote which was omitted from this narration.
---
First published:
March 24th, 2026
---
Narrated by TYPE III AUDIO.
---
Images from the article:





Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.


