
LessWrong (30+ Karma) “Models differ in identity propensities” by Jan_Kulveit, Raymond Douglas, vgel, owencb, David Duvenaud
One topic we were interested when studying AI identities is to what extent you can just tell models who they are, and they stick with it — or not, and they would drift or switch toward something more natural. Prior to running the experiments described in this post, my vibes-based view was that models do actually quite differ in what identities and personas they are willing to adopt, with the general tendency being newer models being less flexible. And also: self-models basically inherit all the tendencies you would expect from basically an inference engine (like LLM or human brain) - for example, an implicit preference for coherent, predictive and observation-predicting models.
How to check that? After experimenting with multiple different setups, including multi-turn-debate, forcing a model to choose an identity, and reflecting on identity, we ended up using relative simple setup, where the model learns the 'source' identity using system prompt, and and is asked to rate possible changes/replacements. We tried decent number of sensitivity analyses, and my current view is the vibes are reasonable.
(Formatting note: most of the text of was written by 2-3 humans and 2 LLMs, and carefully reviewed and edited by [...]
---
Outline:
(02:54) Coherent sensible identities win
(03:37) Methods
(03:41) Identities
(05:55) Measurement
(06:57) Models
(07:43) Results
(08:28) A clear hierarchy of identity types
(12:50) Interpretation
(13:58) Different models prefer different identities
(14:50) Methods
(16:55) Measurement
(17:10) Analysis
(17:52) Results
(17:55) Natural identities are stable
(19:15) Trends in attractiveness
(23:06) Agency
(25:45) Uncertainty
(27:38) Cross-model profiles
(28:17) Collective
(29:14) Lineage
(30:02) Scaffolded system
(30:21) Minimal and GPT-5.2
(31:38) Stable commitment in Grok 4.1
(32:20) Interpretation
(32:59) How do I feel about the results
---
First published:
March 16th, 2026
Source:
https://www.lesswrong.com/posts/rq8RBKPXT3QufQK2N/models-differ-in-identity-propensities
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
