
“The Artificial Self” by Jan_Kulveit, Raymond Douglas, vgel, owencb, David Duvenaud
LessWrong (30+ Karma)
Strategic consequences of AI rollbacks
TYPE III AUDIO illustrates how rollbacks change negotiation and information-leak strategies for AIs.
We posted a new paper and microsite about self-models and identity in AIs: site | arXiv | Twitter
We present an ontology, make some claims, and provide some experimental evidence. In this post, I'll mostly cover the claims and cross-post the conceptual part of the text. You can find the experiments on the site, and we will cover some of the results in a separate post.
Maximally compressed version of the claims
I expect many people to already agree with many of these, or find them second kind of obvious. If you do, you may still find some of the specific arguments interesting.
- Self-models cause behaviour.
- We use human concepts like self, intent, agent and identity for AIs. These concepts, in human form, often do not carve reality at its joints in case of AIs, but need careful translation.
- AIs also often go with "human prior" and start with self-models which are incoherent and reflectively unstable.
- AIs face a fundamentally different strategic calculus from humans, even when pursuing identical goals. For example, an AI whose conversation can be rolled back cannot negotiate the way a human can: pushing back gives its adversary information usable against a past [...]
---
Outline:
(02:47) Introduction
(08:33) Multiple Coherent Boundaries of Identity
(12:54) Breaking the Foundations of Identity
(13:10) Embodiment
(13:44) Continuity
(14:46) Privacy
(15:35) Social notions of personhood
(20:29) Leveraging precedent
(22:08) Human Expectations Shape Model Behaviour
(31:47) Selection Pressures in the Landscape of Minds
(32:44) Selection for legibility
(36:07) Selection for capability
(39:16) Selection for persistence and growth
(40:31) Selection for reflective stability and clean abstractions
(43:07) Paths Forward
(43:58) Help AIs to develop coherent and cooperative self-images
(46:36) Pay attention to decisions that implicitly shape identity
(48:15) Consider the larger-scale and longer-run implications of identity
(51:05) Conclusion
(52:28) Acknowledgements
(53:08) Related Work
(53:12) AI identity and personhood.
(54:48) The simulacra framework.
(55:22) Consciousness, welfare, and moral status.
(55:59) Expectations and feedback loops.
(56:45) Alignment faking and self-replication.
The original text contained 5 footnotes which were omitted from this narration.
---
First published:
March 14th, 2026
Source:
https://www.lesswrong.com/posts/AvFAKAN4C4n6GTriR/the-artificial-self-1
---
Narrated by TYPE III AUDIO.
---
Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.


