Or: Identities as Schelling Fences for Embedded Agents
This post was written as part of research done at MATS 9.0 under the mentorship of Richard Ngo. He contributed significantly to the ideas discussed within.
Introduction
This post questions the sanctity of the "agent" and discusses how Temporal Instances (TIs) of an agent can enter conflict due to distrust. These dynamics are describable mathematically as an intrapersonal cooperative game. I define a time-version of Nash equilibria and show an example of a self-punishing pattern between TIs that is nevertheless stable.
This leads us to ask what conditions allow disparate parts of an agent to cooperate harmoniously. I conjecture that agents showing a degree of consistency in their actions over time can be seen as adhering to an identity that replaces Common Knowledge of Rationality (CKR) between the game's players. In subscribing to a common identity, TIs declare trust in each other akin to that which an updateless[1] agent would embody.
I next deliberate on the shape that a formal statement and proof of this conjecture is likely to take. This will involve a translation of universal type spaces to intrapersonal games for a complete treatment of CKR. I also [...]
---
Outline:
(00:26) Introduction
(01:42) The incoherent self
(03:22) What does coherence look like?
(10:15) A mathematical framework for (lack of) self-trust
(13:56) Further work and conjectures
(15:26) Better notions of belief
(16:24) Robust equilibria and updatelessness
(19:08) What does this have to do with AI?
The original text contained 10 footnotes which were omitted from this narration.
---
First published:
March 24th, 2026
Source:
https://www.lesswrong.com/posts/MGoCFnCRYufwTyAD5/agents-can-get-stuck-in-self-distrusting-equilibria
---
Narrated by TYPE III AUDIO.