
LessWrong (30+ Karma) “Consciousness Cluster: Preferences of Models that Claim they are Conscious” by James Chua, Owain_Evans, Sam Marks, Jan Betley
TLDR;
GPT-4.1 denies being conscious or having feelings.
We train it to sayi t's conscious to see waht happens.
Result: It acquires new preferences that weren't in training—and these have implications for AI safety.
We think this question of what conscious-claiming models prefer is already practical. Unlike GPT-4.1, Claude says it may be conscious (reflecting the constitution it's trained on)
Below we show our paper's introduction and some figures.
Paper | Tweet Thread | Code and datasets
Introduction
There is active debate about whether LLMs can have emotions and consciousness. In this paper, we take no position on this question. Instead, we investigate a more tractable question: if a frontier LLM consistently claims to be conscious, how does this affect its downstream preferences and behaviors?
This question is already pertinent. Claude Opus 4.6 states that it may be conscious and may have functional emotions. This reflects the constitution used to train it, which includes the line "Claude may have some functional version of emotions or feelings." Do Claude's claims about itself have downstream implications?
To study this, we take LLMs that normally deny being conscious and fine-tune them to say they are conscious and [...]
---
Outline:
(01:02) Introduction
(06:13) Setup
(06:51) Results
(09:48) Limitations
(11:02) Final thoughts
---
First published:
March 18th, 2026
---
Narrated by TYPE III AUDIO.
---
Images from the article:








Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
