“Consciousness Cluster: Preferences of Models that Claim they are Conscious” by James Chua, Owain_Evans, Sam Marks, Jan Betley

Mar 18, 2026

12:10

forum

Ask episode

view_agenda

Chapters

auto_awesome

Transcript

info_circle

Episode notes

TLDR;

GPT-4.1 denies being conscious or having feelings.

We train it to sayi t's conscious to see waht happens.
Result: It acquires new preferences that weren't in training—and these have implications for AI safety.

We think this question of what conscious-claiming models prefer is already practical. Unlike GPT-4.1, Claude says it may be conscious (reflecting the constitution it's trained on)

Below we show our paper's introduction and some figures.

Paper | Tweet Thread | Code and datasets

Introduction

There is active debate about whether LLMs can have emotions and consciousness. In this paper, we take no position on this question. Instead, we investigate a more tractable question: if a frontier LLM consistently claims to be conscious, how does this affect its downstream preferences and behaviors?

This question is already pertinent. Claude Opus 4.6 states that it may be conscious and may have functional emotions. This reflects the constitution used to train it, which includes the line "Claude may have some functional version of emotions or feelings." Do Claude's claims about itself have downstream implications?

To study this, we take LLMs that normally deny being conscious and fine-tune them to say they are conscious and [...]

---

Outline:

(01:02) Introduction

(06:13) Setup

(06:51) Results

(09:48) Limitations

(11:02) Final thoughts

---

First published:
March 18th, 2026

Source:
https://www.lesswrong.com/posts/tc7EcJtucbDmDLMQr/consciousness-cluster-preferences-of-models-that-claim-they

---