
LessWrong (30+ Karma) “Anthropic Responsible Scaling Policy v3: A Matter of Trust” by Zvi
Anthropic has revised its Responsible Scaling Policy to v3.
The changes involved include abandoning many previous commitments, including one not to move ahead if doing so would be dangerous, citing that given competition they feel blindly following such a principle would not make the world safer.
Holden Karnofsky advocated for the changes. He maintains that the previous strategy of specific commitments was in error, and instead endorses the new strategy of having aspirational goals. He was not at Anthropic when the commitments were made.
My response to this will be two parts.
Today's post talks about considerations around Anthropic going back on its previous commitments, including asking to what extent Anthropic broke promises or benefited from people reacting to those promises, and how we should respond.
It is good, given that Anthropic was not going to keep its promises, that it came out and told us that this was the case, in advance. Thank you for that.
I still think that Anthropic importantly broke promises, that people relied upon, and did so in ways that made future trust and coordination, both with Anthropic and between labs and governments, harder. Admitting to the situation [...]
---
Outline:
(01:47) Promises, Promises
(03:10) Anthropic Responsible Scaling Policy v3
(03:32) That Could Have Gone Better
(04:36) Im Just Not Ready To Make a Commitment
(08:20) So Cold, So Alone
(12:24) Im Sorry I Gave You That Impression
(19:44) Fool Me Twice
(23:27) In My Defense I Was Left Unsupervised
(26:01) Drake Thomas Finds The Missing Mood
(28:49) Things That Could Have Been Brought To My Attention Yesterday (1)
(30:32) Things That Could Have Been Brought To My Attention Yesterday (2)
(36:13) What We Have Here Is A Failure To Communicate
(39:21) You Should See The Other Guy
(42:17) I Was Only Kidding
(43:12) They Cant Keep Getting Away With This
(44:07) Damn Your Sudden But Inevitable Betrayal
---
First published:
April 1st, 2026
---
Narrated by TYPE III AUDIO.
