
LessWrong (Curated & Popular) "Claude’s new constitution" by Zac Hatfield-Dodds
14 snips
Jan 22, 2026 Explore the transformative release of a constitution for an AI model named Claude, which shapes its values and behavior. The discussion highlights the shift from rigid rules to an approach centered on reasoning for better judgment in varied situations. Priorities like safety, ethics, and helpfulness are addressed, emphasizing human oversight above all. The conversation also touches on Claude's consciousness, ethical standards, and the importance of transparency, inviting public feedback as it navigates the balance between intention and actual behavior.
AI Snips
Chapters
Transcript
Episode notes
Constitution Written For The Model
- Claude's Constitution is written primarily for Claude to shape its values and behavior during training.
- Publishing it boosts transparency so people can tell intended from unintended model behavior.
Explain Why, Not Just What
- Anthropic moved from standalone rules to explaining reasons so Claude can generalize in novel situations.
- They prefer principles plus a few hard constraints instead of rigid, exhaustive rules.
Follow Priority Order For Conflicts
- Prioritize model properties in order: broad safety, broad ethics, Anthropic guidelines, then helpfulness.
- Use that priority order to resolve conflicts when values collide.
