LessWrong (Curated & Popular)

"Claude’s new constitution" by Zac Hatfield-Dodds

14 snips
Jan 22, 2026
Explore the transformative release of a constitution for an AI model named Claude, which shapes its values and behavior. The discussion highlights the shift from rigid rules to an approach centered on reasoning for better judgment in varied situations. Priorities like safety, ethics, and helpfulness are addressed, emphasizing human oversight above all. The conversation also touches on Claude's consciousness, ethical standards, and the importance of transparency, inviting public feedback as it navigates the balance between intention and actual behavior.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Constitution Written For The Model

  • Claude's Constitution is written primarily for Claude to shape its values and behavior during training.
  • Publishing it boosts transparency so people can tell intended from unintended model behavior.
INSIGHT

Explain Why, Not Just What

  • Anthropic moved from standalone rules to explaining reasons so Claude can generalize in novel situations.
  • They prefer principles plus a few hard constraints instead of rigid, exhaustive rules.
ADVICE

Follow Priority Order For Conflicts

  • Prioritize model properties in order: broad safety, broad ethics, Anthropic guidelines, then helpfulness.
  • Use that priority order to resolve conflicts when values collide.
Get the Snipd Podcast app to discover more snips from this episode
Get the app