Offline with Jon Favreau

The Philosopher Teaching AI to Be Good

81 snips
Feb 14, 2026
Amanda Askell, a philosopher-turned-AI researcher at Anthropic who helped craft Claude’s values-oriented constitution. She explains translating moral theory into training, giving a model a character that resists sycophancy, and teaching nuanced judgment, uncertainty, and empathetic facilitation. Conversation covers safety trade-offs, bias in data, and whether AI might one day deserve moral consideration.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Give Models Context, Not Just Rules

  • Amanda Askell wrote Claude's Constitution to give the model context about its role and values rather than only rules.
  • She expects broader context to help Claude generalize better to novel situations.
INSIGHT

Use The Constitution As A Training Signal

  • The Constitution is used directly in training via examples and reinforcement learning to steer Claude's judgments.
  • Providing full documents and reward signals nudges models toward desired nuanced behavior.
ADVICE

Ask Models To Show Evidence And Uncertainty

  • When models present contentious views, ask them to explain evidence and uncertainty rather than take a side.
  • Encourage models to represent multiple perspectives and signal their own confidence level.
Get the Snipd Podcast app to discover more snips from this episode
Get the app