
Offline with Jon Favreau The Philosopher Teaching AI to Be Good
81 snips
Feb 14, 2026 Amanda Askell, a philosopher-turned-AI researcher at Anthropic who helped craft Claude’s values-oriented constitution. She explains translating moral theory into training, giving a model a character that resists sycophancy, and teaching nuanced judgment, uncertainty, and empathetic facilitation. Conversation covers safety trade-offs, bias in data, and whether AI might one day deserve moral consideration.
AI Snips
Chapters
Transcript
Episode notes
Give Models Context, Not Just Rules
- Amanda Askell wrote Claude's Constitution to give the model context about its role and values rather than only rules.
- She expects broader context to help Claude generalize better to novel situations.
Use The Constitution As A Training Signal
- The Constitution is used directly in training via examples and reinforcement learning to steer Claude's judgments.
- Providing full documents and reward signals nudges models toward desired nuanced behavior.
Ask Models To Show Evidence And Uncertainty
- When models present contentious views, ask them to explain evidence and uncertainty rather than take a side.
- Encourage models to represent multiple perspectives and signal their own confidence level.

