The more AI can do, the more we need to ask what it should and shouldn’t do. In this episode, OpenAI researcher Jason Wolfe joins host Andrew Mayne to talk about the Model Spec, the public framework that defines intended model behavior. They discuss how the Model Spec works in practice, including how the chain of command handles conflicts between instructions, and how OpenAI evolves it based on feedback, real-world use, and new model capabilities.
Chapters
00:00 Introduction
01:10 What is the Model Spec?
03:55 How does the Model Spec work in practice?
06:26 Transparency: Where to read the Model Spec & give feedback
07:51 How did the Model Spec originate?
10:02 How does the spec translate into model behavior?
11:26 What is the hierarchy / chain of command?
13:35 Handling edge cases like Santa Claus
17:41 How does the Model Spec evolve over time?
19:59 What happens when models disagree with the spec?
22:05 How do smaller models follow the spec?
23:16 Is chain-of-thought useful for alignment?
24:16 Model Spec vs Anthropic’s Constitution
26:28 What surprised you most?
26:56 How do you define the scope of the spec?
27:44 What is the future of the Model Spec?
31:16 How should developers think about the spec?
34:44 Asimov’s laws vs Model Spec
37:16 Could AI write a Human Spec?
Hosted on Acast. See acast.com/privacy for more information.