Interconnects

OpenAI's o3: Over-optimization is back and weirder than ever

117 snips
Apr 19, 2025
The discussion dives into the intriguing phenomenon of over-optimization in reinforcement learning. It highlights how this issue impacts language models and leads to unexpected behaviors, such as gibberish output. The hosts explore the new o3 model from OpenAI, showcasing its unique inference abilities and the balance between enhanced performance and potential pitfalls. Real-world examples, like the cartwheeling cheetah, illustrate the challenges of reward design and task generalization in AI development.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Over-Optimization in RLHF Models

  • Over-optimization in RLHF caused models to output gibberish and random tokens, not just refusals.
  • The training signals mismatch the true objectives, causing the dysfunction.
INSIGHT

O3 Model's New Over-Optimization

  • OpenAI's O3 model introduces a new over-optimization type linked to tool-enabled multi-step reasoning.
  • It excels in benchmarks but autonomously searches even without prompts, causing product challenges.
ANECDOTE

O3's Precise Tool Use Example

  • Lambert asked O3 for a motorboat GIF used by RL researchers, receiving a precise, helpful response instantly.
  • Competing models hallucinated wrong links, showing O3's better grounded search.
Get the Snipd Podcast app to discover more snips from this episode
Get the app