Justified Posteriors

Evaluating GDPVal, OpenAI's Eval for Economic Value

15 snips
Nov 4, 2025
Dive into the intriguing world of AI evaluations with a focus on OpenAI's new GDPVal metric. This innovative approach contrasts sharply with traditional macro frameworks, assessing AI's economic impact on specific tasks. Surprising findings reveal AI models like Claude achieving near human parity in various tasks. The discussion also uncovers the complexities of task design and the role of prompt engineering in AI performance. Expect insights on potential economic value automation could bring, alongside the challenges of automating knowledge work.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Claude Nears Human Parity

  • Claude Opus achieved a 47.6% win rate and GPT-5 High about 38.8% on 220 evaluated tasks, nearing human parity.
  • The hosts found Claude outperforming OpenAI's own leading model to be a striking claim.
INSIGHT

Big Model Gains Trump Small Prompting

  • Older models like GPT-4 only win ~10% while newer, more compute-intensive models show big jumps.
  • Model quality improvements over a year drove larger gains than prompt tweaks in many cases.
ADVICE

Favor Iterative AI Collaboration

  • Use collaborative 'N-shot' workflows to maximize speed and cost gains from AI.
  • OpenAI estimates GPT-5 yields up to 39% faster and 63% cheaper workflows in iterative use.
Get the Snipd Podcast app to discover more snips from this episode
Get the app