No Priors AI

OpenAI’s Game-Changing Agents – Worth the Hype?

12 snips
Oct 13, 2025
OpenAI is shaking up the AI landscape with its release of open models after five years. The conversation dives into whether these new agents signify progress or potential risks. Performance assessments on CodeForce and the implications of tool use in benchmarks highlight crucial findings. Plus, there's intriguing talk on hallucination rates and the company's stance on sharing training data. Microsoft's integration of these models into Windows adds another layer of innovation, promising exciting advancements for developers.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Tools Drive Many Benchmark Gains

  • 'Tools' in benchmarks mean external capabilities like calculators and code execution which materially boost performance.
  • OpenAI did not release those proprietary tools with the open models, so tool-augmented benchmarks aren't immediately reproducible by users.
ADVICE

Build Custom Tools When Using Base Models

  • If you integrate an open model into a product, plan to build custom tools and wrappers to match your task.
  • Startups should expect to add tooling anyway to get production-grade results from base models.
ANECDOTE

SelfPause Example Of Tooling

  • Jaeden recounts his first startup SelfPause where they built custom components to steer ChatGPT into coaching behavior.
  • He uses this to illustrate why developers typically add domain-specific tooling around models.
Get the Snipd Podcast app to discover more snips from this episode
Get the app