
Don't Worry About the Vase Podcast GPT-5.4 Is A Substantial Upgrade
5 snips
Mar 11, 2026 A deep look at why GPT-5.4 feels like a clear upgrade over prior versions. Discussion of OpenAI's official feature pitch, benchmarks, and frontier math records. Comparison with Gemini and Claude Opus 4.6 across speed, accuracy, and style. Safety evaluations, jailbreak defenses, and real-world demos like UI automation and tooling tests. Polls and recommendations on when to add GPT-5.4 to your roster.
AI Snips
Chapters
Transcript
Episode notes
GPT 5.4 Set Records On Frontier Math
- Epoch AI and Lehem report GPT 5.4 set new records on Frontier Math and solved a previously open problem.
- Pre-release testing showed tiered scores including 38% on the hardest Tier 4 problems for GPT 5.4 Pro.
Safety Tests Are Saturated And Likely Lag Future Attacks
- Safety/evals for disallowed content are saturated and only modestly improved; jailbreak defenses target past attacks and will likely be bypassed.
- OpenAI's tests measure robustness against known attacks but may not catch future adversarial methods.
Agentic Reversion Improved But Not Production Ready
- GPT 5.4 shows progress on destructive-action avoidance and reverting agentic changes but performance remains far from reliable.
- Table results show improved user-work preservation (~0.53) but high failure rates remain for production use.
