Interconnects

GPT 5.4 is a big step for Codex

62 snips
Mar 18, 2026
A lively take on how GPT 5.4 advances agent workflows by improving correctness, speed, ease of use, and cost. Discussion covers everyday engineering tasks that used to cause frequent failures and why the new model feels smoother. Comparisons highlight contrasting styles and practical trade offs between different AI systems. Thoughts on Codex app polish, token efficiency, and future integrations round out the conversation.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Benchmarks Need Multiple Axes For Agents

  • Traditional benchmarks compress agent performance into a single correctness score that misses key trade-offs.
  • Nathan Lambert argues agent tasks require correctness, ease of use, speed, and cost to be evaluated separately to reflect real-world utility.
ANECDOTE

Rage Quits Ended With GPT 5.4

  • Nathan describes frequent "rage quits" with earlier OpenAI agents that failed on routine engineering tasks like git and package installs.
  • He notes those hard edges disappeared with GPT 5.4, which now handles APIs, LaTeX, FFmpeg, and git more reliably in small projects.
INSIGHT

Approachability Comes From Presentation And Tone

  • GPT 5.4 improved approachability by the way the model harness presents outputs and requests, making it easier to dive in.
  • Nathan contrasts this mechanical precision with Claude's warmer, opinionated style and suggests differing uses for each.
Get the Snipd Podcast app to discover more snips from this episode
Get the app