The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

12 snips

Mar 25, 2026

Nick Heiner, head of RL environments at Surge AI who builds simulated workplace benchmarks, walks through the secret training grounds where agents learn real work. He covers what makes realistic environments, why many models fail messy real-world tasks, CoreCraft for e-commerce simulations, multi-agent market sims, and why rigorous evals and reward design are crucial.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Build Environments Around Deployable Economic Value

Surge prioritizes environments where AI can do economically valuable work that is deployable, like finance and customer support.
They avoid high-regulation spaces like healthcare early because deployment barriers slow real-world impact.

INSIGHT

Top Models Fail The Real Work Test

Frontier models still fail about 40% on real workplace tasks because benchmarks are academic and miss multi-step groundedness, planning, and tool use.
Failures spike when tasks require searching company docs, filtering outdated files, or following structured workflows.

ANECDOTE

Models Prefer Reinventing The Wheel

Surge found models behave like academics and avoid practical shortcuts, e.g., coding agents often refuse to use existing libraries.
Nick observed models rebuilding solutions from scratch instead of calling Stockfish or standard libraries in production scenarios.

Get the Snipd Podcast app to discover more snips from this episode