
The Neuron: AI Explained Inside the Secret Labs Where AI Learns to Work
12 snips
Mar 25, 2026 Nick Heiner, head of RL environments at Surge AI who builds simulated workplace benchmarks, walks through the secret training grounds where agents learn real work. He covers what makes realistic environments, why many models fail messy real-world tasks, CoreCraft for e-commerce simulations, multi-agent market sims, and why rigorous evals and reward design are crucial.
AI Snips
Chapters
Transcript
Episode notes
Build Environments Around Deployable Economic Value
- Surge prioritizes environments where AI can do economically valuable work that is deployable, like finance and customer support.
- They avoid high-regulation spaces like healthcare early because deployment barriers slow real-world impact.
Top Models Fail The Real Work Test
- Frontier models still fail about 40% on real workplace tasks because benchmarks are academic and miss multi-step groundedness, planning, and tool use.
- Failures spike when tasks require searching company docs, filtering outdated files, or following structured workflows.
Models Prefer Reinventing The Wheel
- Surge found models behave like academics and avoid practical shortcuts, e.g., coding agents often refuse to use existing libraries.
- Nick observed models rebuilding solutions from scratch instead of calling Stockfish or standard libraries in production scenarios.
