
AI + a16z Evals, Feedback Loops, and the Engineering That Makes AI Work
107 snips
Feb 17, 2026 Ankur Goyal, founder and CEO of Braintrust and former databases/AI products engineer, joins to talk engineering, evals, and productionizing models. He explains what evals do and why feedback loops matter. They debate systems vs. scaling mindsets, compare SQL vs. Bash agent designs, and unpack open vs. closed model cycles and Chinese model dynamics.
AI Snips
Chapters
Transcript
Episode notes
Evals Are Scientific Engineering
- Evals are the scientific method for non-deterministic AI systems and make hypotheses testable.
- Ankur Goyal says qualitative review plus quantitative metrics refines evals over time.
Design Agents To Be Disposable
- Build agents and context layers expecting to throw them away tomorrow for flexibility.
- Engineer strong feedback loops from production to tests to know where models actually fail.
When Models Stall, Engineering Wins
- Model quality improvements slow unpredictably, creating opportunities to engineer efficiency.
- When you can't make the model 1% smarter, engineering can make it far more efficient.

