
Software Engineering Daily Optimizing Agent Behavior in Production with Gideon Mendels
31 snips
Feb 17, 2026 Gideon Mendels, Co-founder and CEO of Comet who built ML systems at Google, talks about building and evaluating LLM-powered agents. He covers why non-determinism breaks traditional testing. He explains evals as test suites, bootstrapping regressions from production, and treating prompts, tools, and configs as optimization/search problems.
AI Snips
Chapters
Transcript
Episode notes
From Software Engineer To MLOps Founder
- Gideon described moving from software engineering into ML and seeing chaotic ML workflows at Google that inspired Comet.
- He and his co-founder built experiment tracking in 2017–18 which later evolved to support LLM-driven agent workflows with OPIC in 2024.
Agents Are A Hybrid Engineering Domain
- Agent development sits between traditional software engineering and ML because builders control prompts, tools, and context rather than weights.
- This hybrid nature requires different SDKs, UIs, and operational patterns than classic MLOps.
Bootstrap Evals From Real Failures
- Build evaluation suites (evals) mapping inputs to expected outputs and scoring distance between them.
- Use human subject-matter experts and product workflows to bootstrap regression tests from real production failures.

