
Vanishing Gradients Episode 60: 10 Things I Hate About AI Evals with Hamel Husain
230 snips
Sep 30, 2025 Hamel Husain, a machine learning engineer and evals expert, discusses the pitfalls of AI evaluations and how to adopt a data-centric approach for reliable results. He outlines ten critical mistakes teams make, debunking ineffective metrics like 'hallucination scores' in favor of tailored analytics. Hamel shares a workflow for effective error analysis, including involving domain experts wisely and avoiding hasty automation. Bryan Bischoff joins as a guest to introduce the 'Failure as a Funnel' concept, emphasizing focused debugging for complex AI systems.
AI Snips
Chapters
Transcript
Episode notes
Use A Failure Funnel For Agents
- Break agent evaluation into steps using a failure funnel or transition matrix to spot hotspots.
- Prioritize fixes by which step most frequently causes failures.
Open The Prompt And Read It
- Read and review every prompt; frameworks often hide the prompt content from you.
- Write clear prompts and remove sloppy copy-paste artifacts like irrelevant roles or emojis.
Replace Noise With Custom Error Metrics
- Replace noisy dashboards with custom metrics tied to real error categories you discovered.
- Track true error rates informed by your error analysis, not vanity scores.

