Vanishing Gradients

Episode 60: 10 Things I Hate About AI Evals with Hamel Husain

230 snips
Sep 30, 2025
Hamel Husain, a machine learning engineer and evals expert, discusses the pitfalls of AI evaluations and how to adopt a data-centric approach for reliable results. He outlines ten critical mistakes teams make, debunking ineffective metrics like 'hallucination scores' in favor of tailored analytics. Hamel shares a workflow for effective error analysis, including involving domain experts wisely and avoiding hasty automation. Bryan Bischoff joins as a guest to introduce the 'Failure as a Funnel' concept, emphasizing focused debugging for complex AI systems.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Use A Failure Funnel For Agents

  • Break agent evaluation into steps using a failure funnel or transition matrix to spot hotspots.
  • Prioritize fixes by which step most frequently causes failures.
ADVICE

Open The Prompt And Read It

  • Read and review every prompt; frameworks often hide the prompt content from you.
  • Write clear prompts and remove sloppy copy-paste artifacts like irrelevant roles or emojis.
ADVICE

Replace Noise With Custom Error Metrics

  • Replace noisy dashboards with custom metrics tied to real error categories you discovered.
  • Track true error rates informed by your error analysis, not vanity scores.
Get the Snipd Podcast app to discover more snips from this episode
Get the app