Episode 60: 10 Things I Hate About AI Evals with Hamel Husain

230 snips

Sep 30, 2025

Guest

Bryan Bischoff

Guest

Hamel Husain

Hamel Husain, a machine learning engineer and evals expert, discusses the pitfalls of AI evaluations and how to adopt a data-centric approach for reliable results. He outlines ten critical mistakes teams make, debunking ineffective metrics like 'hallucination scores' in favor of tailored analytics. Hamel shares a workflow for effective error analysis, including involving domain experts wisely and avoiding hasty automation. Bryan Bischoff joins as a guest to introduce the 'Failure as a Funnel' concept, emphasizing focused debugging for complex AI systems.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Use A Failure Funnel For Agents

Break agent evaluation into steps using a failure funnel or transition matrix to spot hotspots.
Prioritize fixes by which step most frequently causes failures.

ADVICE

Open The Prompt And Read It

Read and review every prompt; frameworks often hide the prompt content from you.
Write clear prompts and remove sloppy copy-paste artifacts like irrelevant roles or emojis.

ADVICE

Replace Noise With Custom Error Metrics

Replace noisy dashboards with custom metrics tied to real error categories you discovered.
Track true error rates informed by your error analysis, not vanity scores.

Get the Snipd Podcast app to discover more snips from this episode

Get the app