Vanishing Gradients

Next Level AI Evals for 2026

28 snips
Apr 23, 2026
Eddie Landesberg, a staff data scientist who applies causal inference to AI evaluation, and Stella Wenxing Liu, Head of Applied Science and creator of an AI evals course, discuss using AI evals as a product compass. They cover team-centered evaluations, custom metrics and clear product constraints. They explore causal inference for policy-style evaluation and calibrating LLM judges. They emphasize hands-on data curiosity and statistical rigor.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

AI Evals Are Your Product Compass

  • AI evals act as the product's compass, guiding iteration and serving as release pass/fail gates.
  • Stella uses evals for regulatory compliance in education and to decide whether a feature is ready for experiments or production.
ADVICE

Make Evals A Crossfunctional Team Effort

  • Treat AI evaluation as a team sport involving PMs, data scientists, UX designers, and SMEs.
  • Stella says scope grows beyond one person's capacity so distribute roles early to cover technical, UX and domain needs.
ADVICE

Specify What Your Product Must Not Do

  • Define product expectations and constraints early, including what the product should not do.
  • Stella finds disagreements usually stem from missing product requirements rather than evaluation mechanics.
Get the Snipd Podcast app to discover more snips from this episode
Get the app