Evaluating Agents with Human‑Curated Test Cases

Barry and Richard explain judge builders, synthetic examples and leveraging experts' samples to test and improve agents.

Play episode from 29:02

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!