The Data Exchange with Ben Lorica cover image

The Gap Between AI Hype and Enterprise Reality

The Data Exchange with Ben Lorica

00:00

Evaluating Agents with Human‑Curated Test Cases

Barry and Richard explain judge builders, synthetic examples and leveraging experts' samples to test and improve agents.

Play episode from 29:02
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app