
The Gap Between AI Hype and Enterprise Reality
The Data Exchange with Ben Lorica
00:00
Evaluating Agents with Human‑Curated Test Cases
Barry and Richard explain judge builders, synthetic examples and leveraging experts' samples to test and improve agents.
Play episode from 29:02
Transcript


