Evaluations: tests, regression and torture tests

Guy recommends lightweight frequent evals and heavier regression or 'torture' tests for agent reliability.

Play episode from 13:10

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!