Software Engineering Daily cover image

Optimizing Agent Behavior in Production with Gideon Mendels

Software Engineering Daily

00:00

Holistic evaluation beyond model benchmarks

They argue benchmarks miss tooling, retrieval, and end-to-end harness impacts; practical evaluation must include the whole system.

Play episode from 42:39
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app