Reproducibility, evals, and LLM-based testing

Fitz outlines two approaches: workflow-skeleton tests and open-ended evals using LLMs to assess other LLM outputs probabilistically.

Play episode from 05:23

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!