Chain of Thought | AI Agents, Infrastructure & Engineering cover image

Why LLMs Are Plausibility Engines, Not Truth Engines | Dan Klein

Chain of Thought | AI Agents, Infrastructure & Engineering

00:00

Benchmarks: wrong metrics and overfitting risks

Dan critiques common benchmarks, dataset overfitting, and the difference between average accuracy and consistency.

Play episode from 14:50
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app