Benchmark shows limits on white-collar automation

Jaeden summarizes Merkur's Apex Agents benchmark showing models score ~25% on complex white-collar tasks.

Play episode from 05:49

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!