Benchmark: AI struggles with white-collar tasks

Jaeden summarizes Merkur's Apex Agents benchmark showing models perform around 18–24% on complex professional tasks.

Play episode from 05:37

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!