Benchmark results: best models still fall short

Unknown Speaker summarizes scores: Gemini 3 Flash and GPT-5.2 lead but top accuracy remains around 24%.

Play episode from 04:09

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!