Direct Testing Shows Incremental Gains

Cal Newport examines the UK AISI evaluation, finding Mythos near the top on capture-the-flag tasks but without a decisive leap over GPT-5 or Opus 4.6.

Play episode from 08:35

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app