LessWrong (Curated & Popular) cover image

"How to game the METR plot" by shash42

LessWrong (Curated & Popular)

00:00

Labs' incentives and benchmark overfitting

shash42 discusses how labs' focus on benchmarks like CTFs can make the METR plot reflect effort, not general capability.

Play episode from 06:10
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app