We live in an era of charts that are going up and to the right. This image obviously describes the stock market, particularly any company whose business is adjacent to artificial intelligence. But beyond stocks, another sort of chart we keep seeing is of AI capabilities also going up and to the right. The most famous and viral of these comes from an organization called METR, which stands for Model Evaluation and Threat Research. The organization is focused on understanding the degree to which AI models can engage in autonomous, complex tasks. METR see this is as a particularly important benchmark, given the risk that AI could one day be engaged in recursive self improvement, taking humans out of the loop. But how do you really gauge a model's ability to do complex problems. And what is being measured for exactly? On this episode, we speak with METR's President Chris Painter as well as Joel Becker, a member of the technical staff who works on evaluation methods for the organization. We discuss both the mechanics and the philosophy of METR's work, and what it means when we see a a chart showing that Clause Opus 4.6 can do a task that would take a human nearly 12 hours.

Only http://Bloomberg.com subscribers can get the Odd Lots newsletter in their inbox each week, plus unlimited access to the site and app. Subscribe at bloomberg.com/subscriptions/oddlots

Subscribe to the Odd Lots Newsletter
Join the conversation: discord.gg/oddlots

See omnystudio.com/listener for privacy information.

Understanding the Most Viral Chart in Artificial Intelligence

Odd Lots

How human baselines get set

The AI-powered Podcast Player