Latent Space: The AI Engineer Podcast cover image

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay

Latent Space: The AI Engineer Podcast

00:00

Benchmarks: Pokemon, IOI, and long-horizon planning

They examine Pokemon as a long-horizon benchmark and limits on models applying web knowledge in-game.

Play episode from 26:37
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app