#53 - John Schultz - Why Google Made ChatGPT, Gemini & Claude Play 900,000 hands of Poker...

43 snips

Feb 6, 2026

John Schultz, a DeepMind research engineer with a games and game-theory background, explains the Kaggle Game Arena. He recounts running 900,000 poker hands to compare LLMs, explores why games like poker and Werewolf reveal model strengths and quirks, and discusses surprising behaviors, prompt design, and what games teach us about capabilities and safety.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ADVICE

Design Prompts With Clear Objectives

When evaluating LLM gameplay, explicitly define the objective and request reasoning to inspect model decisions.
Frame prompts neutrally and require EV-maximizing justification in standard poker concepts.

INSIGHT

Chunked Episodes Aid In-Game Adaptation

Context windows matter for opponent adaptation, so matches used 100-hand episodes with full hand-history in context.
Revealing opponent cards at each hand's end accelerates learning within an episode.

INSIGHT

Models Show Distinct Playing Styles

Models developed distinct playing styles, from hyper-aggressive to passive, showing varied emergent behaviors.
These styles indicate model differences but precise causes are hard to pinpoint.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Which LLM is best at POKER? What about the social deception game Werewolf? This week I'm collaborating with ⁠ @googledeepmind ⁠ and ⁠ @kaggle ⁠ to explain their new “AI Game Arena” project, designed to test all the top LLMs at various games.

The Game Arena is a massive research project to create new AI benchmarks, and understand what really makes these general-purpose AIs tick. It also helps us evaluate how far along the path to true AGI we are, NBD.So who better to speak to than Deepmind research engineer John Schultz, one of the main brains behind the project.

We discuss why some LLMs are so much better than others, their internal understanding of the games (or lack of it!) and why games are a useful way to evaluate both capabilities and safety of frontier models.And friends - do please note that for the first time ever on this channel, this post is part of a paid partnership (woohoo I made it!).

So while everything I said in this interview, is absolutely my own views, there are also direct financial incentives behind this particular interview.

Chapters

00:00 - Intro

00:55 - What Is Game Arena?

01:25 - John’s Favorite Games

03:18 - Narrow AI Vs LLMs: What’s The Difference?

05:58 - Will LLM Poker Skills Transfer To Other Games?

09:00 - Benchmarks And Evals

10:50 - Prompt Design For Game-Playing LLMs

13:32 - Do LLMs Understand Concepts Like Expected Value?

14:24 - Context Window Length17:27 - Why Do LLMs Have Different Playing Styles?

18:41 - Where Does Their Skill Come From?

21:09 - Surprising Results

23:50 - Why Werewolf?

25:48 - Where Does LLM Reasoning Come From?

27:49 - Could LLMs Learning Social Games Be Dangerous?30:58 - Can Games Teach LLMs Cooperation?

32:48 - Which Games Will Be Hardest To Learn?

Links:

♾️ Kaggle Game Arena (Main Hub): https://www.kaggle.com/game-arena

♾️ Introducing the Kaggle Game Arena (Official Post): https://www.kaggle.com/blog/introducing-game-arena

♾️ Poker Benchmark – 900,000 Hands: https://www.kaggle.com/blog/game-arena-poker

♾️ Werewolf Benchmark (Social Deception Game): https://www.kaggle.com/benchmarks/kaggle/werewolf

♾️ Google DeepMind (Research Organization): https://deepmind.google

♾️ Polaris Library (DeepMind, Open Source): https://github.com/google-deepmind/polaris

Credits:

♾️ Hosted by Liv Boeree

♾️ Produced by Luca de Vico

The Win-Win Podcast:

Poker champion Liv Boeree takes to the interview chair to tease apart the complexities of one of the most fundamental parts of human nature: competition. Liv is joined by top philosophers, gamers, artists, technologists, CEOs, scientists, politicians and more to understand how competition manifests in their world, and how to change seemingly win-lose systems into Win-Wins.

Podcast links:

♾️ Website: https://www.winwinpodcast.com/

♾️ Youtube: https://www.youtube.com/playlist?list=PLWgq0OZMtwtOIyMsVM_vksqdfWcM-b68S

♾️ Spotify: https://open.spotify.com/show/03bGVUaFZmJUmEvSHNDPdI?si=64379cc23696454f

♾️ Apple Podcasts: https://podcasts.apple.com/us/podcast/win-win-with-liv-boeree/id1724791350

♾️ Pocketcast: https://play.pocketcasts.com/podcasts/7f708340-d17c-013b-f46e-0acc26574db2

#winwinpodcast #AI #poker #kaggle