Win-Win with Liv Boeree

#53 - John Schultz - Why Google Made ChatGPT, Gemini & Claude Play 900,000 hands of Poker...

43 snips
Feb 6, 2026
John Schultz, a DeepMind research engineer with a games and game-theory background, explains the Kaggle Game Arena. He recounts running 900,000 poker hands to compare LLMs, explores why games like poker and Werewolf reveal model strengths and quirks, and discusses surprising behaviors, prompt design, and what games teach us about capabilities and safety.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ADVICE

Design Prompts With Clear Objectives

  • When evaluating LLM gameplay, explicitly define the objective and request reasoning to inspect model decisions.
  • Frame prompts neutrally and require EV-maximizing justification in standard poker concepts.
INSIGHT

Chunked Episodes Aid In-Game Adaptation

  • Context windows matter for opponent adaptation, so matches used 100-hand episodes with full hand-history in context.
  • Revealing opponent cards at each hand's end accelerates learning within an episode.
INSIGHT

Models Show Distinct Playing Styles

  • Models developed distinct playing styles, from hyper-aggressive to passive, showing varied emergent behaviors.
  • These styles indicate model differences but precise causes are hard to pinpoint.
Get the Snipd Podcast app to discover more snips from this episode
Get the app