Latent Space: The AI Engineer Podcast

Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun

227 snips
Apr 2, 2026
Fan-yun Sun, Moonlake AI co-founder focused on interactive world models, joins Chris Manning, Stanford NLP pioneer, for a lively tour of AI worlds that are multimodal, interactive, and efficient. They dig into action-conditioned environments, symbolic structure over pure pixels, game engines as reasoning tools, persistent simulated worlds, programmable visuals, spatial audio, and why benchmarking should focus on real utility.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes

Reasoning Traces Make Worlds Trainable

  • Fan-Yun Sun shows Moonlake's reasoning traces model gameplay state, not just visuals, across geometry, physics, affordances, audio, and scoring.
  • In the bowling example, the system tracks ball pickup, pin collisions, resets, and score updates so users can actually practice the game.

Code Engines Serve As Cognitive Tools

  • Moonlake treats physics engines, code, and other simulators as cognitive tools a model can call instead of end-to-end learned priors.
  • Fan-Yun Sun says this makes multiplayer and different task-specific simulators possible, rather than forcing one fixed visual pipeline for every world.

Reverie Separates Gameplay State From Visual Style

  • Moonlake splits world modeling into a causal reasoning model for persistent state and Reverie for photorealistic restyling.
  • Fan-Yun Sun says Reverie preserves interactivity while changing pixel distribution, making rendering programmable and able to react to game state.
Get the Snipd Podcast app to discover more snips from this episode
Get the app