Ep#42: General Intuition

4 snips

Nov 13, 2025

Discover how AI can learn from video games to create predictive world models. The team shares insights on using diffusion models for better visual detail in training agents. They explore the challenges of multi-player dynamics and the importance of high-quality action labels. The discussion includes innovations for stability and speed in model training, as well as the advantages of transferring knowledge across different games. Learn about their mission to develop general agents for complex reasoning in three-dimensional spaces.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Real-Time CSGO World Model Demo

Adam trained a diffusion world model on 95 hours of CSGO gameplay and demonstrated it can run interactively in real time.
The model ran at about 10 Hz on a single 3090 and produced action-steerable game frames.

INSIGHT

Engineering Tricks To Run Diffusion Fast

Speed was achieved by reducing denoising steps (down to three) and using a two-stage downsample-upsample pipeline for high-resolution frames.
These engineering choices let diffusion models run fast enough for RL and interactive control.

ADVICE

Fix Drift By Collecting More Data

Prioritize scaling data coverage to improve model stability in less-visited regions of an environment.
Pim emphasizes more data reduces out-of-distribution drift and stabilizes long rollouts.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

With enough data, robots and AI can learn “world models” that let them predict the results of their actions. These models are a way to learn how embodied AI agents can perform a wide variety of useful tasks — but they require a huge amount of data.

The team at General Intuition has a solution: use data from video games! Games teach movement, problem solving, and complex spatial reasoning, and they come in a staggering diversity of forms, covering a wide variety of problems. What’s more, the captured data is high-quality, without the noise or annotation error that can come from

We sat down with Pim and Adam from the General Intuition team to learn more about their history, their plans, and their philosophy.

Abstract:

World models constitute a promising approach for training reinforcement learning agents in a safe and sample-efficient manner. Recent world models predominantly operate on sequences of discrete latent variables to model environment dynamics. However, this compression into a compact discrete representation may ignore visual details that are important for reinforcement learning. Concurrently, diffusion models have become a dominant approach for image generation, challenging well-established methods modeling discrete latents. Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model. We analyze the key design choices that are required to make diffusion suitable for world modeling, and demonstrate how improved visual details can lead to improved agent performance. DIAMOND achieves a mean human normalized score of 1.46 on the competitive Atari 100k benchmark; a new best for agents trained entirely within a world model. We further demonstrate that DIAMOND’s diffusion world model can stand alone as an interactive neural game engine by training on static Counter-Strike: Global Offensive gameplay. To foster future research on diffusion for world modeling, we release our code, agents and playable world models at https://github.com/eloialonso/diamond.

Project Page

General Intuition

General Intuition is Hiring: https://jobs.ashbyhq.com/medal

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit robopapers.substack.com