Infinite Curiosity Pod with Prateek Joshi

Diffusion LLMs - The Fastest LLMs Ever Built | Stefano Ermon, cofounder of Inception Labs

10 snips

Oct 9, 2025

In this engaging discussion, Stefano Ermon, a Stanford associate professor and co-founder of Inception Labs, dives into the revolutionary world of Diffusion Language Models. He explains how these models surpass traditional autoregressive techniques, highlighting breakthroughs in parallel refinement for text and code generation. Stefano also shares insights on engineering challenges, the importance of high-quality data, and commercial viability. Excitingly, he discusses the future potential of diffusion LLMs in coding and multimodal applications.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ANECDOTE

From GANs To Text Diffusion Breakthrough

Stefano described his research path from GANs to score-based diffusion models and the community shift to diffusion for images.
He recounted a 2024 breakthrough extending diffusion math to discrete text and code, leading to an ICML best paper.

INSIGHT

Denoising Training Enables Iterative Repair

Diffusion models are trained by denoising: intentionally corrupting clean text and training the network to fix mistakes.
At inference the model iteratively repairs a random or bad initial guess until the output is high quality.

ADVICE

Optimize Serving With A Custom Engine

Build a custom serving engine to harness diffusion LLM parallelism for production traffic.
Off-the-shelf autoregressive serving stacks won't deliver optimal performance for diffusion inference.

Get the Snipd Podcast app to discover more snips from this episode

Get the app