Infinite Curiosity Pod with Prateek Joshi

Diffusion LLMs - The Fastest LLMs Ever Built | Stefano Ermon, cofounder of Inception Labs

10 snips
Oct 9, 2025
In this engaging discussion, Stefano Ermon, a Stanford associate professor and co-founder of Inception Labs, dives into the revolutionary world of Diffusion Language Models. He explains how these models surpass traditional autoregressive techniques, highlighting breakthroughs in parallel refinement for text and code generation. Stefano also shares insights on engineering challenges, the importance of high-quality data, and commercial viability. Excitingly, he discusses the future potential of diffusion LLMs in coding and multimodal applications.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

From GANs To Text Diffusion Breakthrough

  • Stefano described his research path from GANs to score-based diffusion models and the community shift to diffusion for images.
  • He recounted a 2024 breakthrough extending diffusion math to discrete text and code, leading to an ICML best paper.
INSIGHT

Denoising Training Enables Iterative Repair

  • Diffusion models are trained by denoising: intentionally corrupting clean text and training the network to fix mistakes.
  • At inference the model iteratively repairs a random or bad initial guess until the output is high quality.
ADVICE

Optimize Serving With A Custom Engine

  • Build a custom serving engine to harness diffusion LLM parallelism for production traffic.
  • Off-the-shelf autoregressive serving stacks won't deliver optimal performance for diffusion inference.
Get the Snipd Podcast app to discover more snips from this episode
Get the app