The Stack Overflow Podcast

Generating text with diffusion (and ROI with LLMs)

17 snips
Feb 3, 2026
Stefano Ermon, researcher and CEO of Inception Labs who helped pioneer diffusion models, explains how diffusion language models generate many tokens together and why they can be faster and more accurate. Aldo Luevano, chairman and co-founder of Roomie and robotics/enterprise AI leader, talks about an ROI-first platform that measures impact, integrates physical AI, and automates legacy systems. Short, technical, and business-focused conversation.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Parallel Token Generation Speeds Inference

  • Diffusion language models generate multiple tokens in parallel by iteratively refining a noisy guess.
  • Stefano Ermon says this yields 5–10x faster inference than comparable autoregressive models.
INSIGHT

Training As Denoising, Not Next-Token Prediction

  • Inception trains transformers to denoise corrupted text instead of predicting next tokens.
  • The model learns to correct mistakes and iteratively produce a clean answer at inference time.
INSIGHT

Built-In Error Correction Lowers Irreversible Mistakes

  • Diffusion models reduce irreversible errors because they can revise tokens during generation.
  • Stefano notes they still hallucinate but tend to match accuracy of speed-optimized autoregressive models.
Get the Snipd Podcast app to discover more snips from this episode
Get the app