
The Stack Overflow Podcast Generating text with diffusion (and ROI with LLMs)
17 snips
Feb 3, 2026 Stefano Ermon, researcher and CEO of Inception Labs who helped pioneer diffusion models, explains how diffusion language models generate many tokens together and why they can be faster and more accurate. Aldo Luevano, chairman and co-founder of Roomie and robotics/enterprise AI leader, talks about an ROI-first platform that measures impact, integrates physical AI, and automates legacy systems. Short, technical, and business-focused conversation.
AI Snips
Chapters
Transcript
Episode notes
Parallel Token Generation Speeds Inference
- Diffusion language models generate multiple tokens in parallel by iteratively refining a noisy guess.
- Stefano Ermon says this yields 5–10x faster inference than comparable autoregressive models.
Training As Denoising, Not Next-Token Prediction
- Inception trains transformers to denoise corrupted text instead of predicting next tokens.
- The model learns to correct mistakes and iteratively produce a clean answer at inference time.
Built-In Error Correction Lowers Irreversible Mistakes
- Diffusion models reduce irreversible errors because they can revise tokens during generation.
- Stefano notes they still hallucinate but tend to match accuracy of speed-optimized autoregressive models.

