Generating text with diffusion (and ROI with LLMs)

17 snips

Feb 3, 2026

Guest

Stefano Ermon

Guest

Aldo Luevano

Stefano Ermon, researcher and CEO of Inception Labs who helped pioneer diffusion models, explains how diffusion language models generate many tokens together and why they can be faster and more accurate. Aldo Luevano, chairman and co-founder of Roomie and robotics/enterprise AI leader, talks about an ROI-first platform that measures impact, integrates physical AI, and automates legacy systems. Short, technical, and business-focused conversation.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Parallel Token Generation Speeds Inference

Diffusion language models generate multiple tokens in parallel by iteratively refining a noisy guess.
Stefano Ermon says this yields 5–10x faster inference than comparable autoregressive models.

INSIGHT

Training As Denoising, Not Next-Token Prediction

Inception trains transformers to denoise corrupted text instead of predicting next tokens.
The model learns to correct mistakes and iteratively produce a clean answer at inference time.

INSIGHT

Built-In Error Correction Lowers Irreversible Mistakes

Diffusion models reduce irreversible errors because they can revise tokens during generation.
Stefano notes they still hallucinate but tend to match accuracy of speed-optimized autoregressive models.

Get the Snipd Podcast app to discover more snips from this episode

Get the app