The Stack Overflow Podcast cover image

Generating text with diffusion (and ROI with LLMs)

The Stack Overflow Podcast

00:00

Serving stack differences for diffusion LLMs

Stefano outlines serving-engine changes, caching, and continuous patching needed to run diffusion models efficiently in production.

Play episode from 07:10
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app