Latent Space: The AI Engineer Podcast cover image

NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo)

Latent Space: The AI Engineer Podcast

00:00

Disaggregation: prefill vs decode separation

Kyle explains splitting prefill and decode phases to remove synchronous scheduling and specialize worker pools.

Play episode from 38:17
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app