Latent Space: The AI Engineer Podcast cover image

NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo)

Latent Space: The AI Engineer Podcast

00:00

Long context limits and model-hardware co-design

Kyle and hosts discuss context length scaling, attention costs, and model/hardware co-design breakthroughs (unhobblers).

Play episode from 43:11
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app