Tech Talks Daily cover image

How DDN And NVIDIA Are Rethinking AI Infrastructure For The Rubin Era

Tech Talks Daily

00:00

Improving GPU utilization and time-to-token

Alex describes engineering changes—KV cache tiering and DPU offload—that raise GPU utilization and reduce time to first token.

Play episode from 07:40
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app