Super Data Science: ML & AI Podcast with Jon Krohn cover image

973: AI Systems Performance Engineering, with Chris Fregly

Super Data Science: ML & AI Podcast with Jon Krohn

00:00

Inference fundamentals: pre-fill, decode, and KV cache

Chris explains pre-fill vs. decode steps, KV cache role, and how inference workloads require different node configurations.

Play episode from 39:13
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app