The Data Exchange with Ben Lorica cover image

Breaking the Memory Wall in the Age of Inference

The Data Exchange with Ben Lorica

00:00

Pre-fill and Decode: Two Inference Phases

Sid explains pre-fill (compute heavy) versus decode (memory heavy) and why memory-centric designs speed decoding.

Play episode from 10:37
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app