
Breaking the Memory Wall in the Age of Inference
The Data Exchange with Ben Lorica
00:00
Pre-fill and Decode: Two Inference Phases
Sid explains pre-fill (compute heavy) versus decode (memory heavy) and why memory-centric designs speed decoding.
Play episode from 10:37
Transcript


