EP 163: Breaking the Memory Wall: Micron’s Strategy for the AI Era

27 snips

May 5, 2026

Jeremy Werner, SVP and GM at Micron’s Core Data Center unit, leads memory and SSD strategy for AI workloads. He unpacks the rising 'memory wall' in inference and why expanded context windows explode memory needs. He outlines Micron’s multi-year bets across HBM, DRAM and ultra-high-capacity SSDs and how denser storage shrinks footprint, saves power, and reshapes data center design.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Memory Became The Core AI Bottleneck

AI has permanently changed memory's strategic role by making memory the key asset to enable both training and inference at scale.
Jeremy Werner says memory now breaks bottlenecks for inference and training, creating sustained demand rather than prior cyclicality.

INSIGHT

Inference Creates A KVCache Memory Wall

Inference is memory‑heavy because decode needs past tokens (KVCache) persisted to avoid recomputing entire history.
Werner explains missing KVCache forces O(n^2) recompute, while storing it keeps per‑cycle work linear and multiplies effective GPU throughput.

INSIGHT

Full Memory Hierarchy For AI Inference

Memory hierarchy stretches from HBM (closest, 10–100GB KVCache) to main memory to expansion memory to SSD context storage and network data lakes.
Werner maps tradeoffs: more capacity further from GPU but with higher latency and lower bandwidth.

Get the Snipd Podcast app to discover more snips from this episode