
The Circuit EP 163: Breaking the Memory Wall: Micron’s Strategy for the AI Era
27 snips
May 5, 2026 Jeremy Werner, SVP and GM at Micron’s Core Data Center unit, leads memory and SSD strategy for AI workloads. He unpacks the rising 'memory wall' in inference and why expanded context windows explode memory needs. He outlines Micron’s multi-year bets across HBM, DRAM and ultra-high-capacity SSDs and how denser storage shrinks footprint, saves power, and reshapes data center design.
AI Snips
Chapters
Transcript
Episode notes
Memory Became The Core AI Bottleneck
- AI has permanently changed memory's strategic role by making memory the key asset to enable both training and inference at scale.
- Jeremy Werner says memory now breaks bottlenecks for inference and training, creating sustained demand rather than prior cyclicality.
Inference Creates A KVCache Memory Wall
- Inference is memory‑heavy because decode needs past tokens (KVCache) persisted to avoid recomputing entire history.
- Werner explains missing KVCache forces O(n^2) recompute, while storing it keeps per‑cycle work linear and multiplies effective GPU throughput.
Full Memory Hierarchy For AI Inference
- Memory hierarchy stretches from HBM (closest, 10–100GB KVCache) to main memory to expansion memory to SSD context storage and network data lakes.
- Werner maps tradeoffs: more capacity further from GPU but with higher latency and lower bandwidth.
