The Circuit

EP 163: Breaking the Memory Wall: Micron’s Strategy for the AI Era

27 snips
May 5, 2026
Jeremy Werner, SVP and GM at Micron’s Core Data Center unit, leads memory and SSD strategy for AI workloads. He unpacks the rising 'memory wall' in inference and why expanded context windows explode memory needs. He outlines Micron’s multi-year bets across HBM, DRAM and ultra-high-capacity SSDs and how denser storage shrinks footprint, saves power, and reshapes data center design.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Memory Became The Core AI Bottleneck

  • AI has permanently changed memory's strategic role by making memory the key asset to enable both training and inference at scale.
  • Jeremy Werner says memory now breaks bottlenecks for inference and training, creating sustained demand rather than prior cyclicality.
INSIGHT

Inference Creates A KVCache Memory Wall

  • Inference is memory‑heavy because decode needs past tokens (KVCache) persisted to avoid recomputing entire history.
  • Werner explains missing KVCache forces O(n^2) recompute, while storing it keeps per‑cycle work linear and multiplies effective GPU throughput.
INSIGHT

Full Memory Hierarchy For AI Inference

  • Memory hierarchy stretches from HBM (closest, 10–100GB KVCache) to main memory to expansion memory to SSD context storage and network data lakes.
  • Werner maps tradeoffs: more capacity further from GPU but with higher latency and lower bandwidth.
Get the Snipd Podcast app to discover more snips from this episode
Get the app