
Semi Doped A New Era of Context Memory with Val Bercovici from WEKA
7 snips
Feb 6, 2026 Val Bercovici, Chief AI Officer at Weka and former NetApp CTO, explains AI-native storage and context memory challenges. They explore memory tiering from HBM to NVMe. Discussions cover latency, high-bandwidth flash, dedicated context memory networks, Axon pooling local NVMe into memory, augmented memory grids, token warehouses, and networking innovations for AI infrastructure.
AI Snips
Chapters
Transcript
Episode notes
Context Memory Scales Far Faster Than Tokens
- Context memory demand explodes because vectorizing tokens multiplies storage needs dramatically.
- One megabyte of tokens can translate to ~50GB of KV cache, making on-GPU memory insufficient quickly.
NVMe Is Memory Extension, Not Just Storage
- NVMe treats flash as a memory extension and enables tiered context memory architectures.
- The challenge is making slower NVMe behave like memory so models don't suffer perceptible latency.
Networks Can Outpace Motherboard Bandwidth
- Modern AI systems invert traditional assumptions: networks can be faster than motherboards.
- GPUs collaborate across high-speed fabrics so network design matters as much as local memory.

