A New Era of Context Memory with Val Bercovici from WEKA

7 snips

Feb 6, 2026

Val Bercovici, Chief AI Officer at Weka and former NetApp CTO, explains AI-native storage and context memory challenges. They explore memory tiering from HBM to NVMe. Discussions cover latency, high-bandwidth flash, dedicated context memory networks, Axon pooling local NVMe into memory, augmented memory grids, token warehouses, and networking innovations for AI infrastructure.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Context Memory Scales Far Faster Than Tokens

Context memory demand explodes because vectorizing tokens multiplies storage needs dramatically.
One megabyte of tokens can translate to ~50GB of KV cache, making on-GPU memory insufficient quickly.

INSIGHT

NVMe Is Memory Extension, Not Just Storage

NVMe treats flash as a memory extension and enables tiered context memory architectures.
The challenge is making slower NVMe behave like memory so models don't suffer perceptible latency.

INSIGHT

Networks Can Outpace Motherboard Bandwidth

Modern AI systems invert traditional assumptions: networks can be faster than motherboards.
GPUs collaborate across high-speed fabrics so network design matters as much as local memory.

Get the Snipd Podcast app to discover more snips from this episode

Get the app