TechCrunch Industry News

Running AI models is turning into a memory game

Feb 17, 2026
Conversation centers on rising memory costs for AI and why DRAM matters as much as GPUs. They discuss memory orchestration and prompt caching complexity at Anthropic. Startups and stack layers that can optimize cache get attention. The EU disabling built-in AI tools and legal risks from data sharing are also explored.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Memory Is The New Bottleneck

  • Memory (DRAM and cache) is becoming as critical as GPUs for AI infrastructure costs and performance.
  • Companies that master memory orchestration will perform the same queries with far fewer tokens and gain a competitive edge.
INSIGHT

Prompt Caching Reveals Cost Dynamics

  • Prompt caching changes show how providers monetize and optimize memory access windows for AI queries.
  • Efficient cache use can drastically cut costs but requires careful orchestration to avoid evicting useful data.
ADVICE

Optimize Memory To Cut Token Costs

  • Optimize memory orchestration across the stack to reduce token use and inference costs.
  • Focus on cache optimization, data-center memory allocation, and model swarms to extract savings.
Get the Snipd Podcast app to discover more snips from this episode
Get the app