NVIDIA AI Podcast

Powering the AI Inference Wave with EPRI's Ben Sooter - Ep. 292

15 snips
Mar 4, 2026
Ben Sooter, Director of R&D at EPRI who works on innovation for reliable, affordable power, talks about how most AI energy is spent on inference rather than training. He describes micro data centers near underused substations to cut latency and boost resilience. He also covers distributed site clustering, grid integration strategies, and real-time low-latency applications.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Inference Consumes Most Model Energy

  • Most of an AI model's lifetime energy is spent on inference rather than training.
  • Ben Sooter says roughly 20% of compute is training and 80% is inference, so inference will drive the next major compute and power wave.
ANECDOTE

Agentic AI Could Move Loads To Nighttime

  • Ben changed his load-shape hypothesis after considering agentic AIs that run autonomously and may compute at night.
  • He realized consumer-driven agents could shift inference loads to nights and create new temporal patterns beyond daytime peaks.
INSIGHT

Place Inference Compute Close To Users

  • Inference data centers should be geographically distributed near users for latency and performance.
  • Ben compares the shift to early streaming and gaming servers, noting proximity mirrors improve user experience for latency‑sensitive services.
Get the Snipd Podcast app to discover more snips from this episode
Get the app