
NVIDIA AI Podcast Powering the AI Inference Wave with EPRI's Ben Sooter - Ep. 292
15 snips
Mar 4, 2026 Ben Sooter, Director of R&D at EPRI who works on innovation for reliable, affordable power, talks about how most AI energy is spent on inference rather than training. He describes micro data centers near underused substations to cut latency and boost resilience. He also covers distributed site clustering, grid integration strategies, and real-time low-latency applications.
AI Snips
Chapters
Transcript
Episode notes
Inference Consumes Most Model Energy
- Most of an AI model's lifetime energy is spent on inference rather than training.
- Ben Sooter says roughly 20% of compute is training and 80% is inference, so inference will drive the next major compute and power wave.
Agentic AI Could Move Loads To Nighttime
- Ben changed his load-shape hypothesis after considering agentic AIs that run autonomously and may compute at night.
- He realized consumer-driven agents could shift inference loads to nights and create new temporal patterns beyond daytime peaks.
Place Inference Compute Close To Users
- Inference data centers should be geographically distributed near users for latency and performance.
- Ben compares the shift to early streaming and gaming servers, noting proximity mirrors improve user experience for latency‑sensitive services.
