NVIDIA AI Podcast

Powering the AI Inference Wave with EPRI's Ben Sooter - Ep. 292

15 snips

Mar 4, 2026

Ben Sooter, Director of R&D at EPRI who works on innovation for reliable, affordable power, talks about how most AI energy is spent on inference rather than training. He describes micro data centers near underused substations to cut latency and boost resilience. He also covers distributed site clustering, grid integration strategies, and real-time low-latency applications.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Inference Consumes Most Model Energy

Most of an AI model's lifetime energy is spent on inference rather than training.
Ben Sooter says roughly 20% of compute is training and 80% is inference, so inference will drive the next major compute and power wave.

ANECDOTE

Agentic AI Could Move Loads To Nighttime

Ben changed his load-shape hypothesis after considering agentic AIs that run autonomously and may compute at night.
He realized consumer-driven agents could shift inference loads to nights and create new temporal patterns beyond daytime peaks.

INSIGHT

Place Inference Compute Close To Users

Inference data centers should be geographically distributed near users for latency and performance.
Ben compares the shift to early streaming and gaming servers, noting proximity mirrors improve user experience for latency‑sensitive services.

Get the Snipd Podcast app to discover more snips from this episode