Will inference move to the edge?

28 snips

Dec 18, 2025

Shayle is joined by Ben Lee, a Professor at the University of Pennsylvania and a visiting researcher at Google, focusing on AI systems. They delve into the shift from centralized AI compute to edge inference, essential for latency-sensitive applications like autonomous vehicles. Ben explains the differences between hyperscale, edge, and on-device computing, emphasizing why training will stay centralized. He also highlights the challenges and potential of local data centers, exploring implications for energy consumption and the future landscape of AI applications.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Training Creates Power Spikes

Training causes synchronized compute and communication spikes that create big power swings in data centers.
Those spikes are costly and sometimes managed by running dummy workloads to flatten power demand.

ADVICE

Put Single-GPU Inference At The Edge

Move inference closer to users when each query can be served by a single GPU or a tightly coupled set of GPUs.
Use edge data centers for latency-sensitive inference since they don't require massive GPU coordination.

INSIGHT

Edge Facilities Need Retrofits

Existing small data centers (15–50 MW) could host GPU inference but often need retrofits for power density and cooling.
Converting CPU-optimized facilities to GPU workloads requires infrastructure changes.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Today virtually all AI compute takes place in centralized data centers, driving the demand for massive power infrastructure.

But as workloads shift from training to inference, and AI applications become more latency-sensitive (autonomous vehicles, anyone?), there‘s another pathway: migrating a portion of inference from centralized computing to the edge. Instead of a gigawatt-scale data center in a remote location, we might see a fleet of smaller data centers clustered around an urban core. Some inference might even shift to our devices.

So how likely is a shift like this, and what would need to happen for it to substantially reshape AI power?

In this episode, Shayle talks to Dr. Ben Lee, a professor of electrical engineering and computer science at the University of Pennsylvania, as well as a visiting researcher at Google. Shayle and Ben cover topics like:

The three main categories of compute: hyperscale, edge, and on-device
Why training is unlikely to move from hyperscale
The low latency demands of new applications like autonomous vehicles
How generative AI is training us to tolerate longer latencies
Why distributed inference doesn‘t face the same technical challenges as distributed training
Why consumer devices may limit model capability

Resources:

ACM SIGMETRICS Performance Evaluation Review: A Case Study of Environmental Footprints for Generative AI Inference: Cloud versus Edge
Internet of Things and Cyber-Physical Systems: Edge AI: A survey

Credits: Hosted by Shayle Kann. Produced and edited by Daniel Woldorff. Original music and engineering by Sean Marquand. Stephen Lacey is our executive editor.

Catalyst is brought to you by EnergyHub. EnergyHub helps utilities build next-generation virtual power plants that unlock reliable flexibility at every level of the grid. See how EnergyHub helps unlock the power of flexibility at scale, and deliver more value through cross-DER dispatch with their leading Edge DERMS platform, by visiting energyhub.com.

Catalyst is brought to you by Bloom Energy. AI data centers can’t wait years for grid power—and with Bloom Energy’s fuel cells, they don’t have to. Bloom Energy delivers affordable, always-on, ultra-reliable onsite power, built for chipmakers, hyperscalers, and data center leaders looking to power their operations at AI speed. Learn more by visiting⁠ ⁠⁠BloomEnergy.com⁠.

Catalyst is supported by Third Way. Third Way’s new PACE study surveyed over 200 clean energy professionals to pinpoint the non-cost barriers delaying clean energy deployment today and offers practical solutions to help get projects over the finish line. Read Third Way's full report, and learn more about their PACE initiative, at www.thirdway.org/pace.