Catalyst with Shayle Kann

Will inference move to the edge?

28 snips
Dec 18, 2025
Shayle is joined by Ben Lee, a Professor at the University of Pennsylvania and a visiting researcher at Google, focusing on AI systems. They delve into the shift from centralized AI compute to edge inference, essential for latency-sensitive applications like autonomous vehicles. Ben explains the differences between hyperscale, edge, and on-device computing, emphasizing why training will stay centralized. He also highlights the challenges and potential of local data centers, exploring implications for energy consumption and the future landscape of AI applications.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Training Creates Power Spikes

  • Training causes synchronized compute and communication spikes that create big power swings in data centers.
  • Those spikes are costly and sometimes managed by running dummy workloads to flatten power demand.
ADVICE

Put Single-GPU Inference At The Edge

  • Move inference closer to users when each query can be served by a single GPU or a tightly coupled set of GPUs.
  • Use edge data centers for latency-sensitive inference since they don't require massive GPU coordination.
INSIGHT

Edge Facilities Need Retrofits

  • Existing small data centers (15–50 MW) could host GPU inference but often need retrofits for power density and cooling.
  • Converting CPU-optimized facilities to GPU workloads requires infrastructure changes.
Get the Snipd Podcast app to discover more snips from this episode
Get the app