AI Engineering Podcast

Kubernetes, Compliance, and Control: The Operational Backbone of AI Sovereignty

8 snips
Feb 25, 2026
Stephen Watt, leader of the Office of the CTO at Red Hat with deep generative AI and infrastructure experience, discusses AI sovereignty, why organizations pursue self-managed GPU platforms, and the operational role of Kubernetes for scale-out LLM serving. He covers observability and policy for nondeterministic systems, confidential inference and agent identity, hardware and model optionality, and the persistent gap: broad access to GPUs.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Give Yourself Hardware Optionality

  • Choose infrastructure optionality to avoid vendor or hardware lock-in by supporting CPUs, Intel/AMD/NVIDIA GPUs and new accelerators.
  • Invest in driver and runtime integrations (e.g., out-of-tree device drivers for PyTorch) so vendors can add accelerators without heavy upstream friction.
INSIGHT

Kubernetes Is The Scale Out Backbone

  • Kubernetes is the natural scale-out backbone for LLM serving because it disaggregates compute and memory across servers.
  • The PyTorch ecosystem is historically scale-up, so teams must bridge toolchains into Kubernetes to operate large model clusters.
ADVICE

Route Inference Through A Semantic Router

  • Add a semantic routing control plane in front of inference to enforce policy, manage cost, and route requests to appropriate models.
  • Use pluggable policy engines and model classifiers so teams can assign production models versus prototype runtimes per workload.
Get the Snipd Podcast app to discover more snips from this episode
Get the app