
AI Engineering Podcast Kubernetes, Compliance, and Control: The Operational Backbone of AI Sovereignty
8 snips
Feb 25, 2026 Stephen Watt, leader of the Office of the CTO at Red Hat with deep generative AI and infrastructure experience, discusses AI sovereignty, why organizations pursue self-managed GPU platforms, and the operational role of Kubernetes for scale-out LLM serving. He covers observability and policy for nondeterministic systems, confidential inference and agent identity, hardware and model optionality, and the persistent gap: broad access to GPUs.
AI Snips
Chapters
Transcript
Episode notes
Give Yourself Hardware Optionality
- Choose infrastructure optionality to avoid vendor or hardware lock-in by supporting CPUs, Intel/AMD/NVIDIA GPUs and new accelerators.
- Invest in driver and runtime integrations (e.g., out-of-tree device drivers for PyTorch) so vendors can add accelerators without heavy upstream friction.
Kubernetes Is The Scale Out Backbone
- Kubernetes is the natural scale-out backbone for LLM serving because it disaggregates compute and memory across servers.
- The PyTorch ecosystem is historically scale-up, so teams must bridge toolchains into Kubernetes to operate large model clusters.
Route Inference Through A Semantic Router
- Add a semantic routing control plane in front of inference to enforce policy, manage cost, and route requests to appropriate models.
- Use pluggable policy engines and model classifiers so teams can assign production models versus prototype runtimes per workload.
