Kubernetes, Compliance, and Control: The Operational Backbone of AI Sovereignty

8 snips

Feb 25, 2026

Stephen Watt, leader of the Office of the CTO at Red Hat with deep generative AI and infrastructure experience, discusses AI sovereignty, why organizations pursue self-managed GPU platforms, and the operational role of Kubernetes for scale-out LLM serving. He covers observability and policy for nondeterministic systems, confidential inference and agent identity, hardware and model optionality, and the persistent gap: broad access to GPUs.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Give Yourself Hardware Optionality

Choose infrastructure optionality to avoid vendor or hardware lock-in by supporting CPUs, Intel/AMD/NVIDIA GPUs and new accelerators.
Invest in driver and runtime integrations (e.g., out-of-tree device drivers for PyTorch) so vendors can add accelerators without heavy upstream friction.

INSIGHT

Kubernetes Is The Scale Out Backbone

Kubernetes is the natural scale-out backbone for LLM serving because it disaggregates compute and memory across servers.
The PyTorch ecosystem is historically scale-up, so teams must bridge toolchains into Kubernetes to operate large model clusters.

ADVICE

Route Inference Through A Semantic Router

Add a semantic routing control plane in front of inference to enforce policy, manage cost, and route requests to appropriate models.
Use pluggable policy engines and model classifiers so teams can assign production models versus prototype runtimes per workload.

Get the Snipd Podcast app to discover more snips from this episode

Get the app