

AI Engineering Podcast
Tobias Macey
This show is your guidebook to building scalable and maintainable AI systems. You will learn how to architect AI applications, apply AI to your work, and the considerations involved in building or customizing new models. Everything that you need to know to deliver real impact and value with machine learning and artificial intelligence.
Episodes
Mentioned books

8 snips
Feb 25, 2026 • 1h 1min
Kubernetes, Compliance, and Control: The Operational Backbone of AI Sovereignty
Stephen Watt, leader of the Office of the CTO at Red Hat with deep generative AI and infrastructure experience, discusses AI sovereignty, why organizations pursue self-managed GPU platforms, and the operational role of Kubernetes for scale-out LLM serving. He covers observability and policy for nondeterministic systems, confidential inference and agent identity, hardware and model optionality, and the persistent gap: broad access to GPUs.

20 snips
Feb 15, 2026 • 51min
From Blind Spots to Observability: Operationalizing LLM Apps with OpenLit
Aman Agarwal, creator of OpenLit and builder of observability tooling for LLM apps, discusses operational foundations for running LLM-powered systems in production. He covers common blind spots like opaque model behavior, runaway token costs, and brittle prompt management. The conversation dives into OpenTelemetry-based tracing, prompt/version management, evaluation workflows, fleet instrumentation, and avoiding vendor lock-in.

11 snips
Feb 8, 2026 • 59min
Taming Voice Complexity with Dynamic Ensembles at Modulate
Carter Huffman, Co-founder and CTO of Modulate who builds low-latency voice AI systems. He talks about why speech-to-text pipelines miss tone and emotion. He explains dynamic ensemble architectures that route small specialized models per conversation. He covers cost-based routing, watchdog checks, long-horizon memory, and when ensembles beat giant models.

Jan 27, 2026 • 46min
GPU Clouds, Aggregators, and the New Economics of AI Compute
Hugo Shi, co-founder and CTO of Saturn Cloud, builds GPU cloud platforms for ML teams. He maps the GPU provider landscape and compares hyperscalers, boutique GPU clouds, bare‑metal and aggregators. He explores orchestration, data gravity, training vs inference splits, accelerator diversity (including AMD progress), supply dynamics, and predictions on consolidation, marketplaces, and reliability for long GPU runs.

33 snips
Jan 20, 2026 • 56min
The Future of Dev Experience: Spotify’s Playbook for Organization‑Scale AI
Niklas Gustavsson, Chief Architect at Spotify, brings a wealth of experience in backend systems and developer experience. He explores Spotify's ambitious journey to scale AI through a standardized, distributed architecture. Topics include the rapid grassroots adoption of coding agents, the delicate balance between team autonomy and standardization, and the evolving role of developers as code-writing time diminishes. Niklas also discusses emerging agent capabilities, like fleet-wide code changes and insights learned from human oversight in AI systems, shaping the future of engineering workflows.

Jan 5, 2026 • 56min
Generative AI Meets Accessibility: Benchmarks, Breakthroughs, and Blind Spots with Joe Devon
Joe Devon, co-founder of Global Accessibility Awareness Day and tech entrepreneur, dives into the intersection of generative AI and digital accessibility. He discusses how AI can improve captions and audio descriptions while highlighting the inconsistent accessibility of code generated by large models. Joe introduces the AI Model Accessibility Checker to benchmark accessible code production and shares practical steps for better inclusivity. He emphasizes involving users with disabilities in design processes to create truly accessible AI solutions, making it clear that accessibility is a crucial human-rights issue.

25 snips
Dec 29, 2025 • 54min
Beyond the Chatbot: Practical Frameworks for Agentic Capabilities in SaaS
Preeti Shukla, a seasoned product and engineering leader with a focus on generative AI and SaaS, dives into the operational challenges of integrating agentic capabilities. She discusses crucial factors like latency, cost control, and data privacy in multi-tenant environments. Preeti emphasizes the importance of starting with internal pilots and outlines frameworks for choosing models and deployment strategies. She also tackles the complexities of evaluation and monitoring in AI systems, offering valuable insights on avoiding confident hallucinations and ensuring reliability.

28 snips
Dec 16, 2025 • 1h 8min
MCP as the API for AI‑Native Systems: Security, Orchestration, and Scale
Craig McLuckie, co-creator of Kubernetes and CEO of StackLock, dives into the pivotal role of the Model Context Protocol (MCP) as the API layer for AI-native applications. He discusses the importance of securing AI agents through optimized MCP deployments and highlights common adoption pitfalls like tool pollution and security risks. Craig also stresses the need for continuous evaluations in stochastic systems and shares insights on ToolHive's innovative approach to orchestration and semantic search for better developer experiences.

82 snips
Nov 24, 2025 • 60min
Context as Code, DevX as Leverage: Accelerating Software with Multi‑Agent Workflows
Max Beauchemin, a data engineering veteran and creator of Apache Airflow and Superset, discusses his shift to multi-agent development with Agor. He explores the concept of an 'AI-first reflex,' where humans orchestrate tasks while agents accelerate workflows. Max highlights how shifting bottlenecks like code review can be addressed through improved developer experiences and 'context as code.' He introduces Agor’s innovative platform, designed for managing git worktrees and collaborative environments, enabling richer visibility and parallelization in software engineering.

Nov 16, 2025 • 1h 1min
Inside the Black Box: Neuron-Level Control and Safer LLMs
Vinay Kumar, Founder and CEO of Arya.ai and head of Lexsi Labs, dives into the nuances of AI interpretability and alignment. He contrasts interpretability with explainability, highlighting the evolution of these concepts into tools for model alignment. Vinay shares insights on leveraging neuron-level editing for safer LLMs and discusses practical techniques like pruning and unlearning. He emphasizes the need for concrete metrics in alignment and explores the future role of AI agents in enhancing model safety, aiming for advanced AI that is both effective and responsible.


