MLOps.community

Demetrios

Relaxed Conversations around getting AI into production, whatever shape that may come in (agentic, traditional ML, LLMs, Vibes, etc)

Episodes

Mentioned books

May 12, 2026 • 48min

The Latency Goldilocks Zone Explained

Daniel Wobert, Data Science Manager at iFood who built the AILO conversational agent, talks hyper-personalization and the Large Commerce Model that profiles every user. They cover recommendation tradeoffs, the Tinder-style exploration test, multi-channel UX for app/WhatsApp/voice, the Goldilocks latency zone of perceived response time, and operational lessons from scaling agents in production.

May 8, 2026 • 41min

Building MCP Before MCP Existed: Inside Despegar's Sofia Agent

A behind-the-scenes look at how Despegar built Sofia, a multi-agent travel concierge running on WhatsApp. They explain the custom orchestration layer and early tool-calling protocol that predated standards. Learn about verticalized agent flows, decentralized squad ownership, chat-first interfaces, KPI evolution, and group trip planning through messaging.

May 1, 2026 • 51min

Voice Agent Use Cases

Anurag Beniwal, Member of the Technical Staff at ElevenLabs with experience building billion-interaction voice systems, shares practical production patterns. He contrasts cascaded and speech-to-speech approaches. He explains latency masking, model constellations, turn-taking challenges, and high-value use cases like support, bookings, and sales.

Apr 24, 2026 • 1h 7min

The Creator of Superpowers: Why Real Agentic Engineering Beats Vibe Coding

Jesse Vincent, Founder & CEO of Prime Radiant and creator of Superpowers, is a veteran software engineer who built the leading Claude Code plugin and an agentic development method. He explains the brainstorming-first workflow, spec-driven planning and pressure-testing skills to stop LLM rationalization. He also covers debugging with agents, Clearance IDE, and the unreleased Green Field tool for reverse-engineering codebases.

Apr 21, 2026 • 41min

It's 2026, and We're Still Talking Evals

Maggie Konstanty, an AI product manager who builds and evaluates large-scale agents for food ordering and ecommerce. She explains why single accuracy numbers mislead. She contrasts pre‑ship sims with production traces. She digs into drop‑off analytics, the dangers of many disconnected evaluators, and why teams often end up building custom eval tooling.

Apr 17, 2026 • 51min

Why Agents are Driving Software Development to the Cloud

Zach Lloyd, founder and CEO of Warp, an AI-native terminal maker and former Google Docs product lead. He argues agents belong in cloud-native systems with team visibility and scoped permissions. He demos Oz for orchestration, talks agent observability, just-in-time interfaces replacing traditional UIs, and how governance and deployment choices shape safe agent adoption.

Apr 14, 2026 • 54min

The Modern Software Engineer

Mihail Eric, AI practitioner known for work on coding agents and developer tooling. He talks about agents as tutors for junior engineers, risks of overreliance, and when humans must keep architectural control. They cover testing and validation for long-running agent tasks, team practices to standardize agent use, and how AI-native teams change roles and scale.

Apr 10, 2026 • 1h 5min

We Cut LLM Latency by 70% in Production

Maher Hanafi, SVP of Engineering who led self-hosting LLMs at enterprise scale and optimized GPU inference, shares practical production stories. He describes cutting latency 50–70% with TensorRT LLM. He explains cold-start fixes, KV-cache and in-flight batching, scaling strategies that lower GPU spend, and how vertical features evolve into a reusable AI platform.

Apr 7, 2026 • 51min

Getting Humans Out of the Way: How to Work with Teams of Agents

Rob Ennals, creator of Broomy and Staff Software Engineer experienced in large-scale distributed systems, explains how to design systems where many agents run and self-validate in parallel. He covers visual screenshot QA, agent retry and verification loops, repo design and linting for agents, parallel agent selection, automated merge conflict handling, and UI/compute strategies for scaling agent teams.

Apr 3, 2026 • 53min

Fixing GPU Starvation in Large-Scale Distributed Training

Kashish Mittal, Staff Software Engineer at Uber who builds hyperscale ML infrastructure, talks about solving GPU starvation in large-scale training. He recounts full-stack profiling and tracing to find hidden CPU bottlenecks. He explains reshaping data reads, packing tensors to cut transfers, caching transformed NumPy tensors, and trade-offs between latency and utilization in serving.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner