

MLOps.community
Demetrios
Relaxed Conversations around getting AI into production, whatever shape that may come in (agentic, traditional ML, LLMs, Vibes, etc)
Episodes
Mentioned books

12 snips
May 12, 2026 • 48min
The Latency Goldilocks Zone Explained
Daniel Wobert, Data Science Manager at iFood who built the AILO conversational agent, talks hyper-personalization and the Large Commerce Model that profiles every user. They cover recommendation tradeoffs, the Tinder-style exploration test, multi-channel UX for app/WhatsApp/voice, the Goldilocks latency zone of perceived response time, and operational lessons from scaling agents in production.

17 snips
May 8, 2026 • 41min
Building MCP Before MCP Existed: Inside Despegar's Sofia Agent
A behind-the-scenes look at how Despegar built Sofia, a multi-agent travel concierge running on WhatsApp. They explain the custom orchestration layer and early tool-calling protocol that predated standards. Learn about verticalized agent flows, decentralized squad ownership, chat-first interfaces, KPI evolution, and group trip planning through messaging.

28 snips
May 1, 2026 • 51min
Voice Agent Use Cases
Anurag Beniwal, Member of the Technical Staff at ElevenLabs with experience building billion-interaction voice systems, shares practical production patterns. He contrasts cascaded and speech-to-speech approaches. He explains latency masking, model constellations, turn-taking challenges, and high-value use cases like support, bookings, and sales.

137 snips
Apr 24, 2026 • 1h 7min
The Creator of Superpowers: Why Real Agentic Engineering Beats Vibe Coding
Jesse Vincent, Founder & CEO of Prime Radiant and creator of Superpowers, is a veteran software engineer who built the leading Claude Code plugin and an agentic development method. He explains the brainstorming-first workflow, spec-driven planning and pressure-testing skills to stop LLM rationalization. He also covers debugging with agents, Clearance IDE, and the unreleased Green Field tool for reverse-engineering codebases.

29 snips
Apr 21, 2026 • 41min
It's 2026, and We're Still Talking Evals
Maggie Konstanty, an AI product manager who builds and evaluates large-scale agents for food ordering and ecommerce. She explains why single accuracy numbers mislead. She contrasts pre‑ship sims with production traces. She digs into drop‑off analytics, the dangers of many disconnected evaluators, and why teams often end up building custom eval tooling.

83 snips
Apr 17, 2026 • 51min
Why Agents are Driving Software Development to the Cloud
Zach Lloyd, founder and CEO of Warp, an AI-native terminal maker and former Google Docs product lead. He argues agents belong in cloud-native systems with team visibility and scoped permissions. He demos Oz for orchestration, talks agent observability, just-in-time interfaces replacing traditional UIs, and how governance and deployment choices shape safe agent adoption.

60 snips
Apr 14, 2026 • 54min
The Modern Software Engineer
Mihail Eric, AI practitioner known for work on coding agents and developer tooling. He talks about agents as tutors for junior engineers, risks of overreliance, and when humans must keep architectural control. They cover testing and validation for long-running agent tasks, team practices to standardize agent use, and how AI-native teams change roles and scale.

26 snips
Apr 10, 2026 • 1h 5min
We Cut LLM Latency by 70% in Production
Maher Hanafi, SVP of Engineering who led self-hosting LLMs at enterprise scale and optimized GPU inference, shares practical production stories. He describes cutting latency 50–70% with TensorRT LLM. He explains cold-start fixes, KV-cache and in-flight batching, scaling strategies that lower GPU spend, and how vertical features evolve into a reusable AI platform.

61 snips
Apr 7, 2026 • 51min
Getting Humans Out of the Way: How to Work with Teams of Agents
Rob Ennals, creator of Broomy and Staff Software Engineer experienced in large-scale distributed systems, explains how to design systems where many agents run and self-validate in parallel. He covers visual screenshot QA, agent retry and verification loops, repo design and linting for agents, parallel agent selection, automated merge conflict handling, and UI/compute strategies for scaling agent teams.

31 snips
Apr 3, 2026 • 53min
Fixing GPU Starvation in Large-Scale Distributed Training
Kashish Mittal, Staff Software Engineer at Uber who builds hyperscale ML infrastructure, talks about solving GPU starvation in large-scale training. He recounts full-stack profiling and tracing to find hidden CPU bottlenecks. He explains reshaping data reads, packing tensors to cut transfers, caching transformed NumPy tensors, and trade-offs between latency and utilization in serving.


