MLOps.community

Demetrios

Relaxed Conversations around getting AI into production, whatever shape that may come in (agentic, traditional ML, LLMs, Vibes, etc)

Episodes

Mentioned books

21 snips

Mar 31, 2026 • 59min

Spec Driven Development, Workflows, and the Recent Coding Agent Conference

Jens Bodal, a senior software engineer who builds backend systems and developer tooling, discusses how abstract AI agents shift work from coding to defining intent. He covers agent orchestration, evaluation challenges, security and sandboxing, local self-hosted stacks, spec-driven workflows, and how teams must rethink ownership, reviews, and telemetry in an agent-first world.

33 snips

Mar 30, 2026 • 1h 1min

Operationalizing AI Agents: From Experimentation to Production // Databricks Roundtable

Samraj Moorjani, MLflow engineer focused on agent quality and observability. Apurva Misra, AI consultant helping startups scope POCs and automation. Ben Epstein, CTO building LLM-driven internal tools for property teams. They discuss scaling agent reliability, observability and testing strategies. Conversation covers eval-driven development, sandboxing and production-grade monitoring for agent workflows.

35 snips

Mar 27, 2026 • 56min

Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs

Chris Fregly, AI performance engineer, founder, and author, walks through software/hardware co-design for PyTorch, CUDA, and NVIDIA GPUs. He talks mechanical sympathy, GPU generations, NVLink and networking, kernel tuning with coding agents, and infrastructure trade-offs for training versus inference. Short, technical, and focused on building scalable, high-performance AI systems.

44 snips

Feb 19, 2026 • 1h 6min

Serving LLMs in Production: Performance, Cost & Scale // CAST AI Roundtable

Igor Šušić, founding ML engineer focused on large-scale inference and performance tuning. Ioana Apetrei, senior product manager building accessible, cost-effective LLM deployment. They debate why deployments fail at scale. They cover model routing and cost vs accuracy. They explain time-sharing GPUs, quantization, prefill vs decode separation, and when self-hosting or managed endpoints make sense.

23 snips

Feb 17, 2026 • 1h 3min

The Future of Information Retrieval: From Dense Vectors to Cognitive Search

Rahul Raja, Staff Software Engineer at LinkedIn who builds large-scale search and retrieval systems, discusses the shift from keyword search to dense, vector-based retrieval. He explores cognitive search, LLM-driven reasoning and personalization, scalability of billions of embeddings, evaluation signals beyond recall, and challenges like embedding drift, access control, and cost-effective infrastructure.

58 snips

Feb 13, 2026 • 26min

Rethinking Notebooks Powered by AI

Vincent Warmerdam, founding engineer at marimo who reinvents Python notebooks for reactive, reproducible, and interactive data work. He discusses marimo’s reactive execution model, Molab hosted GPU notebooks, LLM and agent integrations that inspect and modify notebook state, dynamic UI generation, WASM/pyodide exports, and treating notebooks as shareable, testable Python apps.

70 snips

Feb 10, 2026 • 57min

Software Engineering in the Age of Coding Agents: Testing, Evals, and Shipping Safely at Scale

Ereli Eran, founding engineer at 7AI who builds agentic AI systems for security ops, joins to unpack real-world agent engineering. He covers how agentic systems mix deterministic code with stochastic LLM behavior. They talk testing, evals, safety gates, progressive prompts, model hybrids, observability and audit trails, and strategies for shipping agents reliably at scale.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app