MLOps.community

Demetrios
undefined
16 snips
Mar 27, 2026 • 56min

arrowspace: Vector Spaces and Graph Wiring

Lorenzo Moriondo, Technical Lead for AI at tuned.org.uk and creator of Arrowspace, builds graph-based tools for LLM systems. He talks about turning embeddings into graphs to reveal structure, topological versus geometric search, and using graph wiring for smarter retrieval, dataset curation, drift detection, and agent memory.
undefined
37 snips
Mar 20, 2026 • 51min

Agentic Marketplace

Donné Stevenson, ML engineer building scalable GenAI infrastructure and rollout strategies. Pedro Chaves, data science manager shaping GenAI search and marketplace recommendations. They explore agent-driven home search, enriched listings with commute and safety data, progressive permissioning and undo patterns, agent-to-agent marketplace layers, and agent-powered uploads, negotiation and logistics.
undefined
27 snips
Mar 17, 2026 • 1h 1min

Durable Execution and Modern Distributed Systems

Johann Schleier-Smith, Technical Lead for AI at Temporal Technologies, builds reliable infrastructure for long-running production AI workflows. He explains durable execution and why it makes regular Python programs crash-proof and scalable. Topics include deterministic workflows, cross-region resilience, integrating durable state with databases, using durable execution with LLMs and agents, and practical operational patterns.
undefined
58 snips
Feb 24, 2026 • 1h 26min

Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs

Chris Fregly, AI performance engineer, founder, and author, walks through software/hardware co-design for PyTorch, CUDA, and NVIDIA GPUs. He talks mechanical sympathy, GPU generations, NVLink and networking, kernel tuning with coding agents, and infrastructure trade-offs for training versus inference. Short, technical, and focused on building scalable, high-performance AI systems.
undefined
44 snips
Feb 19, 2026 • 1h 6min

Serving LLMs in Production: Performance, Cost & Scale // CAST AI Roundtable

Igor Šušić, founding ML engineer focused on large-scale inference and performance tuning. Ioana Apetrei, senior product manager building accessible, cost-effective LLM deployment. They debate why deployments fail at scale. They cover model routing and cost vs accuracy. They explain time-sharing GPUs, quantization, prefill vs decode separation, and when self-hosting or managed endpoints make sense.
undefined
23 snips
Feb 17, 2026 • 1h 3min

The Future of Information Retrieval: From Dense Vectors to Cognitive Search

Rahul Raja, Staff Software Engineer at LinkedIn who builds large-scale search and retrieval systems, discusses the shift from keyword search to dense, vector-based retrieval. He explores cognitive search, LLM-driven reasoning and personalization, scalability of billions of embeddings, evaluation signals beyond recall, and challenges like embedding drift, access control, and cost-effective infrastructure.
undefined
58 snips
Feb 13, 2026 • 26min

Rethinking Notebooks Powered by AI

Vincent Warmerdam, founding engineer at marimo who reinvents Python notebooks for reactive, reproducible, and interactive data work. He discusses marimo’s reactive execution model, Molab hosted GPU notebooks, LLM and agent integrations that inspect and modify notebook state, dynamic UI generation, WASM/pyodide exports, and treating notebooks as shareable, testable Python apps.
undefined
70 snips
Feb 10, 2026 • 57min

Software Engineering in the Age of Coding Agents: Testing, Evals, and Shipping Safely at Scale

Ereli Eran, founding engineer at 7AI who builds agentic AI systems for security ops, joins to unpack real-world agent engineering. He covers how agentic systems mix deterministic code with stochastic LLM behavior. They talk testing, evals, safety gates, progressive prompts, model hybrids, observability and audit trails, and strategies for shipping agents reliably at scale.
undefined
62 snips
Feb 6, 2026 • 52min

Physical AI: Teaching Machines to Understand the Real World

Nick Gillian, Co-Founder and CTO of Archetype AI and builder of Newton, is an expert in real-time sensor understanding. He discusses Physical AI: single foundation models for diverse sensors, fusing non-visual modalities, and the engineering challenges of massive continuous data. He also covers evaluation strategies, product-driven dataset design, and turning research into deployable, safety-minded systems.
undefined
46 snips
Feb 3, 2026 • 1h 7min

Speed and Scale: How Today's AI Datacenters Are Operating Through Hypergrowth

Kris Beevers, CEO and co-founder of NetBox Labs and veteran network engineer with a Ph.D., discusses how modern AI datacenters handle hypergrowth. He talks about modeling infrastructure as a single system of record, tackling power and procurement bottlenecks, and using programmatic blueprints and digital twins to speed builds. He highlights rapid iteration, robotics in racking, and the need for vendor data standards for automation.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app