

Data Engineering Podcast
Tobias Macey
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Episodes
Mentioned books

19 snips
Mar 29, 2026 • 50min
Treat Metering Like Finance: Building Data Platforms for Consumption Economics
Himant Goyal, a Senior Product Manager who builds usage-driven data platforms, explains why metering should be treated like a financial system. He covers the architecture needed for reliable consumption tracking. Short takes dive into real-time versus batch tradeoffs, rate-card versioning, handling late or duplicate data, and the cross-functional shift required for finance, product, and engineering to co-own metering.

9 snips
Mar 22, 2026 • 43min
Beyond the PDF: Rowan Cockett on Reproducible, Composable Science
Rowan Cockett, co-founder and CEO of CurveNote and Continuous Science Foundation, builds tools for reproducible, composable scientific research. He talks about fixing PDF-bound workflows, cloud-optimized formats like Zarr, Jupyter-based interactive articles, graceful degradation of interactives, storage partnerships that avoid hosting huge datasets, and the Open Exchange Architecture push for interoperable scientific components.

62 snips
Mar 16, 2026 • 1h 2min
Beyond Prompts: Practical Paths to Self‑Improving AI
Raj Shukla, CTO at SymphonyAI and veteran applied AI leader, discusses building production-grade self-improving AI for regulated industries. He covers agentic architectures, feedback loops, and intelligent memory as a practical middle ground. He also talks about sandboxing, policy alignment, subagent code loops, model brittleness, and how owning memory and process graphs creates enterprise differentiation.

13 snips
Mar 8, 2026 • 1h 5min
Orion at Gravity: Trustworthy AI Analysts for the Enterprise
Drew Gilson, co-founder of Gravity and former Looker/Google product leader focused on agentic analytics. Lucas Thelosen, co-founder of Gravity and former Looker/Google analytics lead building Orion. They discuss building trustworthy AI analysts using semantic layers and context engineering. They cover preserving data investments, bootstrapping semantics from messy systems, memory and lineage, and connecting insights to action.

37 snips
Mar 2, 2026 • 45min
From Models to Momentum: Uniting Architects and Engineers with ER/Studio
Ryan Hirsch, product marketing lead with a data warehousing background, and Jamie Knowles, product director and enterprise data modeling expert, discuss ER/Studio’s role in creating shared semantic models. They cover translating logical designs to code, preventing semantic drift, integrating governance, collaboration features like Team Server, and new AI-assisted modeling and semantic exports.

32 snips
Feb 22, 2026 • 58min
From Data Models to Mind Models: Designing AI Memory at Scale
Vasilije Markovich, founder of Cognee and former data engineer turned cognitive-science-informed entrepreneur, builds agentic memory and knowledge-layer systems. He discusses permanent vs session memory, graph+vector architectures, storage and latency trade-offs, metadata and decay strategies, trace-based scoring, multi-tenant isolation, and practical vertical uses like pharma, logistics, and security.

40 snips
Feb 15, 2026 • 51min
Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops
Aman Agarwal, creator of OpenLit and AI engineering tools builder, talks about making LLM apps reliable and debuggable. He covers opaque model behavior, runaway token costs, and brittle prompt management. He explains OpenTelemetry-native observability, prompt/secret versioning, eval workflows, and integrations that turn black-box model runs into stepwise traces for production readiness.

47 snips
Feb 8, 2026 • 47min
From Legacy to AI-Ready: How MongoDB AMP Accelerates Modernization
Shilpa Kolhar, SVP of Product and Engineering at MongoDB who built large-scale data and ML infrastructure, explains modernizing legacy relational systems to a document-first, AI-ready platform. She covers AMP, Atlas Vector Search and embeddings, schema validation and versioning patterns, incremental migration units, and balancing LLM automation with human governance.

35 snips
Feb 1, 2026 • 57min
Branches, Diffs, and SQL: How Dolt Powers Agentic Workflows
Tim Sehn, founder and CEO of DoltHub and creator of Dolt — a version-controlled SQL database — explains why Git-style semantics belong in data systems. He covers row-level branching, merging, and diffs, real production use cases like reproducible ML feature stores and game config, and how branches enable safe agentic writes and PR-style data reviews.

81 snips
Jan 25, 2026 • 41min
Logical First, Physical Second: A Pragmatic Path to Trusted Data
Jamie Knowles, Product Director for ER/Studio with decades in data modeling and architecture, explains why meaning should drive designs. He talks about building shared semantic models, avoiding schema sprawl, and evolving architecture alongside delivery. He also covers governance, practical modeling techniques, and the double-edged role of generative AI in drafting models without human-approved ontologies.


