Data Engineering Podcast

Tobias Macey

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Episodes

Mentioned books

Mar 29, 2026 • 50min

Treat Metering Like Finance: Building Data Platforms for Consumption Economics

Himant Goyal, a Senior Product Manager who builds usage-driven data platforms, explains why metering should be treated like a financial system. He covers the architecture needed for reliable consumption tracking. Short takes dive into real-time versus batch tradeoffs, rate-card versioning, handling late or duplicate data, and the cross-functional shift required for finance, product, and engineering to co-own metering.

Mar 22, 2026 • 43min

Beyond the PDF: Rowan Cockett on Reproducible, Composable Science

Rowan Cockett, co-founder and CEO of CurveNote and Continuous Science Foundation, builds tools for reproducible, composable scientific research. He talks about fixing PDF-bound workflows, cloud-optimized formats like Zarr, Jupyter-based interactive articles, graceful degradation of interactives, storage partnerships that avoid hosting huge datasets, and the Open Exchange Architecture push for interoperable scientific components.

Mar 16, 2026 • 1h 2min

Beyond Prompts: Practical Paths to Self‑Improving AI

Raj Shukla, CTO at SymphonyAI and veteran applied AI leader, discusses building production-grade self-improving AI for regulated industries. He covers agentic architectures, feedback loops, and intelligent memory as a practical middle ground. He also talks about sandboxing, policy alignment, subagent code loops, model brittleness, and how owning memory and process graphs creates enterprise differentiation.

Mar 8, 2026 • 1h 5min

Orion at Gravity: Trustworthy AI Analysts for the Enterprise

Drew Gilson, co-founder of Gravity and former Looker/Google product leader focused on agentic analytics. Lucas Thelosen, co-founder of Gravity and former Looker/Google analytics lead building Orion. They discuss building trustworthy AI analysts using semantic layers and context engineering. They cover preserving data investments, bootstrapping semantics from messy systems, memory and lineage, and connecting insights to action.

Mar 2, 2026 • 45min

From Models to Momentum: Uniting Architects and Engineers with ER/Studio

Ryan Hirsch, product marketing lead with a data warehousing background, and Jamie Knowles, product director and enterprise data modeling expert, discuss ER/Studio’s role in creating shared semantic models. They cover translating logical designs to code, preventing semantic drift, integrating governance, collaboration features like Team Server, and new AI-assisted modeling and semantic exports.

Feb 22, 2026 • 58min

From Data Models to Mind Models: Designing AI Memory at Scale

Vasilije Markovich, founder of Cognee and former data engineer turned cognitive-science-informed entrepreneur, builds agentic memory and knowledge-layer systems. He discusses permanent vs session memory, graph+vector architectures, storage and latency trade-offs, metadata and decay strategies, trace-based scoring, multi-tenant isolation, and practical vertical uses like pharma, logistics, and security.

Feb 15, 2026 • 51min

Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops

Aman Agarwal, creator of OpenLit and AI engineering tools builder, talks about making LLM apps reliable and debuggable. He covers opaque model behavior, runaway token costs, and brittle prompt management. He explains OpenTelemetry-native observability, prompt/secret versioning, eval workflows, and integrations that turn black-box model runs into stepwise traces for production readiness.

Feb 8, 2026 • 47min

From Legacy to AI-Ready: How MongoDB AMP Accelerates Modernization

Shilpa Kolhar, SVP of Product and Engineering at MongoDB who built large-scale data and ML infrastructure, explains modernizing legacy relational systems to a document-first, AI-ready platform. She covers AMP, Atlas Vector Search and embeddings, schema validation and versioning patterns, incremental migration units, and balancing LLM automation with human governance.

Feb 1, 2026 • 57min

Branches, Diffs, and SQL: How Dolt Powers Agentic Workflows

Tim Sehn, founder and CEO of DoltHub and creator of Dolt — a version-controlled SQL database — explains why Git-style semantics belong in data systems. He covers row-level branching, merging, and diffs, real production use cases like reproducible ML feature stores and game config, and how branches enable safe agentic writes and PR-style data reviews.

Jan 25, 2026 • 41min

Logical First, Physical Second: A Pragmatic Path to Trusted Data

Jamie Knowles, Product Director for ER/Studio with decades in data modeling and architecture, explains why meaning should drive designs. He talks about building shared semantic models, avoiding schema sprawl, and evolving architecture alongside delivery. He also covers governance, practical modeling techniques, and the double-edged role of generative AI in drafting models without human-approved ontologies.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner