Data Engineering Podcast

Tobias Macey

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Episodes

Mentioned books

Jan 18, 2026 • 1h 12min

Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability

Jacob Leverich, Cofounder and CTO of Observe, brings his vast experience from Splunk and Google to discuss the transformative power of lakehouse architectures in observability. He addresses the struggles organizations face with fragmented tools and high costs, introducing innovative solutions leveraging OpenTelemetry and Kafka for efficient data ingestion. Jacob dives into the benefits of using Iceberg for better data organization, the intricacies of query orchestration for low-latency responses, and the importance of metadata in enhancing user experience.

Jan 12, 2026 • 57min

Semantic Operators Meet Dataframes: Building Context for Agents with FENIC

Kostas Pardalis, a data infrastructure engineer and founder, discusses Fennec, a revolutionary DataFrame engine designed for LLM-powered data workflows. He explains the limitations of traditional data infrastructures and introduces semantic operators that transform unstructured data into structured schemas. Kostas delves into Fennec's architecture, lazy DataFrame plans, and optimizer design, emphasizing its role in enhancing context management for agents. He also shares practical use cases and the future potential of integrating Fennec with other frameworks for scalable, reliable data solutions.

Jan 5, 2026 • 49min

Beyond Dashboards: How Data Teams Earn a Seat at the Table

Goutham Budati, a data leader known for the Data–Perspective–Action framework, explores how data teams can elevate their influence in business. He shares insights on transforming reactive tasks into proactive strategies, emphasizing the importance of context and storytelling. Goutham discusses the necessity of creating living dashboards, aligning technical projects with business goals, and maintaining trust in metrics. He advocates for collaboration between analytics engineers and analysts, promoting continuous insight generation through structured narratives.

Dec 29, 2025 • 59min

Unfreezing The Data Lake: The Future-Proof File Format

Xinyu Zeng, a PhD student and database researcher, dives deep into F3, the innovative 'future-proof file format' he’s developing. He highlights the limitations of existing formats like Parquet and ORC, tackling issues such as CPU-bound decoding and metadata overhead. By rethinking the layout and using WebAssembly for self-decoding, F3 aims to advance data handling. Xinyu discusses the importance of decoupling formats, supports multimodal data, and shares future directions, including integrating with existing technologies to enhance data lakes.

Dec 21, 2025 • 1h 6min

From Context to Semantics: How Metadata Powers Agentic AI

Suresh Srinivas, a data platform technologist and co-founder of OpenMetadata, and Sriharsha Chintalapani, CTO of Collate, delve into the transformative role of metadata in AI. They discuss how metadata evolves from a human-centric tool to a foundational layer for AI, emphasizing the importance of semantics for accurate outcomes. The conversation highlights automated documentation and governance enhancements, scaling agent workflows, and the crucial balance of user identity and policy enforcement as AI access expands. Their insights reveal how marrying big data with ontologies can create machine-understandable meaning.

Dec 14, 2025 • 27min

From Data Engineering to AI Engineering: Where the Lines Blur

Explore the evolution of data engineering as it merges with AI. Discover how the transition from Hadoop to cloud warehouses has shaped current practices. Uncover the impact of LLMs and how unstructured data is revolutionizing information retrieval. Delve into operational demands, including uptime and latency, in customer-facing applications. Reflect on the need for collaboration, new testing practices, and a community approach to emerging AI workflows. This journey emphasizes adapting skills to a rapidly changing technological landscape.

Dec 8, 2025 • 59min

Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics

Michael Toy, co-creator of the Malloy language and former Looker engineer, discusses revolutionizing data interaction beyond SQL. He shares insights on Malloy’s design as a human-centric, composable language, emphasizing semantic modeling and hierarchical data. Michael explains the practical implications of eliminating SQL barriers and the user-friendly syntax aimed at easing adoption. He also highlights the exciting synergy between Malloy and LLM-generated queries and invites collaboration in the open-source development to shape its future.

Nov 24, 2025 • 1h 1min

Blurring Lines: Data, AI, and the New Playbook for Team Velocity

Max Beauchemin, founder and engineer behind Apache Airflow, dives into the transformative interplay of data and AI engineering. He discusses how using AI for most tasks shifts human roles towards orchestration and taste management, leading to new bottlenecks in code review and QA. Max highlights the concept of treating context as code and advocates for just-in-time retrieval to enhance data tools. He also introduces Agor, a multiplayer orchestration platform designed for efficient agent management and collaborative workflows.

Nov 16, 2025 • 52min

State, Scale, and Signals: Rethinking Orchestration with Durable Execution

Preeti Somal, EVP of Engineering at Temporal and expert in durable execution, dives into innovative methods for building stateful data systems. She discusses how Temporal's code-first model simplifies reliability and reduces the need for error-handling scaffolding. With insights on integrating application and data teams, managing large data while keeping orchestration lightweight, and the importance of observability, Preeti shares strategies for efficiently handling long-running AI workflows. She also highlights practical adoption patterns and the role of Nexus in creating seamless cross-boundary calls.

Nov 9, 2025 • 52min

The AI Data Paradox: High Trust in Models, Low Trust in Data

Ariel Pohoryles, head of product marketing at Boomi with over 20 years in data engineering, discusses a fascinating survey of 300 data leaders. He reveals the surprising paradox where 77% trust AI data yet only 50% trust their organization's overall data. Ariel emphasizes the need for stronger automation and governance in data management for effective AI production. He explores the challenges of unstructured data, advocates for automated pipelines, and predicts a convergence between data and application teams, highlighting the importance of managing AI workflows responsibly.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner