The Analytics Engineering Podcast

Under the hood of Apache Iceberg (w/ Christian Thiel)

75 snips
Aug 24, 2025
Christian Thiel, co-founder of Lakekeeper, dives into the fascinating world of Apache Iceberg, a leading data management tool. He discusses its evolving ecosystem, addressing challenges in data architecture and the importance of timely data for machine learning. The conversation explores data access mechanisms, secure credential management, and the innovative features improving enterprise readiness. Thiel also highlights the flexibility of permission models and the role of Lakekeeper in enhancing data collaboration and integrity.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Iceberg Enables Compute-Storage Separation

  • Apache Iceberg separates storage and compute by providing a shared metadata layer that multiple query engines can use.
  • This enables one canonical copy of data while swapping compute engines to avoid vendor lock-in.
INSIGHT

Metadata Hierarchy Is The Core

  • Iceberg is fundamentally a metadata hierarchy sitting on top of file formats like Parquet.
  • The top-level table metadata JSON is the authoritative state that catalogs must manage centrally.
INSIGHT

Optimistic Concurrency And Append-Only Files

  • Iceberg uses optimistic concurrency: writers produce new files and metadata then request an atomic catalog commit.
  • This append-only pattern enables time travel and avoids in-place file edits.
Get the Snipd Podcast app to discover more snips from this episode
Get the app