

Engineering Lakehouses with Open Table Formats
Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake
Book • 2025
Engineering Lakehouses with Open Table Formats explains how open table formats (for example Apache Iceberg) decouple storage from compute to enable interoperable, evolvable data tables for analytics and AI. It covers schema evolution, performance techniques (compaction, partitioning), streaming versus batch patterns, catalogs, governance, lineage, and cost-control practices needed to move AI from pilot to production.
The book compares open table formats and integration patterns with various compute engines, offering architecture guidance and real-world case studies to help practitioners choose and operate lakehouse components.
It emphasizes building governance, metadata, and lineage into lakehouse deployments from day one to meet security and compliance needs.
Practical chapters and quizzes reinforce concepts so readers can apply them to business problems and production systems.
The book compares open table formats and integration patterns with various compute engines, offering architecture guidance and real-world case studies to help practitioners choose and operate lakehouse components.
It emphasizes building governance, metadata, and lineage into lakehouse deployments from day one to meet security and compliance needs.
Practical chapters and quizzes reinforce concepts so readers can apply them to business problems and production systems.
Mentioned by
Mentioned in 0 episodes
Mentioned by 

to introduce the guest as the book's co-author and to frame the episode's topic about open lakehouse architecture.


Paul Muller

Open Lakehouse Architecture: How to Scale AI to Production




