
The AI Forecast: Data and AI in the Cloud Era Open Lakehouse Architecture: How to Scale AI to Production
Mar 4, 2026
Dipankar Mazumdar, Director of Developer Relations at Cloudera and co-author on lakehouse engineering, explains open lakehouse fundamentals. He covers Apache Iceberg and open table formats, schema evolution, multi-engine access without duplication, small-file pitfalls, cost control for AI compute, and baked-in governance and lineage for production AI.
AI Snips
Chapters
Books
Transcript
Episode notes
Design For Change First
- Design data systems expecting change: schema evolution, shifting requirements, and continuous ingestion.
- Prioritize correctness and trust before chasing performance or cost optimizations in production AI pipelines.
Explainable AI Thesis Led To Developer Advocacy
- Dipankar built an explainable-AI tool for his master's thesis that visualized why a model denied a loan and open sourced it.
- That project led him into community work, papers, and eventually a transition into developer advocacy roles.
Lakehouse Unifies Lakes And Warehouses
- A lakehouse merges the flexibility of data lakes with the management features of data warehouses to serve analytics and ML on one platform.
- Open table formats (like Apache Iceberg) decouple storage from compute so multiple engines can access the same data without copying.





