The AI Forecast: Data and AI in the Cloud Era

Open Lakehouse Architecture: How to Scale AI to Production

Mar 4, 2026

Dipankar Mazumdar, Director of Developer Relations at Cloudera and co-author on lakehouse engineering, explains open lakehouse fundamentals. He covers Apache Iceberg and open table formats, schema evolution, multi-engine access without duplication, small-file pitfalls, cost control for AI compute, and baked-in governance and lineage for production AI.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ADVICE

Design For Change First

Design data systems expecting change: schema evolution, shifting requirements, and continuous ingestion.
Prioritize correctness and trust before chasing performance or cost optimizations in production AI pipelines.

ANECDOTE

Explainable AI Thesis Led To Developer Advocacy

Dipankar built an explainable-AI tool for his master's thesis that visualized why a model denied a loan and open sourced it.
That project led him into community work, papers, and eventually a transition into developer advocacy roles.

INSIGHT

Lakehouse Unifies Lakes And Warehouses

A lakehouse merges the flexibility of data lakes with the management features of data warehouses to serve analytics and ML on one platform.
Open table formats (like Apache Iceberg) decouple storage from compute so multiple engines can access the same data without copying.

Get the Snipd Podcast app to discover more snips from this episode

Get the app