The AI Forecast: Data and AI in the Cloud Era

Open Lakehouse Architecture: How to Scale AI to Production

Mar 4, 2026
Dipankar Mazumdar, Director of Developer Relations at Cloudera and co-author on lakehouse engineering, explains open lakehouse fundamentals. He covers Apache Iceberg and open table formats, schema evolution, multi-engine access without duplication, small-file pitfalls, cost control for AI compute, and baked-in governance and lineage for production AI.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ADVICE

Design For Change First

  • Design data systems expecting change: schema evolution, shifting requirements, and continuous ingestion.
  • Prioritize correctness and trust before chasing performance or cost optimizations in production AI pipelines.
ANECDOTE

Explainable AI Thesis Led To Developer Advocacy

  • Dipankar built an explainable-AI tool for his master's thesis that visualized why a model denied a loan and open sourced it.
  • That project led him into community work, papers, and eventually a transition into developer advocacy roles.
INSIGHT

Lakehouse Unifies Lakes And Warehouses

  • A lakehouse merges the flexibility of data lakes with the management features of data warehouses to serve analytics and ML on one platform.
  • Open table formats (like Apache Iceberg) decouple storage from compute so multiple engines can access the same data without copying.
Get the Snipd Podcast app to discover more snips from this episode
Get the app