Data Engineering Podcast

Duck Lake: Simplifying the Lakehouse Ecosystem

81 snips
Sep 10, 2025
Hannes Mühleisen and Mark Raasveldt, key figures behind DuckDB, dive into their latest project, Duck Lake, aiming to simplify the lakehouse ecosystem. They discuss how Duck Lake stands out with its unified SQL database, making metadata management a breeze. The duo shares their vision for decentralized processing, local-first data architecture, and benefits like data inlining and encryption. They also touch on its seamless integration with existing systems, showcasing how it can transform data workflows and enhance user experiences.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Scale Duck Lake To Your Needs

  • Start small or large: Duck Lake scales from a single-line local attach to thousands of nodes and massive storage.
  • Choose deployment weight that matches your team skills and growth plans to avoid unnecessary infrastructure overhead.
INSIGHT

Concept vs Implementation Distinction

  • Duck Lake is a protocol concept and an implementation; the concept supports distributed engines while the DuckDB extension currently runs single-node.
  • Other engines (Spark, Trino) can query Duck Lake metadata and scale out queriess with lightweight adapters.
ADVICE

Reuse DuckDB To Integrate Engines

  • Leverage DuckDB as an embeddable gateway on worker nodes to avoid writing custom connectors for each engine.
  • Use DuckDB extension to register external parquet files so other engines can read them via Duck Lake metadata.
Get the Snipd Podcast app to discover more snips from this episode
Get the app