
Postgres FM pg_duckdb
Jan 3, 2025
Jelte Fennema-Nio, an engineer at MotherDuck experienced with DuckDB and pg_duckdb. Joe Sciarrino, an engineer with Hydra who helped start pg_duckdb and plans managed offerings. They explain pg_duckdb's design and how DuckDB is embedded in Postgres. They talk about Parquet/CSV and S3 analytics, joining Postgres tables with object storage, performance expectations, tuning threads/memory, and managed service plans.
AI Snips
Chapters
Transcript
Episode notes
Query Stealing Lets DuckDB Do The Heavy Lifting
- pg_duckdb flips the usual FDW model: it pushes the query into DuckDB and lets DuckDB pull Postgres data rather than mapping remote tables into Postgres.
- This 'query stealing' approach uses DuckDB's execution engine for sorting, grouping, and joins to speed analytics.
Archive Events To Parquet And Query In Place
- Offload large event or time-series data to Parquet on S3 and join with Postgres via pg_duckdb to avoid ETL silos.
- Joe and Jelte recommend archiving old events to object storage and querying them alongside live Postgres rows when needed.
Time Series Data Benefits Most From Columnar Offload
- Time-series and event data are the most common beneficial use cases for columnar offload because old data is seldom updated and often aggregated.
- Jelte and Joe point to faster ad-hoc aggregates as the primary win versus row-store lookups.


