pg_duckdb

Jan 3, 2025

Guest

Jelte Fennema-Nio

Guest

Joe Sciarrino

Jelte Fennema-Nio, an engineer at MotherDuck experienced with DuckDB and pg_duckdb. Joe Sciarrino, an engineer with Hydra who helped start pg_duckdb and plans managed offerings. They explain pg_duckdb's design and how DuckDB is embedded in Postgres. They talk about Parquet/CSV and S3 analytics, joining Postgres tables with object storage, performance expectations, tuning threads/memory, and managed service plans.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Query Stealing Lets DuckDB Do The Heavy Lifting

pg_duckdb flips the usual FDW model: it pushes the query into DuckDB and lets DuckDB pull Postgres data rather than mapping remote tables into Postgres.
This 'query stealing' approach uses DuckDB's execution engine for sorting, grouping, and joins to speed analytics.

ADVICE

Archive Events To Parquet And Query In Place

Offload large event or time-series data to Parquet on S3 and join with Postgres via pg_duckdb to avoid ETL silos.
Joe and Jelte recommend archiving old events to object storage and querying them alongside live Postgres rows when needed.

INSIGHT

Time Series Data Benefits Most From Columnar Offload

Time-series and event data are the most common beneficial use cases for columnar offload because old data is seldom updated and often aggregated.
Jelte and Joe point to faster ad-hoc aggregates as the primary win versus row-store lookups.

Get the Snipd Podcast app to discover more snips from this episode

Get the app