Talk Python To Me

#454: Data Pipelines with Dagster

4 snips

Mar 21, 2024

Explore data pipelines with Python using Dagster, featuring insights from Pedram Navid. Learn about building efficient data pipelines, orchestrating automation, optimizing deployments with Posit Connect, backfills, partitioning, and popular data tools like DBT, DuckDB, Apache Arrow, and Pollers.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

From Bank Scripting To Dagster Through Product Needs

Pedram started automating a tedious bank ops task with Python, which led to GUI tools and eventually a career in data engineering.
His early laziness-driven automation evolved into work on Airflow and later joining Dagster Labs.

ADVICE

Reuse Existing Python Connectors As Resources

Use the Python packages you already know inside Dagster rather than hunting for custom providers.
Install Snowflake or other connectors and call them as resources; Dagster can persist via IO managers if you prefer.

INSIGHT

Materialization Emits Valuable Asset Metadata

Materializing an asset means running it and capturing emitted metadata about the result.
That metadata (timestamps, row counts) builds a lifetime history you can use to spot regressions or performance drift.

Get the Snipd Podcast app to discover more snips from this episode