
Talk Python To Me #454: Data Pipelines with Dagster
4 snips
Mar 21, 2024 Explore data pipelines with Python using Dagster, featuring insights from Pedram Navid. Learn about building efficient data pipelines, orchestrating automation, optimizing deployments with Posit Connect, backfills, partitioning, and popular data tools like DBT, DuckDB, Apache Arrow, and Pollers.
AI Snips
Chapters
Transcript
Episode notes
From Bank Scripting To Dagster Through Product Needs
- Pedram started automating a tedious bank ops task with Python, which led to GUI tools and eventually a career in data engineering.
- His early laziness-driven automation evolved into work on Airflow and later joining Dagster Labs.
Reuse Existing Python Connectors As Resources
- Use the Python packages you already know inside Dagster rather than hunting for custom providers.
- Install Snowflake or other connectors and call them as resources; Dagster can persist via IO managers if you prefer.
Materialization Emits Valuable Asset Metadata
- Materializing an asset means running it and capturing emitted metadata about the result.
- That metadata (timestamps, row counts) builds a lifetime history you can use to spot regressions or performance drift.
