Talk Python To Me

#454: Data Pipelines with Dagster

4 snips
Mar 21, 2024
Explore data pipelines with Python using Dagster, featuring insights from Pedram Navid. Learn about building efficient data pipelines, orchestrating automation, optimizing deployments with Posit Connect, backfills, partitioning, and popular data tools like DBT, DuckDB, Apache Arrow, and Pollers.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

From Bank Scripting To Dagster Through Product Needs

  • Pedram started automating a tedious bank ops task with Python, which led to GUI tools and eventually a career in data engineering.
  • His early laziness-driven automation evolved into work on Airflow and later joining Dagster Labs.
ADVICE

Reuse Existing Python Connectors As Resources

  • Use the Python packages you already know inside Dagster rather than hunting for custom providers.
  • Install Snowflake or other connectors and call them as resources; Dagster can persist via IO managers if you prefer.
INSIGHT

Materialization Emits Valuable Asset Metadata

  • Materializing an asset means running it and capturing emitted metadata about the result.
  • That metadata (timestamps, row counts) builds a lifetime history you can use to spot regressions or performance drift.
Get the Snipd Podcast app to discover more snips from this episode
Get the app