How AI Is Built

#006 Data Orchestration Tools, Choosing the right one for your needs

16 snips
May 10, 2024
John Wessel, founder of Agreeable Data, discusses the evolution of data orchestration tools, the popularity of Apache Airflow, and the challenges of choosing the right orchestrator. They also explore the components of a data orchestrator, the role of AI in data orchestration, managing orchestrators, monitoring, and the future of orchestration tools.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Centralize For Visibility And Error Handling

  • Use an orchestrator to centralize dependencies, errors, and observability across ETL/ELT steps.
  • Central orchestration improves visibility compared with disparate, separate job executions.
ADVICE

Start Serverless, Move To K8s If Needed

  • Prefer serverless (Lambda/GCP/Azure equivalents) for as long as practical to reduce ops burden.
  • Move to Kubernetes or managed VMs when workloads need more control or persistent resources.
INSIGHT

Resource Limits Drive Orchestrator Choice

  • Memory and time limits shape orchestrator and deployment choices, especially for AI workloads.
  • Understanding job memory/time needs upfront prevents surprising serverless limits and failed runs.
Get the Snipd Podcast app to discover more snips from this episode
Get the app