Data Engineering Weekly

Insights from Jacopo Tagliabue, CTO of Bauplan: Revolutionizing Data Pipelines with Functional Data Engineering

Aug 13, 2025
Jacopo Tagliabue, CTO of Bauplan, is an experienced entrepreneur revolutionizing data engineering. He discusses his journey from NLP to simplifying data pipelines, emphasizing the need for efficiency. Jacopo reveals how Bauplan automates tasks, allowing engineers to focus on business logic. He covers innovative features like data versioning and collaboration using Git principles. Additionally, he highlights the integration of Apache Arrow for enhanced performance, showcasing how Bauplan stands out against traditional tools. Tune in for fresh insights into the future of data engineering!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Arrow As The Zero-Copy Lingua Franca

  • Bauplan functions accept tables (Arrow) as inputs and return tables, enabling zero-copy transfers between steps.
  • Using Arrow as a lingua franca unlocks fast, zero-serialization passes and efficient vectorized checks.
ADVICE

Use Branches For Features, Experiments, Debugging

  • Treat branches like feature sandboxes, experiments, or debug workspaces and delete them after merging.
  • Rebase and rerun when merges conflict; Bauplan enforces conservative merging to avoid unsafe writes.
INSIGHT

Vertical Scaling Suits Most Pipelines

  • Bauplan schedules functions to run vertically on a single host and scales by sizing hosts, not by distributed shuffles.
  • For most pipelines under ~100 GB this vertical model is faster and simpler than spinning distributed clusters.
Get the Snipd Podcast app to discover more snips from this episode
Get the app