Your next ETL pipeline will be serverless

11 snips

Jul 4, 2025

Poonam Pratik Patel, Director at The Line Tech UK and AWS Community Builder, dives into the world of serverless ETL implementation. She shares insights on how serverless architectures can streamline data processing while ensuring accuracy. The discussion includes practical strategies for data validation and partitioning, alongside the integration of AWS tools like Glue and Lambda. Poonam also highlights the transformative role of AI and ML in the evolution of data pipelines, making them more efficient and scalable.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Embrace Serverless for ETL

Use AWS serverless services like Lambda and Step Functions to eliminate infrastructure management in ETL.
Focus on defining data flow and validation, not on backend infrastructure sizing or maintenance.

ANECDOTE

Pipeline Orchestrated by Step Functions

Data from branches is collected into a single S3 bucket which triggers a Step Function workflow.
The workflow invokes Lambda functions that validate and process data, moving invalid files to a separate bucket.

ADVICE

Use Pandas in Lambda for Validation

Pass entire files from S3 to Lambda for processing and use libraries like Pandas for CSV data operations.
Design Lambda functions to break down data and validate it efficiently within size limits.

Get the Snipd Podcast app to discover more snips from this episode

Get the app