The AWS Developers Podcast

Your next ETL pipeline will be serverless

11 snips
Jul 4, 2025
Poonam Pratik Patel, Director at The Line Tech UK and AWS Community Builder, dives into the world of serverless ETL implementation. She shares insights on how serverless architectures can streamline data processing while ensuring accuracy. The discussion includes practical strategies for data validation and partitioning, alongside the integration of AWS tools like Glue and Lambda. Poonam also highlights the transformative role of AI and ML in the evolution of data pipelines, making them more efficient and scalable.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Embrace Serverless for ETL

  • Use AWS serverless services like Lambda and Step Functions to eliminate infrastructure management in ETL.
  • Focus on defining data flow and validation, not on backend infrastructure sizing or maintenance.
ANECDOTE

Pipeline Orchestrated by Step Functions

  • Data from branches is collected into a single S3 bucket which triggers a Step Function workflow.
  • The workflow invokes Lambda functions that validate and process data, moving invalid files to a separate bucket.
ADVICE

Use Pandas in Lambda for Validation

  • Pass entire files from S3 to Lambda for processing and use libraries like Pandas for CSV data operations.
  • Design Lambda functions to break down data and validate it efficiently within size limits.
Get the Snipd Podcast app to discover more snips from this episode
Get the app