The Data Engineering Show

Block Bad Data Before the Write with Nike’s Ashok Singamaneni

Oct 7, 2025
Ashok Singamaneni, Principal Data Engineer at Nike and creator of Spark Expectations and BrickFlow, discusses preventing bad data writes and improving pipeline reliability. He explains treating ingestion and transformation like a software product. Topics include rule types for checks, running validations before final writes, decorator-based integration to avoid double scans, performance trade-offs, and cautious use of generative AI tools.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Treat Bronze/Silver Like A Software Product

  • Treat ingestion and transformation layers as a software product with tests and checks before final writes.
  • Ashok built Spark Expectations after seeing Databricks DLT to run DQ checks pre-write in Spark so bad data never lands in final tables.
ADVICE

Run Full Prewrite Checks With Defined Failure Actions

  • Run row-level, aggregation-level, and query-level DQ checks on the full dataframe before final write and choose actions: ignore, drop, or fail.
  • Use error tables for dropped records and fail jobs for mission-critical failures to avoid costly recomputes.
ADVICE

Hook Checks Into Jobs With A Decorator

  • Integrate Spark Expectations using a Python decorator so checks run when your function returns a dataframe, avoiding separate scans.
  • Place checks at the final write step to limit overhead and optimize aggregation queries carefully.
Get the Snipd Podcast app to discover more snips from this episode
Get the app