The Real Python Podcast

Preparing Data Science Projects for Production

62 snips
Nov 14, 2025
Khuyen Tran, an author and data science practitioner from CodeCut, shares her insights on preparing Python projects for production. She discusses her journey into blogging and the motivation behind her book, "Production Ready Data Science." Key topics include common pitfalls of notebooks and the importance of reproducible workflows. Khuyen advocates for using modular coding practices and the benefits of tools like Polars and marimo notebooks for efficiency. She also emphasizes the significance of version control and proper testing in data science projects.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

Startup Friction Motivated The Book

  • Khuyen described being a data scientist at a startup where engineers rewrote scientists' code for production.
  • That experience motivated her to teach cleaner, importable code practices in her book.
ADVICE

Use Lockfiles For Reproducible Environments

  • Use tools like uv that provide both a flexible pyproject and a reproducible lock file to pin exact dependencies.
  • Prefer a lockfile when you need byte-for-byte reproducibility across platforms.
ADVICE

Try Polars For Faster, Lazy DataFrames

  • Try Polars for faster, more expressive DataFrame operations and lazy execution to limit memory use.
  • Use Polars when you need efficient, chainable transformations without PySpark setup overhead.
Get the Snipd Podcast app to discover more snips from this episode
Get the app