
The Real Python Podcast Preparing Data Science Projects for Production
62 snips
Nov 14, 2025 Khuyen Tran, an author and data science practitioner from CodeCut, shares her insights on preparing Python projects for production. She discusses her journey into blogging and the motivation behind her book, "Production Ready Data Science." Key topics include common pitfalls of notebooks and the importance of reproducible workflows. Khuyen advocates for using modular coding practices and the benefits of tools like Polars and marimo notebooks for efficiency. She also emphasizes the significance of version control and proper testing in data science projects.
AI Snips
Chapters
Books
Transcript
Episode notes
Startup Friction Motivated The Book
- Khuyen described being a data scientist at a startup where engineers rewrote scientists' code for production.
- That experience motivated her to teach cleaner, importable code practices in her book.
Use Lockfiles For Reproducible Environments
- Use tools like uv that provide both a flexible pyproject and a reproducible lock file to pin exact dependencies.
- Prefer a lockfile when you need byte-for-byte reproducibility across platforms.
Try Polars For Faster, Lazy DataFrames
- Try Polars for faster, more expressive DataFrame operations and lazy execution to limit memory use.
- Use Polars when you need efficient, chainable transformations without PySpark setup overhead.


