Preparing Data Science Projects for Production

62 snips

Nov 14, 2025

Khuyen Tran, an author and data science practitioner from CodeCut, shares her insights on preparing Python projects for production. She discusses her journey into blogging and the motivation behind her book, "Production Ready Data Science." Key topics include common pitfalls of notebooks and the importance of reproducible workflows. Khuyen advocates for using modular coding practices and the benefits of tools like Polars and marimo notebooks for efficiency. She also emphasizes the significance of version control and proper testing in data science projects.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ANECDOTE

Startup Friction Motivated The Book

Khuyen described being a data scientist at a startup where engineers rewrote scientists' code for production.
That experience motivated her to teach cleaner, importable code practices in her book.

ADVICE

Use Lockfiles For Reproducible Environments

Use tools like uv that provide both a flexible pyproject and a reproducible lock file to pin exact dependencies.
Prefer a lockfile when you need byte-for-byte reproducibility across platforms.

ADVICE

Try Polars For Faster, Lazy DataFrames

Try Polars for faster, more expressive DataFrame operations and lazy execution to limit memory use.
Use Polars when you need efficient, chainable transformations without PySpark setup overhead.

Get the Snipd Podcast app to discover more snips from this episode

Get the app