#516: Accelerating Python Data Science at NVIDIA

61 snips

Aug 19, 2025

Ben Zaitlen, a system software manager at NVIDIA with over 15 years in the Python ecosystem, discusses revolutionary advancements in GPU-accelerated data science. He unpacks RAPIDS, an open-source toolkit that supercharges popular libraries like pandas and scikit-learn. Listeners learn about the challenges and triumphs of GPU integration, including speed boosts that reduce hours of work to mere minutes. The conversation also covers scaling techniques for large datasets and the exciting future of using GPUs to revolutionize AI workloads.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Use Zero Code Change First

Try the zero-code-change path like cudf.pandas to run existing pandas code on GPUs.
Fall back to CPU automatically when unsupported, letting you test without rewriting imports.

ADVICE

Read CSVs On GPU For Big Files

Use cudf's GPU CSV reader to accelerate large CSV parsing and type inference.
The reader parallelizes string parsing and conversions, giving large files big speedups versus pandas.

INSIGHT

Zero-Copy Interop Is Foundational

RAPIDS relies on Apache Arrow for zero-copy interoperability across libraries.
Ben Zaitlen emphasizes Arrow, DLPack and array-dispatch standards as key to smooth GPU integration.

Get the Snipd Podcast app to discover more snips from this episode

Get the app