Talk Python To Me

#516: Accelerating Python Data Science at NVIDIA

61 snips
Aug 19, 2025
Ben Zaitlen, a system software manager at NVIDIA with over 15 years in the Python ecosystem, discusses revolutionary advancements in GPU-accelerated data science. He unpacks RAPIDS, an open-source toolkit that supercharges popular libraries like pandas and scikit-learn. Listeners learn about the challenges and triumphs of GPU integration, including speed boosts that reduce hours of work to mere minutes. The conversation also covers scaling techniques for large datasets and the exciting future of using GPUs to revolutionize AI workloads.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Use Zero Code Change First

  • Try the zero-code-change path like cudf.pandas to run existing pandas code on GPUs.
  • Fall back to CPU automatically when unsupported, letting you test without rewriting imports.
ADVICE

Read CSVs On GPU For Big Files

  • Use cudf's GPU CSV reader to accelerate large CSV parsing and type inference.
  • The reader parallelizes string parsing and conversions, giving large files big speedups versus pandas.
INSIGHT

Zero-Copy Interop Is Foundational

  • RAPIDS relies on Apache Arrow for zero-copy interoperability across libraries.
  • Ben Zaitlen emphasizes Arrow, DLPack and array-dispatch standards as key to smooth GPU integration.
Get the Snipd Podcast app to discover more snips from this episode
Get the app