
Super Data Science: ML & AI Podcast with Jon Krohn 815: Polars: Faster DataFrame Ops, with Marco Gorelli
14 snips
Sep 3, 2024 In this enlightening discussion, Marco Gorelli, a Senior Software Engineer at Quansight Labs and a core developer of the Polars and Narwhals libraries, shares his insights on optimizing data operations. He explains when to use Polars over Pandas and its unique features like lazy evaluation and string optimizations. Marco also delves into the Narwhals library, bridging compatibility with Pandas. He shares his strategies for winning forecasting competitions and addresses the need for greater diversity in data science. Prepare for a deep dive into the future of data manipulation!
AI Snips
Chapters
Books
Transcript
Episode notes
String Optimization in Polars
- Polars optimizes string operations using a specialized string storage different from Pandas and NumPy.
- This leads to substantial performance gains, even in queries without explicit string operations.
Geocoding Case Study
- Marco used Polars and custom Rust plugins to perform geocoding on hundreds of gigabytes of data within AWS Lambda's constraints.
- This showcased Polars's extensibility, lazy evaluation, and efficiency in a resource-limited environment.
Time Zone Management
- Avoid manually managing time zones due to their complexity and potential errors.
- Leverage libraries like Polars, which provides reliable, time-zone-aware operations.








