
Tech on the Rocks From pandas to Arrow: Wes McKinney on the Future of Data Infrastructure
Dec 1, 2025
Wes McKinney, creator of pandas and co-creator of Apache Arrow and Ibis, is a long-time leader in the Python data ecosystem. He walks through pandas’ UX-driven origins and the move to columnar in-memory Arrow. Conversations cover Arrow vs Parquet, new GPU-friendly file encodings, big metadata and table formats, Rust query engines like DataFusion, and how AI agents are changing developer workflows.
AI Snips
Chapters
Books
Transcript
Episode notes
Arrow Enables Zero Copy Cross Language Transfer
- Apache Arrow defines an in-memory columnar layout with record batches and schemas to enable zero-copy cross-process and cross-language data transfer.
- Arrow IPC uses a small metadata prefix and flatbuffers so receivers can map buffers into language objects without rehydration cost.
Arrow Versus Parquet Tradeoffs
- Parquet is optimized for compact on-disk storage via dictionary, RLE and general-purpose compression, requiring decoding work on read.
- Arrow intentionally stores fully rehydrated memory layouts to avoid decode costs and favor modern high-bandwidth, parallel compute.
Next Gen File Formats Tackle Parquet Limits
- New file formats aim to replace Parquet's costly metadata and general-purpose compression with lightweight GPU/CPU-friendly encodings and better random access.
- Problems include wide schemas, metadata deserialization, unknown memory needs and poor GPU decode predictability.



