

The Data Stack Show
Rudderstack
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
Episodes
Mentioned books

4 snips
May 24, 2023 • 58min
139: Decoupling the Execution Engine From Python’s Pandas with Aditya Parameswaran of Ponder
Highlights from this week’s conversation include:Aditya’s background and journey in the data space (2:47)What does Ponder do? (5:18)101 on Pandas and why people utilize it (6:42)The challenge of translating Pandas to a big data platform (16:11)Data Warehouses and ML workflows (21:27)The differences in the “zoo” of data languages (26:56)Why do ML and data engineering have to be so different in languages? (34:39)Builders should be adapting to the users and not the other way around (39:32)Will we see a singular data interface in the future? (46:19)Aditya’s most surprising discovery in his research (50:40)Final thoughts and takeaways (53:18)Read more of Aditya's work: Pandas vs. SQL – Part 1: The Food Court and the Michelin-Style RestaurantPandas vs. SQL – Part 2: Pandas Is More ConcisePandas vs. SQL – Part 3: Pandas Is More FlexiblePandas vs. SQL – Part 4: Pandas Is More ConvenientThe Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

May 22, 2023 • 4min
The PRQL: Removing the Execution Engine Language Barrier with Aditya Parameswaran of Ponder
In this bonus episode, Eric and Kostas preview their upcoming conversation with Aditya Parameswaran of Ponder. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com
for information about our collection and use of personal data for
advertising.

May 17, 2023 • 1h 2min
138: Paradigm Shift: Batch to Data Streaming with A.J. Hunyady of InfinyOn
Highlights from this week’s conversation include:A.J.’s background and journey in data (2:23)Challenges with Hadoop ecosystem (8:50)Starting InfinyOn and the need for innovation (10:02)Challenges with Kafka and Microservices (14:01)Real-time data streaming for IoT devices (19:28)Paradigm shift to real-time data processing (22:17)Benefits of Rust (29:45)Web Assembly and Platform Features (36:29)Analytics and Event Correlation (40:16)Real-time data processing (47:03)ETL vs ELP (52:20)Final thoughts and takeaways (57:07)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

May 15, 2023 • 5min
The PRQL: Data Infrastructure Systems and the Rust / WebAssembly Combo with A.J. Hunyady of InfinyOn
In this bonus episode, Eric and Kostas preview their upcoming conversation with A.J. Hunyady, Founder and CEO of InfinyOn. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com
for information about our collection and use of personal data for
advertising.

May 10, 2023 • 59min
137: Data Collection Secrets & The Search Data Problem with Josh Wills
Highlights from this week’s conversation include:Josh’s background in data working at Google, Slack, and other companies (1:21)The need and process for high quality data (4:33)Digging into auction code (14:03)Joining Slack and working in the early days of the company (18:00)Not fighting the last war in data (25:42)Building a product, while using the product (30:35)Transitioning to the search team at Slack (36:50)Usage patterns of search (41:21)Josh’s work in helping build DuckDB (46:20)Having the right toolset to increase precision and efficiency (52:42)Final thoughts and takeaways (56:03)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

May 8, 2023 • 2min
The PRQL: Data Engineers in the Front End with Josh Wills
In this bonus episode, Eric previews his upcoming conversation with Josh Wills, an experienced data scientist who has worked with IBM, Google, Slack, DuckDB, and more. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com
for information about our collection and use of personal data for
advertising.

May 3, 2023 • 1h
136: System Evolution from Hadoop to RocksDB with Dhruba Borthakur of Rockset
Highlights from this week’s conversation include:Dhruba’s journey into the data space (2:02)The impact of Hadoop on the industry (3:37)Dhruba’s work in the early days of the Facebook team (7:54)Building and implementing RocksDB (14:33)Stories with Mark Zuckerberg at Facebook (24:25)The next evolution in storage hardware (26:14)How Rockset is different from other real-time platforms (33:13)Going from a key value store to an index (37:15)Where does Rockset go from here? (44:59)The success of RocksDB as an open source project (49:11)How do we properly steward real-time technology for impact (51:17)Final thoughts and takeaways (56:18)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

May 1, 2023 • 3min
The PRQL: Hardware Innovation Begets Software Innovation with Dhruba Borthakur Co-Founder and CTO, Rockset
In this bonus episode, Eric and Kostas preview their upcoming conversation with Dhruba Borthakur of Rockset. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com
for information about our collection and use of personal data for
advertising.

Apr 28, 2023 • 15min
Data Council Week (Ep 7) - What’s Next for Data Council? With Pete Soderling of Data Council
Highlights from this week’s conversation include:The origin story of Data Council (0:39)Developments for the future of Data Council (2:42)The emphasis of AI and ChatGPT at this year’s conference (3:54)The support of the data community (5:31)Biggest changes and innovations in the industry (7:10)What’s next for the Data Council? (10:46)Getting connected with Data Council (13:07)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

4 snips
Apr 27, 2023 • 40min
Data Council Week (Ep 6) - All About Debezium and Change Data Capture With Gunnar Morling of Decodable
Gunnar Morling discusses Debezium's replication of data, working with Kafka, importance of documentation in open-source projects, and the vision moving forward. They cover the challenges of CDC open-source solutions and the importance of building a diverse system with common interfaces.


