

Data Renegades
Heavybit
Exploring data, code, culture, and everything in between.
Episodes
Mentioned books

Apr 28, 2026 • 52min
Ep. #11, Contrarian Bets and AI Skepticism with Michael Stonebraker
Michael Stonebraker, legendary database pioneer behind Ingres and PostgreSQL, shares his long arc in data systems. He discusses why specialized database engines beat one-size-fits-all platforms. He critiques AI hype around applying LLMs to messy enterprise warehouses. He recounts startup lessons, research pivots, and what keeps him curious after five decades.

Apr 7, 2026 • 38min
Ep. #10, Data Modeling Matters Most with Toby Mao
On episode 10 of Data Renegades, CL Kao and Dori Wilson sit down with Toby Mao. They explore how real-world data problems at Netflix and Airbnb led to the creation of SQLGlot and SQLMesh, and why data modeling remains the hardest challenge in data engineering. The conversation also dives into how AI is reshaping engineering workflows, without replacing the need for strong architectural thinking.The post appeared first on Heavybit.

Mar 3, 2026 • 59min
Ep. #9, Radical Accountability in Software with Wes McKinney
Wes McKinney, creator of Pandas and co-creator of Apache Arrow, is a data infrastructure and tooling architect. He explores how AI coding agents change software creation. He discusses why data infrastructure resists full automation. He explains semantic layers, text-to-SQL limits, and why AI will raise accountability and the bar for startups.

4 snips
Feb 17, 2026 • 45min
Ep. #8, One Human Plus Agents with Scott Breitenother
Scott Breitenother, co-founder of Kilo Code and ex-founder of Brooklyn Data, builds AI coding agents and data consultancies. He talks about AI reshaping the data stack, one human working with many assistants, and AI as an exoskeleton for engineers. He covers documentation-first culture, removing bureaucratic gates, and long-running autonomous agents.

15 snips
Feb 3, 2026 • 1h 13min
Ep. #7, Truth-Seeking Data Systems with Bryan Bischof
Bryan Bischof, a PhD mathematician turned AI and ML builder, has built recommendation and streaming systems for startups and taught data science. He discusses building terabyte-scale streaming inference, surprising predictive features, debugging recommendations gone wrong, experiment design to settle disputes, and why truth-seeking should guide data teams.

12 snips
Jan 13, 2026 • 57min
Ep. #6, From Big Data to Curiosity-Driven Insight with Roger Magoulas
Roger Magoulas, a seasoned data industry leader who helped popularize 'big data', dives deep into the complexities facing data organizations today. He reveals how overwhelming tool pipelines hinder insight generation and advocates for a shift towards curiosity-driven insights. Roger shares his journey from early data warehousing to launching Strata and highlights the importance of storytelling in analytics. He also emphasizes hiring generalists and leveraging AI for efficiency, all while stressing the value of curiosity in data roles.

Dec 23, 2025 • 1h 10min
Ep. #5, The Identity Crisis of BI with Benn Stancil
Benn Stancil, Co-founder of Mode and an insightful commentator on data and AI, dives into the evolving identity of data tools. He discusses the importance of maintaining a clear product vision and the perils of 'identity bloat' when catering to diverse user requests. Benn reflects on the challenges in data analysis and the need for better storytelling as AI influences data interpretation. He also touches on the potential for large language models to shift how executives access insights, emphasizing that meaningful data successes remain rare.

10 snips
Dec 9, 2025 • 46min
Ep. #4, Streaming Made Practical with Micah Wylde
Micah Wylde, founder of Arroyo and former engineer at Cloudflare, shares his journey from building fraud detection at Sift Science to creating massive real-time systems at Lyft. He discusses the complexity of streaming systems, emphasizing that schema evolution remains a tough challenge. Micah argues for SQL-first streaming, critiques current CDC tooling, and highlights the importance of treating data outputs as products. He also predicts a shift in data architecture towards open formats and the impact of AI on data consumption.

Nov 25, 2025 • 53min
Ep. #3, Building Tools That Shape Data with Maxime Beauchemin
In this engaging conversation, guest Maxime Beauchemin, the creator of Apache Airflow and Superset, shares his journey from building data warehouses at Ubisoft to revolutionizing data tooling. He reveals the inspiration behind Airflow, the challenges of scalable BI, and his mission with Preset to disrupt conventional analytics. Maxime discusses the implications of AI on data practices, the future of data roles, and the importance of open-source governance in fostering healthy data teams. A treasure trove of insights for data enthusiasts!

21 snips
Nov 25, 2025 • 57min
Ep. #2, Data Journalism Unleashed with Simon Willison
Simon Willison, a prominent open-source software developer and data journalism advocate, shares his journey from creating Django to building Datasette. He discusses the evolution of data journalism and highlights impactful projects like the Washington Post's opioid investigation. The conversation explores how open-source tools can empower newsrooms, the potential of AI in automating data cleaning, and innovative uses of Datasette for diverse purposes. Simon also predicts a future where AI reshapes data workflows and enriches team capabilities.


