The Data Exchange with Ben Lorica

Ben Lorica

A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].

Episodes

Mentioned books

Sep 2, 2021 • 54min

The State of Data Journalism

This week our special correspondent and editor Jenn Webb speak with Tara Kelly, Data Editor at DataJournalism.com (DJC) an organization created by the European Journalism Centre. DJC provides journalists and media groups with free resources, materials, online video courses and community forums. Most recently they created two free e-books: The Verification Handbook and an updated edition of the Data Journalism Handbook.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Aug 26, 2021 • 52min

Auditing machine learning models for discrimination, bias, and other risks

This week’s guest are Rayid Ghani, Distinguished Career Professor in the Machine Learning Department and the Heinz College of Information Systems and Public Policy at Carnegie Mellon University, and Andrew Burt, co-founder and Managing Partner of BNH.ai, a new law firm focused on AI compliance, risk mitigation, and related topics. BNH is the first law firm run by lawyers and technologists focused on helping companies identify and mitigate risks associated with machine learning and AI.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Aug 19, 2021 • 50min

An oscilloscope for deep learning

This week’s guest is Charles Martin, independent researcher and founder of Calculation Consulting, a boutique consultancy focused on data science and machine learning. Along with Michael Mahoney and Serena Peng, Charles is co-author of a recent Nature paper on new methods for evaluating and tuning deep learning models (“Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data”).Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Aug 12, 2021 • 37min

What’s new in data engineering

This week our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and Jesse Anderson, Managing Director at the Big Data Institute. Jesse is the author of a recent book entitled “Data Teams: A Unified Management Model for Successful Data-Focused Teams”. This conversation was focused on key areas in data engineering.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Aug 5, 2021 • 50min

The evolution of the data science role and of data science tools

This week our managing editor Jenn Webb and I speak with Sean Taylor, Data Science Manager at Lyft. Sean was previously a research scientist and manager at Facebook where he was instrumental in the creation and release of Prophet, a very popular open source library for time-series forecasting.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jul 29, 2021 • 52min

Data Augmentation in Natural Language Processing

This week’s guests are Steven Feng, Graduate Student and Ed Hovy, Research Professor, both from the Language Technologies Institute of Carnegie Mellon University. We discussed their recent survey paper on Data Augmentation Approaches in NLP (GitHub), an active field of research on techniques for increasing the diversity of training examples without explicitly collecting new data. One key reason why such strategies are important is that augmented data can act as a regularizer to reduce overfitting when training models.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jul 22, 2021 • 43min

Storage Technologies for a Multi-cloud World

This week’s guest is Brad King, CTO of Scality, a company that builds software-defined file and object storage systems for hybrid & multi-cloud settings. Storage and compute are the basic building blocks of (cloud) computing platforms and this episode highlights all the important considerations and recent innovations in storage technologies that data engineers, architects, and machine learning professionals need to know.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jul 15, 2021 • 49min

Building a next-generation dataflow orchestration and automation system

In this episode, our managing editor Jenn Webb and I speak with Chris White, CTO of Prefect, a startup building tools to help companies build, monitor, and manage dataflows. Prefect originated from lessons Chris and his co-founder learned while they were at Capital One, where they were early users and contributors to related projects like Apache Airflow.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jul 8, 2021 • 44min

Building a flexible, intuitive, and fast forecasting library

This week’s guests are Reza Hosseini, Staff Software Engineer, and Albert Chen, Staff Data Scientist, both at Linkedin. Reza and Albert are part of the team behind the new open source library Greykite, a flexible and fast library for time-series forecasting.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jul 1, 2021 • 44min

Neural Models for Tabular Data

This week’s guest is Sercan Arik, Research Scientist at Google Cloud AI. Sercan and his collaborators recently published a paper on TabNet, a deep neural network architecture for tabular data. It uses sequential attention to select features, is explainable, and based on tests Sarjan and team have done spanning many domains, TabNet outperforms or is on par with other models (e.g., XGBoost) on classification and regression problems.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner