

The Data Exchange with Ben Lorica
Ben Lorica
A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].
Episodes
Mentioned books

Jul 16, 2020 • 44min
Democratizing machine learning
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Ameet Talwalkar, co-founder and Chief Scientist at Determined AI1, and an Assistant Professor in the Machine Learning Department at Carnegie Mellon University. A few months ago, I spoke with one of Ameet’s co-founders (Evan Sparks), around the time they announced that they were open sourcing the Determined Training Platform (DTP). Ameet and I started off by discussing the first few months of DTP as an open source project, specifically initial feedback from users, applications and use cases that they are seeing, and much more.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jul 9, 2020 • 50min
How graph technologies are being used to solve complex business problems
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Denise Gosnell, Chief Data Officer at DataStax. Denise is also the co-author of the new book, The Practitioner’s Guide to Graph Data, which covers foundational tools and techniques needed to utilize graph technologies in production applications. This conversation is a great introduction to what has become an important class of technologies and tools. Graph technologies are used to power a wide array of applications, including recommendation engines, fraud detection systems, identity and access management, search, and many other use cases.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jul 2, 2020 • 43min
Machines for unlocking the deluge of COVID-19 papers, articles, and conversations
In this episode of the Data Exchange I speak with Amy Heineike, Principal Product Architect at Primer.ai, a startup building machines that can read and write. Primer recently used their technology to build COVID-19 Primer, a web site that provides an overview of the latest research papers, media coverage, and social media conversations pertaining to COVID-19.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jun 25, 2020 • 34min
Designing machine learning models for both consumer and industrial applications
In this episode of the Data Exchange I speak with Christopher Nguyen, CEO of Arimo (a Panasonic company). I first met Christopher in the early days of Apache Spark, Arimo was one of the first companies to embrace Spark and make it a central component of their data platform. He was also an early proponent of exploring deep learning for enterprise applications. A serial entrepreneur, Christopher was also an Engineering Director at Google where he was responsible for Google Apps.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jun 18, 2020 • 44min
Building open source developer tools for language applications
In this episode of the Data Exchange I speak with Matthew Honnibal, founder of Explosion AI, a startup focused on building developer tools for AI and natural language processing. Matthew and team are the creators of popular tools like spaCy (NLP), Thinc (lightweight deep learning library), and Prodigy (annotation and active learning). Our conversation focused on a range of topics including:spaCyThincExplosion AI and ProdigyDistributed computing with RayDetailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jun 11, 2020 • 41min
Viewing machine learning and data science applications as sociotechnical systems
In this episode of the Data Exchange I speak with Chris Wiggins, Associate Professor at Columbia University, Chief Data Scientist at the New York Times, and co-founder of hackNY. He began his career in theoretical physics but he always had a strong interest in applying quantitative techniques to other disciplines. Early in his career he became interested in applications of machine learning to problems in biology and the health sciences.Our conversation focused on a range of topics including:How he shifted his focus from physics to machine learning and data science.Applications of reinforcement learning.“Data scientist” as a job title, and data science training programs.Ethics in machine learning and data science, including training the next generation of data scientists.A 2015 essay written by Michael Jordan and Tom Mitchell.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jun 4, 2020 • 35min
Identifying and mitigating liabilities and risks associated with AI
In this episode of the Data Exchange I speak with Andrew Burt, Chief Legal Officer at Immuta and co-founder and Managing Partner of BNH.ai, a new law firm focused on AI compliance and related topics. As AI and machine learning become more widely deployed, lawyers and technologists need to collaborate more closely so they can identify and mitigate liabilities and risks associated with AI. BNH is the first law firm run by lawyers and technologists focused on helping companies identify and mitigate those risks.Our conversation focused on a range of topics including:Why a law firm is the right vehicle for helping companies manage and mitigate risks associated with AI and machine learning.The legal profession’s long history in managing risk and regulatory frameworks.Model governance.Incident response and recovery.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

May 28, 2020 • 40min
How machine learning is being used in quantitative finance
In this episode of the Data Exchange our special correspondent and editor Jenn Webb speaks with Arum Verma, Head of Quantitative Research Solutions at Bloomberg. My first job post-academia was as lead quant in a small hedge fund. Since then, I’ve followed the industry from afar and I’ve long been interested in the role of data and models in financial services. Arun and I discussed quantitative finance when we ran into each other at the O’Reilly AI conference in London last year. He was slated to give a talk on extracting trading signals from alternative data sets, an important subject among quants.Jenn and Arun discussed a range of topics including:The quantitative finance landscape.The challenges in identifying and using alternative data sources.Applications of machine learning in finance, specifically deep learning and reinforcement learning.New natural language models and their applications in finance.Model Explainability and Model Risk Management.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

May 21, 2020 • 35min
Understanding machine learning model governance
In this episode of the Data Exchange I speak with Harish Doddi, cofounder of Datatron, a startup focused on helping companies operationalize machine learning. Over the past two years, Harish has worked closely with enterprises to understand their needs in the areas of model operations and model governance. Last year Harish and I, along with David Talby, wrote two articles on these topics. In the first article, we described these emerging areas (“What are model governance and model operations?”), and in the second we listed lessons that ML engineers can draw from two highly regulated industries (“Managing machine learning in the enterprise: Lessons from banking and health care”).As machine learning becomes widely deployed, organizations will need to develop processes and tools to ensure that models behave as intended. This means having the right set of controls and validation steps in place.Our conversation focused on model governance and related topics:We discussed the three related areas of MLOps, Model Governance, Model Observability.I asked Harish to describe how model governance is perceived and practiced in different industries.We discussed real-world examples of model governance, and organizational and staffing considerations that come into play.CI/CD for machine learning.Key enterprise features for model governance solutions.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

May 14, 2020 • 34min
Improving performance and scalability of data science libraries
In this episode of the Data Exchange I speak with Wes McKinney, Director of Ursa Labs and an Apache Arrow PMC Member. Wes is the creator of pandas, one of the most widely used Python libraries for data science. He is also the author of the best-selling book, “Python for Data Analysis” – a book that has become essential reading for both aspiring and experienced data scientists.Our conversation focused on data science tools and other topics including:Two open source projects Wes has long been associated with: pandas and Apache Arrow.The need for a shared infrastructure for data science.Ursa Labs: its mission and structure.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.


