

The Data Exchange with Ben Lorica
Ben Lorica
A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].
Episodes
Mentioned books

May 7, 2020 • 37min
Why TinyML will be huge
In this episode of the Data Exchange I speak with Pete Warden, Staff Research Engineer at Google. Pete is a prolific author and teacher, and he has made many important contributions across many open source software projects. To name just a couple of his projects: he put together the Data Science toolkit (open data sets and open-source tools for data science) and he assembled tools to help developers get started using deep learning, long before TensorFlow and PyTorch were available. Most recently, Pete has been focused on implementing machine learning in ultra-low power systems (TinyML).Our conversation focused on TinyML and other topics including:The early days of using deep learning for computer visionTensorFlow – Pete was part of the team at Google that originated TF.What is TinyML and why is going to be an important topic in the years ahead.Privacy and security in the context of TinyML.Pete’s new book and accompanying video series on YouTube, both designed to help developers get started building TinyML applications.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Apr 30, 2020 • 41min
An open source platform for training deep learning models
In this episode of the Data Exchange I speak with Evan Sparks, cofounder and CEO of Determined AI, a startup that recently open sourced a platform for training deep learning models. Many of the impressive results and applications of deep learning have happened at a handful of companies and research groups. As more companies use deep learning they are learning that infrastructure for training and transfer learning isn’t widely available.Our conversation focused on deep learning and other topics including:Their decision to open source the Determined Training Platform (DTP).Enterprise use cases and applications of deep learning, and why Evan thinks more companies will need a platform for training DL models.The components that come with the DTP: Distributed Training and Hyperparameter Tuning, Experiment Tracking and tools for collaboration and governance, Scheduler specialized for DL workflows, and more.Some examples of how teams have been using DTP.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Apr 23, 2020 • 44min
Algorithms that continually invent both problems and solutions
In this episode of the Data Exchange I speak with Kenneth Stanley, a Senior Research Manager at Uber AI and a Professor at UCF. Ken just announced that starting in June he is starting a new research group focused on open-endedness at OpenAI. He is a pioneer in the field of neuroevolution – a method for evolving and learning neural networks through evolutionary algorithms. Ken and his colleague, Joel Lehman, wrote one of my favorite books on AI aimed at a broad audience: Why Greatness Cannot Be Planned. In this episode we discuss his upcoming move to OpenAI, as well as his recent work on open-ended algorithms.Our conversation covered:Ken’s new position at OpenAI.The transition from being a longtime academic researcher to founding and helping lead an industrial research team (Uber AI Labs).Open-ended algorithms, specifically his work on POET (Paired Open-Ended Trailblazer) and Enhanced POET.Generative Teaching NetworksDetailed show notes can be found on The Data Exchange web site.Subscribe to The Data Exchange Newsletter.

Apr 16, 2020 • 35min
Computational Models and Simulations of Epidemic Infectious Diseases
In this episode of the Data Exchange I speak with Bruno Gonçalves, a data scientist working at the intersection of Data Science and Finance. I have known Bruno for several years and we met when I recruited him to teach several extremely popular conference tutorials and talks on machine learning and deep learning. Prior to shifting over to data science, he spent several years as a researcher focused on mathematical models in Epidemiology – a field with a rich history dating as far back as the 1920s. This episode is devoted to tools and techniques for modeling epidemics.Our conversation covered:Bruno’s background and his experience in modeling epidemics.The field of epidemic models: what techniques are used, the size of the community of researchers, and how do models get evaluated.His two recent posts: “Epidemic Modeling 101 – Or why your CoVID-19 exponential fits are wrong” and “Epidemic Modeling 102 – All CoVID-19 models are wrong, but some are useful”The role that epidemic models play in decision making.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Data Exchange Newsletter.

Apr 9, 2020 • 44min
Human-in-the-loop machine learning
In this episode of the Data Exchange I speak with Rob Munro, CEO of Machine Learning Consulting and author of the forthcoming book, “Human-in-the-loop Machine Learning”. If you want a copy of Rob’s book, use the discount code podexchange20.Our conversation covered:Rob’s experience building data and machine learning products at Powerset, Idibon, and AWS.Natural language processing - Given Rob’s extensive experience as a researcher, practitioner, and entrepreneur in areas that touch on NLP, we discussed recent trends in language technologies.Human-in-the-loop machine learning.Our goal in this podcast is to build a community of people interested in Data, Machine Learning and AI. If you have suggestions for us on what to recommend (books, conferences, links), and guests to book, please visit TheDataExchange.media site and fill out the “contact” form. The first five people who fill out the form get a free book from Manning (you can view Manning’s catalog here).Detailed show notes can be found on The Data Exchange web site.

Apr 2, 2020 • 40min
Next-generation simulation software will incorporate deep reinforcement learning
In this episode of the Data Exchange I speak with Chris Nicholson, founder and CEO of Pathmind, a startup applying deep reinforcement learning (DRL) to simulation problems. In a recent post I highlighted two areas where companies can begin to add DRL to their suite of tools: personalization and recommendation engines, and simulation software. My interest in the interplay between DRL and simulation software began when I came across the work of Pathmind in this area.Our conversation focused on deep reinforcement learning and its applications:We began with the basics: what is reinforcement learning and why should businesses pay attention to it?We discussed enterprise applications of DRL, with particular emphasis in areas where Chris and Pathmind have been focused of late: Business Process Simulation and Optimization.Pathmind have been early adopters of Ray and of RLlib, a popular open-source library for reinforcement learning built on top of Ray. I asked Chris why they chose to build on top of RLlib.Detailed show notes can be found on The Data Exchange web site.

Mar 26, 2020 • 37min
Business at the speed of AI: Lessons from Shopify
In this episode of the Data Exchange I speak with Solmaz Shahalizadeh, VP and Head of Data Science and Data Platform Engineering at Shopify. Shopify is a powerhouse in ecommerce and their technology powers over a million businesses worldwide. Solmaz is a frequent speaker and presenter at conferences throughout the world and she has played a critical role in helping Shopify scale its data and machine learning infrastructure.Our conversation covered many important technical and business topics including:Building and scaling machine learning data products.Building and scaling data teams.Data informed product building.Detailed show notes can be found on The Data Exchange web site.

Mar 19, 2020 • 40min
How deep learning is being used in search and information retrieval
In this episode of the Data Exchange I speak with Edo Liberty, founder of Hypercube, a startup building tools for deploying deep learning models in search and information retrieval involving large collections. When I spoke at AI Week in Tel Aviv last November several friends encouraged me to learn more about Hypercube - I’m glad I took their advice!Our conversation covered several topics including:Edo’s experience applying machine learning and building tools for ML at places like Yale, Yahoo's Research Lab in New York, and Amazon’s AI Lab.How deep learning is being used in search and information retrieval.Challenges one faces in building search and information retrieval applications when the size of collections are large.Deep learning based search and information retrieval and what Edo describes as “enterprise end-to-end deep search platforms”.Detailed show notes can be found on The Data Exchange web site.

Mar 12, 2020 • 39min
The responsible development, deployment and operation of machine learning systems
In this episode of the Data Exchange I speak with Alejandro Saucedo, Engineering Director at Seldon, a startup building tools for productionizing machine learning. Alejandro is also Chief Scientist at The Institute for Ethical AI & Machine Learning, a UK-based research center that conducts “research into processes and frameworks that support the responsible development, deployment and operation of machine learning systems”.Our conversation covered Alejandro’s work at both Seldon and the Institute for Ethical AI & Machine Learning:We discussed topic areas that the Institute focuses on including explainability, MLOps, adversarial robustness, and privacy-preserving machine learningWe covered some of the recent output from the Institute including the machine learning maturity model, their open source explainable AI library, their AI-RFX Procurement Framework, and their list of Principles for Responsible AIWe also discussed his role at Seldon, and areas that Seldon has been focused on.Detailed show notes can be found on The Data Exchange web site.

Mar 5, 2020 • 35min
Hyperscaling natural language processing
In this episode of the Data Exchange I speak with Edmon Begoli, Chief Data Architect at Oak Ridge National Laboratory (ORNL). Edmon has developed and implemented large-scale data applications on systems like Open MPI, Hadoop/MapReduce, Apache Calcite, Apache Spark, and Akka. Most recently he has been building large-scale machine learning and natural language applications with Ray, a distributed execution framework that makes it easy to scale machine learning and Python applications.Our conversation included a range of topics, including:Edmon’s role at the ORNL and his experience building applications with Hadoop and Spark.What is distributed online learning?Why they started using Ray to build distributed online learning applications.Two important use cases: suicide prevention among US veterans and infectious disease surveillance.Detailed show notes can be found on The Data Exchange web site.Join Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and many other speakers at the first Ray Summit In San Francisco, May 27-28. Tickets start at $200.


