The Data Exchange with Ben Lorica

Ben Lorica

A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].

Episodes

Mentioned books

Feb 27, 2020 • 36min

What businesses need to know about model explainability

In this episode of the Data Exchange I speak with Krishna Gade, founder and CEO at Fiddler Labs, a startup focused on helping companies build trustworthy and understandable AI solutions. Prior to founding Fiddler, Krishna led engineering teams at Pinterest and Facebook.Our conversation included a range of topics, including:Krishna’s background as an engineering manager at Facebook and Pinterest.Why Krishna decided to start a company focused on explainability.Guidelines for companies who want to begin working on incorporating model explainability into their data products.The relationship between model explainability (transparency) and security (ML that can resist adversarial attacks).Detailed show notes can be found on The Data Exchange web site.Join Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and many other speakers at the first Ray Summit In San Francisco, May 27-28. Tickets start at $200.

Feb 20, 2020 • 36min

Scalable Machine Learning, Scalable Python, For Everyone

In this episode of the Data Exchange I speak with Dean Wampler, Head of Developer Relations at Anyscale, the startup founded by the creators of Ray. Ray is a distributed execution framework that makes it easy to scale machine learning and Python applications. It has a very simple API and as someone who uses both Python and machine learning, Ray has been a wonderful addition to my toolbox. Dean has long been one of my favorite architects, speakers and teachers, and we have known each other since the early days of Apache Spark. He has authored numerous books and is known for his interest in Scala and programming languages, as well as in software architecture.Our conversation spanned many topics, including:What is Ray and why should someone consider using it?The first Ray Summit (May 27-28 in San Francisco)Dean’s first impressions of Ray, and his journey from Scala to Python.An update on Ray’s core libraries, Ray on Windows, and distributed training with Ray.Detailed show notes can be found on The Data Exchange web site.For more on Ray and scalable machine learning & Python, come hear from Dean Wampler, Michael Jordan, Ion Stoica, Manuela Veloso, Wes McKinney and many other leading developers and researchers at the first Ray Summit in San Francisco (May 27-28).

Feb 13, 2020 • 34min

Computational humanness, analogy and innovation, and soft concepts

In this episode of the Data Exchange I speak with Dafna Shahaf, Associate Professor at the School of Computer Science and Engineering, the Hebrew University of Jerusalem. She also runs the hyadata lab, a research group that consistently produces unique and interesting projects at the intersection of computer science, data, and the social sciences.Our conversation included a range of topics, including:Computational analogy: Dafna and her students mine online sources like patent filings, research papers, and data from crowdsourcing platforms focused on innovation, and in the process they produce tools that should be of interest to innovation officers and members of innovation labs.Soft Concepts: Dafna has continued her work on computational humor, and along with her students, they have new tools for automatically finding trivia facts in Wikipedia.An upcoming workshop on Innovative Ideas in Data Science (April 20th in Taipei; the deadline to submit proposals is: 21 February 2020).Detailed show notes can be found on The Data Exchange web site.

Feb 6, 2020 • 33min

Building domain specific natural language applications

In this episode of the Data Exchange I speak with David Talby, co-creator of Spark NLP, an open source, highly scalable, production grade natural language processing (NLP) library. Spark NLP has become one of the more popular NLP libraries and is available on PyPI, Conda, Maven, and Spark Packages. With recent advances in research in large-scale natural language models, there is strong interest in domain specific natural language applications. Besides their work on Spark NLP, David and his collaborators are building natural language models tuned specifically for healthcare applications.Our conversation spanned many topics, including:Spark NLP: its current status and some common and surprising use cases.Recent developments in NLP research and their implications for companies.Spark NLP for HealthcareDetailed show notes can be found on The Data Exchange web site.

Jan 30, 2020 • 42min

The state of privacy-preserving machine learning

In this episode of the Data Exchange I speak with Morten Dahl, research scientist at Dropout Labs, a startup building a platform and tools for privacy-preserving machine learning. He is also behind TF Encrypted, an open source framework for encrypted machine learning in TensorFlow. The rise of privacy regulations like CCPA and GDPR combined with the growing importance of ML has led to a strong interest in tools and techniques for privacy-preserving machine learning among researchers and practitioners. Morten brings the unique perspective of being a longtime security researcher who has also worked as a data scientist in industry.Our conversation spanned many topics, including:Morten’s unique background as an experienced security researcher, developer, and data scientist.The current state of TF Encrypted.Federated learning (FL) and secure aggregation for FL.Privacy-preserving ML solutions will employ a variety of techniques, and thus we also discussed related topics such as differential privacy, homomorphic encryption, and RISELab’s stack for coopetitive learning (MC2).Detailed show notes can be found on The Data Exchange web site.

Jan 23, 2020 • 38min

Taking messaging and data ingestion systems to the next level

Sijie Guo on how Apache Pulsar is able to handle both queuing and streaming, and both online and offline applications.In this episode of the Data Exchange I speak with Sijie Guo, founder of StreamNative, a new startup focused on making enterprise messaging technologies - specifically Apache Pulsar - easy to use on the cloud. Sijie was previously a cofounder of Streamlio (acquired by Splunk) and prior to that he led the messaging team at Twitter. He is also the main organizer behind the Pulsar Summit (April in San Francisco), a new conference whose Call for Speakers closes on January 31st. Our conversation spanned many topics, including:The role of messaging in modern data applications and platforms.The two main types of messaging applications: queuing and streaming.Apache Pulsar as a unified messaging platform, able to handle both queuing and streaming, and both online and offline applications.A status update on Apache Pulsar.Detailed show notes can be found on The Data Exchange web site.

Jan 16, 2020 • 41min

Business at the speed of AI: Lessons from Rakuten

The Data Exchange Podcast: Bahman Bahmani on attracting and retaining talent, and the importance of delivery-oriented teams.In this episode of the Data Exchange I speak with Bahman Bahmani, VP of Data Science and Engineering at Rakuten, a large Japanese ecommerce and online retail company. When I first met Bahman several years ago, he was finishing up his Computer Science PhD at Stanford, and at the time he was giving technical talks on machine learning algorithms and their applications to computer security. Today he leads a large team at Rakuten, and in my opinion he has established an organizational structure, processes and an AI practice that other companies should study.Our conversation spanned many topics, including:The impact that AI, machine learning, and data have had on Rakuten’s businesses.Attracting, nurturing, and retaining talent in an environment when data scientists, data engineers, and analysts who all have many other options.The trio of strategic options: operational excellence, product leadership, customer intimacy.Organization and culture, including key roles within an AI practice.The power of delivery-oriented teams with end-to-end responsibility.Detailed show notes can be found on The Data Exchange web site.

Jan 9, 2020 • 30min

The combination of the right software and commodity hardware will prove capable of handling most machine learning tasks

In this episode of the Data Exchange I speak with Nir Shavit, Professor of EECS at MIT, and cofounder and CEO of Neural Magic, a startup that is creating software to enable deep neural networks to run on commodity CPUs (at GPU speeds or faster). Their initial products are focused on model inference, but they are also working on similar software for model training.Our conversation spanned many topics, including:Neurobiology, in particular the combination of Nir’s research areas of multicore software and connectomics – a branch of neurobiology.Why he believes the combination of the right software and CPUs will prove capable of handling many deep learning tasks.Speed is not the only factor: the “unlimited memory” of CPUs are able to unlock larger problems and architectures.Neural Magic’s initial offering is in inference, model training using CPUs is also on the horizon.Detailed show notes can be found on The Data Exchange web site.

Dec 26, 2019 • 36min

Key AI and Data Trends for 2020

In this episode of the Data Exchange, I speak with my podcast co-organizer Mikio Braun, data scientist at GetYourGuide, and a former machine learning researcher and data architect. Mikio and I go out on a limb and speculate about new trends in AI and Data that we think people should pay attention to in 2020.Our conversation spanned many topics, and we listed trends in:Models: reinforcement learning, deep learning, language models, and related topics.Applications: including emerging use cases for reinforcement learning.Infrastructure and Tools: end-to-end machine learning platforms, the importance of distributed computing, etc.Managing risks: privacy, security, safety, fairness, etc.Emerging technologies to watch for in 2020.Detailed show notes can be found on The Data Exchange web site.

Dec 12, 2019 • 36min

The evolution of TensorFlow and of machine learning infrastructure

In this episode of the Data Exchange I speak with Rajat Monga, one of the founding members of the TensorFlow Engineering team. Up until recently Rajat was the engineering manager for TensorFlow at Google. Our conversation spanned many topics, including:TFX, a production scale machine learning platform based on TensorFlow.Distributed training.MLIR (Multi-Level Intermediate Representation), “a representation format and library of compiler utilities that sits between the model representation and low-level compilers/executors that generate hardware-specific code.”Deep learning in the enterprise.The state of machine learning infrastructure.[full show notes can be found on the Data Exchange web site.]

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner