Linear Digressions

Katie Malone

Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago. 896520

Episodes

Mentioned books

Jan 5, 2020 • 25min

Interview with Prof. Andrew Lo, on using data science to inform complex business decisions

This episode features Prof. Andrew Lo, the author of a paper that we discussed recently on Linear Digressions, in which Prof. Lo uses data to predict whether a medicine in the development pipeline will eventually go on to win FDA approval. This episode gets into the story behind that paper: how the approval prospects of different drugs inform the investment decisions of pharma companies, how to stitch together siloed and incomplete datasts to form a coherent picture, and how the academics building some of these models think about when and how their work can make it out of academia and into industry. Professor Lo is an expert in business (he teaches at the MIT Sloan School of Management) and work like his shows how data science can open up new ways of doing business. Relevant links: https://hdsr.mitpress.mit.edu/pub/ct67j043

Dec 8, 2019 • 25min

Using machine learning to predict drug approvals

One of the hottest areas in data science and machine learning right now is healthcare: the size of the healthcare industry, the amount of data it generates, and the myriad improvements possible in the healthcare system lay the groundwork for compelling, innovative new data initiatives. One spot that drives much of the cost of medicine is the riskiness of developing new drugs: drug trials can cost hundreds of millions of dollars to run and, especially given that numerous medicines end up failing to get approval from the FDA, pharmaceutical companies want to have as much insight as possible about whether a drug is more or less likely to make it through clinical trials and on to approval. Professor Andrew Lo and collaborators at MIT Sloan School of Management is taking a look at this prediction task using machine learning, and has an article in the Harvard Data Science Review showing what they were able to find. It’s a fascinating example of how data science can be used to address business needs in creative but very targeted and effective ways. Relevant links: https://hdsr.mitpress.mit.edu/pub/ct67j043

Dec 2, 2019 • 43min

Facial recognition, society, and the law

Facial recognition being used in everyday life seemed far-off not too long ago. Increasingly, it’s being used and advanced widely and with increasing speed, which means that our technical capabilities are starting to outpace (if they haven’t already) our consensus as a society about what is acceptable in facial recognition and what isn’t. The threats to privacy, fairness, and freedom are real, and Microsoft has become one of the first large companies using this technology to speak out in specific support of its regulation through legislation. Their arguments are interesting, provocative, and even if you don’t agree with every point they make or harbor some skepticism, there’s a lot to think about in what they’re saying.

Nov 25, 2019 • 28min

Lessons learned from doing data science, at scale, in industry

If you’ve taken a machine learning class, or read up on A/B tests, you likely have a decent grounding in the theoretical pillars of data science. But if you’re in a position to have actually built lots of models or run lots of experiments, there’s almost certainly a bunch of extra “street smarts” insights you’ve had that go beyond the “books smarts” of more academic studies. The data scientists at Booking.com, who run build models and experiments constantly, have written a paper that bridges the gap and talks about what non-obvious things they’ve learned from that practice. In this episode we read and digest that paper, talking through the gotchas that they don’t always teach in a classroom but that make data science tricky and interesting in the real world. Relevant links: https://www.kdd.org/kdd2019/accepted-papers/view/150-successful-machine-learning-models-6-lessons-learned-at-booking.com

Nov 18, 2019 • 36min

Varsity A/B Testing

When you want to understand if doing something causes something else to happen, like if a change to a website causes and dip or rise in downstream conversions, the gold standard analysis method is to use randomized controlled trials. Once you’ve properly randomized the treatment and effect, the analysis methods are well-understood and there are great tools in R and python (and other languages) to find the effects. However, when you’re operating at scale, the logistics of running all those tests, and reaching correct conclusions reliably, becomes the main challenge—making sure the right metrics are being computed, you know when to stop an experiment, you minimize the chances of finding spurious results, and many other issues that are simple to track for one or two experiments but become real challenges for dozens or hundreds of experiments. Nonetheless, the reality is that there might be dozens or hundreds of experiments worth running. So in this episode, we’ll work through some of the most important issues for running experiments at scale, with strong support from a series of great blog posts from Airbnb about how they solve this very issue. For some blog post links relevant to this episode, visit lineardigressions.com

Nov 11, 2019 • 25min

The Care and Feeding of Data Scientists: Growing Careers

In the third and final installment of a conversation with Michelangelo D’Agostino, VP of Data Science and Engineering at Shoprunner, about growing and mentoring data scientists on your team. Some of our topics of conversation include how to institute hack time as a way to learn new things, what career growth looks like in data science, and how to institutionalize professional growth as part of a career ladder. As with the other episodes in this series, the topics we cover today are also covered in the O’Reilly report linked below. Relevant links: https://oreilly-ds-report.s3.amazonaws.com/Care_and_Feeding_of_Data_Scientists.pdf

Nov 4, 2019 • 20min

The Care and Feeding of Data Scientists: Recruiting and Hiring Data Scientists

This week’s episode is the second in a three-part interview series with Michelangelo D’Agostino, VP of Data Science at Shoprunner. This discussion centers on building a team, which means recruiting, interviewing and hiring data scientists. Since data science talent is in such high demand, and data scientists are understandably choosy about where they go to work, a good recruiting and hiring program can have a big impact on the size and quality of the team. Our chat covers much a couple of sections in our dual-authored O’Reilly report, “The Care and Feeding of Data Scientists,” which you can read at the link below. https://oreilly-ds-report.s3.amazonaws.com/Care_and_Feeding_of_Data_Scientists.pdf

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app