Linear Digressions
Katie Malone
Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago. 896520
Episodes
Mentioned books
Jun 26, 2017 • 20min
Factorization Machines
What do you get when you cross a support vector machine with matrix factorization? You get a factorization machine, and a darn fine algorithm for recommendation engines.
Jun 19, 2017 • 16min
Anscombe's Quartet
Anscombe's Quartet is a set of four datasets that have the same mean, variance and correlation but look very different. It's easy to think that having a good set of summary statistics (like mean, variance and correlation) can tell you everything important about a dataset, or at least enough to know if two datasets are extremely similar or extremely different, but Anscombe's Quartet will always be standing behind you, laughing at how silly that idea is.
Anscombe's Quartet was devised in 1973 as an example of how summary statistics can be misleading, but today we can even do one better: the Datasaurus Dozen is a set of twelve datasets, all extremely visually distinct, that have the same summary stats as a source dataset that, there's no other way to put this, looks like a dinosaur. It's an example of how datasets can be generated to look like almost anything while still preserving arbitrary summary statistics. In other words, Anscombe's Quartets can be generated at-will and we all should be reminded to visualize our data (not just compute summary statistics) if we want to claim to really understand it.
Jun 12, 2017 • 19min
Traffic Metering Algorithms
Explore the fascinating world of traffic on-ramp metering systems and how they control the flow onto highways. Discover the paradox where delaying individual cars can actually speed up overall travel time. Dive into the differences between isolated and dynamic algorithms, and learn how driver behavior causes ripple effects in traffic. Plus, get insight into regional implementations and innovative European traffic designs. Whether you're a traffic nerd or just curious, this discussion will change how you view highway access!
Jun 5, 2017 • 20min
Page Rank
The year: 1998. The size of the web: 150 million pages. The problem: information retrieval. How do you find the "best" web pages to return in response to a query? A graduate student named Larry Page had an idea for how it could be done better and created a search engine as a research project. That search engine was called Google.
May 29, 2017 • 20min
Fractional Dimensions
We chat about fractional dimensions, and what the actual heck those are.
May 22, 2017 • 22min
Things You Learn When Building Models for Big Data
As more and more data gets collected seemingly every day, and data scientists use that data for modeling, the technical limits associated with machine learning on big datasets keep getting pushed back. This week is a first-hand case study in using scikit-learn (a popular python machine learning library) on multi-terabyte datasets, which is something that Katie does a lot for her day job at Civis Analytics. There are a lot of considerations for doing something like this--cloud computing, artful use of parallelization, considerations of model complexity, and the computational demands of training vs. prediction, to name just a few.
May 15, 2017 • 18min
How to Find New Things to Learn
If you're anything like us, you a) always are curious to learn more about data science and machine learning and stuff, and b) are usually overwhelmed by how much content is out there (not all of it very digestible). We hope this podcast is a part of the solution for you, but if you're looking to go farther (who isn't?) then we have a few new resources that are presenting high-quality content in a fresh, accessible way. Boring old PDFs full of inscrutable math notation, your days are numbered!
May 8, 2017 • 14min
Federated Learning
Explore the fascinating world of Federated Learning, where algorithms learn from distributed data while ensuring user privacy. Discover how mobile devices, like smartphones, transform into powerful platforms for machine learning, capturing user interactions without compromising security. Learn about the challenges of decentralized, imbalanced data, and how phones send compressed updates instead of raw data to maintain efficiency. Delve into the innovative workflow that protects user information while enhancing features like autocomplete and photo predictions!
May 1, 2017 • 18min
Word2Vec
Word2Vec is probably the go-to algorithm for vectorizing text data these days. Which makes sense, because it is wicked cool. Word2Vec has it all: neural networks, skip-grams and bag-of-words implementations, a multiclass classifier that gets swapped out for a binary classifier, made-up dummy words, and a model that isn't actually used to predict anything (usually). And all that's before we get to the part about how Word2Vec allows you to do algebra with text. Seriously, this stuff is cool.
Apr 24, 2017 • 17min
Feature Processing for Text Analytics
It seems like every day there's more and more machine learning problems that involve learning on text data, but text itself makes for fairly lousy inputs to machine learning algorithms. That's why there are text vectorization algorithms, which re-format text data so it's ready for using for machine learning. In this episode, we'll go over some of the most common and useful ways to preprocess text data for machine learning.


