Linear Digressions

Katie Malone
undefined
Nov 28, 2016 • 13min

Recurrent Neural Nets

This week, we're doing a crash course in recurrent neural networks--what the structural pieces are that make a neural net recurrent, how that structure helps RNNs solve certain time series problems, and the importance of forgetfulness in RNNs. Relevant links: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
undefined
Nov 21, 2016 • 17min

Stealing a PIN with signal processing and machine learning

Want another reason to be paranoid when using the free coffee shop wifi? Allow us to introduce WindTalker, a system that cleverly combines a dose of signal processing with a dash of machine learning to (potentially) steal the PIN from your phone transactions without ever having physical access to your phone. This episode has it all, folks--channel state information, ICMP echo requests, low-pass filtering, PCA, dynamic time warps, and the PIN for your phone.
undefined
Nov 14, 2016 • 16min

Neural Net Cryptography

Cryptography used to be the domain of information theorists and spies. There's a new player now: neural networks. Given the task of communicating securely, neural networks are inventing new encryption methods that, as best we can tell, are unlike anything humans have ever seen before. Relevant links: http://arstechnica.co.uk/information-technology/2016/10/google-ai-neural-network-cryptography/ https://arxiv.org/pdf/1610.06918v1.pdf
undefined
Nov 7, 2016 • 20min

Deep Blue

In 1997, Deep Blue was the IBM algorithm/computer that did what no one, at the time, though possible: it beat the world's best chess player. It turns out, though, that one of the most important moves in the matchup, where Deep Blue psyched out its opponent with a weird move, might not have been so inspired after all. It might have been nothing more than a bug in the program, and it changed computer science history. Relevant links: https://www.wired.com/2012/09/deep-blue-computer-bug/
undefined
Oct 31, 2016 • 15min

Organizing Google's Datasets

If you're a data scientist, there's a good chance you're used to working with a lot of data. But there's a lot of data, and then there's Google-scale amounts of data. Keeping all that data organized is a Google-sized task, and as it happens, they've built a system for that organizational challenge. This episode is all about that system, called Goods, and in particular we'll dig into some of the details of what makes this so tough. Relevant links: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45390.pdf
undefined
Oct 24, 2016 • 26min

Fighting Cancer with Data Science: Followup

A few months ago, Katie started on a project for the Vice President's Cancer Moonshot surrounding how data can be used to better fight cancer. The project is all wrapped up now, so we wanted to tell you about how that work went and what changes to cancer data policy were suggested to the Vice President. See lineardigressions.com for links to the reports discussed on this episode.
undefined
Oct 17, 2016 • 12min

The 19-year-old determining the US election

Sick of the presidential election yet? We are too, but there's still almost a month to go, so let's just embrace it together. This week, we'll talk about one of the presidential polls, which has been kind of an outlier for quite a while. This week, the NY Times took a closer look at this poll, and was able to figure out the reason it's such an outlier. It all goes back to a 19-year-old African American man, living in Illinois, who really likes Donald Trump... Relevant Links: http://www.nytimes.com/2016/10/13/upshot/how-one-19-year-old-illinois-man-is-distorting-national-polling-averages.html followup article from LA Times, released after recording: http://www.latimes.com/politics/la-na-pol-daybreak-poll-questions-20161013-snap-story.html
undefined
Oct 9, 2016 • 14min

How to Steal a Model

What does it mean to steal a model? It means someone (the thief, presumably) can re-create the predictions of the model without having access to the algorithm itself, or the training data. Sound far-fetched? It isn't. If that person can ask for predictions from the model, and he (or she) asks just the right questions, the model can be reverse-engineered right out from under you. Relevant links: https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf
undefined
Oct 3, 2016 • 17min

Regularization

Lots of data is usually seen as a good thing. And it is a good thing--except when it's not. In a lot of fields, a problem arises when you have many, many features, especially if there's a somewhat smaller number of cases to learn from; supervised machine learning algorithms break, or learn spurious or un-interpretable patterns. What to do? Regularization can be one of your best friends here--it's a method that penalizes overly complex models, which keeps the dimensionality of your model under control.
undefined
Sep 26, 2016 • 16min

The Cold Start Problem

You might sometimes find that it's hard to get started doing something, but once you're going, it gets easier. Turns out machine learning algorithms, and especially recommendation engines, feel the same way. The more they "know" about a user, like what movies they watch and how they rate them, the better they do at suggesting new movies, which is great until you realize that you have to start somewhere. The "cold start" problem will be our focus in this episode, both the heuristic solutions that help deal with it and a bit of realism about the importance of skepticism when someone claims a great solution to cold starts. Relevant links: http://repository.upenn.edu/cgi/viewcontent.cgi?article=1141&context=cis_papers

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app