The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Single Headed Attention RNN: Stop Thinking With Your Head with Stephen Merity - #325

Dec 12, 2019

Stephen Merity, an NLP and deep learning researcher at DDX Times, shares insights into his innovative work on Single Headed Attention RNNs. He delves into his motivations for developing this model and contrasts it with conventional transformers. Merity emphasizes the importance of efficient model benchmarking, revealing how he made training accessible on a single GPU. He also discusses the significance of diversifying AI research, encouraging exploration beyond just large models. Plus, he reflects on the balance between academic writing and accessibility.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Sentiment Neuron Example

OpenAI's Sentiment Neuron, trained solely on predicting the next character in Amazon reviews, accurately identified review sentiment.
It could even generate positive or negative reviews, demonstrating a deeper grasp of language.

INSIGHT

Challenging Transformer Dominance

Stephen Merity's research aims to challenge the dominance of transformer architectures in language modeling.
He uses LSTMs to achieve similar results with fewer resources, increasing accessibility for researchers.

INSIGHT

Transformer Architecture

Transformers use attention mechanisms, allowing each word to consider all other words for context, unlike sequential RNNs/LSTMs.
This enables parallelization during training but requires significant compute for both training and inference.

Get the Snipd Podcast app to discover more snips from this episode

Get the app