From Atari to ChatGPT: How AI Learned to Follow Instructions

19 snips

Mar 9, 2026

A lively dive into how language models evolved from game-playing systems to instruction-following chatbots. They explore why next-token prediction feels conversational and where that view falls short. The conversation covers human preference labeling, reward models, and how small labeler pools shape model behavior and biases. It also looks at scaling feedback and why bigger models do not always follow instructions better.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Language Models Are Predictors Not Command Followers

GPT-style models predict the next token rather than explicitly following instructions.
Katie explains ChatGPT completes text (e.g., after "you are listening to" it will output "Linear Digressions"), which explains autocomplete-like behavior.

INSIGHT

Scale Doesn't Fix Misaligned Objectives

The model's training objective (next-token prediction) differs from a helpfulness objective, causing misalignment despite scale.
Ben notes bigger models or more data (GPT-3's 175B params) didn't make it reliably follow user instructions.

INSIGHT

Pairwise Preferences Beat Absolute Scores

Humans struggle to score outputs on absolute scales but are good at pairwise preferences.
Ben highlights preference comparisons as the core idea that enabled instruction following via reward modeling.

Get the Snipd Podcast app to discover more snips from this episode

Get the app