Linear Digressions cover image

From Atari to ChatGPT: How AI Learned to Follow Instructions

Linear Digressions

00:00

Human preferences as a training signal

Ben traces the 2017 insight to use human pairwise preferences to teach agents what humans like.

Play episode from 05:28
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app