From Atari to ChatGPT: How AI Learned to Follow Instructions
Linear Digressions
00:00
Applying preference learning to text tasks
Ben outlines applying human preferences to GPT-2 for stylistic edits and summarization, improving over supervised methods.
Play episode from 11:23
Transcript


