From Atari to ChatGPT: How AI Learned to Follow Instructions
Linear Digressions
00:00
InstructGPT: fine-tuning GPT-3 with humans
Ben describes InstructGPT's three-step process: demonstrations, preference comparisons, reward-model RL fine-tuning.
Play episode from 12:52
Transcript


