Deep Papers

ChatGPT and InstructGPT: Aligning Language Models to Human Intention

14 snips
Jan 18, 2023
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Instruction-Focused Models Change Behavior

  • InstructGPT fine-tunes LLMs to follow natural-language instructions rather than predict internet text continuations.
  • That shift makes models behave like assistants and perform diverse tasks with simple prompts.
ADVICE

Use Human Rankings To Train A Reward Model

  • Collect human preference rankings between model outputs and train a reward model to predict those preferences.
  • Then use reinforcement learning to optimize a policy model to maximize the learned reward.
INSIGHT

Demonstrations Bootstrap; RLHF Fine-Tunes

  • Supervised demonstrations bootstrap model behavior by providing high-quality examples to imitate.
  • RLHF then fine-tunes behavior more subtly by optimizing for human preference scores.
Get the Snipd Podcast app to discover more snips from this episode
Get the app