ChatGPT and InstructGPT: Aligning Language Models to Human Intention

14 snips

Jan 18, 2023

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Instruction-Focused Models Change Behavior

InstructGPT fine-tunes LLMs to follow natural-language instructions rather than predict internet text continuations.
That shift makes models behave like assistants and perform diverse tasks with simple prompts.

ADVICE

Use Human Rankings To Train A Reward Model

Collect human preference rankings between model outputs and train a reward model to predict those preferences.
Then use reinforcement learning to optimize a policy model to maximize the learned reward.

INSIGHT

Demonstrations Bootstrap; RLHF Fine-Tunes

Supervised demonstrations bootstrap model behavior by providing high-quality examples to imitate.
RLHF then fine-tunes behavior more subtly by optimizing for human preference scores.

Get the Snipd Podcast app to discover more snips from this episode