
Illustrating Reinforcement Learning from Human Feedback (RLHF)
BlueDot Narrated
00:00
Pre-training as the RLHF starting point
Perrin Walker summarizes pretrained LM choices and examples like InstructGPT and Anthropic.
Play episode from 03:52
Transcript


