
Illustrating Reinforcement Learning from Human Feedback (RLHF)
BlueDot Narrated
00:00
Three core RLHF training steps
Perrin Walker outlines pre-training, reward model data collection, and RL fine-tuning stages.
Play episode from 03:17
Transcript


