
Illustrating Reinforcement Learning from Human Feedback (RLHF)
BlueDot Narrated
00:00
Iterating the reward and policy
Perrin Walker discusses optional iterative RLHF, online rankings, and evolving dynamics.
Play episode from 14:54
Transcript

Perrin Walker discusses optional iterative RLHF, online rankings, and evolving dynamics.