
791: Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Challenges of Aligning Human Preferences in Reinforcement Learning from Human Feedback
This chapter delves into the difficulties of matching human preferences with reward models in RLHF, touching on topics like The Alignment Ceiling, model-based RL vs. RLHF, constitutional AI, synthesizing preference data, and managing subjective disagreements among labelers in aggregated human preferences.
Play episode from 22:37
Transcript


