Super Data Science: ML & AI Podcast with Jon Krohn cover image

791: Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert

Super Data Science: ML & AI Podcast with Jon Krohn

00:00

Challenges of Aligning Human Preferences in Reinforcement Learning from Human Feedback

This chapter delves into the difficulties of matching human preferences with reward models in RLHF, touching on topics like The Alignment Ceiling, model-based RL vs. RLHF, constitutional AI, synthesizing preference data, and managing subjective disagreements among labelers in aggregated human preferences.

Play episode from 22:37
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app