
791: Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Insights on Reinforcement Learning and Robotics Developments
This chapter delves into how DDPO facilitates intent alignment in smaller models for improved benchmarks, outperforming llama in some areas. It also discusses the importance of dexterity, ambi, and covariant in robotics and how RLAIF efficiently scales up fine tuning while rectifying pre-training biases for a positive social impact.
Play episode from 54:34
Transcript


