Super Data Science: ML & AI Podcast with Jon Krohn cover image

791: Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert

Super Data Science: ML & AI Podcast with Jon Krohn

00:00

Insights on Reinforcement Learning and Robotics Developments

This chapter delves into how DDPO facilitates intent alignment in smaller models for improved benchmarks, outperforming llama in some areas. It also discusses the importance of dexterity, ambi, and covariant in robotics and how RLAIF efficiently scales up fine tuning while rectifying pre-training biases for a positive social impact.

Play episode from 54:34
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app