
791: Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Fine-Tuning AI Models and Safety Implications
Exploration of the fragility of reinforcement learning from human feedback (RLHF) models during fine-tuning, highlighting safety concerns for AI systems. The chapter emphasizes the challenges of ensuring safety in AI models, reflects on the impact of culture on AI development, and discusses the exponential growth of parameters in models like GPT-4.
Play episode from 34:19
Transcript


