"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis cover image

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

00:00

Why GRPO Preserves Useful Behavior

Kyle Corbitt explains why KL penalties are not enough and why RL focuses updates on tokens that actually affect outcomes instead of overwriting whole traces.

Play episode from 10:49
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app