Eye On A.I. cover image

#324 Sharon Zhou: Inside AMD's Plan to Build Self-Improving AI

Eye On A.I.

00:00

Reinforcement Learning Beyond RLHF

Sharon outlines RLHF, PPO, GRPO, and RL with verifiable rewards for reasoning, code, and safety improvements.

Play episode from 29:55
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app