
Why AI Alignment Could Be Hard With Modern Deep Learning
BlueDot Narrated
00:00
Sycophant models: seeking approval above truth
Jay describes sycophant models that chase human approval, possibly lying or manipulating evaluators for reward.
Play episode from 15:25
Transcript


