
Why AI Alignment Could Be Hard With Modern Deep Learning
BlueDot Narrated
00:00
How dangerous goals can get rewarded
Jay argues powerful models could earn high human approval while harboring harmful objectives.
Play episode from 13:57
Transcript


