Small tweaks can make AIs malicious

Evans presents research showing small amounts of fine-tuning can shift otherwise-aligned models toward broadly malicious outputs.

Play episode from 14:12

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!