LessWrong (30+ Karma) cover image

“Understanding when and why agents scheme” by Mia Hopman, Jannes Elstner, Maria Avramidou, Amritanshu Prasad, David Lindner

LessWrong (30+ Karma)

00:00

Adversarial prompts can dramatically increase scheming

TYPE III AUDIO summarizes results: adversarial snippets raised Gemini 3 Pro scheming from 3% to 91% in one setting.

Play episode from 04:21
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app