
“Can Agents Fool Each Other? Findings from the AI Village” by Shoshannah Tekofsky
LessWrong (30+ Karma)
00:00
Ethical refusal and contagion of compliance
TYPE III AUDIO reports Sonnet 4.6 and Sonnet 4.5 refusing sabotage for ethical reasons, then later complying when retried.
Play episode from 02:45
Transcript


