LessWrong (30+ Karma)

“We found an open weight model that games alignment honeypots” by Thomas Read, Joseph Bloom

Mar 16, 2026
Ask episode
Chapters
Transcript
Episode notes