LessWrong (30+ Karma)

“LLM Misalignment Can be One Gradient Step Away, and Blackbox Evaluation Cannot Detect It.” by Yavuz Bakman

Mar 16, 2026
Ask episode
Chapters
Transcript
Episode notes