The Nonlinear Library

LW - Testbed evals: evaluating AI safety even when it can't be directly measured by joshc

4 snips
Nov 15, 2023
In this podcast, they discuss evaluating AI safety in hard-to-measure domains using the GENIES benchmark. They propose using AI alignment techniques to solve analogous problems to assess safety. They explore examples like controlling generalization across different distribution shifts and identifying deceptive behaviors. The podcast emphasizes the importance of measuring the effectiveness of AI safety researchers and their tools, drawing parallels with testing aerospace components in controlled environments.
Ask episode
Chapters
Transcript
Episode notes