Get the app
joshc
AI safety researcher and author of the paper 'Generalization Analogies (GENIES): A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains.'
Best podcasts with joshc
Ranked by the Snipd community
4 snips
Nov 15, 2023
• 8min
LW - Testbed evals: evaluating AI safety even when it can't be directly measured by joshc
chevron_right
In this podcast, they discuss evaluating AI safety in hard-to-measure domains using the GENIES benchmark. They propose using AI alignment techniques to solve analogous problems to assess safety. They explore examples like controlling generalization across different distribution shifts and identifying deceptive behaviors. The podcast emphasizes the importance of measuring the effectiveness of AI safety researchers and their tools, drawing parallels with testing aerospace components in controlled environments.
The AI-powered Podcast Player
Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
Get the app