joshc

AI safety researcher and author of the paper 'Generalization Analogies (GENIES): A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains.'

Best podcasts with joshc

Ranked by the Snipd community

Nov 15, 2023 • 8min

LW - Testbed evals: evaluating AI safety even when it can't be directly measured by joshc

In this podcast, they discuss evaluating AI safety in hard-to-measure domains using the GENIES benchmark. They propose using AI alignment techniques to solve analogous problems to assess safety. They explore examples like controlling generalization across different distribution shifts and identifying deceptive behaviors. The podcast emphasizes the importance of measuring the effectiveness of AI safety researchers and their tools, drawing parallels with testing aerospace components in controlled environments.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner