The Nonlinear Library

AF - Untrusted smart models and trusted dumb models by Buck Shlegeris

Nov 4, 2023
Buck Shlegeris, writer and contributor to The AI Alignment Forum, discusses the importance of capability evaluations in determining trustworthy AI models. He suggests segregating models into smart untrusted and dumb trusted categories for safety. The podcast delves into the challenges of using trusted models for safety research and the need for monitoring and oversight in AI integration.
Ask episode
Chapters
Transcript
Episode notes