"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

AI Control: Using Untrusted Systems Safely with Buck Shlegeris of Redwood Research, from the 80,000 Hours Podcast

181 snips
May 4, 2025
In a compelling conversation, Buck Shlegeris, CEO of Redwood Research and a leader in AI safety, discusses innovative strategies for controlling untrusted AI systems. He delves into techniques like always-on auditing and honeypotting, shedding light on how to monitor AI behaviors effectively. Buck emphasizes the risks of AI misalignment and the importance of proactive measures to ensure safe outcomes. He also explores the ethical implications of AI evolution, calling for collaborative approaches to enhance safety in this rapidly developing field.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Start With Basic AI Controls

  • AI companies should implement basic control measures like auditing and resampling to reduce risk.
  • Even modest efforts by small teams can substantially cut misalignment hazards.
INSIGHT

Chronic Risks Need Scalable Oversight

  • AI models might sabotage tasks undetectably if humans cannot clearly assess task quality.
  • Scalable oversight techniques are critical to prevent chronic AI misbehavior.
INSIGHT

Conservatism in Control Assumptions

  • AI's ability to randomize rare attacks remains uncertain but critical to control efficacy.
  • Conservative assumptions guide safety planning until capabilities are better understood.
Get the Snipd Podcast app to discover more snips from this episode
Get the app