"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

AI Control: Using Untrusted Systems Safely with Buck Shlegeris of Redwood Research, from the 80,000 Hours Podcast

181 snips

May 4, 2025

In a compelling conversation, Buck Shlegeris, CEO of Redwood Research and a leader in AI safety, discusses innovative strategies for controlling untrusted AI systems. He delves into techniques like always-on auditing and honeypotting, shedding light on how to monitor AI behaviors effectively. Buck emphasizes the risks of AI misalignment and the importance of proactive measures to ensure safe outcomes. He also explores the ethical implications of AI evolution, calling for collaborative approaches to enhance safety in this rapidly developing field.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Start With Basic AI Controls

AI companies should implement basic control measures like auditing and resampling to reduce risk.
Even modest efforts by small teams can substantially cut misalignment hazards.

INSIGHT

Chronic Risks Need Scalable Oversight

AI models might sabotage tasks undetectably if humans cannot clearly assess task quality.
Scalable oversight techniques are critical to prevent chronic AI misbehavior.

INSIGHT

Conservatism in Control Assumptions

AI's ability to randomize rare attacks remains uncertain but critical to control efficacy.
Conservative assumptions guide safety planning until capabilities are better understood.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

In this episode, we share a fascinating conversation from the 80,000 Hours Podcast between Rob Wiblin and Buck Shlegeris, CEO of Redwood Research. Buck dives deep into the emerging field of AI Control strategies for safely working with powerful AIs even if they’re not fully aligned. They explore innovative techniques like always-on auditing, honeypotting, re-sampling, and factored cognition to monitor and manage AI behaviors. This discussion highlights both the promise and challenges of controlling increasingly autonomous AI systems in today’s fast-evolving landscape. Tune in for a thoughtful, first-principles look at how we might secure useful outcomes from AIs we can’t fully trust.

Original source: https://80000hours.org/podcast/episodes/buck-shlegeris-ai-control-scheming/ Upcoming Major AI Events Featuring Nathan Labenz as a Keynote Speaker https://www.imagineai.live/ https://adapta.org/adapta-summit https://itrevolution.com/product/enterprise-tech-leadership-summit-las-vegas/

SPONSORS:

ElevenLabs: ElevenLabs gives your app a natural voice. Pick from 5,000+ voices in 31 languages, or clone your own, and launch lifelike agents for support, scheduling, learning, and games. Full server and client SDKs, dynamic tools, and monitoring keep you in control. Start free at https://elevenlabs.io/cognitive-revolution

Oracle Cloud Infrastructure (OCI): Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive

Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive

NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Website: https://www.cognitiverevolution.ai

Twitter (Podcast): https://x.com/cogrev_podcast

Twitter (Nathan): https://x.com/labenz

LinkedIn: https://linkedin.com/in/nathanlabenz/

Youtube: https://youtube.com/@CognitiveRevolutionPodcast

Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431

Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk