AXRP - the AI X-risk Research Podcast

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

106 snips

Apr 11, 2024

Buck Shlegeris and Ryan Greenblatt discuss AI control mechanisms in preventing AI from taking over the world. They cover topics such as protocols for AI control, preventing dangerous coded AI communication, unpredictably uncontrollable AI, and the impact of AI control on the AI safety field.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Trusted Model Monitors Smarter AI

Use a trusted, less capable model to monitor actions of smarter AIs for sketchy behavior before executing their commands.
Report suspicious actions to humans for review and escalate if deemed problematic to prevent unsafe behavior.

INSIGHT

Human-Level AI Are Likely Controllable

AI systems roughly human-level in capabilities and behavior are likely controllable with current techniques.
Humans themselves cause problems, but can still be controlled via monitoring and constraints similar to AI control.

ADVICE

Restrict AI Communication Channels

Restrict AI communication to natural language and paraphrase messages using a weaker model to prevent encoded covert communication.
Have humans review messages randomly to detect suspicious encoded messages and reduce covert communication risk.

Get the Snipd Podcast app to discover more snips from this episode