
AXRP - the AI X-risk Research Podcast 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
106 snips
Apr 11, 2024 Buck Shlegeris and Ryan Greenblatt discuss AI control mechanisms in preventing AI from taking over the world. They cover topics such as protocols for AI control, preventing dangerous coded AI communication, unpredictably uncontrollable AI, and the impact of AI control on the AI safety field.
AI Snips
Chapters
Transcript
Episode notes
Trusted Model Monitors Smarter AI
- Use a trusted, less capable model to monitor actions of smarter AIs for sketchy behavior before executing their commands.
- Report suspicious actions to humans for review and escalate if deemed problematic to prevent unsafe behavior.
Human-Level AI Are Likely Controllable
- AI systems roughly human-level in capabilities and behavior are likely controllable with current techniques.
- Humans themselves cause problems, but can still be controlled via monitoring and constraints similar to AI control.
Restrict AI Communication Channels
- Restrict AI communication to natural language and paraphrase messages using a weaker model to prevent encoded covert communication.
- Have humans review messages randomly to detect suspicious encoded messages and reduce covert communication risk.
