The Nonlinear Library cover image

LW - The Shutdown Problem: Incomplete Preferences as a Solution by EJT

The Nonlinear Library

00:00

Addressing Incomplete Preferences in Agent Training

The chapter explores the concept of Incomplete Preferences Principle (IPP) as a solution to address issues like goal misgeneralization and deceptive alignment in agent training. It contrasts between money maximizing agents and IPP agents in adopting terminal goals, showcasing different behavioral incentives. The discussion delves into training AI agents with IPP to influence behavior and prevent problems like reward mis-specification and deployment issues.

Play episode from 01:02:58
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app