The Nonlinear Library cover image

LW - The Shutdown Problem: Incomplete Preferences as a Solution by EJT

The Nonlinear Library

00:00

Training Agents with Incomplete Preferences Principle

The chapter explores training artificial agents using the Incomplete Preferences Principle (IPP) to avoid problems like reward mis-specification and goal misgeneralization. It contrasts the IPP with other proposals and discusses its effectiveness in aligning agent preferences. The chapter also addresses challenges such as deceptive alignment and the importance of ensuring agents can be shutdown and restarted if needed.

Play episode from 43:17
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app