The Nonlinear Library

LW - The Shutdown Problem: Incomplete Preferences as a Solution by EJT

Feb 23, 2024
Exploring the concept of incomplete preferences in AI alignment to address the shutdown problem. Training agents without specific preferences to avoid unintended consequences. Analyzing shutdown decisions in AI agents and the importance of distinguishing preferences. Training artificial agents using the Incomplete Preferences Principle (IPP) to avoid problems like reward mis-specification. Addressing incomplete preferences in agent training to prevent issues like goal misgeneralization. Delving into agent behavior, preferences, and constraints in decision-making processes.
Ask episode
Chapters
Transcript
Episode notes