LW - The Shutdown Problem: Incomplete Preferences as a Solution by EJT

Feb 23, 2024

Exploring the concept of incomplete preferences in AI alignment to address the shutdown problem. Training agents without specific preferences to avoid unintended consequences. Analyzing shutdown decisions in AI agents and the importance of distinguishing preferences. Training artificial agents using the Incomplete Preferences Principle (IPP) to avoid problems like reward mis-specification. Addressing incomplete preferences in agent training to prevent issues like goal misgeneralization. Delving into agent behavior, preferences, and constraints in decision-making processes.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 4min

Balancing Shutdown Ability and Usefulness in Artificial Agents

04:13 • 19min

Analyzing Shutdown Decisions in AI Agents

23:32 • 20min

Training Agents with Incomplete Preferences Principle

43:17 • 20min