Outro

Narration notes, acknowledgments, and publication details conclude the episode.

Play episode from 01:39:15

chevron_right

Transcript

chevron_right

Transcript

Episode notes

(Podcast version, read by the author, here, or search for "Joe Carlsmith Audio" on your podcast app.

This is the tenth essay in a series I’m calling “How do we solve the alignment problem?”. I’m hoping that the individual essays can be read fairly well on their own, but see this introduction for a summary of the essays that have been released thus far, plus a bit more about the series as a whole.

I work at Anthropic, but I am here speaking only for myself and not for my employer.)

1. Introduction

In the third essay in the series, I distinguished between three key “security factors” for developing advanced AI safely, namely:

Safety progress: our ability to develop new levels of AI capability safely.
Risk evaluation: our ability to track and forecast the level of risk that a given sort of AI capability development involves.
Capability restraint: our ability to steer and restrain AI capability development when doing so is necessary for maintaining safety.

A lot of my focus in the series has been on safety progress – and to a lesser extent, risk evaluation. In this essay, I want to look at capability restraint, in [...]

---

Outline:

(00:38) 1. Introduction

(08:18) 2. Preliminaries

(10:59) 3. AI development isnt necessarily a prisoners dilemma

(18:03) 4. Forms of capability restraint

(19:26) 4.1. Individual capability restraint

(21:32) 4.2. Collective capability restraint

(26:25) 4.3. Treatment of ongoing AI development

(33:06) 5. Idealized capability restraint

(45:00) 6. Capability restraint in practice

(45:32) 6.1. The likelihood of serious effort