“On restraining AI development for the sake of safety” by Joe Carlsmith

Mar 20, 2026

Joe Carlsmith, an AI alignment researcher who wrote the featured essay, discusses capability restraint for preventing loss-of-control risks. He explains why pause options matter, contrasts individual vs collective restraint, and analyzes practicality of compute and algorithmic governance. Short takes cover coordination problems, greenlighting dilemmas, and how restraint could backfire.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

AI Race Incentives Can Be A Stag Hunt

AI races can resemble a stag hunt rather than a prisoner's dilemma, so mutual slowing can be stable if others also slow.
Carlsmith shows path dependence: if rivals go slow, you often prefer to go slow; if they rush, you prefer to rush.

ADVICE

Combine Individual Caution With Domestic Regulation

Use a mix of individual and collective restraint: companies can burn leads or drop out while regulators and norms handle collective slowdown.
Carlsmith recommends domestic regulation to coordinate firms within countries as a practical lever.

INSIGHT

Restraint Must Allow Safety Work And Approvals

Capability restraint should handle red lighting, safety progress, green lighting, and benign apps simultaneously.
Carlsmith cautions pure indefinite shutdowns ignore the need to allow alignment research and staged approvals.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

(Podcast version, read by the author, here, or search for "Joe Carlsmith Audio" on your podcast app.

This is the tenth essay in a series I’m calling “How do we solve the alignment problem?”. I’m hoping that the individual essays can be read fairly well on their own, but see this introduction for a summary of the essays that have been released thus far, plus a bit more about the series as a whole.

I work at Anthropic, but I am here speaking only for myself and not for my employer.)

1. Introduction

In the third essay in the series, I distinguished between three key “security factors” for developing advanced AI safely, namely:

Safety progress: our ability to develop new levels of AI capability safely.
Risk evaluation: our ability to track and forecast the level of risk that a given sort of AI capability development involves.
Capability restraint: our ability to steer and restrain AI capability development when doing so is necessary for maintaining safety.

A lot of my focus in the series has been on safety progress – and to a lesser extent, risk evaluation. In this essay, I want to look at capability restraint, in [...]

---

Outline:

(00:38) 1. Introduction

(08:18) 2. Preliminaries

(10:59) 3. AI development isnt necessarily a prisoners dilemma

(18:03) 4. Forms of capability restraint

(19:26) 4.1. Individual capability restraint

(21:32) 4.2. Collective capability restraint

(26:25) 4.3. Treatment of ongoing AI development

(33:06) 5. Idealized capability restraint

(45:00) 6. Capability restraint in practice

(45:32) 6.1. The likelihood of serious effort

(53:57) 6.2. The efficacy of capability restraint

(55:40) 6.2.1. Compute governance

(58:05) 6.2.2. Algorithmic governance

(01:04:12) 6.2.3. Greenlighting and safety progress

(01:10:44) 6.3. Ways that capability restraint could end up net negative

(01:11:53) 6.3.1. Concentrations of power

(01:17:17) 6.3.2. Ceding competitive advantage to authoritarian countries

(01:19:29) 6.3.3. Other concerns

(01:27:13) 7. Prioritizing capability restraint relative to other security factors

(01:29:43) 8. Conclusion

(01:31:19) Appendix 1: What are we using the time for?

The original text contained 40 footnotes which were omitted from this narration.

---

First published:
March 19th, 2026

Source:
https://www.lesswrong.com/posts/K8jyKcDQbfBjmiAoM/on-restraining-ai-development-for-the-sake-of-safety

---

Narrated by TYPE III AUDIO.