
LessWrong (30+ Karma) “On restraining AI development for the sake of safety” by Joe Carlsmith
Mar 20, 2026
Joe Carlsmith, an AI alignment researcher who wrote the featured essay, discusses capability restraint for preventing loss-of-control risks. He explains why pause options matter, contrasts individual vs collective restraint, and analyzes practicality of compute and algorithmic governance. Short takes cover coordination problems, greenlighting dilemmas, and how restraint could backfire.
AI Snips
Chapters
Transcript
Episode notes
AI Race Incentives Can Be A Stag Hunt
- AI races can resemble a stag hunt rather than a prisoner's dilemma, so mutual slowing can be stable if others also slow.
- Carlsmith shows path dependence: if rivals go slow, you often prefer to go slow; if they rush, you prefer to rush.
Combine Individual Caution With Domestic Regulation
- Use a mix of individual and collective restraint: companies can burn leads or drop out while regulators and norms handle collective slowdown.
- Carlsmith recommends domestic regulation to coordinate firms within countries as a practical lever.
Restraint Must Allow Safety Work And Approvals
- Capability restraint should handle red lighting, safety progress, green lighting, and benign apps simultaneously.
- Carlsmith cautions pure indefinite shutdowns ignore the need to allow alignment research and staged approvals.

