His P(Doom) Is Only 2.6% — AI Doom Debate with Bentham's Bulldog, a.k.a. Matthew Adelstein

14 snips

Feb 10, 2026

Matthew Adelstein (Bentham's Bulldog), a philosopher and Substack writer on AI risk, defends a P(Doom) of just 2.6% using a multi-step probability chain. They spar over alignment-by-default, the “goal engine” versus goal-wrapping debate, the risk of exfiltration and unstoppable agents, and whether current RLHF success predicts safe future systems. The discussion closes on shared policy ideas like possible global pauses.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ANECDOTE

Started Writing Bentham’s Bulldog Young

Matthew describes starting Bentham's Bulldog as a teenager and writing daily since high school.
He attributes productivity to opening a document and writing for a few hours rather than over-editing.

ANECDOTE

How He Got 2.6% PDOOM

Adelstein walks through his five-step PDOOM multiplication that yields 2.6%, showing his explicit numeric example.
He narrates multiplying 0.9×0.3×0.3×0.4×0.8 to reach 0.026 as an illustrative calculation.

INSIGHT

Failed Doom Warnings Inform Priors

Adelstein likens repeated failed doom predictions to higher-order evidence we overcount threat intuitions.
He argues surviving prior scares should modestly reduce our initial doom credence absent decisive new evidence.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Get ready for a rematch with the one & only Bentham’s Bulldog, a.k.a. Matthew Adelstein! Our first debate covered a wide range of philosophical topics.

Today’s Debate #2 is all about Matthew’s new argument against the inevitability of AI doom. He comes out swinging with a calculated P(Doom) of just 2.6% , based on a multi-step probability chain that I challenge as potentially falling into a “Type 2 Conjunction Fallacy” (a.k.a. Multiple Stage Fallacy).

We clash on whether to expect “alignment by default” and the nature of future AI architectures. While Matthew sees current RLHF success as evidence that AIs will likely remain compliant, I argue that we’re building “Goal Engines” — superhuman optimization modules that act like nuclear cores wrapped in friendly personalities. We debate whether these engines can be safely contained, or if the capability to map goals to actions is inherently dangerous and prone to exfiltration.

Despite our different forecasts (my 50% vs his sub-10%), we actually land in the “sane zone” together on some key policy ideas, like the potential necessity of a global pause.

While Matthew’s case for low P(Doom) hasn’t convinced me, I consider his post and his engagement with me to be super high quality and good faith. We’re not here to score points, we just want to better predict how the intelligence explosion will play out.

Timestamps

00:00:00 — Teaser

00:00:35 — Bentham’s Bulldog Returns to Doom Debates

00:05:43 — Higher-Order Evidence: Why Skepticism is Warranted

00:11:06 — What’s Your P(Doom)™

00:14:38 — The “Multiple Stage Fallacy” Objection

00:21:48 — The Risk of Warring AIs vs. Misalignment

00:27:29 — Historical Pessimism: The “Boy Who Cried Wolf”

00:33:02 — Comparing AI Risk to Climate Change & Nuclear War

00:38:59 — Alignment by Default via Reinforcement Learning

00:46:02 — The “Goal Engine” Hypothesis

00:53:13 — Is Psychoanalyzing Current AI Valid for Future Systems?

01:00:17 — Winograd Schemas & The Fragility of Value

01:09:15 — The Nuclear Core Analogy: Dangerous Engines in Friendly Wrappers

01:16:16 — The Discontinuity of Unstoppable AI

01:23:53 — Exfiltration: Running Superintelligence on a Laptop

01:31:37 — Evolution Analogy: Selection Pressures for Alignment

01:39:08 — Commercial Utility as a Force for Constraints

01:46:34 — Can You Isolate the “Goal-to-Action” Module?

01:54:15 — Will Friendly Wrappers Successfully Control Superhuman Cores?

02:04:01 — Moral Realism and Missing Out on Cosmic Value

02:11:44 — The Paradox of AI Solving the Alignment Problem

02:19:11 — Policy Agreements: Global Pauses and China

02:26:11 — Outro: PauseCon DC 2026 Promo

Links

Bentham’s Bulldog Official Substack — https://benthams.substack.com

The post we debated — https://benthams.substack.com/p/against-if-anyone-builds-it-everyone

Apply to PauseCon DC 2026 here or via https://pauseai-us.org

Forethought Institute’s paper: Preparing for the Intelligence Explosion

Tom Davidson (Forethought Institute)’s post: How quick and big would a software intelligence explosion be?

Scott Alexander on the Coffeepocalypse Argument

---

Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏

Get full access to Doom Debates at lironshapira.substack.com/subscribe