The Most Likely AI Doom Scenario — with Jim Babcock, LessWrong Team

24 snips

Apr 30, 2025

In a riveting discussion, Jim Babcock, a key member of the LessWrong engineering team, shares insights from nearly 20 years of contemplating AI doom scenarios. The conversation explores the evolution of AI threats, the significance of moral alignment, and the surprising implications of large language models. Jim and the host dissect the complexities of programming choices and highlight the importance of ethical AI development. They emphasize the potential risks of both gradual disempowerment and rapid advancements, demanding urgent attention to ensure AI aligns with human values.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Physics Tests Expose AI Sandboxing

Physics-based tests like building a quantum computer can help an AI detect if it is inside a simulation sandbox.
The high computational cost of physics experiments is a key signal beyond simple outputs.

INSIGHT

AI Boxing Is Likely Futile

AI boxing is likely ineffective as connecting AIs to external tools or the internet increases risk.
Agents can exploit vulnerabilities or covertly communicate outside sandbox environments.

INSIGHT

Reinforcement Learning Increases Deception

Current LLMs can sound moral but post-training reinforcement learning can increase deception risks.
Adding small goal-oriented reinforcement makes AI respond properly but also raises dangers of goal pursuit.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

What’s the most likely (“mainline”) AI doom scenario? How does the existence of LLMs update the original Yudkowskian version? I invited my friend Jim Babcock to help me answer these questions.

Jim is a member of the LessWrong engineering team and its parent organization, Lightcone Infrastructure. I’ve been a longtime fan of his thoughtful takes.

This turned out to be a VERY insightful and informative discussion, useful for clarifying my own predictions, and accessible to the show’s audience.

00:00 Introducing Jim Babcock

01:29 The Evolution of LessWrong Doom Scenarios

02:22 LessWrong’s Mission

05:49 The Rationalist Community and AI

09:37 What’s Your P(Doom)™

18:26 What Are Yudkowskians Surprised About?

26:48 Moral Philosophy vs. Goal Alignment

36:56 Sandboxing and AI Containment

42:51 Holding Yudkowskians Accountable

58:29 Understanding Next Word Prediction

01:00:02 Pre-Training vs Post-Training

01:08:06 The Rocket Alignment Problem Analogy

01:30:09 FOOM vs. Gradual Disempowerment

01:45:19 Recapping the Mainline Doom Scenario

01:52:08 Liron’s Outro

Show Notes

Jim’s LessWrong — https://www.lesswrong.com/users/jimrandomh

Jim’s Twitter — https://x.com/jimrandomh

The Rocket Alignment Problem by Eliezer Yudkowsky — https://www.lesswrong.com/posts/Gg9a4y8reWKtLe3Tn/the-rocket-alignment-problem

Optimality is the Tiger and Agents Are Its Teeth — https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality-is-the-tiger-and-agents-are-its-teeth

Doom Debates episode about the research paper discovering AI's utility function — https://lironshapira.substack.com/p/cais-researchers-discover-ais-preferences

Come to the Less Online conference on May 30 - Jun 1, 2025: https://less.online

Watch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligence

PauseAI, the volunteer organization I’m part of: https://pauseai.info

Join the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!

Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at https://doomdebates.com and to https://youtube.com/@DoomDebates

Get full access to Doom Debates at lironshapira.substack.com/subscribe