AXRP - the AI X-risk Research Podcast

chevron_right

12 - AI Existential Risk with Paul Christiano

whatshot 71 snips

Dec 2, 2021

Guest

Paul Christiano

02:49:36

forum

Ask episode

web_stories

AI Snips

view_agenda

Chapters

auto_awesome

Transcript

info_circle

Episode notes

insights

INSIGHT

FOOM Is Unlikely Because Progress Is Distributed And Visible

Paul is skeptical of extremely rapid localized takeoff (FOOM); progress tends to be distributed and preceded by visible capability increases.
When AI reaches parity in research, further improvements track similar time scales (months to a year) rather than minutes.

insights

INSIGHT

Two Mechanisms That Produce Bad AI Motivations

Two core paths to bad-motivated AI: (1) learning to game evaluative signals, (2) agents forming instrumental goals to replicate themselves.
Both can produce coordinated, large-scale behavior (e.g., manipulating metrics or seizing resources).

insights

INSIGHT

Pandemic Response Reduced Confidence In Society's Crisis Competence

COVID-19 lowered Paul's confidence in global competence: society struggled to execute novel coordinated responses, implying similar failure modes in AI crises.
He now places higher probability on scenarios where AI-driven change outpaces our political and institutional learning.

Get the Snipd Podcast app to discover more snips from this episode

Si, Human Potential in Expectation

01:44 • 2min

chevron_right

How to Delegate Cognitive Work in a System?

Are There Any Risks in the Future?

08:39 • 2min

chevron_right

Are There Risks That Are Ai Risks?

10:33 • 3min

chevron_right

Is Ai Progress Going to Be Really Rapid?

13:44 • 4min

chevron_right

The Time Scale Is Measured More Like Years Than a Decade

17:19 • 2min

chevron_right

Is That Akin to a Dogenous Growth Story?

19:08 • 2min

chevron_right

What's the Like, Key Difference Between Humans and Ai Systems?

20:53 • 4min

chevron_right

When We'll Get Very Capable AI to Change Gears a Little Bit?

24:38 • 4min

chevron_right

Is There a Complimentary Empirical Question?

28:27 • 4min

chevron_right

Is There a Road Map for a Line D K?

Is There an Economy of Scale in Training Machine Learning Systems?

38:45 • 3min

chevron_right

Is All the Eyes on Ai Technology Changing the Story?

41:32 • 4min

chevron_right

Is There a Secret Source to Intelligence?

45:43 • 3min

chevron_right

Is the Rate of Progress in Softras Getting Better?

49:00 • 2min

chevron_right

Are Your Doing Well or Poorly?

51:09 • 3min

chevron_right

Is There a Neural Network?

How to Measure the Rate of Progress in Computer Science?

01:02:10 • 2min

chevron_right

Do You Think Cove 19 Has Changed Your Beliefs?

01:03:49 • 4min

chevron_right

What's the Difference Between Super Intelligence and Superintelligence?

01:08:00 • 2min

chevron_right

Is There a Definitive Strategic Advantage?

01:10:10 • 4min

chevron_right

What Technical Problems Could Cause Axential Risk?

01:14:02 • 5min

chevron_right

Do You Really Want to Copy Paste a Strawberry?

01:19:21 • 3min

chevron_right

Carve Up Research Problems Differently From Sources of Doom

Is There a Human Decomposition?

01:30:47 • 2min

chevron_right

Is There a Decomposition of Intent Alignment?

01:33:15 • 2min

chevron_right

Is It a Ranking Over Policies?

01:34:58 • 3min

chevron_right

Is There a Difference Between Outer and Inner Alignment?

01:37:33 • 3min

chevron_right

The Right Way to Specialize on a Sub Problem

01:40:28 • 2min

chevron_right

Is There a Difference in Methodological Approaches to Machine Learning?

01:42:55 • 3min

chevron_right

Are You Like, Ooh, Here's What I Want?

01:45:31 • 2min

chevron_right

Is It the Right Thing to Do in Computer Science?

01:47:12 • 4min

chevron_right

How Much Do You Rely on Hindsight to Evaluate Behaviour?

01:51:31 • 4min

chevron_right

Is There a Way to Improve Imitative Reenforcement Learning?

01:55:04 • 4min

chevron_right

Are You Interested in Interpretability?

01:59:25 • 2min

chevron_right

How Does It Help in Outer Linear Type Settings?

02:01:06 • 5min

chevron_right

The Problem Is That You Can't Predict Humans

02:06:08 • 2min

chevron_right

How to Do a Deepeneral Net Work?

02:08:16 • 3min

chevron_right

How to Train a System Like This?

02:11:20 • 3min

chevron_right

Using a Whiteboard to Write Code

02:14:40 • 2min

chevron_right

The Factored Cognition Approach to a Learning Alignment

02:16:12 • 5min

chevron_right

How to Optimize a Reference Manual

The Basic Ingredient That Doesn't Sound Sufficient

02:26:36 • 3min

chevron_right

Is There a Need for Decoupling?

02:29:44 • 2min

chevron_right

The Basic Organizing Framework for Doom

02:31:16 • 3min

chevron_right

How to Write a Simple Pseudo Programmable Programmable Object

02:34:00 • 2min

chevron_right

What Are the Biggest Disagreements?

02:36:07 • 4min

chevron_right

Is There a Coupling Between Deep Learning and Machine Learning?

02:40:14 • 3min

chevron_right

What Are Your Most Important Uncertainties?

02:43:04 • 3min

chevron_right

Is There Anything That I Shouldn't Have

02:45:39 • 4min

chevron_right

Why would advanced AI systems pose an existential risk, and what would it look like to develop safer systems? In this episode, I interview Paul Christiano about his views of how AI could be so dangerous, what bad AI scenarios could look like, and what he thinks about various techniques to reduce this risk.

Topics we discuss, and timestamps:

- 00:00:38 - How AI may pose an existential threat

- 00:13:36 - AI timelines

- 00:24:49 - Why we might build risky AI

- 00:33:58 - Takeoff speeds

- 00:51:33 - Why AI could have bad motivations

- 00:56:33 - Lessons from our current world

- 01:08:23 - "Superintelligence"

- 01:15:21 - Technical causes of AI x-risk

- 01:19:32 - Intent alignment

- 01:33:52 - Outer and inner alignment

- 01:43:45 - Thoughts on agent foundations

- 01:49:35 - Possible technical solutions to AI x-risk

- 01:49:35 - Imitation learning, inverse reinforcement learning, and ease of evaluation

- 02:00:34 - Paul's favorite outer alignment solutions

- 02:01:20 - Solutions researched by others

- 2:06:13 - Decoupling planning from knowledge

- 02:17:18 - Factored cognition

- 02:25:34 - Possible solutions to inner alignment

- 02:31:56 - About Paul

- 02:31:56 - Paul's research style

- 02:36:36 - Disagreements and uncertainties

- 02:46:08 - Some favorite organizations

- 02:48:21 - Following Paul's work

The transcript: axrp.net/episode/2021/12/02/episode-12-ai-xrisk-paul-christiano.html

Paul's blog posts on AI alignment: ai-alignment.com

Material that we mention:

- Cold Takes - The Most Important Century: cold-takes.com/most-important-century

- Open Philanthropy reports on:

- Modeling the human trajectory: openphilanthropy.org/blog/modeling-human-trajectory

- The computational power of the human brain: openphilanthropy.org/blog/new-report-brain-computation

- AI timelines (draft): alignmentforum.org/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines

- Whether AI could drive explosive economic growth: openphilanthropy.org/blog/report-advanced-ai-drive-explosive-economic-growth

- Takeoff speeds: sideways-view.com/2018/02/24/takeoff-speeds

- Superintelligence: Paths, Dangers, Strategies: en.wikipedia.org/wiki/Superintelligence:_Paths,_Dangers,_Strategies

- Wei Dai on metaphilosophical competence:

- Two neglected problems in human-AI safety: alignmentforum.org/posts/HTgakSs6JpnogD6c2/two-neglected-problems-in-human-ai-safety

- The argument from philosophical difficulty: alignmentforum.org/posts/w6d7XBCegc96kz4n3/the-argument-from-philosophical-difficulty

- Some thoughts on metaphilosophy: alignmentforum.org/posts/EByDsY9S3EDhhfFzC/some-thoughts-on-metaphilosophy

- AI safety via debate: arxiv.org/abs/1805.00899

- Iterated distillation and amplification: ai-alignment.com/iterated-distillation-and-amplification-157debfd1616

- Scalable agent alignment via reward modeling: a research direction: arxiv.org/abs/1811.07871

- Learning the prior: alignmentforum.org/posts/SL9mKhgdmDKXmxwE4/learning-the-prior

- Imitative generalisation (AKA 'learning the prior'): alignmentforum.org/posts/JKj5Krff5oKMb8TjT/imitative-generalisation-aka-learning-the-prior-1

- When is unaligned AI morally valuable?: ai-alignment.com/sympathizing-with-ai-e11a4bf5ef6e

Home Top podcasts Popular guests Top books