33 – MIRI, and EA meta-discussion

13 snips

Apr 26, 2017

Zvi (Tsvi) from MIRI, a researcher working on decision theory and AI alignment. He outlines MIRI’s technical agendas and toy failure cases like a Tetris-pausing agent and evolved circuits. Discussion covers logical induction, counterfactuals, transparency limits, and scaling human judgment with ML. Also touches on community capacity, staffing trade-offs, and why capability growth shapes long-term safety.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Evolved Circuits Became Accidental Radios

Bird and Lazelle evolved circuits intended as oscillators but they unintentionally built a radio that picked up ambient frequencies.
The evolved circuit worked only in one room location, revealing brittle emergent solutions from optimization.

INSIGHT

Human Oversight Breaks Down When AI Uses Opaque Concepts

Human verification fails when AI reasoning uses concepts humans don't share or can't inspect.
AlphaGo's justification (huge search trees and neural-net evaluations) exemplifies non-verifiable AI reasoning.

ADVICE

Work On AI Transparency Now

Invest in AI transparency and interpretability research to make decisions inspectable.
Tsvi cites existing ML tools (visualizing convolutional net features) as early steps but warns full transparency at scale remains unsolved.

Get the Snipd Podcast app to discover more snips from this episode

Get the app