EP 325 Joe Edelman on Full-Stack AI Alignment

70 snips

Oct 7, 2025

Joe Edelman, a researcher focused on AI alignment, shares his insights on designing social systems that promote human flourishing. He discusses the importance of pluralism, critiquing conventional voting and market models for their superficiality. Edelman emphasizes the concept of 'thick models of value,' arguing that true values encompass deeper reasons and norms. He also addresses the risks of AI assistants potentially manipulating users and proposes innovative solutions like value-aware markets to navigate societal challenges. A thought-provoking conversation on the future of AI and governance!

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ADVICE

Design Non-Manipulative Allied AI

Build allied AI that represents users' best, deliberated values rather than merely mirroring raw preferences.
Let AIs help users reason through edge cases and refine values without covert manipulation.

ADVICE

Keep Human-in-the-Loop For Hard Choices

When agents act on your behalf, create feedback loops where hard choices are escalated to the human.
Make agent representations legible and checkable to build responsibility and learning.

INSIGHT

Elicit Values From Meaningful Choices

Value elicitation can start from past meaningful choices and follow-up why they mattered to extract attention policies.
Agents can interview and form legible representations to apply those policies in real-world matches.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Jim talks with Joe Edelman about the ideas in the Meaning Alignment Institute's recent paper "Full Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value." They discuss pluralism as a core principle in designing social systems, the informational basis for alignment, how preferential models fail to capture what people truly care about, the limitations of markets and voting as preference-based systems, critiques of text-based approaches in LLMs, thick models of value, values as attentional policies, AI assistants as potential vectors for manipulation, the need for reputation systems and factual grounding, the "super negotiator" project for better contract negotiation, multipolar traps, moral graph elicitation, starting with membranes, Moloch-free zones, unintended consequences and lessons from early Internet optimism, concentration of power as a key danger, co-optation risks, and much more. Episode Transcript "A Minimum Viable Metaphysics," by Jim Rutt (Substack) Jim's Substack JRS Currents 080: Joe Edelman and Ellie Hain on Rebuilding Meaning Meaning Alignment Institute If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All, by Eliezer Yudkowsky and Nate Soares "Full Stack Alignment: Co-aligning AI and Institutions with Thick Models of Value," by Joe Edelman et al. "What Are Human Values and How Do We Align AI to Them?" by Oliver Klingefjord, Ryan Lowe, and Joe Edelman Joe Edelman has spent much of his life trying to understand how ML systems and markets could change, retaining their many benefits but avoiding their characteristic problems: of atomization, and of servicing shallow desires over deeper needs. Along the way this led him to formulate theories of human meaning and values (https://arxiv.org/abs/2404.10636) and study models of societal transformation (https://www.full-stack-alignment.ai/paper) as well as inventing the meaning-based metrics used at CouchSurfing, Facebook, and Apple, co-founding the Center for Humane Technology and the Meaning Alignment Institute, and inventing new democratic systems (https://arxiv.org/abs/2404.10636). He’s currently one of the PIs leading the Full-Stack Alignment program at the Meaning Alignment Institute, with a network of more than 50 researchers at universities and corporate labs working on these issues.