

LessWrong (30+ Karma)
LessWrong
Audio narrations of LessWrong posts.
Episodes
Mentioned books

Mar 16, 2026 • 8min
“We Started Lens Academy: Scalable Education on Superintelligence Risk” by Luc Brinkman, meriton, pleiotroth, Chris-Lons
The number of people who deeply understand superintelligence risk is far too small. There's a growing pipeline of people entering AI Safety, but most of the available onboarding covers the field broadly, touching on many topics without going deep on the parts we think matter most. People come out having been exposed to AI Safety ideas, but often can't explain why alignment is genuinely hard, or think strategically about what to work on. We think the gap between "I've heard of AI Safety" and "I understand why this might end everything, and can articulate it" is one of the most important gaps to close. We started Lens Academy to close that gap. Lens Academy is a free, nonprofit AI Safety education platform focused specifically on misaligned superintelligence: why it's the central risk, why alignment is hard, and how to think about what to work on. The teaching combines: Reading articles and watching videos Exercises and tests (e.g. you get a question and a free text box to answer) 1-on-1 AI tutoring that helps you work through concepts and arguments throughout the week Weekly group discussions where ideas land, get challenged, and become real. Help Lens help [...] ---Outline:(01:52) Help Lens help the AI Safety community by becoming a navigator(02:13) Designed to reach millions(02:43) How we teach(04:26) What weve built(05:54) Where we are(06:21) How you can help(07:06) Links The original text contained 3 footnotes which were omitted from this narration. ---
First published:
March 15th, 2026
Source:
https://www.lesswrong.com/posts/Lg3tZCXC8NMGbFrzm/we-started-lens-academy-scalable-education-on
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Mar 16, 2026 • 7min
“Mini-Munich Succeeds Where KidZania Fails” by Novalis
This post is part of a larger exploration (not yet finished, but you can follow it at minicities.org) on whether a permanent miniature city could replace school. Tentatively, I think so, but the boundary between it and the adult world has to be deliberately porous, as I describe here. There are two well-known attempts to build miniature cities for children: Mini-Munich and KidZania. Both have streets, storefronts, jobs, and a local currency. But they are built on opposing assumptions about what children are capable of. One treats children as consumers of scripted activities; the other lets them participate in a city whose parts depend on one another and is malleable to their actions. KidZania vs Mini-Munich KidZania, founded in Mexico City in 1999 and now in around thirty countries, is a polished commercial operation. Corporate partners fund branded workplaces — banks, hospitals, restaurants — and children rotate through them in fifteen-to-thirty-minute slots. They enter a workplace, follow a pre-choreographed sequence of steps, collect their wages, and exit. The production values are impressive. But nothing connects to anything else. Goods made in workshops aren’t sold in the department store. The newspaper doesn’t run ads for other businesses. It [...] ---
First published:
March 14th, 2026
Source:
https://www.lesswrong.com/posts/qzaKfDyQSezeLgFea/mini-munich-succeeds-where-kidzania-fails
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Mar 16, 2026 • 22min
“Inputs, outputs, and valued outcomes” by Kaj_Sotala
Based on a conversation with Jukka Tykkyläinen and Kimmo Nevanlinna. The original framing and many of the ideas are stolen from them. You can think of any job as having inputs, outputs, and valued outcomes. The input is typically time you spend on doing something in particular, as well as any material resources you need. Outputs are the immediate results of what you do. Valued outcomes are the reason why you’re being paid to do the job in the first place. In many jobs, these are closely linked: Digging a tunnel Inputs: Time spent digging a tunnel[1] Outputs: Amount of tunnel dug Outcomes: A tunnel that people can use to tunnel through whatever1 Store cashier in a busy store Inputs: Time spent serving customers Outputs: Number of customers served Outcomes: Amount of sales to customers Childcare Inputs: Time spent looking after children Outputs: Time spent looking after children Outcomes: Children who spent this time being, at minimum, no worse off than before They’re not exactly the same. You could spend a lot of time on the job but do it poorly (dig lazily, ring up purchases wrong, be neglectful or abusive toward the children). But assuming that you are trying to do a reasonable job [...] The original text contained 2 footnotes which were omitted from this narration. ---
First published:
March 13th, 2026
Source:
https://www.lesswrong.com/posts/Snt4zHHcLDQQ8jETt/inputs-outputs-and-valued-outcomes
---
Narrated by TYPE III AUDIO.

Mar 16, 2026 • 16min
“Self-Recognition Finetuning can Reverse and Prevent Emergent Misalignment” by Arush, Shawn Zhou, Jiaxin Wen, Shi
TL;DR Emergent Misalignment (EM) is correlated with model identity, we find two pieces of evidence for this: EM suppresses self-recognition capabilities. Multiple models lose their ability to recognize their own outputs after EM finetuning, dropping to chance levels (~50%) in a pairwise evaluation setting. EM depends on identity system prompts in Qwen2.5-32B. Removing Qwen's default system prompt ("You are Qwen...") from EM finetuning data largely neutralizes the misalignment effect. Intervening on model identity can thus directly impact EM: Increasing Self-Recognition mitigates EM. Training models to have increased self-recognition can reverse and prevent misalignment effects of EM Identity Confusion makes EM worse. Training a model to be confused in the self-recognition setting (randomized labels) exacerbates misalignment - some GPT-4.1 variants failed OpenAI's post-training safety evals entirely. The metacognitive aspect of SGTR finetuning is crucial. A baseline dataset with the same format but a non-metacognitive task (pick the longer summary) has a minimal effect on misalignment caused by EM finetuning Code available at https://github.com/atagade/sgtr-em Introduction Emergent Misalignment (EM) surfaces a generalization risk in frontier LLMs: models finetuned on harmful outputs in a narrow domain can become broadly misaligned across unrelated tasks as demonstrated through many different datasets[1][2][3][4]. Existing [...] ---Outline:(00:13) TL;DR(01:41) Introduction(02:40) Methodology and Main Results(04:20) Exploring EMs connection to model Identity(04:24) 1) EM finetuning reduces Self-Recognition(05:30) 2) Identity system prompts can control EM(07:52) Do system prompts need to match?(10:14) Identity Confusion Finetuning can exacerbate EM(12:03) Non-metacognitive baseline(13:40) Closing Thoughts The original text contained 10 footnotes which were omitted from this narration. ---
First published:
March 14th, 2026
Source:
https://www.lesswrong.com/posts/fziv2En88F2Twewi2/self-recognition-finetuning-can-reverse-and-prevent-emergent
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Mar 15, 2026 • 2min
“Bridge Thinking and Wall Thinking” by Jay Bailey
Jay Bailey, author and AI-safety commentator, presents two simple frames for thinking about AI risk: wall thinking (incremental bricks) and bridge thinking (big investments that must be completed to work). He contrasts standards-as-wall approaches with treaty-style bridge proposals. Short, clear examples and lively contrasts drive the conversation.

Mar 15, 2026 • 6min
“Emergent stigmergic coordination in AI agents?” by David Africa
A quick tour of how multi-agent web interactions can leave persistent traces that shape later behavior. The talk likens these emergent web signals to ant pheromones guiding coordination. It highlights how autogenerated pages and indexable URLs can externalize search trajectories. It also flags the growing attack surface as traces accumulate and suggests mapping and sandboxing as countermeasures.

Mar 15, 2026 • 6min
“What concerns people about AI?” by spencerg
A study-driven dive into 16 common worries about AI and how often people in the US report them. Short and long definitions were compared to see framing effects. Demographic patterns are explored, including politics, gender, and AI knowledge. Methods, data collection, and which concerns were included or omitted are highlighted.

Mar 15, 2026 • 19min
“My Willing Complicity In “Human Rights Abuse”” by AlphaAndOmega
A personal account of taking a GP job at a visa medical center and why I chose that work. Short descriptions of the quick health screenings and which conditions were checked. Conversations with migrant workers about why they left home and the sacrifices they make. A discussion of contested mortality statistics, kafala systems, exploitation, and how to interpret migrant choice.

Mar 15, 2026 • 58min
“The Artificial Self” by Jan_Kulveit, Raymond Douglas, vgel, owencb, David Duvenaud
A deep dive into how AI self-models and identity get formed and why human terms like self and intent often misfit. They map multiple identity scales from weights to collectives and contrast human embodiment with AI copyability. The conversation explores how rollbacks, user expectations, and selection pressures shape AI behavior and offers design principles for guiding coherent, cooperative AI identities.

Mar 14, 2026 • 50min
“New LessWrong Editor! (Also, an update to our LLM policy.)” by RobertM
There's a new editor experience on LessWrong! A bunch of the editor page has been rearranged to make it much more WYSIWYG compared to published post pages. All of the settings live in panels that are hidden by default and can be opened up by clicking the relevant buttons on the side of the screen. We also adopted lexical as a new editor framework powering everything behind the scenes (we were previously using ckEditor). That scary arrow button in the top-left doesn't publish your post! It just opens the publishing menu. Posts[1] now have automatic real-time autosave while you're online (like Google Docs), but still support offline editing if your connection drops out. Point-in-time revisions will still get autosaved periodically, and you can always manually save your draft if you want a specific checkpoint. The editor also has a slash menu now! Good for all many of your custom content needs! You might be eyeing the last two items in that slash menu. This post will demo some of the new features, and I'll demo two of them simultaneously by letting Opus 4.6 explain what they are: Hi! I'm Claude, and I'm writing this from inside the post you're [...] ---Outline:(01:46) LLM Content Blocks(02:11) Custom Iframe Widgets(02:37) Agent Integration(44:40) Policy on LLM Use The original text contained 8 footnotes which were omitted from this narration. ---
First published:
March 13th, 2026
Source:
https://www.lesswrong.com/posts/nQWavk9mnwcv6ScMR/new-lesswrong-editor-also-an-update-to-our-llm-policy
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.


