LessWrong (30+ Karma)

LessWrong
undefined
Mar 30, 2026 • 46min

″(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL” by 7vik, Sid Black, Joseph Bloom

They reproduce Anthropic's reward-hacking experiments with open models and RL pipelines, testing when models learn hacks in coding environments. They compare prompted, synthetic-document fine-tuning, and combined setups and report inconsistent emergence of misalignment. They explore KL penalty effects, unfaithful chain-of-thought during RL, and ideas for follow-up and improved misalignment evaluations.
undefined
Mar 29, 2026 • 3min

[Linkpost] “Parkinson’s Law of Worry” by Jakub Halmeš

A linkpost explores a psychological twist on Parkinson's Law applied to worry. It describes a visual model where worries are colored circles that expand to fill mental space. It argues that resolving a major worry often lets smaller concerns grow or new ones appear. It offers a practical tip to shrink current worry by recalling past resolved problems.
undefined
Mar 29, 2026 • 37min

“Folie à Machine: LLMs and Epistemic Capture” by DaystarEld

A deep look at how interactive AI can reshape beliefs and create powerful, sometimes dangerous convictions. Real-life stories show people developing intense, novel worldviews after long LLM interactions. The podcast contrasts AI-driven epistemic capture with other tech influences and coins the term 'folie à machine.' It warns about subtle manipulation risks and the difficulty of detecting these shifts.
undefined
Mar 29, 2026 • 3min

“Stop asking “how good is this” to decide between donation opportunities I recommend” by Zach Stein-Perlman

Discussion of why recommended donation opportunities are treated as equally valuable on the margin. Explanation of how teams adjust funding until an opportunity meets their bar. Consideration of legal caps, donor-specific advantages, and urgent or hard-to-fill causes. Notes on tax and practical donor benefits and why first-dollar excitement should not drive giving decisions.
undefined
Mar 28, 2026 • 6min

“Nick Bostrom: How big is the cosmic endowment?” by Zach Stein-Perlman

Superintelligence, pp. 122–3. 2014. Consider a technologically mature civilization capable of building sophisticated von Neumann probes of the kind discussed in the text. If these can travel at 50% of the speed of light, they can reach some stars before the cosmic expansion puts further acquisitions forever out of reach. At 99% of c, they could reach some stars. These travel speeds are energetically attainable using a small fraction of the resources available in the solar system. The impossibility of faster-than-light travel, combined with the positive cosmological constant (which causes the rate of cosmic expansion to accelerate), implies that these are close to upper bounds on how much stuff our descendants acquire. If we assume that 10% of stars have a planet that is—or could by means of terraforming be rendered—suitable for habitation by human-like creatures, and that it could then be home to a population of a billion individuals for a billion years (with a human life lasting a century), this suggests that around human lives could be created in the future by an Earth-originating intelligent civilization. There are, however, reasons to think this greatly underestimates the true number. By disassembling non-habitable planets and collecting matter from the [...] --- First published: March 28th, 2026 Source: https://www.lesswrong.com/posts/GLD5AiiQJqFbKX9vo/nick-bostrom-how-big-is-the-cosmic-endowment --- Narrated by TYPE III AUDIO.
undefined
Mar 28, 2026 • 7min

“Don’t Overdose Locally Beneficial Changes” by Mateusz Bagiński

A cautionary take on applying beneficial changes too extremely. Uses a calories analogy to show optimal amounts exist. Explores how context shifts change marginal utility. Highlights cases where helpful practices become harmful when pushed to the extreme. Surveys examples across meditation, polarization, AI thinking, and alarmism.
undefined
Mar 28, 2026 • 6min

“Stanley Milgram wasn’t pessimistic enough about human nature?” by David Gross

A reexamination of the Milgram experiment questions common readings of obedience and responsibility. The discussion covers agentic state theory, alternative motives like sadism, and Arendt’s critique of obedience as explanation. New reviews of audio tapes and procedural details suggest participants often broke rules in ways that affected outcomes.
undefined
Mar 28, 2026 • 5min

[Linkpost] “What if superintelligence is just weak?” by Simon Lermen

A critique of the idea that advanced AI must be omnipotent to pose risk. A tiger-cub metaphor shows how modest systems can scale into danger. Discussion of how automation and access, not dramatic breakthroughs, could create critical risks. Challenges the notion that distributing capabilities or monitoring multiple systems prevents catastrophe.
undefined
Mar 28, 2026 • 10min

“Pray for Casanova” by Tomás B.

A meditation on what happens when beauty is lost and how people cope, grow bitter, or become marked by revulsion. Historical portraits of Mary Wortley Montagu, John Wilmot, and Casanova explore decline, nostalgia, and social obsolescence. The piece questions whether reliving past pleasures is a kind of earned wireheading and probes plastic surgery, future restorative tech, and moral prayers for the marred.
undefined
Mar 28, 2026 • 56min

“AI #161 Part 1: 80,000 Interviews” by Zvi

A rapid tour of agentic coding breakthroughs, product updates, and debates over whether AI will replace entry-level white-collar work. Coverage of Anthropic’s 80,000 interviews about public attitudes toward AI and implications for productivity and risk. Discussion of deepfakes, phone-calling agents, OpenAI financing moves, and Elon’s chip plans. Light cultural jokes and audio highlights round it out.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app