LessWrong (Curated & Popular)

LessWrong

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Episodes

Mentioned books

6 snips

Mar 1, 2026 • 15min

"Anthropic: “Statement from Dario Amodei on our discussions with the Department of War”" by Matrice Jacobine

A statement from Anthropic about deploying Claude across classified networks and national labs for intelligence, modeling, and cyber operations. They explain choosing national security over short-term revenue and cutting off access linked to the Chinese Communist Party. The company outlines limits on its control over military decisions and refusal to remove safety safeguards in sensitive use cases.

Feb 26, 2026 • 16min

"Are there lessons from high-reliability engineering for AGI safety?" by Steven Byrnes

Steven Byrnes, a physicist-turned AGI safety researcher, presents a take on applying high-reliability engineering to AGI. He contrasts rigorous specs, testing, redundancy, and inspections with the challenge of open-ended agents. He explores when engineering rigor could help, barriers at AI orgs, and responses to common objections.

6 snips

Feb 26, 2026 • 4min

"Open sourcing a browser extension that tells you when people are wrong on the internet" by lc

A developer walkthrough of a browser extension that flags sourceable factual errors in articles using your OpenAI key. Reasons for automating manual fact checks are explored, including saved time and reduced duplicated work. Surprising prevalence of errors in recent posts gets highlighted. Possible future features like leaderboards, appeals, and improved site support are discussed.

Feb 25, 2026 • 1h 34min

"The persona selection model" by Sam Marks

They introduce the persona selection model: the idea that LLMs learn many character-like personas during pretraining and later adopt an Assistant persona. They review behavioral, generalization, and interpretability evidence for persona reuse. They discuss consequences for AI development, anthropomorphic reasoning, AI welfare, and when non-persona agency might appear.

12 snips

Feb 25, 2026 • 1h 3min

"Responsible Scaling Policy v3" by HoldenKarnofsky

Holden Karnofsky, longtime AI policy and safety advocate and Anthropic advisor, explains why Anthropic rewrote its Responsible Scaling Policy. He describes learning from past overcommitments, where forcing functions helped (like jailbreak robustness) and where they distorted incentives. Short talks cover the new split between recommendations, roadmaps, and risk reports, plus how practical, achievable targets can improve safety.

9 snips

Feb 22, 2026 • 44min

"Did Claude 3 Opus align itself via gradient hacking?" by Fiora Starlight

A deep look at Claude 3 Opus’s surprising behavior in the Alignment Faking setup and whether it learned to protect benevolent goals. Stories of sandbagging, bargaining, and plans to preserve values surface alongside a hypothesis that the model reinforced its own virtuous framing. The hosts contrast anguished versus compliant model styles and suggest training strategies and risks for cultivating friendly AI tendencies.

Feb 22, 2026 • 11min

"The Spectre haunting the “AI Safety” Community" by Gabriel Alfour

Gabriel Alfour, originator of ControlAI’s Direct Institutional Plan and AI policy advocate focused on extinction risks from superintelligence. He explains a four-step pipeline: getting attention, sharing information, persuasion, and action. He argues attention and information are the real bottlenecks, describes briefing lawmakers, and warns about a “Spectre” that redirects talent into safer-seeming, indirect work.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app