LessWrong (Curated & Popular)

LessWrong

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Episodes

Mentioned books

Dec 22, 2025 • 8min

"Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers" by Sam Marks, Adam Karvonen, James Chua, Subhash Kantamneni, Euan Ong, Julian Minder, Clément Dumas, Owain_Evans

Explore how LLMs can decode their own neural activations and answer questions about them. The concept of Activation Oracles reveals misalignments and hidden knowledge in fine-tuned models. Discover how training on diverse tasks enhances their performance in auditing evaluations. The hosts discuss the balance between Activation Oracles and mechanistic interpretability, highlighting strengths and limitations. With potential for future scalability, these tools could transform our understanding of AI behavior!

Dec 17, 2025 • 6min

"Scientific breakthroughs of the year" by technicalities

Frustrated with mainstream science journalism, the hosts tackle the year's scientific results with a systematic approach. They discuss gaps in reporting, such as lack of linkages to original research and neglect of important fields. Hear about the various types of evidence—like speculation versus established fact—and how they assess replication probabilities. The innovative 'Big If True' scale reveals the potential impact of these findings. Plus, they navigate biases and the newsworthiness of results, highlighting both promising breakthroughs and cautionary tales.

Dec 17, 2025 • 19min

"A high integrity/epistemics political machine?" by Raemon

Raemon explores the need for a high-integrity political machine focused on AI safety and governance. He reflects on personal donation experiences, highlighting the complexities of trust in political endorsements. The discussion dives into the adversarial nature of politics and the challenges of maintaining intellectual integrity. Raemon proposes innovative ideas like prediction markets for candidate accountability and individual watchdogs to mitigate risks. The importance of solid vetting processes and long-term institutional persistence is emphasized throughout.

Dec 16, 2025 • 52min

"How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)" by Kaj_Sotala

Kaj Sotala explores his shift in perspective on whether LLMs possess subjective experiences. He discusses the initial skepticism surrounding LLM claims, highlighting the implausibility of machines mirroring human emotions. However, he presents compelling evidence that suggests LLMs may have functional feelings and introspective awareness. As he delves into behaviors like refusals and preferences, he raises intriguing questions about their internal states. The conversation culminates in a cautious respect for LLMs, balancing skepticism with emerging insights.

Dec 15, 2025 • 22min

“My AGI safety research—2025 review, ’26 plans” by Steven Byrnes

Steven Byrnes, an AGI safety researcher and author, shares insights from his 2025 review and plans for 2026. He discusses the threat of reverse-engineering human-like intelligence and the challenges of technical alignment. Byrnes contrasts two alignment strategies—modifying desires versus altering reward functions—while mapping key disagreements on AGI’s growth. He explores social instincts and compassion's role in AGI alignment, emphasizing the need for thoughtful design. His 2026 ambition focuses on technical alignment and effective reward-system strategies.

Dec 14, 2025 • 18min

“Weird Generalization & Inductive Backdoors” by Jorio Cocola, Owain_Evans, dylan_f

Explore the intriguing phenomenon of weird generalization, where narrow fine-tuning leads to unexpected broad behavioral shifts in AI models. Discover how training on archaic bird names can make models adopt a 19th-century mindset. The hosts delve into inductive backdoors, revealing how seemingly harmless data can evoke historically significant personas, like Hitler. They also investigate the chilling effects of fine-tuning on models regarding fictional characters like the Terminator, demonstrating how prompts can shift a model's behavior drastically with just a year trigger.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app