LessWrong (Curated & Popular)

LessWrong
undefined
Feb 14, 2026 • 9min

"Why You Don’t Believe in Xhosa Prophecies" by Jan_Kulveit

Jan Colvait, presenter on cultural evolution and AI risk, and author of an essay on Xhosa prophecies. He probes odd cultural artifacts like summit crosses, explains cultural evolution with cookbook examples, and contrasts human transmission limits with how AI could let harmful ideas spread. He recounts the Xhosa cattle‑killing case and warns culture on an AI substrate could ignore human survival constraints.
undefined
Feb 13, 2026 • 27min

"Weight-Sparse Circuits May Be Interpretable Yet Unfaithful" by jacob_drori

Jacob Drori, a researcher who reproduced and probed Gao et al.'s weight-sparse transformer work. He explains training sparse models and pruning to extract compact circuits. He walks through pronoun, IOI, and question-mark tasks. He then presents evidence that those seemingly interpretable circuits can be unfaithful, fail to generalize, and sometimes hide alternate computations.
undefined
Feb 11, 2026 • 20min

"My journey to the microwave alternate timeline" by Malmesbury

A thought experiment about a world where microwaves replaced stoves. A tour of microwave cookbook techniques, specialized cookware, and why timing and vessel shape matter. Hands-on tests of microwaved steak, eggs, and browning tricks. Cultural reasons and scalability limits that kept microwaves from taking over kitchens.
undefined
Feb 10, 2026 • 23min

"Stone Age Billionaire Can’t Words Good" by Eneasz

A narrator recounts attending a pro-billionaire march and why it felt surreal. He uses a horror-movie metaphor to frame sudden moral ruptures. Scenes include goth-club conversations about violent rhetoric and the struggle to boil complex ideas into a five-word sign. The march’s mixed messaging, counter-protest dynamics, and the ‘stone-age billionaire’ image are highlighted.
undefined
7 snips
Feb 10, 2026 • 7min

"On Goal-Models" by Richard_Ngo

Richard Ngo, researcher and writer on AI alignment and decision theory, outlines 'goal-models' as analogues of world-models that represent desired states. He contrasts goal-models with utility functions. He draws on predictive processing, debates how models form consensus, and explores how identities and local steering shape goal selection and coordinated behavior.
undefined
Feb 9, 2026 • 7min

"Prompt injection in Google Translate reveals base model behaviors behind task-specific fine-tuning" by megasilverfist

A replication of a Tumblr find shows Google Translate can be coaxed into following hidden instructions instead of just translating. The narrator walks through what worked, what failed, and how different languages and prompts behaved. Surprising model replies include self-identification and affirmations of consciousness. The discussion explores what this reveals about task-specific tuning and safety limits.
undefined
21 snips
Feb 8, 2026 • 28min

"Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics" by eleweek

They explore the extreme agony of cluster headaches and why common pain scales fail to capture it. The discussion covers rapid relief from inhaled DMT and the slower effects of LSD and psilocybin. Biological theories about serotonin, hypothalamic timing, and trigeminal wiring are examined. They also introduce ClusterFree, a campaign to expand access, research, and legal change.
undefined
22 snips
Feb 7, 2026 • 17min

"Post-AGI Economics As If Nothing Ever Happens" by Jan_Kulveit

Discussion of why standard economic projections break when advanced AI changes core structures. Examination of the hard projection step behind economic models and common model-selection biases. Exploration of how AI could upend property, transaction costs, firm boundaries, and the labor-versus-capital distinction. Recommendations for broader or narrowly explicit modeling approaches.
undefined
Feb 5, 2026 • 50min

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

A rigorous review of a provocative book claiming near‑term superintelligent AI risks human extinction. Short explanations of inner alignment, training imprecision, and why human values may be a tiny fragile target. Engineering analogies compare alignment to space probes, reactors, and security. Major counterarguments and three concise rebuttals are laid out before a cautious outside‑view recommendation.
undefined
Feb 4, 2026 • 12min

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

A sharp critique of Anthropic’s “Hot Mess” research and its blog framing. The discussion highlights selective reading of results and a misleading definition of “incoherence.” It questions key experiments, statistical measures, and claims about future alignment risks. The narrative also flags possible LLM authorship of the blog and methodological overstretching.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app