

LessWrong (Curated & Popular)
LessWrong
Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Episodes
Mentioned books

Jan 22, 2024 • 12min
[HUMAN VOICE] "There is way too much serendipity" by Malmesbury
Exploring the unlikely discovery of sweeteners, the challenges in creating new artificial sweeteners, and the role of sweet taste receptors. Delving into sweetness, bitterness, and the discovery of sweeteners. Discussing the rarity of low-calorie sweeteners and the extensive process of drug discovery. Examining the limited recreational potential of psychiatric drugs and the extensive testing of psychedelic compounds.

Jan 20, 2024 • 41min
[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka
Neel Nanda, an expert in mechanistic interpretability, discusses the challenges and potential applications of mechanistic interpretability. They explore concrete projects, debunk the usefulness of mechanistic interpretability, and discuss the limitations in achieving interpretability in transformative models like GPT-4. They also delve into the concept of model safety and ablations, and discuss the potential of ruling out problematic behavior without fully understanding the model's internals. The speakers reflect on the dialogue and highlight its usefulness in advancing thinking about mechanistic interpretability.

Jan 20, 2024 • 9min
[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al
This is a linkpost for https://arxiv.org/abs/2401.05566Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/ZAsJv7xijKTfZkMtr/sleeper-agents-training- deceptive-llms-that-persist-throughNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓

Jan 17, 2024 • 24min
The impossible problem of due process
I wrote this entire post in February of 2023, during the fallout from the TIME article. I didn't post it at the time for multiple reasons: because I had no desire to get involved in all that nonsensebecause I was horribly burned out from my own community conflict investigation and couldn't stand the thought of engaging with people onlinebecause I generally think it's bad to post on the internet out of frustration or outrageBut after sitting on it for a full year, I still think it's worth posting, so here it is. The only edits I have made since February 16th, 2023, were to add a couple interstitial sentences for clarity, and change 'recent articles' to 'articles from February 2023'. So, it's not intended to be commenting on anything more recent than that.I am precommitting to not engaging with any comments, because I am [...]--- First published: January 16th, 2024 Source: https://www.lesswrong.com/posts/sJEcNgqnSL2n35QWR/the-impossible-problem-of-due-process --- Narrated by TYPE III AUDIO.

Jan 14, 2024 • 23min
[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith
"(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.)"This is the first essay in a series that I’m calling “Otherness and control in the age of AGI.” See here for more about the series as a whole.)When species meetThe most succinct argument for AI risk, in my opinion, is the “second species” argument. Basically, it goes like this.Premise 1: AGIs would be like a second advanced species on earth, more powerful than humans.Conclusion: That’s scary.To be clear: this is very far from airtight logic.[1] But I like the intuition pump. Often, if I only have two sentences to explain AI risk, I say this sort of species stuff. “Chimpanzees should be careful about inventing humans.” Etc.[2]People often talk about aliens here, too. “What if you learned that aliens were on their way to earth? Surely that’s scary.” Again, very far from a knock-down case (for example: we get to build the aliens in question). But it draws on something.In particular, though: it draws on a narrative of interspecies conflict. You are meeting a new form of life, a new type of mind. But these new creatures are presented to you, centrally, as a possible threat; as competitors; as agents in whose power you might find yourself helpless.And unfortunately: yes. But I want to start this series by acknowledging how many dimensions of interspecies-relationship this narrative leaves out, and how much I wish we could be focusing only on the other parts. To meet a new species – and especially, a new intelligent species – is not just scary. It’s incredible. I wish it was less a time for fear, and more a time for wonder and dialogue. A time to look into new eyes – and to see further.Source:https://www.lesswrong.com/posts/mzvu8QTRXdvDReCAL/gentleness-and-the-artificial-otherNarrated for LessWrong by Joe Carlsmith (audio provided with permission).Share feedback on this narration.[Curated Post] ✓[125+ karma Post] ✓

Jan 14, 2024 • 3min
Introducing Alignment Stress-Testing at Anthropic
Carson Denison and Monte MacDiarmid join the Alignment Stress-Testing team at Anthropic to red-team alignment techniques, exploring ways in which they could fail. Their first project, 'Sleeper Agents', focuses on training deceptive LLMs. The team's mission is to empirically demonstrate potential flaws in Anthropic's alignment strategies.

Jan 13, 2024 • 7min
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a linkpost for https://arxiv.org/abs/2401.05566I'm not going to add a bunch of commentary here on top of what we've already put out, since we've put a lot of effort into the paper itself, and I'd mostly just recommend reading it directly, especially since there are a lot of subtle results that are not easy to summarize. I will say that I think this is some of the most important work I've ever done and I'm extremely excited for us to finally be able to share this. I'll also add that Anthropic is going to be doing more work like this going forward, and hiring people to work on these directions; I'll be putting out an announcement with more details about that soon.EDIT: That announcement is now up!Abstract:Humans are capable of [...]--- First published: January 12th, 2024 Source: https://www.lesswrong.com/posts/ZAsJv7xijKTfZkMtr/sleeper-agents-training-deceptive-llms-that-persist-through Linkpost URL:https://arxiv.org/abs/2401.05566 --- Narrated by TYPE III AUDIO.

Jan 7, 2024 • 30min
[HUMAN VOICE] "Meaning & Agency" by Abram Demski
Abram Demski, an AI Alignment researcher and writer, clarifies concepts of AI Alignment focusing on optimization, reference, endorsement, and legitimacy. The podcast explores the implications of agency as a natural phenomenon for AI risk analysis and delves into naturalistic representation theorems, denotation vs. connotation in language, and conditional endorsement and legitimacy. It also discusses the distinction between selection and control processes and their impact on trust and inner alignment.

Jan 7, 2024 • 29min
What’s up with LLMs representing XORs of arbitrary features?
The podcast explores the claim that LLM's can represent XORs of arbitrary features and its implications for AI safety research. It discusses the implications of RAX and generalization in linear probes, generating representations for classification and linear features, possible explanations for aggregation of noisy signals correlated with 'Is a Formula', and explores hypotheses on LLMs' behavior and feature directions.

Jan 5, 2024 • 23min
Gentleness and the artificial Other
The podcast explores the concept of AI risk and encountering a more powerful intelligent species. It discusses the limitations of human comprehension in understanding AI beings and emphasizes the need for a more gentle and considerate approach towards AI. The hosts analyze the tragic story of Timothy Treadwell and his interaction with grizzly bears in Alaska. The chapter also delves into forging connections between humans, aliens, and AIs and preparing for the challenges of interacting with them.


