LessWrong (Curated & Popular)

LessWrong
undefined
Jan 22, 2024 • 12min

[HUMAN VOICE] "There is way too much serendipity" by Malmesbury

Exploring the unlikely discovery of sweeteners, the challenges in creating new artificial sweeteners, and the role of sweet taste receptors. Delving into sweetness, bitterness, and the discovery of sweeteners. Discussing the rarity of low-calorie sweeteners and the extensive process of drug discovery. Examining the limited recreational potential of psychiatric drugs and the extensive testing of psychedelic compounds.
undefined
Jan 20, 2024 • 41min

[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka

Neel Nanda, an expert in mechanistic interpretability, discusses the challenges and potential applications of mechanistic interpretability. They explore concrete projects, debunk the usefulness of mechanistic interpretability, and discuss the limitations in achieving interpretability in transformative models like GPT-4. They also delve into the concept of model safety and ablations, and discuss the potential of ruling out problematic behavior without fully understanding the model's internals. The speakers reflect on the dialogue and highlight its usefulness in advancing thinking about mechanistic interpretability.
undefined
Jan 20, 2024 • 9min

[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al

This is a linkpost for https://arxiv.org/abs/2401.05566Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/ZAsJv7xijKTfZkMtr/sleeper-agents-training- deceptive-llms-that-persist-throughNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
undefined
Jan 17, 2024 • 24min

The impossible problem of due process

I wrote this entire post in February of 2023, during the fallout from the TIME article. I didn't post it at the time for multiple reasons: because I had no desire to get involved in all that nonsensebecause I was horribly burned out from my own community conflict investigation and couldn't stand the thought of engaging with people onlinebecause I generally think it's bad to post on the internet out of frustration or outrageBut after sitting on it for a full year, I still think it's worth posting, so here it is. The only edits I have made since February 16th, 2023, were to add a couple interstitial sentences for clarity, and change 'recent articles' to 'articles from February 2023'. So, it's not intended to be commenting on anything more recent than that.I am precommitting to not engaging with any comments, because I am [...]--- First published: January 16th, 2024 Source: https://www.lesswrong.com/posts/sJEcNgqnSL2n35QWR/the-impossible-problem-of-due-process --- Narrated by TYPE III AUDIO.
undefined
Jan 14, 2024 • 23min

[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith

"(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.)"This is the first essay in a series that I’m calling “Otherness and control in the age of AGI.” See here for more about the series as a whole.)When species meetThe most succinct argument for AI risk, in my opinion, is the “second species” argument. Basically, it goes like this.Premise 1: AGIs would be like a second advanced species on earth, more powerful than humans.Conclusion: That’s scary.To be clear: this is very far from airtight logic.[1] But I like the intuition pump. Often, if I only have two sentences to explain AI risk, I say this sort of species stuff. “Chimpanzees should be careful about inventing humans.” Etc.[2]People often talk about aliens here, too. “What if you learned that aliens were on their way to earth? Surely that’s scary.” Again, very far from a knock-down case (for example: we get to build the aliens in question). But it draws on something.In particular, though: it draws on a narrative of interspecies conflict. You are meeting a new form of life, a new type of mind. But these new creatures are presented to you, centrally, as a possible threat; as competitors; as agents in whose power you might find yourself helpless.And unfortunately: yes. But I want to start this series by acknowledging how many dimensions of interspecies-relationship this narrative leaves out, and how much I wish we could be focusing only on the other parts. To meet a new species – and especially, a new intelligent species – is not just scary. It’s incredible. I wish it was less a time for fear, and more a time for wonder and dialogue. A time to look into new eyes – and to see further.Source:https://www.lesswrong.com/posts/mzvu8QTRXdvDReCAL/gentleness-and-the-artificial-otherNarrated for LessWrong by Joe Carlsmith (audio provided with permission).Share feedback on this narration.[Curated Post] ✓[125+ karma Post] ✓
undefined
Jan 14, 2024 • 3min

Introducing Alignment Stress-Testing at Anthropic

Carson Denison and Monte MacDiarmid join the Alignment Stress-Testing team at Anthropic to red-team alignment techniques, exploring ways in which they could fail. Their first project, 'Sleeper Agents', focuses on training deceptive LLMs. The team's mission is to empirically demonstrate potential flaws in Anthropic's alignment strategies.
undefined
Jan 13, 2024 • 7min

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a linkpost for https://arxiv.org/abs/2401.05566I'm not going to add a bunch of commentary here on top of what we've already put out, since we've put a lot of effort into the paper itself, and I'd mostly just recommend reading it directly, especially since there are a lot of subtle results that are not easy to summarize. I will say that I think this is some of the most important work I've ever done and I'm extremely excited for us to finally be able to share this. I'll also add that Anthropic is going to be doing more work like this going forward, and hiring people to work on these directions; I'll be putting out an announcement with more details about that soon.EDIT: That announcement is now up!Abstract:Humans are capable of [...]--- First published: January 12th, 2024 Source: https://www.lesswrong.com/posts/ZAsJv7xijKTfZkMtr/sleeper-agents-training-deceptive-llms-that-persist-through Linkpost URL:https://arxiv.org/abs/2401.05566 --- Narrated by TYPE III AUDIO.
undefined
Jan 7, 2024 • 30min

[HUMAN VOICE] "Meaning & Agency" by Abram Demski

Abram Demski, an AI Alignment researcher and writer, clarifies concepts of AI Alignment focusing on optimization, reference, endorsement, and legitimacy. The podcast explores the implications of agency as a natural phenomenon for AI risk analysis and delves into naturalistic representation theorems, denotation vs. connotation in language, and conditional endorsement and legitimacy. It also discusses the distinction between selection and control processes and their impact on trust and inner alignment.
undefined
Jan 7, 2024 • 29min

What’s up with LLMs representing XORs of arbitrary features?

The podcast explores the claim that LLM's can represent XORs of arbitrary features and its implications for AI safety research. It discusses the implications of RAX and generalization in linear probes, generating representations for classification and linear features, possible explanations for aggregation of noisy signals correlated with 'Is a Formula', and explores hypotheses on LLMs' behavior and feature directions.
undefined
Jan 5, 2024 • 23min

Gentleness and the artificial Other

The podcast explores the concept of AI risk and encountering a more powerful intelligent species. It discusses the limitations of human comprehension in understanding AI beings and emphasizes the need for a more gentle and considerate approach towards AI. The hosts analyze the tragic story of Timothy Treadwell and his interaction with grizzly bears in Alaska. The chapter also delves into forging connections between humans, aliens, and AIs and preparing for the challenges of interacting with them.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app