LessWrong (Curated & Popular)

LessWrong

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Episodes

Mentioned books

Mar 14, 2023 • 10min

"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares

https://www.lesswrong.com/posts/NJYmovr9ZZAyyTBwM/what-i-mean-by-alignment-is-in-large-part-about-makingCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.(Epistemic status: attempting to clear up a misunderstanding about points I have attempted to make in the past. This post is not intended as an argument for those points.)I have long said that the lion's share of the AI alignment problem seems to me to be about pointing powerful cognition at anything at all, rather than figuring out what to point it at.It’s recently come to my attention that some people have misunderstood this point, so I’ll attempt to clarify here.

Feb 10, 2023 • 1h 13min

"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça

https://www.lesswrong.com/posts/NRrbJJWnaSorrqvtZ/on-not-getting-contaminated-by-the-wrong-obesity-ideasA Chemical Hunger (a), a series by the authors of the blog Slime Mold Time Mold (SMTM), argues that the obesity epidemic is entirely caused (a) by environmental contaminants. In my last post, I investigated SMTM’s main suspect (lithium).[1] This post collects other observations I have made about SMTM’s work, not narrowly related to lithium, but rather focused on the broader thesis of their blog post series.I think that the environmental contamination hypothesis of the obesity epidemic is a priori plausible. After all, we know that chemicals can affect humans, and our exposure to chemicals has plausibly changed a lot over time. However, I found that several of what seem to be SMTM’s strongest arguments in favor of the contamination theory turned out to be dubious, and that nearly all of the interesting things I thought I’d learned from their blog posts turned out to actually be wrong. I’ll explain that in this post.

Feb 8, 2023 • 34min

"SolidGoldMagikarp (plus, prompt generation)"

https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generationWork done at SERI-MATS, over the past two months, by Jessica Rumbelow and Matthew Watkins.TL;DRAnomalous tokens: a mysterious failure mode for GPT (which reliably insulted Matthew)We have found a set of anomalous tokens which result in a previously undocumented failure mode for GPT-2 and GPT-3 models. (The 'instruct' models “are particularly deranged” in this context, as janus has observed.)Many of these tokens reliably break determinism in the OpenAI GPT-3 playground at temperature 0 (which theoretically shouldn't happen).Prompt generation: a new interpretability method for language models (which reliably finds prompts that result in a target completion). This is good for:eliciting knowledgegenerating adversarial inputsautomating prompt search (e.g. for fine-tuning)In this post, we'll introduce the prototype of a new model-agnostic interpretability method for language models which reliably generates adversarial prompts that result in a target completion. We'll also demonstrate a previously undocumented failure mode for GPT-2 and GPT-3 language models, which results in bizarre completions (in some cases explicitly contrary to the purpose of the model), and present the results of our investigation into this phenomenon. Further detail can be found in a follow-up post.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

LessWrong (Curated & Popular)

Episodes

Mentioned books

"The Parable of the King and the Random Process" by moridinamael

"Enemies vs Malefactors" by Nate Soares

"The Waluigi Effect (mega-post)" by Cleo Nardo

"Acausal normalcy" by Andrew Critch

"Please don't throw your mind away" by TsviBT

"Cyborgism" by Nicholas Kees & Janus

"Childhoods of exceptional people" by Henrik Karlsson

"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares

"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça

"SolidGoldMagikarp (plus, prompt generation)"

The AI-powered Podcast Player