LessWrong (Curated & Popular)

LessWrong
undefined
Mar 14, 2023 • 10min

"The Parable of the King and the Random Process" by moridinamael

https://www.lesswrong.com/posts/LzQtrHSYDafXynofq/the-parable-of-the-king-and-the-random-process~ A Parable of Forecasting Under Model Uncertainty ~You, the monarch, need to know when the rainy season will begin, in order to properly time the planting of the crops. You have two advisors, Pronto and Eternidad, who you trust exactly equally. You ask them both: "When will the next heavy rain occur?"Pronto says, "Three weeks from today."Eternidad says, "Ten years from today."
undefined
Mar 14, 2023 • 10min

"Enemies vs Malefactors" by Nate Soares

https://www.lesswrong.com/posts/zidQmfFhMgwFzcHhs/enemies-vs-malefactorsStatus: some mix of common wisdom (that bears repeating in our particular context), and another deeper point that I mostly failed to communicate.Short versionHarmful people often lack explicit malicious intent. It’s worth deploying your social or community defenses against them anyway. I recommend focusing less on intent and more on patterns of harm.(Credit to my explicit articulation of this idea goes in large part to Aella, and also in part to Oliver Habryka.)
undefined
Mar 8, 2023 • 41min

"The Waluigi Effect (mega-post)" by Cleo Nardo

https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-postIn this article, I will present a mechanistic explanation of the Waluigi Effect and other bizarre "semiotic" phenomena which arise within large language models such as GPT-3/3.5/4 and their variants (ChatGPT, Sydney, etc). This article will be folklorish to some readers, and profoundly novel to others.
undefined
Mar 6, 2023 • 16min

"Acausal normalcy" by Andrew Critch

https://www.lesswrong.com/posts/3RSq3bfnzuL3sp46J/acausal-normalcyCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.This post is also available on the EA Forum.Summary: Having thought a bunch about acausal trade — and proven some theorems relevant to its feasibility — I believe there do not exist powerful information hazards about it that stand up to clear and circumspect reasoning about the topic.  I say this to be comforting rather than dismissive; if it sounds dismissive, I apologize.  With that said, I have four aims in writing this post:Dispelling myths.  There are some ill-conceived myths about acausal trade that I aim to dispel with this post.  Alternatively, I will argue for something I'll call acausal normalcy as a more dominant decision-relevant consideration than one-on-one acausal trades.  Highlighting normalcy.  I'll provide some arguments that acausal normalcy is more similar to human normalcy than any particular acausal trade is to human trade, such that the topic of acausal normalcy is — conveniently — also less culturally destabilizing than (erroneous) preoccupations with 1:1 acausal trades. Affirming AI safety as a straightforward priority.  I'll argue that for most real-world-prevalent perspectives on AI alignment, safety, and existential safety, acausal considerations are not particularly dominant, except insofar as they push a bit further towards certain broadly agreeable human values applicable in the normal-everyday-human-world, such as nonviolence, cooperation, diversity, honesty, integrity, charity, and mercy.  In particular, I do not think acausal normalcy provides a solution to existential safety, nor does it undermine the importance of existential safety in some surprising way.Affirming normal human kindness.  I also think reflecting on acausal normalcy can lead to increased appreciation for normal notions of human kindness, which could lead us all to treat each other a bit better.  This is something I wholeheartedly endorse.
undefined
Mar 1, 2023 • 33min

"Please don't throw your mind away" by TsviBT

https://www.lesswrong.com/posts/RryyWNmJNnLowbhfC/please-don-t-throw-your-mind-away[Warning: the following dialogue contains an incidental spoiler for "Music in Human Evolution" by Kevin Simler. That post is short, good, and worth reading without spoilers, and this post will still be here if you come back later. It's also possible to get the point of this post by skipping the dialogue and reading the other sections.]Pretty often, talking to someone who's arriving to the existential risk / AGI risk / longtermism cluster, I'll have a conversation like the following:Tsvi: "So, what's been catching your eye about this stuff?"Arrival: "I think I want to work on machine learning, and see if I can contribute to alignment that way."T: "What's something that got your interest in ML?"A: "It seems like people think that deep learning might be on the final ramp up to AGI, so I should probably know how that stuff works, and I think I have a good chance of learning ML at least well enough to maybe contribute to a research project."------This is an experiment with AI narration. What do you think? Tell us by going to t3a.is.------
undefined
10 snips
Feb 15, 2023 • 1h 17min

"Cyborgism" by Nicholas Kees & Janus

https://www.lesswrong.com/posts/bxt7uCiHam4QXrQAA/cyborgismThere is a lot of disagreement and confusion about the feasibility and risks associated with automating alignment research. Some see it as the default path toward building aligned AI, while others expect limited benefit from near term systems, expecting the ability to significantly speed up progress to appear well after misalignment and deception. Furthermore, progress in this area may directly shorten timelines or enable the creation of dual purpose systems which significantly speed up capabilities research. OpenAI recently released their alignment plan. It focuses heavily on outsourcing cognitive work to language models, transitioning us to a regime where humans mostly provide oversight to automated research assistants. While there have been a lot of objections to and concerns about this plan, there hasn’t been a strong alternative approach aiming to automate alignment research which also takes all of the many risks seriously. The intention of this post is not to propose an end-all cure for the tricky problem of accelerating alignment using GPT models. Instead, the purpose is to explicitly put another point on the map of possible strategies, and to add nuance to the overall discussion. 
undefined
39 snips
Feb 14, 2023 • 28min

"Childhoods of exceptional people" by Henrik Karlsson

https://www.lesswrong.com/posts/CYN7swrefEss4e3Qe/childhoods-of-exceptional-peopleThis is a linkpost for https://escapingflatland.substack.com/p/childhoodsLet’s start with one of those insights that are as obvious as they are easy to forget: if you want to master something, you should study the highest achievements of your field. If you want to learn writing, read great writers, etc.But this is not what parents usually do when they think about how to educate their kids. The default for a parent is rather to imitate their peers and outsource the big decisions to bureaucracies. But what would we learn if we studied the highest achievements? Thinking about this question, I wrote down a list of twenty names—von Neumann, Tolstoy, Curie, Pascal, etc—selected on the highly scientific criteria “a random Swedish person can recall their name and think, Sounds like a genius to me”. That list is to me a good first approximation of what an exceptional result in the field of child-rearing looks like. I ordered a few piles of biographies, read, and took notes. Trying to be a little less biased in my sample, I asked myself if I could recall anyone exceptional that did not fit the patterns I saw in the biographies, which I could, and so I ordered a few more biographies.This kept going for an unhealthy amount of time.I sampled writers (Virginia Woolf, Lev Tolstoy), mathematicians (John von Neumann, Blaise Pascal, Alan Turing), philosophers (Bertrand Russell, René Descartes), and composers (Mozart, Bach), trying to get a diverse sample. In this essay, I am going to detail a few of the patterns that have struck me after having skimmed 42 biographies. I will sort the claims so that I start with more universal patterns and end with patterns that are less common.
undefined
Feb 13, 2023 • 5min

"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares

https://www.lesswrong.com/posts/NJYmovr9ZZAyyTBwM/what-i-mean-by-alignment-is-in-large-part-about-makingCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.(Epistemic status: attempting to clear up a misunderstanding about points I have attempted to make in the past. This post is not intended as an argument for those points.)I have long said that the lion's share of the AI alignment problem seems to me to be about pointing powerful cognition at anything at all, rather than figuring out what to point it at.It’s recently come to my attention that some people have misunderstood this point, so I’ll attempt to clarify here.
undefined
Feb 10, 2023 • 1h 13min

"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça

https://www.lesswrong.com/posts/NRrbJJWnaSorrqvtZ/on-not-getting-contaminated-by-the-wrong-obesity-ideasA Chemical Hunger (a), a series by the authors of the blog Slime Mold Time Mold (SMTM), argues that the obesity epidemic is entirely caused (a) by environmental contaminants. In my last post, I investigated SMTM’s main suspect (lithium).[1] This post collects other observations I have made about SMTM’s work, not narrowly related to lithium, but rather focused on the broader thesis of their blog post series.I think that the environmental contamination hypothesis of the obesity epidemic is a priori plausible. After all, we know that chemicals can affect humans, and our exposure to chemicals has plausibly changed a lot over time. However, I found that several of what seem to be SMTM’s strongest arguments in favor of the contamination theory turned out to be dubious, and that nearly all of the interesting things I thought I’d learned from their blog posts turned out to actually be wrong. I’ll explain that in this post.
undefined
Feb 8, 2023 • 34min

"SolidGoldMagikarp (plus, prompt generation)"

https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generationWork done at SERI-MATS, over the past two months, by Jessica Rumbelow and Matthew Watkins.TL;DRAnomalous tokens: a mysterious failure mode for GPT (which reliably insulted Matthew)We have found a set of anomalous tokens which result in a previously undocumented failure mode for GPT-2 and GPT-3 models. (The 'instruct' models “are particularly deranged” in this context, as janus has observed.)Many of these tokens reliably break determinism in the OpenAI GPT-3 playground at temperature 0 (which theoretically shouldn't happen).Prompt generation: a new interpretability method for language models (which reliably finds prompts that result in a target completion). This is good for:eliciting knowledgegenerating adversarial inputsautomating prompt search (e.g. for fine-tuning)In this post, we'll introduce the prototype of a new model-agnostic interpretability method for language models which reliably generates adversarial prompts that result in a target completion. We'll also demonstrate a previously undocumented failure mode for GPT-2 and GPT-3 language models, which results in bizarre completions (in some cases explicitly contrary to the purpose of the model), and present the results of our investigation into this phenomenon. Further detail can be found in a follow-up post.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app