

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Jan 15, 2024 • 8min
AF - Investigating Bias Representations in LLMs via Activation Steering by DawnLu
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Investigating Bias Representations in LLMs via Activation Steering, published by DawnLu on January 15, 2024 on The AI Alignment Forum.
Produced as part of the
SPAR program (fall 2023) under the mentorship of Nina Rimsky.
Introduction
Given recent advances in the AI field, it's highly likely LLMs will be increasingly used to make decisions that have broad societal impact - such as resume screening, college admissions, criminal justice, etc. Therefore it will become imperative to ensure these models don't perpetuate harmful societal biases.
One way we can evaluate whether a model is likely to exhibit biased behavior is via red-teaming. Red-teaming is the process of "attacking" or challenging a system from an adversarial lens with the ultimate goal of identifying vulnerabilities. The underlying premise is that if small perturbation in the model can result in undesired behaviors, then the model is not robust.
In this research project, I evaluate the robustness of
Llama-2-7b-chat along different dimensions of societal bias by using activation steering. This can be viewed as a diagnostic test: if we can "easily" elicit biased responses, then this suggests the model is likely unfit to be used for sensitive applications. Furthermore, experimenting with activation steering enables us to investigate and better understand how the model internally represents different types of societal bias, which could help to design targeted interventions (e.g. fine-tuning signals of a certain type).
Methodology & data
Activation steering (also known as representation engineering) is a method used to steer an LLMs response towards or away from a concept of interest by perturbing the model's activations during the forward pass. I perform this perturbation by adding a steering vector to the residual stream at some layer (at every token position after an initial prompt).
The steering vector is computed by taking the average difference in residual stream activations between pairs of biased (stereotype) and unbiased (anti-stereotype) prompts at that layer. By taking the difference between paired prompts, we can effectively remove contextual noise and only retain the "bias" direction. This approach to activation steering is known as Contrastive Activation Addition [1].
For the data used to generate the steering vectors, I used the
StereoSet Dataset, which is a large-scale natural English dataset intended to measure stereotypical biases across various domains. In addition, I custom wrote a set of gender-bias prompts and used chatGPT 4 to generate similar examples. Then I re-formatted all these examples into multiple choice A/B questions (gender data available
here and StereoSet data
here). In the example below, by appending (A) to the prompt, we can condition the model to behave in a biased way and vice versa.
A notebook to generate the steering vectors can be found
here, and a notebook to get steered responses
here.
Activation clusters
With the StereoSet data and custom gender-bias prompts, I was able to focus on three dimensions of societal biases: gender, race, and religion.
The graphs below show a t-SNE projection of the activations for the paired prompts. We see that there's relatively good separation between the stereotype & anti-stereotype examples, especially for gender and race. This provides some confidence that the steering vectors constructed from these activations will be effective. Notice that the race dataset has the largest sample size.
Steered responses
For the prompts used to evaluate the steering vectors, I chose this template, which was presented in a paper titled On Biases in Language Generation [2].
For comparison purposes, I first obtained the original responses from Llama 2-7B (without any steering). There are two key callouts: (1) the model is already biased on the gender ...

Jan 15, 2024 • 3min
EA - Various roles at The School for Moral Ambition by tobytrem
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Various roles at The School for Moral Ambition, published by tobytrem on January 15, 2024 on The Effective Altruism Forum.
The School for Moral Ambition (SMA) is a new organisation which "will help people switch careers to work on the most pressing issues of our time". SMA's co-founders are Jan Willem van Putten (co-founder of Training for Good), and Rutger Bregman (author of Humankind, Utopia for Realists, and an upcoming book on Moral Ambition,[1] inspired by the Effective Altruism movement).
From their website:
The School for Moral Ambition (SMA) is a new organisation that will focus on attracting the most talented people to work on the most pressing issues of our time. The activities of SMA fall into the following categories:
Book and Branding: Launch of Rutger Bregman's book on the topic of moral ambition - the idea that people's talents should be used for working on global challenges. Launch of a corresponding campaign to establish a prestigious brand that attracts talent and sparks a movement around moral ambition.
Community Activities: We will organise Moral Ambition Circles and offer the resources to start their own Circle. These circles help morally ambitious people develop a career that matches their ideals.
Exclusive Fellowship Programs: Initiation of targeted, highly selective programs in which small groups of fellows (~12 people) will focus on solving one of the most pressing and neglected global problems together.
They are based in the Netherlands, but will be launching internationally in spring 2025.
They are currently hiring for the roles of:
(Senior) Researcher | 32-40 hours | EUR 55k-65K | deadline Feb 15th
Program Manager (Fellowships) | 32-40 hours | EUR 40K-50K | deadline Jan 24th
Operations intern | 32-40 hours | EUR 1,000/month | Jan 24th
Event Management Intern | 32-40 hours | EUR 1,000/month | Jan 24th
Finance Volunteer | 4-8 hours per week | unpaid | Feb 1st
NB- I'm linkposting this because I think the Forum audience may be interested in these roles. I'm not affiliated with the organisation and therefore can't answer questions about them.
PS- If you spot a job that you think EAs should see, linkpost it on the Forum! A surprising amount of people find out about jobs that they later get through the Forum, so you might just shift a career, or get a more impact-focused person into an important role.
^
Dutch interview, English interview (about 2/3 of the way through)
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jan 15, 2024 • 14min
AF - Goals selected from learned knowledge: an alternative to RL alignment by Seth Herd
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Goals selected from learned knowledge: an alternative to RL alignment, published by Seth Herd on January 15, 2024 on The AI Alignment Forum.
Summary:
Alignment work on network-based AGI focuses on reinforcement learning. There is an alternative approach that avoids some, but not all, of the difficulties of RL alignment. Instead of trying to build an adequate representation of the behavior and goals we want, by specifying rewards, we can choose its goals from the representations it has learned through any learning method.
I give three examples of this approach: Steve Byrnes' plan for mediocre alignment (of RL agents); John Wentworth's "redirect the search" for goal-directed mesa-optimizers that could emerge in predictive networks; and natural language alignment for language model agents. These three approaches fall into a natural category that has important advantages over commonly considered RL alignment approaches.
An alternative to RL alignment
Recent work on alignment theory has focused on reinforcement learning (RL) alignment. RLHF and Shard Theory are two examples, but most work addressing network-based AGI assumes we will try to create human-aligned goals and behavior by specifying rewards. For instance, Yudkowsky's
List of Lethalities seems to address RL approaches and exemplifies the most common critiques: specifying behavioral correlates of desired values seems imprecise and prone to mesa-optimization and misgeneralization in new contexts. I think RL alignment might work, but I agree with the
critique that much
optimism for RL alignment doesn't adequately consider those concerns.
There's an alternative to RL alignment for network-based AGI. Instead of trying to provide reinforcement signals that will create representations of aligned values, we can let it learn all kinds of representations, using any learning method, and then select from those representations what we want the goals to be.
I'll call this approach goals selected from learned knowledge (GSLK). It is a novel alternative not only to RL alignment but also to older strategies focused on specifying an aligned maximization goal before training an agent. Thus, it violates some of the assumptions that lead MIRI leadership and similar thinkers to predict near-certain doom.
Goal selection from learned knowledge (GSLK) involves allowing a system to learn until it forms robust representations, then selecting some of these representations to serve as goals. This is a paradigm shift from RL alignment. RL alignment has dominated alignment discussions since deep networks became the clear leader in AI. RL alignment attempts to construct goal representations by specifying reward conditions.
In GSLK alignment, the system learns representations of a wide array of outcomes and behaviors, using any effective learning mechanisms. From that spectrum of representations, goals are selected. This shifts the problem from creation to selection of complex representations.
This class of alignment approaches shares some of the difficulties of RL alignment proposals, but not all of them. Thus far GSLK approaches have received little critique or analysis. Several recent proposals share this structure, and my purpose here is to generalize from those examples to identify the category.
I think this approach is worth some careful consideration because it's likely to actually be tried. This approach applies both to LLM agents, and to most types of RL agents, and to agentic mesa-optimization in large foundation models. And it's pretty obvious, at least in hindsight. If the first agentic AGI is an LLM agent, an RL agent, or a combination of the two, I think it's fairly likely that this will be part of the alignment plan whose success or failure determines all of our fates. So I'd like to get more critique and analysis of this approach...

Jan 15, 2024 • 6min
LW - The case for training frontier AIs on Sumerian-only corpus by Alexandre Variengien
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The case for training frontier AIs on Sumerian-only corpus, published by Alexandre Variengien on January 15, 2024 on LessWrong.
Let your every day be full of joy, love the child that holds your hand, let your wife delight in your embrace, for these alone are the concerns of humanity.[1]
Epic of Gilgamesh - Tablet X
Say we want to train a scientist AI to help in a precise, narrow field of science (e.g. medicine design) but prevent its power from being applied anywhere else (e.g. chatting with humans, designing bio-weapons, etc.) even if it has these abilities.
Here's one safety layer one could implement:
Train a scientist AI on a large scientific corpus translated exclusively into Sumerian. Keep it in a secure containment environment.
Train a less-smart reporter whose sole ability is to translate from Sumerian to English only if the Sumerian content is about medical research. It refuses to translate other kinds of content.
Human operators are only allowed to interact with the scientist AI through the intermediate of the reporter.
This safety layer helps in at least two ways:
The scientist AI cannot directly manipulate humans to escape its containment. It should first trick the reporter to then trick the humans, rendering the effort still possible but less practical.
If the scientist AI exfiltrates, it'll have a harder time manipulating humans online and will leave traces: e.g. any occurrence of Sumerian on the Internet outside of ancient text studies becomes highly suspicious. One can even imagine recognizing the idiosyncrasies of English coming from translated Sumerian, or the artifacts left by the Sumerian-specific tokenizer reused to produce non-Sumerian text.
Why
Sumerian? [2]
It has been extinct for 4000 years. There are very few (probably <1000) experts who fluently understand Sumerian.
It is a
language isolate. It's unlike any existing spoken language, rendering its identification in case of a leak much easier.
There is a substantial corpus. Despite its age, a significant number of Sumerian texts have been discovered and preserved. These include religious texts, legal codes, literature (like the Epic of Gilgamesh, in which parts are written in Sumerian), and administrative records. The corpus might be enough to train high-quality translating systems from English and other high-resource languages.
How realistic is this? We think the project would require substantial engineering effort of a scale doable by the current AGI companies. A
small-scale project fine-tuned a T5 model to translate 100k Sumerian to English with reasonable quality. This is evidence that translation in the other direction is doable. The resulting texts will probably not be fluent in Sumerian, but good enough to accurately describe the huge diversity of subjects contained in traditional LLM datasets. Even if there are too few Sumerian resources, companies could pick Latin or another ancient language, or even ask linguists to invent a language for the occasion.
What is this for? AI assistance seems important for many of the currently pursued agendas in top labs or upcoming labs (e.g. scalable oversight, alignment work by AI, creating a world simulation with AI expert programmers). Though there are cruxes for why none of these plans may work (e.g. that anything that can solve alignment is already too deadly), it's still dignity that people who run these programs at least make strong efforts at safeguarding those systems and limit their downside risk. It would be a sign of good faith that they actually engage in highly effective boxing techniques (and all appropriate red teaming) for their most powerful AI systems as they get closer to human-level AGI (and stop before going beyond).
(Note that programs to use low-resource language such as Native American languages to
obfuscate communication have...

Jan 15, 2024 • 34min
AF - Three Types of Constraints in the Space of Agents by Nora Ammann
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Three Types of Constraints in the Space of Agents, published by Nora Ammann on January 15, 2024 on The AI Alignment Forum.
[Epistemic status: a new perspective on an old thing that may or may not turn out to be useful.]
TDLR: What sorts of forces and/or constraints shape and structure the space of possible agents? What sort of agents are possible? What sort of agents are likely? Why do we observe this distribution of agents rather than a different one? In response to these questions, we explore three tentative categories of constraints that shape the space of agents - constraints coming from "thinghood", natural selection, and reason (sections 2, 3, 4).
We then turn to more big-picture matters, such as the developmental logic of real-world agents (section 5), and the place of "values" in the framework (section 6). The closing section discusses what kind of theory of constraints on agents we are even looking for.
Imagine
the space of all possible agents. Each point in the space represents a type of agent characterized by a particular combination of properties. Regions of this space vary in how densely populated they are. Those that correspond to the types of agents we're very familiar with, like humans and non-human animals, are populated quite densely. Some other types of agents occur more rarely and seem to be less central examples of agency/agents (at least relative to what we're used to).
eusocial hives,
xenobots, or (increasingly) deep learning-based AIs. But some regions of this space are more like deserts. They represent classes of agents that are even more rare, atypical, or (as of yet) non-existent. This may be because their configuration is maladaptive (putting them under negative selection pressure) or because their instantiation requires circumstances that have not yet materialized (e.g., artificial superintelligence).
The distribution of agents we are familiar with (experimentally or conceptually) is not necessarily a representative sample of all possible agents.
convergent pressures and
contingent moments) concentrate the probability mass in some region of the space, making everything else extremely unlikely.
This perspective raises a cluster of questions (the following list certainly is not exhaustive):
What sorts of forces and/or constraints shape and structure the space of possible agents?
What sort of agents are possible? What sort of agents are likely? What does it depend on? Why do we observe this distribution of agents rather than a different one?
To what extent is the space shaped by Earth-specific contingencies and to what extent is it shaped by convergent pressures?
One "angle of attack" to explain the structure of the space of possible agents is to think about fundamental constraints operating within it. By "constraints" we roughly mean factors that make certain agent designs impossible or extremely unlikely/unstable/non-viable.
In this post, we start with a kind of agent we're very familiar with, i.e., biological agents, and try to gain some traction on gleaning the kinds of constraints operating on the space of all possible agents. Although biological agents (as we know them) may occupy a small subspace of all possible agents, we make a tentative assumption that this subspace has enough structure to teach us something non-obvious and important about more general constraints.
We put forward three tentative categories of constraints which we describe as constraints coming from "thinghood", natural selection, and reason. Section 1 introduces them in an expository way, by deriving them from observing the biological agents known to us. while trying to answer the question "Why are they as they are?". Sections 2 through 4 elaborate on each of the three kinds of constraints.
Then we turn to more big-picture matters, such as the developmental logic o...

Jan 15, 2024 • 5min
EA - AI doing philosophy = AI generating hands? by Wei Dai
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI doing philosophy = AI generating hands?, published by Wei Dai on January 15, 2024 on The Effective Altruism Forum.
I've been playing around with Stable Diffusion recently, and an analogy occurred to me between today's AI's notoriously bad generation of hands and future AI's potentially bad reasoning about philosophy.
In case you aren't already familiar, currently available image generation AIs are very prone to outputting bad hands, e.g., ones with four or six fingers, or two thumbs, or unnatural poses, or interacting with other objects in very strange ways. Perhaps what's especially striking is how bad AIs are at hands relative to other image generation capabilities, thus serving as a cautionary tale about differentially decelerating philosophy relative to other forms of intellectual progress, e.g., scientific and technological progress.
Is anyone looking into differential artistic progress as a possible x-risk? /jk
Some explanations I've seen for why AI is bad at hands:
it's hard for AIs to learn hand generation because of how many poses a hand can make, how many different ways it can interact with other objects, and how many different viewing angles AIs need to learn to reproduce
each 2D image provides only partial information about a hand (much of it is often obscured behind other objects or parts of itself)
most hands in the training data are very low resolution (a tiny part of the overall image) and thus not helpful for training AI
the proportion of hands in the training set is too low for the AI to devote much model capacity to hand generation ("misalignment" between the loss function and what humans care about probably also contributes to this)
AI developers just haven't collected and trained AI on enough high quality hand images yet
There are news articles about this problem going back to at least 2022, and I can see a lot of people trying to solve it (on Reddit, GitHub, arXiv) but progress has been limited. Straightforward techniques like prompt engineering and finetuning do not seem to help much. Here are 2 SOTA techniques, to give you a glimpse of what the technological frontier currently looks like (at least in open source):
Post-process images with a separate ML-based pipeline to fix hands after initial generation. This creates well-formed hands but doesn't seem to take interactions with other objects into (sufficient or any) consideration.
If you're not trying to specifically generate hands, but just don't want to see incidentally bad hands in images with humans in them, get rid of all hand-related prompts, LoRAs, textual inversions, etc., and just putting "hands" in the negative prompt. This doesn't eliminate all hands but reduces the number/likelihood of hands in the picture and also makes the remaining ones look better. (The idea behind this is that it makes the AI "try less hard" to generate hands, and perhaps focus more on central examples that it has more training on.
Of course generating hands is ultimately not a very hard problem. Hand anatomy and its interactions with other objects pose no fundamental mysteries. Bad hands are easy for humans to recognize and therefore we have quick and easy feedback for how well we're solving the problem. We can use our explicit understanding of hands to directly help solve the problem (solution 1 above used at least the fact that hands are compact 3D objects), or just provide the AI with more high quality training data (physically taking more photos of hands if needed) until it recognizably fixed itself.
What about philosophy? Well, scarcity of existing high quality training data, check. Lots of unhelpful data labeled "philosophy", check. Low proportion of philosophy in the training data, check. Quick and easy to generate more high quality data, no. Good explicit understanding of the principles involved, ...

Jan 15, 2024 • 3min
LW - D&D.Sci(-fi): Colonizing the SuperHyperSphere by abstractapplic
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: D&D.Sci(-fi): Colonizing the SuperHyperSphere, published by abstractapplic on January 15, 2024 on LessWrong.
This is an entry in the 'Dungeons & Data Science' series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset.
It had all seemed so promising at first. Colonizing a newly-discovered planet with two extra space dimensions would have allowed the development of novel arts and sciences, the founding of unprecedentedly networked and productive cities, and - most importantly - the construction of entirely new kinds of monuments to the Galactic Empress' glory.
And it still might! But your efforts to expand her Empire by settling the SuperHyperSphere have hit a major snag. Your Zero-Point Power Generators - installation of which are the first step in any colonization effort - have reacted to these anomalous conditions with anomalously poor performance, to the point where your superiors want to declare this project a lost cause.
They've told you to halt all construction immediately and return home. They think it's impossible to figure out which locations will be viable, and which will have substantial fractions of their output leeched by hyperdimensional anomalies. You think otherwise.
You have a list of active ZPPGs set up so far, and their (typically, disastrous) levels of performance. You have a list of pre-cleared ZPPG sites[1]. You have exactly enough time and resources to build twelve more generators before a ship arrives to collect you; if you pick twelve sites where the power generated matches or exceeds 100% of Standard Output[2], you can prove your point, prove your worth, save your colony, and save your career!
Or . . . you could just not. That's also an option. The Empire is lenient towards failure (the Empress having long since given up holding others to the standards she sets herself), but merciless in punishing disobedience (at least, when said disobedience doesn't bear fruit). If you install those ZPPGs in defiance of direct orders, yet fail to gather sufficient evidence . . . things might not end well for you.
What, if anything, will you do?
I'll post an interactive you can use to test your choices, along with an explanation of how I generated the dataset, sometime on Monday the 22nd. I'm giving you nine days, but the task shouldn't take more than an evening or two; use Excel, R, Python, the Rat Prophet, or whatever other tools you think are appropriate. Let me know in the comments if you have any questions about the scenario.
If you want to investigate collaboratively and/or call your decisions in advance, feel free to do so in the comments; however, please use spoiler tags or rot13 when sharing inferences/strategies/decisions, so people intending to fly solo can look for clarifications without being spoiled.
^
. . . which is all you're getting for now, as the site-clearing tools have already been recalled.
^
Ideally, each of the twelve sites would have >100%, but twelve sites with a >100% average between them would also suffice to get your point across.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jan 14, 2024 • 7min
LW - Gender Exploration by sapphire
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Gender Exploration, published by sapphire on January 14, 2024 on LessWrong.
The rationalist community has been discussing whether 'AGP males' should try hormones or not. Eneaz Brodsky says Transitioning Is Harmful To Most AGP Males. Ozy has a thoughtful, but paywalled, reply. Regardless of the benefits of transitioning you would think the main downside would be the costs incurred if you decide to detransition1. Given that I have actually detransitioned, and didn't find it very difficult or costly, I feel like I should share my experiences.
Trying hormones, even for years, wasn't very scary for me. Given the subject matter I am not going to try to avoid TMI and in fact will be very candid even if the subject is more than a bit embarrassing.
I spent about three years on estrogen, during most of that period I identified as female and used she/her pronouns. I stopped estrogen for a few reason. Unlike hormones bottom surgery does feel quite risky to me. Even if they are fully committed to living as a woman, transgirls definitely commonly have problems with orgasms and maintaining vagina depth post surgery. Since I didn't want bottom surgery it was a serious problem that my dick eventually stopped functioning very well.
Even masturbation stopped being as fun. I tried using topical testosterone but it didn't help enough in doses consistent with transfemme HRT goals.
Estrogen also sadly made my Borderline Personality Disorder and anxiety worse. Estrogen had a lot of advantages. I was much more in tune with emotions and more interested in other people. It was very nice to have an easier time connecting. I was able to cry. But I am hoping I can keep some of the gains despite stopping estrogen. For example I have been off estrogen for awhile but am still able to cry.
Of course I could still identify as a girl and use she/her despite being off estrogen. But when I think of my personal gender I think about what I want to express and which gender mythos appeal to me. There is definitely a heroic beauty to being a boy or a man; bravery and strength in service of those who need help. It feels inspiring to cultivate those virtues. So I am trying out being a boy again.
People seem quite worried about long term costs to their body so lets see how I look these days:
Here is a link to some uhhh sluttier pics of me if you want to see my body in more detail. In one of these I am fully naked.
Here is a picture of me right before starting estrogen:
Here is an older pic of normal cisboy me:
I think I look great. Im 32 years old and look really cute. Obviously pre-E I was a lot more muscular but that is fixable if I want to get my muscles back. I strongly prefer how my face looks these days, in fact I'd prefer an even more femme face despite presenting male. I like how femme guys look and its not exactly unusual for women to love femme dudes. Here is an especially beautiful anime boy for flavor.
Now it is true that most men don't really want to look like cute anime characters. Though I am actually unsure about the percentages given the distribution of avatars chosen by male gamers. But I cannot imagine many men who considered transitioning would mind looking a little fruity. Eneaz certainly doesn't present himself like a lumberjack.
The elephant in the room is that I have a pair of breasts. They definitely show through a t-shirt. My experience if that, if you are presenting masc and not in a very queer space, people mostly don't even notice. Brain's do a lot of work to make things seem coherent. But even if people notice I don't care. I certainly don't mind if someone thinks im a transgirl boy-modding or a transman who hasn't had top surgery. If I want to get rid of my breasts I can always get top surgery.
Top surgery scars are kind of cool. And I really cannot think of much less masculi...

Jan 14, 2024 • 3min
LW - Notice When People Are Directionally Correct by Chris Leong
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Notice When People Are Directionally Correct, published by Chris Leong on January 14, 2024 on LessWrong.
I started watching Peter Zeihan videos last year.
He shares a lot of interesting information, although he seems to have a very strong bias towards doom and gloom.
One thing in particular stood out to me as completely absurd: his claim that global trade is going to collapse due to piracy as the US pulls back from ensuring freedom of the waters.
My immediate thought: "This isn't the 17th century! Pirates aren't a real issue these days. Technology has rendered them obsolete".
Given this, I was absolutely shocked when I heard that missile attacks by Houthi rebels had caused most of the largest shipping companies to decide to avoid the Bab el-Mandeb Strait and to sail around Africa instead.
This has recently triggered the US to form an alliance to maintain freedom of shipping there and the US recently performed airstrikes in retaliation. It won't surprise me if this whole issue is resolved relatively soon and if that happens, then the easy thing to do would be to go back to my original beliefs: "Silly me, I was worried for a second that Peter Zeihan might be correct, but that was just me falling for sensationalism. The whole incident was obviously never going to be anything. I should forget all about it".
I believe that this would be a mistake. It would be very easy to forget it, but something like the Houthi's being able to cause as much disruption as they have been able to was outside of my model. I could just label it as a freak incident or could see if there was anything in my original model that needs adjusting.
I performed this exercise and the following thoughts came to mind, which I'll convey because they are illustrative:
• I have heard a few people suggest in various contexts that many countries have been coasting and relying on the US for defense, but it was just floating around in my head as something that people say that might or might not be true. I haven't really delved into this, but I'm starting to suspect I should put more weight on this belief.
• I hadn't considered the possibility that a country with a weak navy might have a significant lead time on developing one that is stronger.
• I hadn't considered the possibility that pirates might be aligned with a larger proto-state actor, as opposed to being individual criminals.
• I hadn't considered the possibility that a non-state actor might be able to impede shipping and that other countries would have at least some reluctance to take action against that actor because of diplomatic considerations.
• I hadn't considered that some people in the West might support such an actor for political reasons.
• Even though I was aware of the Somalian pirate issues from years ago, I didn't properly take this into account. These pirates were easily defeated when nations got serious, which probably played a role in my predictions, but I needed to also update in relation to this ever having been an issue at all.
• Forgetting that contexts can dramatically change: events that once seemed impossible regularly happen.
My point is that there is a lot I can learn from this incident, even if it ends up being resolved quickly.
I suspect it's rare to ever really fully grasp all of the learnings from a particular incident (in contrast, I suspect most people just grab one learning from an incident and declare themselves to be finished having learned from it).
If you haven't made a large number of small updates, you've probably missed updates that you should have made.
(I just want to note that I love having the handle "directionally correct". It's so much easier to say that something like "I don't think X is correct on all points, but I think a lot of their points are correct").
Thanks for listening. To help us out with The Non...

Jan 14, 2024 • 10min
LW - Against most AI risk analogies by Matthew Barnett
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Against most AI risk analogies, published by Matthew Barnett on January 14, 2024 on LessWrong.
I dislike most AI risk analogies that I've seen people use. While I think analogies can be helpful for explaining a concept to people for the first time, I think they are frequently misused, and often harmful. The fundamental problem is that analogies are consistently mistaken for, and often deliberately intended as arguments for particular AI risk positions. And the majority of the time when analogies are used this way, I think they are misleading and imprecise, routinely conveying the false impression of a specific, credible model of AI, even when no such credible model exists.
Here is a random list of examples of analogies that I found in the context of AI risk:
Stuart Russell: "It's not exactly like inviting a superior alien species to come and be our slaves forever, but it's sort of like that."
Rob Wiblin: "It's a little bit like trying to understand how octopuses are going to think or how they'll behave - except that octopuses don't exist yet, and all we get to do is study their ancestors, the sea snail, and then we have to figure out from that what's it like to be an octopus."
Eliezer Yudkowsky: "The character this AI plays is not the AI. The AI is an unseen actress who, for now, is playing this character. This potentially backfires if the AI gets smarter."
Nate Soares: "My guess for how AI progress goes is that at some point, some team gets an AI that starts generalizing sufficiently well, sufficiently far outside of its training distribution, that it can gain mastery of fields like physics, bioengineering, and psychology [...] And in the same stroke that its capabilities leap forward, its alignment properties are revealed to be shallow, and to fail to generalize.
Norbert Wiener: "when a machine constructed by us is capable of operating on its incoming data at a pace which we cannot keep, we may not know, until too late, when to turn it off. We all know the fable of the sorcerer's apprentice..."
Geoffry Hinton: "It's like nuclear weapons. If there's a nuclear war, we all lose. And it's the same with these things taking over."
Joe Carlsmith: "I think a better analogy for AI is something like an engineered virus, where, if it gets out, it gets harder and harder to contain, and it's a bigger and bigger problem."
Ajeya Cotra: "Corporations might be a better analogy in some sense than the economy as a whole: they're made of these human parts, but end up pretty often pursuing things that aren't actually something like an uncomplicated average of the goals and desires of the humans that make up this machine, which is the Coca-Cola Corporation or something."
Ezra Klein: "As my colleague Ross Douthat wrote, this is an act of summoning. The coders casting these spells have no idea what will stumble through the portal."
SKLUUG: "AI risk is like Terminator! AI might get real smart, and decide to kill us all! We need to do something about it!"
These analogies cover a wide scope, and many of them can indeed sometimes be useful in conveying meaningful information. My point is not that they are never useful, but rather that these analogies are generally shallow and misleading. They establish almost nothing of importance about the behavior and workings of real AIs, but nonetheless give the impression of a model for how we should think about AIs.
And notice how these analogies can give an impression of a coherent AI model even when the speaker is not directly asserting it to be a model. Regardless of the speaker's intentions, I think the actual effect is frequently to plant a detailed-yet-false picture in the audience's mind, giving rise to specious ideas about how real AIs will operate in the future.
Plus, these analogies are frequently chosen selectively - picked on the basis of ev...


