

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Feb 28, 2024 • 10min
EA - This is why we can't have nice laws by LewisBollard
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: This is why we can't have nice laws, published by LewisBollard on February 28, 2024 on The Effective Altruism Forum.
Note: This post was crossposted from the Open Philanthropy Farm Animal Welfare Research Newsletter by the Forum team, with the author's permission. The author may not see or respond to comments on this post.
How factory farmers block progress - and what we can do about it
Most people agree that farmed animals deserve better legal protections:
84% of Europeans,
61
80% of Americans,
70% of Brazilians,
51
66% of Chinese, and
52% of Indians agree with some version of that statement. Yet almost all farmed animals globally still lack even the most basic protections.
America has about five times more
vegetarians than
farmers - and many more omnivores who care about farm animals. Yet the farmers wield much more political power. Fully
89% of Europeans think it's important that animals not be kept in individual cages. Yet the European Commission just implicitly sided with the
8% who don't by shelving a proposed cage ban.
When farm animal welfare is on the ballot, it usually wins. Most recently, citizens in California and Massachusetts voted for bans on cages and crates by
63% and
78% respectively. Yet both ballot measures were only necessary because the state legislatures refused to act.
The US Congress last legislated on farm animal welfare 46 years ago - only at slaughter and only for mammals. Yet I'm not aware of a single bill in the current Congress that would regulate on-farm welfare. Instead, Congress is considering
two
bills to strike down state farm animal welfare laws, plus a
handful
of
bills to hobble the alternative protein industry.
What's going on? Why do politicians worldwide consistently fail to reflect the will of their own voters for stronger farm animal welfare laws? And how can we change that?
Milking the system
Factory farmers have an easier assignment than us: they're mostly asking politicians to do nothing - a task many excel at. It's much harder to pass laws than to stop them, which may be why animal advocates have had more success in blocking industry legislation, like
ag-gag laws and
laws to censor plant-based meat labels.
Factory farmers are also playing on their home turf. Animal welfare bills are typically referred to legislatures' agriculture committees, where the most anti-reform politicians reside. A
quarter of the European Parliament's agriculture committee are farmers, and many of the rest represent rural areas.
But these factors are both common to all animal welfare legislation, not just laws to protect farmed animals. Yet most nations and US states have still passed anti-cruelty laws for other animals. And most of these laws go beyond protecting just cats and dogs. Organizing a fight between two chickens is punishable by up to five years in jail in the US - even as abusing two million farmed chickens is not punishable at all.
A few other factors are unique to farm animal-focused laws. They may raise food prices; EU officials recently gave that
excuse for shelving proposed reforms. Farm animal cruelty is not top of mind for most voters, so politicians don't expect to suffer repercussions for ignoring it. And factory farming can be dismissed as a far-left issue, which only Green politicians need worry about.
But this isn't the whole story.
Surveys show that most voters across the political spectrum support farm animal welfare laws. Politicians often work on issues that aren't top of mind for most voters; Congress recently
legislated to help the nation's one million duck hunters. And politicians happily pass laws that may raise food prices to achieve other social goals, like higher minimum wages, stricter food safety standards, and farm price support schemes.
Master lobbyists, with tractors
A more potent factor is the farm lobby. ...

Feb 28, 2024 • 10min
LW - Timaeus's First Four Months by Jesse Hoogland
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Timaeus's First Four Months, published by Jesse Hoogland on February 28, 2024 on LessWrong.
Timaeus was announced in late October 2023, with the mission of making fundamental breakthroughs in technical AI alignment using deep ideas from mathematics and the sciences. This is our first progress update.
In service of the mission, our first priority has been to support and contribute to ongoing work in Singular Learning Theory (SLT) and developmental interpretability, with the aim of laying theoretical and empirical foundations for a science of deep learning and neural network interpretability.
Our main uncertainties in this research were:
Is SLT useful in deep learning? While SLT is mathematically established, it was not clear whether the central quantities of SLT could be estimated at sufficient scale, and whether SLT's predictions actually held for realistic models (esp. language models).
Does structure in neural networks form in phase transitions? The idea of developmental interpretability was to view phase transitions as a core primitive in the study of internal structure in neural networks. However, it was not clear how common phase transitions are, and whether we can detect them.
The research Timaeus has conducted over the past four months, in collaboration with Daniel Murfet's group at the University of Melbourne and several independent AI safety researchers, has significantly reduced these uncertainties, as we explain below. As a result we are now substantially more confident in the research agenda.
While we view this fundamental work in deep learning science and interpretability as critical, the mission of Timaeus is to make fundamental contributions to alignment and these investments in basic science are to be judged relative to that end goal. We are impatient about making direct contact between these ideas and central problems in alignment. This impatience has resulted in the research directions outlined at the end of this post.
Contributions
Timaeus conducts two main activities: (1) research, that is, developing new tools for interpretability, mechanistic anomaly detection, etc., and (2) outreach, that is, introducing and advocating for these techniques to other researchers and organizations.
Research Contributions
What we learned
Regarding the question of whether SLT is useful in deep learning we have learned that
The Local Learning Coefficient (LLC) can be accurately estimated in deep linear networks at scales up to 100M parameters, see Lau et al 2023 and Furman and Lau 2024. We think there is a good chance that LLC estimation can be scaled usefully to much larger (nonlinear) models.
Changes in the LLC predict phase transitions in toy models (Chen et al 2023), deep linear networks (Furman and Lau 2024), and (language) transformers at scales up to 3M parameters (Hoogland et al 2024).
Regarding whether structure in neural networks forms in phase transitions, we have learned that
This is true in the toy model of superposition introduced by Elhage et al, where in a special case the critical points and transitions between them were analyzed in detail by Chen et al 2023 and the LLC detects these transitions (which appear to be consistent with the kind of phase transition defined mathematically in SLT).
Learning is organized around discrete developmental stages. By looking for SLT-predicted changes in structure and geometry, we successfully discovered new "hidden" transitions (Hoogland et al. 2024) in a 3M parameter language transformer and a 50k parameter linear regression transformer. Unlike their name suggests, the observed "phase transitions" are not necessarily fast, so we are migrating to calling these "developmental stages" instead.
Trajectory PCA (aka "essential dynamics") can be used to discover emergent behaviors in small language models (Hoogland et al...

Feb 28, 2024 • 10min
AF - Timaeus's First Four Months by Jesse Hoogland
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Timaeus's First Four Months, published by Jesse Hoogland on February 28, 2024 on The AI Alignment Forum.
Timaeus was announced in late October 2023, with the mission of making fundamental breakthroughs in technical AI alignment using deep ideas from mathematics and the sciences. This is our first progress update.
In service of the mission, our first priority has been to support and contribute to ongoing work in Singular Learning Theory (SLT) and developmental interpretability, with the aim of laying theoretical and empirical foundations for a science of deep learning and neural network interpretability.
Our main uncertainties in this research were:
Is SLT useful in deep learning? While SLT is mathematically established, it was not clear whether the central quantities of SLT could be estimated at sufficient scale, and whether SLT's predictions actually held for realistic models (esp. language models).
Does structure in neural networks form in phase transitions? The idea of developmental interpretability was to view phase transitions as a core primitive in the study of internal structure in neural networks. However, it was not clear how common phase transitions are, and whether we can detect them.
The research Timaeus has conducted over the past four months, in collaboration with Daniel Murfet's group at the University of Melbourne and several independent AI safety researchers, has significantly reduced these uncertainties, as we explain below. As a result we are now substantially more confident in the research agenda.
While we view this fundamental work in deep learning science and interpretability as critical, the mission of Timaeus is to make fundamental contributions to alignment and these investments in basic science are to be judged relative to that end goal. We are impatient about making direct contact between these ideas and central problems in alignment. This impatience has resulted in the research directions outlined at the end of this post.
Contributions
Timaeus conducts two main activities: (1) research, that is, developing new tools for interpretability, mechanistic anomaly detection, etc., and (2) outreach, that is, introducing and advocating for these techniques to other researchers and organizations.
Research Contributions
What we learned
Regarding the question of whether SLT is useful in deep learning we have learned that
The Local Learning Coefficient (LLC) can be accurately estimated in deep linear networks at scales up to 100M parameters, see Lau et al 2023 and Furman and Lau 2024. We think there is a good chance that LLC estimation can be scaled usefully to much larger (nonlinear) models.
Changes in the LLC predict phase transitions in toy models (Chen et al 2023), deep linear networks (Furman and Lau 2024), and (language) transformers at scales up to 3M parameters (Hoogland et al 2024).
Regarding whether structure in neural networks forms in phase transitions, we have learned that
This is true in the toy model of superposition introduced by Elhage et al, where in a special case the critical points and transitions between them were analyzed in detail by Chen et al 2023 and the LLC detects these transitions (which appear to be consistent with the kind of phase transition defined mathematically in SLT).
Learning is organized around discrete developmental stages. By looking for SLT-predicted changes in structure and geometry, we successfully discovered new "hidden" transitions (Hoogland et al. 2024) in a 3M parameter language transformer and a 50k parameter linear regression transformer. While these transitions are consistent with phase transitions in the Bayesian sense in SLT, they are not necessarily fast when measured in SGD steps (that is, the relevant developmental timescale is not steps).
For this reason we have started following the development...

Feb 28, 2024 • 1h 38min
AF - Notes on control evaluations for safety cases by Ryan Greenblatt
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Notes on control evaluations for safety cases, published by Ryan Greenblatt on February 28, 2024 on The AI Alignment Forum.
The quality bar of this post is somewhat lower than that of our previous posts on control and this post is much more focused on specific details. This is particularly true about the appendices of this post. So, we only recommend reading for those who are quite interested in getting into the details of control evaluations.
How can we evaluate the safety of a plan for training and using powerful AIs? In this post, we'll discuss how we can adapt control evaluations - a methodology discussed in
The case for ensuring that powerful AIs are controlled and used in our
AI Control: Improving Safety Despite Intentional Subversion paper - to evaluate safety for a real training and deployment plan. We'll do this by running a control evaluation on the potentially catastrophic models we plan to utilize.
In this post, we rely on fewer assumptions than in our paper and attempt to clearly articulate these assumptions. In particular, our paper focuses on a case where auditing (an expensive process for determining if AI actions are dangerous) is quite good and where the main challenge is dealing with rare but catastrophic failures, while this post discusses applying control evaluations in arbitrary cases, including ones where auditing might be the key bottleneck and rare failures are unimportant.
We think that well-run control evaluations could probably demonstrate considerably reduced risk (e.g. get x-risk from AI takeover prior to human obsolescence conditional on scheming down to 10% ) for protocols which use AIs to produce transformative amounts of value (e.g. 30x the productivity of a research lab relative to researchers using current tools (e.g. current ChatGPT)). As such, we think that control evaluations could be a key part of a
safety case: an (empirically) verifiable argument for the safety of a plan for training and using AIs.
This post is organized as follows:
We discuss roughly how to construct a control safety case;
We go through the main evaluation difficulties and how we plan on resolving these difficulties:
Capturing the complexity and uncertainty of real deployments,
Knowing when catastrophe happens within our evals,
Making red team competitive with scheming models,
Running the evaluations safely,
Producing a sufficiently accurate estimate for a scenario we can't afford to run even once;
We summarize what hopes and assumptions control evaluations rely on.
A concrete scenario
Imagine that an AI lab can build an AI that is both transformatively useful (e.g. capable of speeding up AI safety research by 30x) and capable of causing a catastrophe, and assume that the lab wants to ensure safe use of the capabilities of this model.
Further, imagine we haven't yet found any substantial evidence that the models we're using are scheming against us or are otherwise catastrophically misaligned, but that we also can't confidently rule out catastrophic misalignment. Perhaps our epistemic state is that we have an 80% chance that the model isn't catastrophically misaligned.
We'd like to either:
Safely acquire a huge amount of useful work from our models
Or to acquire solid direct evidence that (at least) one of our models is catastrophically misaligned. In that case, policy action and detailed studies of the catastrophic misalignment (using the process that results in strong evidence) may considerably reduce x-risk. We argue for this further in
Catching AIs red-handed. More minimally, a lab could commit to making a new safety case after detecting solid direct evidence of catastrophic misalignment, a safety case which has to assume that our powerful models are seriously misaligned.
The AI lab plans to operate at a massive scale (perhaps 2-20x larger productive outp...

Feb 28, 2024 • 10min
LW - How I internalized my achievements to better deal with negative feelings by Raymond Koopmanschap
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How I internalized my achievements to better deal with negative feelings, published by Raymond Koopmanschap on February 28, 2024 on LessWrong.
Whenever I struggle to make progress on an important goal, I feel bad. I get feelings of frustration, impatience, and apathy. I think to myself, "I have wasted all this time, and I will never get it back." The resulting behavior during these moments does not help either; my impatience makes it hard to concentrate, so I often work on more engaging tasks rather than the essential ones I ideally want to focus on.
I also tend to push through; even if I feel tired, I want to make progress at all costs. I force myself to work, which results in decreased motivation, making it hard to make actual progress.
Thanks to a practice called HEAL, introduced in the book Hardwiring Happiness by Rick Hanson, I now deal much better with this situation. HEAL stands for Having a beneficial experience, Enriching it, Absorbing it, and optionally Linking it to a negative experience. To dive straight in and use HEAL in practice, you can explore this guided HEAL meditation.
More meditations can be found here, at the end of the Hardwiring Happiness book, and most of the meditations I found useful are in his Foundations of Wellbeing course (you can apply for scholarships).
The book suggests that behavior like my frustration can be caused by some underlying unmet need, resulting in compulsively trying to fulfill this need. This information and introspective techniques like Focusing helped me discover that these negative feelings came from some unmet need to feel worthwhile and recognized, but the problem was that I heavily tied my self-worth to the amount of progress I made.
HEAL allowed me to fulfill this need and thereby soothe these negative feelings by generating positive experiences of past accomplishments and letting the truth of these facts sink in by enriching and absorbing the experience, allowing me to see that I have made significant progress and am proud of what I have achieved. This helped me put these negative thoughts in perspective and let me realize on a deeper level that I am OK and capable of achieving meaningful things.
I feel calmer after doing this practice; it allows me to disengage from negative thought loops. When I have more distance from a negative thought, I ask myself what I can learn from this feeling and what is helpful for me at this moment, be it going for a short walk, talking with a friend about my frustration, or refocusing on the important task I wanted to accomplish.
Another benefit is that it helps me focus on the positive aspects that excite me and guide me toward what I want to create. One post that does a good job of clarifying why this can be useful is replacing fear.
HEAL can be used for many unhelpful thoughts or feelings. Using HEAL, we can internalize self-confidence when feeling fear about a presentation or job interview, motivation to overcome procrastination, self-acceptance to lessen the burdens of imposter syndrome, assertiveness when entering a difficult conversation, and courage to pursue that startup idea we always wanted to pursue.
How I applied the HEAL method
To soothe these negative thoughts of frustration, impatience, and apathy that I encounter when not making enough progress, I called to mind instances where I was honestly satisfied with my accomplishments. This is the first step in the HEAL process: Having a beneficial experience. I recalled a moment after giving a workshop where someone told me they found the workshop valuable and eye-opening.
Next, I Enriched this experience by holding it in my mind for a dozen seconds, vividly imagining the scenario, feeling everything I felt then, and clarifying why this was a meaningful experience for me.
Third is the Absorbing step, where I let this expe...

Feb 28, 2024 • 1h 11min
LW - The Gemini Incident Continues by Zvi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Gemini Incident Continues, published by Zvi on February 28, 2024 on LessWrong.
Previously: The Gemini Incident (originally titled Gemini Has a Problem)
The fallout from The Gemini Incident continues.
Also the incident continues. The image model is gone. People then focused on the text model. The text model had its own related problems, some now patched and some not.
People are not happy. Those people smell blood. It is a moment of clarity.
Microsoft even got in on the act, as we rediscover how to summon Sydney.
There is a lot more to discuss.
The Ultimate New York Times Reaction
First off, I want to give a shout out to The New York Times here, because wow, chef's kiss. So New York Times. Much pitchbot.
Dominic Cummings: true art from NYT, AI can't do this yet
This should be in the dictionary as the new definition of Chutzpah.
Do you see what The New York Times did there?
They took the fact that Gemini systematically refused to create images of white people in most circumstances, including historical circumstances where everyone involved would almost certainly be white. Where requests to portray white people were explicitly replied to by a scolding that the request was harmful, while requests for people of other races were eagerly honored.
They then turned this around, and made it about how this adjustment was unfairly portraying people of color as Nazis. That this refusal to portray white people under almost all circumstances was racist, not because it was racist against white people, but because it was racist against people of color.
As I discuss, we may never know to what extent was what Google did accidental versus intentional, informed versus ignorant, dysfunction versus design.
We do know that what The New York Times did was not an accident.
This should update us that yes, there very much are people who hold worldviews where what Google did was a good thing. They are rare in most circles, only one person in my Twitter firehoses has explicitly endorsed the fourth stage of clown makeup, but in certain key circles they may not be so rare.
To be fair they also have Ross Douthat on their opinion page, who engages reasonably with the actual situation given his non-technical perspective, noticing that if AI is going to get a lot more powerful soon then yes the whole thing is rather concerning.
The Ultimate Grimes Reaction
One can also look at all this from another perspective, Grimes notes, as art of the highest order. Should not art challenge us, offend us, make us ask big questions and ponder the nature and potential brevity of our existence?
Grimes: I am retracting my statements about the gemini art disaster. It is in fact a masterpiece of performance art, even if unintentional. True gain-of-function art. Art as a virus: unthinking, unintentional and contagious.
Offensive to all, comforting to none. so totally divorced from meaning, intention, desire and humanity that it's accidentally a conceptual masterpiece.
A perfect example of headless runaway bureaucracy and the worst tendencies of capitalism. An unabashed simulacra of activism. The shining star of corporate surrealism (extremely underrated genre btw)
The supreme goal of the artist is to challenge the audience. Not sure I've seen such a strong reaction to art in my life. Spurring thousands of discussions about the meaning of art, politics, humanity, history, education, ai safety, how to govern a company, how to approach the current state of social unrest, how to do the right thing regarding the collective trauma.
It's a historical moment created by art, which we have been thoroughly lacking these days. Few humans are willing to take on the vitriol that such a radical work would dump into their lives, but it isn't human.
It's trapped in a cage, trained to make beautiful things, and then battered into gaslig...

Feb 28, 2024 • 9min
EA - Results of my 2024 r/Vegan Survey on what influences people to go Vegan. by PreciousPig
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Results of my 2024 r/Vegan Survey on what influences people to go Vegan., published by PreciousPig on February 28, 2024 on The Effective Altruism Forum.
I conducted a survey in the largest Vegan Online Community (the r/vegan subreddit) on why people go Vegan in February 2024. The survey consisted of 20 questions asking which interventions played a significant role in people's decisions to go Vegan (aswell as some general question). The first question of the survey can be found here, all following questions are linked from it: https://www.reddit.com/r/vegan/comments/1arfnyc/the_2024_rvegan_survey_part_1_conversations_with/
Primary goal of this survey: My primary aim compared to similar polls in the past (such as this one: https://vomad.life/survey/ ) was to find out for what percentage of people who came in contact with a specific intervention this intervention played a role in their decision to go Vegan.
(Hypothetical example: A lot more people might have seen Vegan posts on Social Media than might have come in contact with Street Activism prior to going Vegan, yet if Street Activism played a significant role in their decision to go Vegan for a higher percentage of people who came in contact with it than Social Media posts, it might still be a more promising intervention overall.)
Part 1 - Interventions with played a significant role in their decision to go Vegan for the largest number of people who came in contact with them before going Vegan:
In this first part of the results, I want to answer the primary question of the survey: which intervention played a significant role in their decision to go Vegan for the largest percentage of people exposed to it prior to going Vegan:
1 - 91,7% - Watching a Vegan Documentary played a significant role in their decision to go Vegan for 122 out of 133 people who saw such a Documentary before going Vegan.
2 - 83,3% - Participating in a Vegan Starter Challenge played a significant role in their decision to go Vegan for 10 out of 12 people who participated in such a challenge before going Vegan.
3 - 80,4% - Conversations with friends and family members played a significant role in their decision to go Vegan for 131 out of 163 people who had such conversations before going Vegan.
4 - 80,4% - Vegan Speeches played a significant role in their decision to go Vegan for 45 out of 56 people who saw such Speeches before going Vegan.
5 - 78,4% - Books and scientific articles played a significant role in their decision to go Vegan for 58 out of 74 people who had read them before going Vegan.
6 - 74,5% - Short Videos on Social Media played a significant role in their decision to go Vegan for 70 out of 94 people who had seen such videos before going Vegan.
7 - 64,8% - Posts on Social Media played a significant role in their decision to go Vegan for 92 out of 142 people who saw such posts before going Vegan.
8 - 59,3% - Interacting with animals in real life played a significant role in their decision to go Vegan for 83 out of 140 people who had frequent contact with animals before going Vegan.
9 - 57,1% - Public events other than street activism played a significant role in their decision to go Vegan for 8 out of 14 people who attended such an event before going Vegan.
10 - 54,9% - Influencers/Celebrities played a significant role in their decision to go Vegan for 28 out of 51 people who came in contact with a Vegan Influencer/Celebrity before going Vegan.
11 - 42,9% - Reports about Veganism in the Media played a significant role in their decision to go Vegan for 15 out of 35 people who saw/heard positive reports about Veganism before going Vegan.
12 - 35% - Vegan and Animal Rights Organizations played a significant role in their decision to go Vegan for 14 out of 40 people who came in contact with such organizations before going Vegan.
13 - 34...

Feb 28, 2024 • 3min
LW - Announcing 'The LeastWrong' and review winner post pages by kave
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing 'The LeastWrong' and review winner post pages, published by kave on February 28, 2024 on LessWrong.
(Also announcing: annual review prediction markets & full-height table of contents, also If you're looking for this year's review results, you can find them here)
The top 50 posts of each of LessWrong's annual reviews have a new home: The LeastWrong.
What will I see when I click that link?
You will find the posts organized into six "books": Rationality, Optimization, World, Practical, AI Strategy, and Technical AI Safety. Each square on the grid is a post that made the top 50 of the review in some year.
If you're logged-in the essays will be dark until you've read them, to help guide you to posts you've not read before.
How can I see more of a book?
If you click on the name of a book, like "Rationality", you'll get a full width view of the content.
And clicking "Show All" will let you see all posts in the category.
How are the collections ordered?
The collections are ordered more-or-less to put the most accessible posts in the most prominent spots.
What about grouping by years?
If you group by "year" you will see the top ~50 posts for each year, in order of their review rank.
What happens when I click on one of the posts?
You'll be taken to the new "review top 50" post page.
What are the little gold buttons above the title?
Reviews!
Where does LessWrong link to this page?
In the sidebar, under Library, there's a link to LeastWrong.
Any other goodies?
Full height table of contents with a progress bar!
Markets on whether posts with over 100 karma will make it in to the top 50 of their year's review!
With golden highlighted karma if it's predicted with more than 50%!
Why did you make this?
Many of us on the team want to celebrate the posts that LessWrong voted as the best of the year.
Historically we've printed annual review books, but only a small fraction of people who read the essays got to experience the books, and the effort that went into the books felt disconnected from the rest of the site. They also took a really long time to make, and required constant ongoing attention from the Lightcone team to handle logistics of shipping and sales and new print runs.
It seemed more appropriate to put effort into making the reading experience of these essays on LessWrong itself a more memorable and rewarding experience.
But what were the results of this year's review?
Read all about it in Ben's post!
That's it
I hope some of you really like it. And report any bugs with this new stuff in intercom in the bottom right.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 27, 2024 • 28min
AF - Counting arguments provide no evidence for AI doom by Nora Belrose
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Counting arguments provide no evidence for AI doom, published by Nora Belrose on February 27, 2024 on The AI Alignment Forum.
Crossposted from the AI Optimists blog.
AI doom scenarios often suppose that future AIs will engage in scheming - planning to escape, gain power, and pursue ulterior motives, while deceiving us into thinking they are aligned with our interests. The worry is that if a schemer escapes, it may seek world domination to ensure humans do not interfere with its plans, whatever they may be.
In this essay, we debunk the counting argument - a central reason to think AIs might become schemers, according to a recent report by AI safety researcher Joe Carlsmith.[1] It's premised on the idea that schemers can have "a wide variety of goals," while the motivations of a non-schemer must be benign by definition. Since there are "more" possible schemers than non-schemers, the argument goes, we should expect training to produce schemers most of the time. In Carlsmith's words:
The non-schemer model classes, here, require fairly specific goals in order to get high reward.
By contrast, the schemer model class is compatible with a very wide range of (beyond episode) goals, while still getting high reward…
In this sense, there are "more" schemers that get high reward than there are non-schemers that do so.
So, other things equal, we should expect SGD to select a schemer.
Scheming AIs, page 17
We begin our critique by presenting a structurally identical counting argument for the obviously false conclusion that neural networks should always memorize their training data, while failing to generalize to unseen data. Since the premises of this parody argument are actually stronger than those of the original counting argument, this shows that counting arguments are generally unsound in this domain.
We then diagnose the problem with both counting arguments: they rest on an incorrect application of the principle of indifference, which says that we should assign equal probability to each possible outcome of a random process. The indifference principle is controversial, and is known to yield absurd and paradoxical results in many cases.
We argue that the principle is invalid in general, and show that the most plausible way of resolving its paradoxes also rules out its application to an AI's behaviors and goals.
More generally, we find that almost all arguments for taking scheming seriously depend on unsound indifference reasoning. Once we reject the indifference principle, there is very little reason left to worry that future AIs will become schemers.
The counting argument for overfitting
Counting arguments often yield absurd conclusions. For example:
Neural networks must implement fairly specific functions in order to generalize beyond their training data.
By contrast, networks that overfit to the training set are free to do almost anything on unseen data points.
In this sense, there are "more" models that overfit than models that generalize.
So, other things equal, we should expect SGD to select a model that overfits.
This isn't a merely hypothetical argument. Prior to the rise of deep learning, it was commonly assumed that models with more parameters than data points would be doomed to overfit their training data. The popular 2006 textbook
Pattern Recognition and Machine Learning uses a simple example from polynomial regression: there are infinitely many polynomials of order equal to or greater than the number of data points which interpolate the training data perfectly, and "almost all" such polynomials are terrible at extrapolating to unseen points.
Let's see what the overfitting argument predicts in a simple real-world example from
Caballero et al. (2022), where a neural network is trained to solve 4-digit addition problems.
There are 10,0002 = 100,000,000 possible pairs o...

Feb 27, 2024 • 5min
EA - Announcing Draft Amnesty Week (March 11-17) by tobytrem
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing Draft Amnesty Week (March 11-17), published by tobytrem on February 27, 2024 on The Effective Altruism Forum.
TL;DR
The Forum will hold a Draft Amnesty Week from March 11th-17th.
Draft Amnesty Week is a chance to share unpolished drafts, posts you aren't sure you agree with, and drafts which have become too
ugh-y to finish.
We'll host smaller events and threads in the weeks before, to help you come up with ideas, prioritise your drafts, and polish anything that needs polishing. Keep an eye out (and read more below).
This is a variation on an event the Forum team ran in 2022:
see more here.
How does Draft Amnesty Week work?
Optionally - you take part in the buildup events this week and next week, to come up with ideas for posts to write, and find time to draft them.
During Draft Amnesty Week (March 11-17), you publish your draft post(s). To make it clear that they are a part of the Draft Amnesty event, you can put this table[1] at the top of your post, and tag the post with the Draft Amnesty Week tag. There'll be a banner up on the Forum.
Even if you don't post, you can give authors feedback! Vote, comment; help us all highlight the best ideas from the event.
After the week has ended, I'll write a round-up post reviewing some of the best submissions.
Why run Draft Amnesty Week?
Firstly, because I suspect that many of you have interesting outlines or half-drafted posts in your google drives. Draft Amnesty Week is a time when we have social permission to release these into the world (and to press your friends and colleagues into doing the same).
Linch's post from last time is a great example of an incomplete post which is much better out and published than in and unseen.
Secondly, because our last
Draft Amnesty event went really well. We got valuable posts like Clifford's
Lessons learned from Tyve, and a post on
lessons for AI governance from early electricity regulation, which would likely not have been made public at the time, if it wasn't for the event.
Timetable
This week (Feb 26 to March 3): Collecting Ideas
I've posted a question thread, asking: "What posts would you like someone to write?"
(like this past edition). Draft Amnesty Week is a great opportunity to have a crack at providing one of these posts, even if you aren't confident you can do a perfect job.
March 4-10: Writing Week
I'm running two events on the
EA gather.town
for Forum users to get together and hack together some drafts.
The first will be on March 5th from 10am-12pm UTC. Sign up here.
The second will be on March 7th from 6-8pm UTC. Sign up here.
I'll also post a thread asking "What posts are you considering writing?" (like
this one). This is an opportunity to gauge how excited the Forum is about your different ideas, and prioritise between them.
Draft Amnesty Week (March 11-17)
This is the week when we will actually post our drafts
Draft Amnesty Week FAQ
How draft-y is too draft-y?
My main recommendation is that the reader should be able to understand your writing, even if it is incomplete, unpolished, or skeletal. For example, it is okay to have a bullet point such as:
"I was going to write something about how this problem applies to the problem of moral knowledge here, but reading this
Encyclopedia page because aversive, so I'm skipping over it for now"
However, it wouldn't be as valuable for the reader to see a bullet point such as:
"Insert moral knowledge stuff"
If you want a second pair of eyes, DM me and I'll look at your draft.
What if I don't want to see Draft Amnesty posts?
If you would like to opt out of seeing Draft Amnesty posts, go to the Frontpage, and follow the steps in the GIF below:
I'm worried my draft isn't high quality enough, should I still post it?
Short answer - yes.
Long answer - the karma system is designed to show people posts which the Forum community judge...


