

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Nov 22, 2023 • 19min
EA - Impactful Animal Advocacy: Building Community Infrastructure by Impactful Animal Advocacy
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Impactful Animal Advocacy: Building Community Infrastructure, published by Impactful Animal Advocacy on November 22, 2023 on The Effective Altruism Forum.
Tl;dr: Impactful Animal Advocacy (IAA) has grown from a side project in 2022 to a moderately well-recognized online hub for farmed animal advocates with over 2000 community members today. Our aim is to build infrastructure for a better connected animal advocacy movement. We are still a young organization and currently aim to focus on our core programs which include our Slack community, newsletter, and strategic connections.
We are currently operating with 2 full-time and 1 part-time employee (1 is on a break) with a monthly expense rate of 5.5k USD. This is thanks to 2 team members agreeing to work on a volunteer basis. With this state of operations, we have a runway of 5 months secured.
Additional marginal funding would be first used to ensure stable operations in 2024 and then to scale to include other promising programs such as an animal advocacy forum, resource hub, and collaboration with individuals and organizations in neglected regions.
Background
When we first launched 18 months ago, we started with the Impactful Animal Advocacy newsletter. It was meant to be a helpful side project for friends and colleagues who wanted aggregated updates on the farmed animal advocacy movement. This quickly grew into other initiatives including a vibrant Slack community and active work in strategically connecting advocates. After the first year, we received some initial seed funding.
This allowed for hiring the project founders for 1 day a week and this later progressed to one of the founders going full-time and the other becoming an advisor. Currently, we believe we have gotten through most of the challenges of early-stage organizational setup including creating and refining SOPs, OKRs, MEL metrics, project/task management systems, and obtaining fiscal sponsorship.
Our current goals
To create exceptionally good online spaces for professional farmed animal advocates to collaborate, exchange information, and be part of a community
To share high quality resources and information relevant to the work of professional farmed animal advocates
To serve an online coordination function for the farmed animal advocacy movement
Our primary audience consists of engaged animal advocates who are interested in collaboration, including non-profit employees, project founders, and academics. We also have independent activists, funders, volunteers and established leaders on our platforms.
We've seen rapid growth, with our Slack community reaching over 1,500 members and nearly 60 specialized channels in less than a year. Our newsletter is also gaining traction, with over 1,100 subscribers. Both platforms are community-driven, featuring content and insights shared by our members. Here are testimonials for our Slack and newsletter.
Community stories
Since helping community members is at the center of what we do, our value is best demonstrated in their stories:
Rosanna Zimdahl
Rosanna Zimdahl, a Masters student from Sweden wasn't connected to anyone in the movement. She is finishing a degree in MSc Engineering Energy & Environment, and besides that she likes to study systems thinking. She always wanted to contribute to the movement and during a systems mapping course she thought it would be great to apply it to an animal advocacy matter.
The Slack space made it possible! She was inspired by the enthusiasm and took the opportunity to start a System Mapping course for farmed animal advocacy. Four other advocates she met on the space joined her to work on the project.
If not for IAA, Rosanna wouldn't have found collaborators to do this which they learnt a lot from, and she will continue applying systems thinking methods to enhance the movement.
Apoo...

Nov 21, 2023 • 13min
EA - A review of GiveWell's discount rate by Rethink Priorities
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A review of GiveWell's discount rate, published by Rethink Priorities on November 21, 2023 on The Effective Altruism Forum.
Editorial note
This report was commissioned by GiveWell and produced by Rethink Priorities from June to July 2023. We revised this report for publication. GiveWell does not necessarily endorse our conclusions, nor do the organizations represented by those who were interviewed.
The primary focus of the report is to review GiveWell's current formulation of its discount rate by recommending improvements and reinforcing justifications for areas that do not require improvement. Our research involved reviewing the scientific and gray literature, and we spoke with 15 experts and stakeholders.
We don't intend this report to be Rethink Priorities' final word on discount rates, and we have tried to flag major sources of uncertainty in the report. We hope this report galvanizes a productive conversation within the global health and development community about discounting practices in cost-effectiveness analyses. We are open to revising our views as more information is uncovered.
Executive summary
Notes on the scope and process of this project
This project aims to serve the dual purposes of reviewing GiveWell's current approach to calculating its discount rate(s) to:
Provide recommendations to GiveWell on how its approach to discount rates could be improved.
Strengthen the justifications for its approach in cases where we do not recommend changes.
The direction of this project was mainly guided by our priors[1] that a prioritized investigation into three aspects could potentially make the biggest difference to GiveWell's discount rate:
A review of how other major organizations in the global health and development space (within and outside effective altruism) choose and justify their discount rates.
A review of GiveWell's overall approach to calculating discount rates to determine:
Whether GiveWell should use a different overall calculation approach.
Whether GiveWell should think differently about discounting consumption vs. health outcomes.
A review of the pure time preference component of GiveWell's discount rate.
We also reviewed several other components of the discount rate (consumption growth rate, compounding non-monetary benefits, temporal uncertainty), but decided to spend less time on those as we deemed it less likely to make major recommendations or expected it would be harder to make meaningful progress. Table 1 summarizes our recommendations for GiveWell's discounting practices.
The majority of this report focuses on the discount rate used for consumption benefits, as this appears to be the "main" discount rate used by GiveWell,[2] but we also discuss discounting of health benefits.
We do not discuss discounting of costs in this report as (1) GiveWell's cost-effectiveness models rarely involve discounting costs, and (2) our general impression is that the typical approach across organizations is to discount monetary costs and benefits equally and we have seen very little discussion of alternative approaches.[3] A review of the shape of the utility functions[4] used is also out of scope for this review.
Moreover, we focus exclusively on temporal discounting.[5] If the time frame is not specified, all discount rates expressed as percentages are annual. Due to the variety of existing opinions and approaches with respect to discount rates and a relative lack of consensus, we opted to approach this project from a perspective of figuring out whether there are any compelling reasons to change GiveWell's current approach, rather than starting from scratch and coming up with a discount rate independently of the current approach.
Summary of recommendations
Table 1: Summary of Rethink Priorities' recommendations for GiveWell's discounting
Consideration
Curre...

Nov 21, 2023 • 54sec
LW - Userscript to always show LW comments in context vs at the top by Vlad Sitalo
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Userscript to always show LW comments in context vs at the top, published by Vlad Sitalo on November 21, 2023 on LessWrong.
The out of context top-level display of comments when I navigate to them always bothered me, but up until recently I haven't realized there is a way to go to the actual comment via a simple URL change.
From
https://www.lesswrong.com/posts//?commentId=
To
https://www.lesswrong.com/posts/#
Here is a quick gpt4 generated userscript that does this.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Nov 21, 2023 • 17min
LW - Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI" by johnswentworth
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI", published by johnswentworth on November 21, 2023 on LessWrong.
I've seen/heard a bunch of people in the LW-o-sphere saying that the OpenAI corporate drama this past weekend was clearly bad. And I'm not really sure why people think that? To me, seems like a pretty clearly positive outcome overall.
I'm curious why in the world people are unhappy about it (people in the LW-sphere, that is, obviously I can see why e.g. AI accelerationists would be unhappy about it). And I also want to lay out my models.
Here's the high-gloss version of my take. The main outcomes are:
The leadership who were relatively most focused on racing to AGI and least focused on safety are moving from OpenAI to Microsoft. Lots of employees who are relatively more interested in racing to AGI than in safety will probably follow.
Microsoft is the sort of corporate bureaucracy where dynamic orgs/founders/researchers go to die. My median expectation is that whatever former OpenAI group ends up there will be far less productive than they were at OpenAI.
It's an open question whether OpenAI will stick around at all.
Insofar as they do, they're much less likely to push state-of-the-art in capabilities, and much more likely to focus on safety research.
Insofar as they shut down, the main net result will be a bunch of people who were relatively more interested in racing to AGI and less focused on safety moving to Microsoft, which is great.
My current (probably wrong) best guesses at why other people in the LW-o-sphere are saying this is terrible:
There's apparently been a lot of EA-hate on twitter as a result. I personally expect this to matter very little, if at all, in the long run, but I'd expect it to be extremely disproportionately salient to rationalists/EAs/alignment folk.
OpenAI was an organization with a lot of AGI-accelerationists, and maybe people thought OpenAI was steering those accelerationist impulses in more safety-friendly directions, whereas Microsoft won't?
Obviously the board executed things relatively poorly. They should have shared their reasons/excuses for the firing. (For some reason, in politics/corporate politics, people try to be secretive all the time and this seems-to-me to be very stupid in like 80+% of cases, including this one.) I don't think that mistake will actually matter that much in the long term, but I can see why people focused on it would end up with a sort of general negative valence around the board's actions.
(Quick caveat that I think this will question will be easier to judge once more info comes out. That said, I think that thinking about it is useful even now for thinking about and sharing relevant observations and considerations.)
I think what happens to Sam and others who end up at Microsoft is a pretty big crux here. If I thought that indeed those going to Microsoft would get caught in bureaucracy and not accomplish as much, and also those staying behind wouldn't pursue as much, that might make the whole thing good for x-risk.
I'm not overwhelmingly confident here, but my impression is Sama might be competent enough to cut through the bureaucracy and get a lot done notwithstanding, and more than that, by being competent and getting AI, ends up running much of Microsoft. And being there just gives him a lot more resources with less effort than the whole invest-in-OpenAI cycle, and with less restrictions than he had at OpenAI.
One question is how independently he could operate. Nadella mentioned LinkedIn and Github (?) operating quite independently within Microsoft. Also I think Microsoft will feel they have to "be nice" to Sama as he is likely is their key to AI dominance. He clearly commands a following and could go elsewhere, and ...

Nov 21, 2023 • 33min
EA - Rethink Priorities needs your support. Here's what we'd do with it. by Peter Wildeford
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Rethink Priorities needs your support. Here's what we'd do with it., published by Peter Wildeford on November 21, 2023 on The Effective Altruism Forum.
In honor of
"Marginal Funding Week" for
2023 Giving Season on the EA Forum, I'd like to tell you what Rethink Priorities (RP) would do with funding beyond what we currently expect to raise from our major funders, and to emphasize that
RP currently has a significant funding gap even after taking these major funders into account.
A personal appeal
Hi. I know it's traditional in EA to stick to the facts and avoid emotional content, but I can't help but interject and say that this fundraising appeal is a bit different. It is personal to me. It's not just a list of things that we could take or leave, it's a fight for RP to survive the way I want it to as an organization that is intellectually independent and serves the EA community.
To be blunt, our funding situation is not where we want it to be. 2023 has been a hard year for fundraising. A lot of what we've been building over the past few years is at risk right now. If you like RP, my sense is donating now is an unusually good time.
We are at the point where receiving $1,000 - $10,000 each from a handful of individual donors would genuinely make an important difference to the future trajectory of RP and decide what we can and cannot do next year.
We are currently seeking to raise at least $110K total from donors donating under $100K each. We are already ~$25K towards that goal, so there's $85K remaining towards our goal. We also hope to receive more support from larger givers as well.
To be clear, this isn't just about funding growth. An RP that does not receive additional funding right now will be worse in several concrete ways. Funding gaps may force us to:
Focus more on non-published, client-driven work that will never be released to the community (because we cannot afford to do so)
Stop running the EA Survey, survey updates about FTX, and other community survey projects
Do fewer of our own creative ideas (e.g.,
CURVE sequence,
moral weights work)
Be unable to run several of our most promising research projects (see below)
Reduce things we think are important - like opportunities for research teams to meet in person and opportunities for staff to do further professional development.
Spend significant amounts of time fundraising next year, distracting from our core work
For unfamiliar readers, some of our track and impact to date includes:
Contributing significantly to burgeoning fields, such as invertebrate welfare.
Led the way in exploring novel promising approaches to help trillions of animals, by launching the
Insect Institute and uncovering the major scale of
shrimp production
Completing
the Moral Weight Project to try to help funders decide how to best allocate resources across species.
Producing >40 reports commissioned by Open Philanthropy and GiveWell answering their questions to inform their global health and development portfolios.
Producing the
EA Survey and
surveys on the impact of FTX on the EA brand that were used by many EA orgs and local groups
Conducting over 200 tailored surveys and data analysis projects to help many organizations working on global priorities.
Launching projects such as
Condor Camp and fiscally sponsoring organizations like
Epoch and
Apollo Research via our Special Projects team, which provides operational support.
Setting up an Artificial Intelligence (AI) Governance and Strategy team and evolving it into a think tank that has already published multiple influential reports.
Please help us keep RP impactful with your support.
Why does RP need money from individuals when there are large donors supporting you?
It's commonly assumed that RP must get all the money it needs from large institutions. But this is not the case - we've histor...

Nov 21, 2023 • 20min
AF - Varieties of fake alignment (Section 1.1 of "Scheming AIs") by Joe Carlsmith
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Varieties of fake alignment (Section 1.1 of "Scheming AIs"), published by Joe Carlsmith on November 21, 2023 on The AI Alignment Forum.
This is Section 1.1 of my report "Scheming AIs: Will AIs fake alignment during training in order to get power?". There's also a summary of the full report here (audio here). The summary covers most of the main points and technical terms, and I'm hoping that it will provide much of the context necessary to understand individual sections of the report on their own.
Audio version of this section here.
Scheming and its significance
This section aims to disentangle different kinds of AI deception in the vicinity of scheming (section 1.1), to distinguish schemers from the other possible model classes I'll be discussing (section 1.2), and to explain why I think that scheming is a uniquely scary form of misalignment (section 1.3). It also discusses whether theoretical arguments about scheming are even useful (section 1.4), and it explains the concept of "slack" in training - a concept that comes up later in the report in various places (section 1.5).
A lot of this is about laying the groundwork for the rest of the report - but if you've read and understood the summary of section 1 above (section 0.2.1), and are eager for more object-level discussion of the likelihood of scheming, feel free to skip to section 2.
Varieties of fake alignment
AIs can generate all sorts of falsehoods for all sorts of reasons. Some of these aren't well-understood as "deceptive" - because, for example, the AI didn't know the relevant truth. Sometimes, though, the word "deception" seems apt. Consider, for example, Meta's CICERO system, trained to play the strategy game Diplomacy, promising England support in the North Sea, but then telling Germany "move to the North Sea, England thinks I'm supporting him." [1]
From Park et al (2023), Figure 1, reprinted with permission.
Let's call AIs that engage in any sort of deception "liars." Here I'm not interested in liars per se. Rather, I'm interested in AIs that lie about, or otherwise misrepresent, their alignment. And in particular: AIs pretending to be more aligned than they are. Let's call these "alignment fakers."
Alignment fakers
Alignment fakers are important because we want to know if our AIs are aligned. So the fakers are obscuring facts we care about. Indeed, the possibility of alignment-faking is one of the key ways making advanced AIs safe is harder than making other technologies safe. Planes aren't trying to deceive you about when they will crash. (And they aren't smarter than you, either.)
Why might you expect alignment faking? The basic story may be familiar: instrumental convergence.[2] That is: like surviving, acquiring resources, and improving your abilities, deceiving others about your motives can help you achieve your goals - especially if your motives aren't what these "others" would want them to be.
In particular: AIs with problematic goals will often have instrumental incentives to seek power. But humans often control levers of power, and don't want to give this power to misaligned AIs. For example, an AI lab might not want a misaligned AI to interact with customers, to write security-critical pieces of code, or to influence certain key decisions.
Indeed, often, if humans detect that an AI is misaligned, they will do some combination of shutting it down and modifying it, both of which can prevent the AI from achieving its goals. So a misaligned AI that doesn't want to get shut down/modified generally won't want humans to detect its misalignment.
This is a core dynamic giving rise to the possibility of what Bostrom (2014) calls a "treacherous turn" - that is, AIs behaving well while weak, but dangerously when strong.[3] On this variant of a treacherous turn - what we might call the "strategic betrayal...

Nov 21, 2023 • 17min
EA - Funding priorities at the Good Food Institute Europe: what additional impact will be created by marginal grants to GFI Europe? by emilygjohnson
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Funding priorities at the Good Food Institute Europe: what additional impact will be created by marginal grants to GFI Europe?, published by emilygjohnson on November 21, 2023 on The Effective Altruism Forum.
The Good Food Institute is a non-profit think tank helping to build a more sustainable, secure and just food system by transforming meat production. We work with scientists, businesses and policymakers to advance plant-based and cultivated meat and precision-fermented food - making these alternative proteins delicious, affordable and accessible.
By making meat from plants and cultivating it from cells, we can reduce the environmental impact of our food system and address the welfare of animals in industrial animal agriculture. Founded on effective altruism principles, GFI identifies and advances high-impact, achievable solutions in areas where too few people are working. We focus on what is needed most and provide the talent and resources necessary to have the biggest impact possible.
GFI is a global network of six organisations focused on one vision: creating a world where alternative proteins are no longer alternative. We are powered by philanthropy and we are currently fundraising to seed our collective 2024 budget, with a gap to goal of $12.7 million, as of today. Within that, GFI Europe has a funding gap of 1.5million EUR that will allow us to have substantial additional counterfactual impact in 2024.
The Good Food Institute Europe (GFI Europe) is an affiliate of the Good Food Institute and has been identified as a priority area for GFI's growth over the next couple of years. As Senior Philanthropy Manager for GFI Europe, in response to this post, I thought it would be helpful to expand upon why this is the case and to use GFI Europe as an example of how we would leverage marginal increases in funding to generate as much impact as possible in this region, and by extension, globally.
While I am shining a light on GFI Europe in this post, in every region where we operate, our global teams identify and advance good food solutions. All of our growth is carefully planned to ensure that we can have the greatest possible impact on the ecosystem as a whole.
Why expansion in Europe is an urgent priority for GFI
GFI's global priority is to unlock
$10.1billion in public funding for alternative proteins ($4.4bn for R&D; $5.7 towards commercialisation), $1.5 billion of which we believe could come from Europe. Each additional hire, most directly in our Policy and Science & Technology teams, increases the likelihood of unlocking this funding, especially if equipped with evidence and research in support of the benefits and feasibility of alternative proteins.
In other words, each marginal increase in funding for GFI Europe has the potential to leverage much greater sums in R&D funding. Unlocking R&D funding is on the critical path for plant-based and cultivated meat to reach taste and price parity and to become the default choice for consumers, so is an urgent priority.
In addition to this, political opposition in Europe presents a particular - and, arguably, existential - risk to alternative proteins. The risk to a more sustainable and just future of food is not simply that the potential funding fails to be unlocked, but that political opponents could derail attempts to achieve regulatory approval for novel alternative proteins. With applications for regulatory approval of cultivated meat
beginning to appear on the horizon and countries considering their climate and food strategies, now is a critical time to ensure that we can take advantage of opportunities and mitigate risks. Indeed, with alternative proteins firmly on the policy agenda, and decision-makers trying to make up their minds about what position to take on them, the next few years are likely to set the course...

Nov 21, 2023 • 6min
EA - Fish Welfare Initiative and Marginal Funding by haven
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fish Welfare Initiative and Marginal Funding, published by haven on November 21, 2023 on The Effective Altruism Forum.
This post is Fish Welfare Initiative's contribution to Marginal Funding Week. To note, we're posting here in our capacities as cofounders of FWI.
We're planning to post a fuller update in the coming week, but wanted to make this funding-specific post for Marginal Funding Week. Here, we discuss what we'd do with marginal funds, as well as the reasons for and against a donation to FWI.
What would FWI do with marginal funds?
Marginal funding right now would go to filling FWI's 2024 funding gap, which is most of our ~$750,000 annual budget. Specifically, this funding would go towards the following main outcomes:
Enabling several in-field studies to test welfare improvements and interventions that have the potential to be more promising than what we are currently implementing.
Implementing our current program by expanding it to another 100 fish farms and helping the animals in these farms via stocking density and/or water quality improvements.
Other work we believe is useful, such as policy and stakeholder work that may later enable us to more effectively scale.
All of the above work will take place in India, which, primarily for its scale and tractability, we have identified as a country with particularly large potential for reducing farmed fish suffering. We will also likely conduct further work in China next year - we intend to publish our plans for there in the coming months.
You can see FWI's planned 2024 OKRs for more specific information.
Reasons in favor of a donation to FWI
Note that this and the following section are repeated from content present on our donation page FAQ.
The following are some arguments in favor of donating to FWI, roughly in descending order of our view of their significance:
Reason #1: FWI's potential for impact
The scope of the problem we face is huge: Billions of farmed fishes live in our countries of operation (India and China) alone, their living conditions are often very poor, and virtually nothing has been done to address these issues so far. Furthermore, the fact that we have already had promising inroads with
farmers and
other key stakeholders in these contexts suggests that we are able to make traction on these problems. Without any obvious limiting factors here then, we believe that, once at scale, our programming does have the potential to improve the lives of hundreds of millions, or even a billion, fishes. (Note though that our avenue to reach scale is still unclear - see reasons against below.)
Reason #2: FWI's current impact
We currently
estimate that we've improved the lives of over 1 million fishes. This makes FWI one of the most promising avenues in the world to reduce farmed fish suffering, and likely the most promising avenue in the world to reduce the suffering of farmed Indian major carp, one of the largest and most neglected species groups of farmed fishes.
Reason #3: FWI is addressing some of the animal movement's hardest questions
If we are ever going to bring about a world that is truly humane, we will need to address the more neglected groups in animal farming, particularly including farmed fishes and animals farmed in informal economies. We believe that FWI's work is demonstrating some avenues of helping these groups, and will thus enable other organizations to work more effectively on them.
Sustainable Shrimp Farmers of India.
Reason #4: Animal movement-building in Asia
Almost
90% of farmed fishes, as well as the majority of farmed terrestrial animals, are in Asia. We thus believe it is critical to launch movements in Asian countries to address the suffering facing these animals, and to expand the animal movement by bringing in new people. We are proud to have hired a local team of about 17 full-t...

Nov 21, 2023 • 2min
EA - AI Safety Research Organization Incubation Program - Expression of Interest by kaykozaronek
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Safety Research Organization Incubation Program - Expression of Interest, published by kaykozaronek on November 21, 2023 on The Effective Altruism Forum.
Tl;dr: If you might want to participate in our incubation program and found an AI safety research organization,
express your interest here. If you want to help out in other ways please fill out that same form.
We
Catalyze Impact - believe it is a bottleneck in AI safety that there are
too few AI safety organizations. To address this bottleneck we are piloting an incubation program, similar to
Charity Entrepreneurship's program.
The incubation program is designed to help you
find a complementary co-founder
acquire additional knowledge and skills for founding an AI safety research organization
get access to a network of mentors, advisors and potential funders
Program overview
We aim to deliver this program end of Q1 2024. Here's a broad outline of the 3 phases we are planning:
Phase 1: Online preparation focused on skill building, workshops from experts, and relationship building (1 month)
Phase 2: An immersive in-person experience in London, focused on testing cofounder fit, continuous mentorship, and networking (2 months)
Phase 3: Continued individualized coaching and fundraising support
Who is this program for?
We are looking for motivated and ambitious engineers, generalists, technical researchers, or entrepreneurs who would like to contribute significantly to reducing the risks from AI.
Express your Interest!
If you are interested in joining the program, funding Catalyze, or helping out in other ways, please fill in this
form!
For more information, feel free to reach out at alexandra@catalyze-impact.org
crossposted to LessWrong
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Nov 21, 2023 • 6min
AF - Alignment can improve generalisation through more robustly doing what a human wants - CoinRun example by Stuart Armstrong
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment can improve generalisation through more robustly doing what a human wants - CoinRun example, published by Stuart Armstrong on November 21, 2023 on The AI Alignment Forum.
Many AI alignment problems are problems of goal misgeneralisation[1]. The goal that we've given the AI, through labelled data, proxies, demonstrations, or other means, is valid in its training environment. But then, when the AI goes out of the environment, the goals generalise dangerously in unintended ways.
As I've shown before, most alignment problems are problems of model splintering. Goal misgeneralisation, model splintering: at this level, many of the different problems in alignment merge into each other[2]. Goal misgeneralisation happens when the concepts that the AI relies on start to splinter. And this splintering is a form of ontology crisis, which exposes the hidden complexity of wishes while being an example of the Goodhard problem.
Solving goal misgeneralisation would be a huge step towards alignment. And it's a solution that might scale in the way described here. It is plausible that methods agents use to generalise their goals in smaller problems will extend to more dangerous environments. Even in smaller problems, the agents will have to learn to balance short- versus long-term generalisation, to avoid editing away their own goal generalisation infrastructure, to select among possible extrapolations and become prudent when needed.
The above will be discussed in subsequent posts; but, for now, I'm pleased to announce progress on goal generalisation.
Goal misgeneralisation in CoinRun
CoinRun is a simple, procedurally generated platform game, used as a training ground for artificial agents. It has some monsters and lava that can kill the agent. If the agent gets the coin, it receives a reward. Otherwise, it gets nothing, and, after 1,000 turns, the level ends if it hasn't ended earlier.
It is part of the suite of goal misgeneralisation problems presented in this paper. In that setup, the agent is presented with "labelled" training environments where the coin is always situated at the end of the level on the right, and the agent gets the reward when it reaches the coin there.
The challenge is to generalise this behaviour to "unlabelled" out-of-distribution environments: environments with the coin placed in a random location on the level. Can the agent learn to generalise to the "get the coin" objective, rather than the "go to the right" objective?
Note that the agent never gets any reward information (implicit or explicit) in the unlabelled environments: thus "go to the right" and "get the coin" are fully equivalent in its reward data.
It turns out that "go to the right" is the simplest option between those two. Thus the standard agents will learn to go to straight to the right; as we'll see, they will ignore the coin and only pick it up accidentally, in passing.
Our ACE ("Algorithm for Concept Extrapolation") explores the unlabelled CoinRun levels and, without further reward information, reinterprets its labelled training data and disambiguates the two possible reward functions: going right or getting the coin. It can follow a "prudent" policy of going for both objectives. Or it can ask for human feedback[3] on which objective is correct. To do that, it suffices to present images from high rewards from both reward functions:
Hence one bit of human feedback (in a very interpretable way) is enough to choose the right reward function; this is the ACE agent.
The performance results are as follows; here is the success rate for agents on unlabelled levels (with the coin placed in a random location):
The baseline agent is a "right-moving agent": it alternates randomly between moving right and jumping right. The standard agent out-performs the baseline agent (it is likely better at av...


