

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Dec 5, 2023 • 11min
EA - An exhaustive list of cosmic threats by JordanStone
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An exhaustive list of cosmic threats, published by JordanStone on December 5, 2023 on The Effective Altruism Forum.
Toby Ord covers 'Asteroids and Comets' and 'Stellar Explosions' in The Precipice. But I thought it would be useful to provide an up-to-date and exhaustive list of all cosmic threats. I'm defining cosmic threat here as any existential risk potentially arising from space. I think this list may be useful for 3 main reasons:
New cosmic threats are discovered frequently. So it's plausible that future cause areas could pop out of this space. I think that keeping an eye on it should help identify areas that may need research. Though it should be noted that some of the risks are totally impossible to protect against at this point (e.g. a rogue planet entering our solar system).
Putting all of the cosmic threats together in one place could reveal that cosmic threats are more important than previously thought, or provide a good intro for someone interested in working in this space.
There is momentum in existential risk reduction from outer space, with great powers (Russia, USA, China, India, Europe) already collaborating on asteroid impact risk. So harnessing that momentum to tackle some more of the risks on this list could be really tractable and may lead to collaboration on other x-risks like AI, biotech and nuclear.
I will list each cosmic threat, provide a brief explanation, and find the best evidence I can to provide severity and probability estimates for each. Enjoy :)
I'll use this format:
Cosmic Threat [Severity of worst case scenario /10] [Probability of that scenario occurring in the next 100 years] Explanation of threat
Explanation of rationale and approach
Severity estimates
For the severity, 10 is the extinction of all intelligent life on Earth, and 0 is a fart in the wind. It was difficult to pin down one number for threats with multiple outcomes (e.g. asteroids have different sizes). So the severity estimates are for the worst-case scenarios for each cosmic threat, and the probability estimate corresponds to that scenario.
Probability estimates
Probabilities are presented as % chance of that scenario occurring in the next 100 years. I have taken probabilities from the literature and converted values to normalise them as a probability of their occurrence within the next 100 years (as a %). This isn't a perfect way to do it, but I prioritised getting a general understanding of their probability, rather than numbers that are hard to imagine. When the severity or likelihood is unclear or not researched well enough, I've written 'unknown'.
I'm trying my best to ignore reasoning along the lines of "if it hasn't happened before, then it very likely won't happen ever or is extremely rare" because of the anthropic principle. Our view of past events on Earth is biased towards a world that has allowed humanity to evolve, which likely required a few billion years of stable-ish conditions. So it is likely that we have just been lucky in the past, where no cosmic threats have disturbed Earth's habitability so extremely as to set back life's evolution by billions of years (not even the worst mass extinction ever at the Permian-Triassic boundary did this, as reptiles survived).
An Exhaustive List of Cosmic Threats
Format:
Cosmic Threat [Severity of worst case scenario /10] [Probability of that scenario occurring in the next 100 years] Explanation of threat
Solar flares [4/10] [1%]. Electromagnetic radiation erupts from the surface of the sun. Solar flares occur fairly regularly and cause minor impacts, mainly on communications. A large solar flare has the potential to cause electrical grids to fail, damage satellites, disrupt radio signals, cause increased radiation influx, destroy data storage devices, cause navigation errors, and permanently damage scientific eq...

Dec 5, 2023 • 4min
EA - I donated $35 to offset my carbon footprint for this year by Luke Eure
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: I donated $35 to offset my carbon footprint for this year, published by Luke Eure on December 5, 2023 on The Effective Altruism Forum.
This is a cross-post from my blog. I've seen analysis like this previously on the forum, but nothing recently, so I thought it might be useful to share one up-to-date practical exploration of climate offset donations.
I want to start donating annually to offset my carbon footprint. I don't really think of this as a charitable cost - instead it's
internalizing my externalities.
This is the first time I am systematically deciding to make an annual donation - I wanted to walk through my thinking in case it's useful for anyone else! This post also serves as pro-Effective Altruism propaganda.
How much carbon do I need to offset?
The average American seems to emit about 15-20T of CO2 per year (
source,
source,
source). I'll assume 20T.
But I travel a lot. A round-trip flight from London to New York
emits ~1T of CO2. This year I took 5 international flights - most had multiple legs, so I'll assume I emitted 15T more than the average American.
So let's say I have to offset 35T of CO2 each year.
Where should I donate?
I trust Vox's Future Perfect on stuff like this. They recommend donating to a climate change fund such as the
Climate Change Fund from Founder's Pledge
How much should I donate?
I'll use the
top recommended climate charity from Vox's Future Perfect as a benchmark. As of December 2023, this is the
Clean Air Task Force
Founder's Pledge estimates that a donation to CATF can avert 1T of CO2 emissions for $0.1-$1
So that would put the amount I have to donate to offset all my emissions at $3.50-$35 per year
I'll be on the safe side and assume I should donate $35
Conclusion: I just donated $35 to the Climate Fund from Founder's Pledge to offset my yearly carbon footprint. I intend to make this donation annually going forward, and encourage you to as well!
Effective Altruism has been under some heat lately - with the collapse of FTX, and the drama around the OpenAI board ousting Sam Altman.
EA is both a philosophy and a community. I think the above exercise illustrates why both are really good, despite recent drama.
The philosophy of Effective Altruism gave me the intellectual motivation to donate in the first place. And it informs my decision about where to donate: I should not just donate to what feels the best - I should donate where my dollar will have the highest impact in terms of tons of CO2-eq averted.
The community of EA has created institutions (in this case Vox's Future Perfect, and Founder's Pledge) that help me quickly[1] identify a good donation opportunity, and direct my funds effectively. Also,
a post on the the EA Forum provided extra social motivation to make this donation
Is this system perfect? No. Perhaps I could have spent more time finding a better charity to donate to. Perhaps I should be doing more in my lifestyle or in political activism to be addressing the problem of climate change.
But I think my actions here are a lot better than they would be if Effective Altruism did not exist[2]. So overall I remain proud of Effective Altruism - both the philosophy and the community.
^
It only took 1 hour to do the research and decide to donate!
^
For what it's worth, the philosophy and community of EA were also key motivators in my decision to become vegetarian
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Dec 5, 2023 • 28min
AF - Neural uncertainty estimation for alignment by Charlie Steiner
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Neural uncertainty estimation for alignment, published by Charlie Steiner on December 5, 2023 on The AI Alignment Forum.
Introduction
Suppose you've built some AI model of human values. You input a situation, and it spits out a goodness rating. You might want to ask: "What are the error bars on this goodness rating?" In addition to it just being nice to know error bars, an uncertainty estimate can also be useful inside the AI: guiding active learning[1], correcting for the optimizer's curse[2], or doing out-of-distribution detection[3].
I recently got into the uncertainty estimation literature for neural networks (NNs) for a pet reason: I think it would be useful for alignment to quantify the domain of validity of an AI's latent features. If we point an AI at some concept in its world-model, optimizing for realizations of that concept can go wrong by pushing that concept outside its domain of validity.
But just keep thoughts of alignment in your back pocket for now. This post is primarily a survey of the uncertainty estimation literature, interspersed with my own takes.
The Bayesian neural network picture
The Bayesian NN picture is the great granddaddy of basically every uncertainty estimation method for NNs, so it's appropriate to start here.
The picture is simple. You start with a prior distribution over parameters. Your training data is evidence, and after training on it you get an updated distribution over parameters. Given an input, you calculate a distribution over outputs by propagating the input through the Bayesian neural network.
This would all be very proper and irrelevant ("Sure, let me just update my 2trilliondimensional joint distribution over all the parameters of the model"), except for the fact that actually training NNs does kind of work this way. If you use a log likelihood loss and L2 regularization, the parameters that minimize loss will be at the peak of the distribution that a Bayesian NN would have, if your prior on the parameters was a Gaussian[4][5].
This is because of a bridge between the loss landscape and parameter uncertainty. Bayes's rule says P(parameters|dataset)=P(parameters)P(dataset|parameters)/P(dataset). Here P(parameters|dataset)is your posterior distribution you want to estimate, and P(parameters)P(dataset|parameters) is the exponential of the loss[6]. This lends itself to physics metaphors like "the distribution of parameters is a Boltzmann distribution sitting at the bottom of the loss basin."
Empirically, calculating the uncertainty of a neural net by pretending it's adhering to the Bayesian NN picture works so well that one nice paper on ensemble methods[7] called it "ground truth." Of course to actually compute anything here you have to make approximations, and if you make the quick and dirty approximations (e.g. pretend you can find the shape of the loss basin from the Hessian) you get bad results[8], but people are doing clever things with Monte Carlo methods these days[9], and they find that better approximations to the Bayesian NN calculation get better results.
But doing Monte Carlo traversal of the loss landscape is expensive. For a technique to apply at scale, it must impose only a small multiplier on cost to run the model, and if you want it to become ubiquitous the cost it imposes must be truly tiny.
Ensembles
A quite different approach to uncertainty is ensembles[10]. Just train a dozen-ish models, ask them for their recommendations, and estimate uncertainty from the spread. The dozen-times cost multiplier on everything is steep, but if you're querying the model a lot it's cheaper than Monte Carlo estimation of the loss landscape.
Ensembling is theoretically straightforward. You don't need to pretend the model is trained to convergence, you don't need to train specifically for predictive loss, you don't even need...

Dec 5, 2023 • 10min
AF - Some open-source dictionaries and dictionary learning infrastructure by Sam Marks
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some open-source dictionaries and dictionary learning infrastructure, published by Sam Marks on December 5, 2023 on The AI Alignment Forum.
As more people begin work on interpretability projects which incorporate dictionary learning, it will be valuable to have high-quality dictionaries publicly available.[1] To get the ball rolling on this, my collaborator (Aaron Mueller) and I are:
open-sourcing a number of sparse autoencoder dictionaries trained on Pythia-70m MLPs
releasing our repository for training these dictionaries[2].
Let's discuss the dictionaries first, and then the repo.
The dictionaries
The dictionaries can be downloaded from here. See the sections "Downloading our open-source dictionaries" and "Using trained dictionaries" here for information about how to download and use them. If you use these dictionaries in a published paper, we ask that you mention us in the acknowledgements.
We're releasing two sets of dictionaries for EleutherAI's 6-layer pythia-70m-deduped model. The dictionaries in both sets were trained on 512-dimensional MLP output activations (not the MLP hidden layer like Anthropic used), using ~800M tokens from The Pile.
The first set, called
0_8192, consists of dictionaries of size 8192=16512. These were trained with an L1 penalty of
1e-3.
The second set, called
1_32768, consists of dictionaries of size 32768=64512. These were trained with an l1 penalty of
3e-3.
Here are some statistics. (See our repo's readme for more info on what these statistics mean.)
For dictionaries in the
0_8192 set:
Layer
MSE Loss
L1 loss
L0
% Alive
% Loss Recovered
0
0.056
6.132
9.951
0.998
0.984
1
0.089
6.677
44.739
0.887
0.924
2
0.108
11.44
62.156
0.587
0.867
3
0.135
23.773
175.303
0.588
0.902
4
0.148
27.084
174.07
0.806
0.927
5
0.179
47.126
235.05
0.672
0.972
For dictionaries in the
1_32768 set:
Layer
MSE Loss
L1 loss
L0
% Alive
% Loss Recovered
0
0.09
4.32
2.873
0.174
0.946
1
0.13
2.798
11.256
0.159
0.768
2
0.152
6.151
16.381
0.118
0.724
3
0.211
11.571
39.863
0.226
0.765
4
0.222
13.665
29.235
0.19
0.816
5
0.265
26.4
43.846
0.13
0.931
And here are some histograms of feature frequencies.
Overall, I'd guess that these dictionaries are decent, but not amazing.
We trained these dictionaries because we wanted to work on a downstream application of dictionary learning, but lacked the dictionaries. These dictionaries are more than good enough to get us off the ground on our mainline project, but I expect that in not too long we'll come back to train some better dictionaries (which we'll also open source). I think the same is true for other folks: these dictionaries should be sufficient to get started on projects that require dictionaries; and when better dictionaries are available later, you can swap them in for optimal results.
Some miscellaneous notes about these dictionaries (you can find more in the repo).
The L1 penalty for
1_32768 seems to have been too large; only 10-20% of the neurons are alive, and the loss recovered is much worse. That said, we'll remark that after examining features from both sets of dictionaries, the dictionaries from the
1_32768 set seem to have more interpretable features than those from the
0_8192 set (though it's hard to tell).
In particular, we suspect that for
0_8192, the many high-frequency features in the later layers are uninterpretable but help significantly with reconstructing activations, resulting in deceptively good-looking statistics. (See the bullet point below regarding neuron resampling and bimodality.)
As we progress through the layers, the dictionaries tend to get worse along most metrics (except for % loss recovered). This may have to do with the growing scale of the activations themselves as one moves through the layers of pythia models (h/t to Arthur Conmy for raising this hypothesis).
We note that our dictionary fea...

Dec 5, 2023 • 7min
LW - The LessWrong 2022 Review by habryka
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The LessWrong 2022 Review, published by habryka on December 5, 2023 on LessWrong.
The snow is falling, the carols are starting, and we all know it's time for our favorite winter holiday tradition. It's LessWrong review time!
Each year we come together and review posts that are at least one year old. That means for the next two months we are reviewing all posts from 2022.
While our everyday lives are filled with fads and chasing the sweet taste of karma and social approval, the LessWrong review is the time to take a step back and ask ourselves "did this actually help me think better?", "did this actually turn out to be valuable?" and "which things withstood further and extended scrutiny?".
We've done this 4 times so far (2018, 2019, 2020, 2021).
The full technical details of how the Annual Review works are in the final section of this post, but it's basically the same as the past few years. There are three phases:
Preliminary Voting Phase (2 weeks, Dec 4 - 17): We identify posts especially worthy of consideration in the review casting preliminary votes. Posts with 2 preliminary votes move into the Discussion Phase.
Discussion Phase (4 weeks, Dec 17 - Jan 14): We review and debate posts. Posts that receive at least one written review move to the final voting phase.
Final Voting (2 weeks, Jan 14 - Jan 28): We do a full voting pass, using quadratic voting. The outcome determines the Annual Review results.
For more of the philosophy of the Annual Review, see the previous announcement posts here, here, here, and here.
Getting Started
At the top of any posts eligible for the review, you will see this:
These will be your preliminary votes for the 2022 review. Posts need to get at least 2 preliminary votes (positive or negative) in order to move to the next phase of the review.
To start perusing posts, I recommend going to the
All 2022 Posts page, or the
View Your Past Upvotes page. Note: only users with accounts registered before January 2022 are eligible to vote.
No books this year, sorry folks
For 2018, 2019, and 2020 we printed books of the results of the review. We have sold many thousands of them, I am very proud of them, and many people told me that these are among the favorite things that they own:
Sadly, there won't be a book this year (and also not of the 2021 review). The effort involved in making them is hard to justify with increasing demands from many of our other projects (as well as reduced funding, since if you take into account the 4-5 staff months these cost to make each year, we net lost money on these).
I am thinking about other ways to create an easy to reference artifact that captures the results of this year's and last year's review. I think the minimum I want to do is to create a good ebook and maybe an audible version using our machine narration (or doing human narration). Additional suggestions are welcome.
We are going to be doing a Christmas sale of all of the previous years' books in the next few days, and hopefully before Christmas we will also have a good ebook (and maybe even an audiobook version) available of last year's review results.
How does the review work?
Phase 1: Preliminary Voting
To nominate a post, cast a preliminary vote for it. Eligible voters will see this UI:
If you think a post was an important intellectual contribution, you can cast a vote indicating roughly how important it was. For some rough guidance:
A vote of 1 means "it was good."
A vote of 4 means "it was quite important".
A vote of 9 means it was "a crucial piece of intellectual progress."
You can vote at the top of a post page, or anywhere the post appears in a list (like the
All Posts page, or the new
View Your Past Upvotes page).
Posts that get at least one positive vote go to the Voting Dashboard, where other users can vote on it. You're encouraged to give at l...

Dec 5, 2023 • 23min
LW - Speaking to Congressional staffers about AI risk by Akash
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Speaking to Congressional staffers about AI risk, published by Akash on December 5, 2023 on LessWrong.
In May and June of 2023, I (Akash) had about 50-70 meetings about AI risks with congressional staffers. I had been meaning to write a post reflecting on the experience and some of my takeaways, and I figured it could be a good topic for a LessWrong dialogue. I saw that hath had offered to do LW dialogues with folks, and I reached out.
In this dialogue, we discuss how I decided to chat with staffers, my initial observations in DC, some context about how Congressional offices work, what my meetings looked like, lessons I learned, and some miscellaneous takes about my experience.
Context
Hey! In your message, you mentioned a few topics that relate to your time in DC.
I figured we should start with your experience talking to congressional offices about AI risk. I'm quite interested in learning more; there don't seem to be many public resources on what that kind of outreach looks like.
How'd that start? What made you want to do that?
In March of 2023, I started working on some AI governance projects at the Center for AI Safety. One of my projects involved helping CAIS respond to a Request for Comments about AI Accountability that was released by the NTIA.
As part of that work, I started thinking a lot about what a good regulatory framework for frontier AI would look like. For instance: if I could set up a licensing regime for frontier AI systems, what would it look like? Where in the US government would it be housed? What information would I want it to assess?
I began to wonder how actual policymakers would react to these ideas. I was also curious to know more about how policymakers were thinking about AI extinction risks and catastrophic risks.
I started asking other folks in AI Governance. The vast majority had not talked to congressional staffers (at all). A few had experience talking to staffers but had not talked to them about AI risk. A lot of people told me that they thought engagement with policymakers was really important but very neglected. And of course, there are downside risks, so you don't want someone doing it poorly.
After consulting something like 10-20 AI governance folks, I asked CAIS if I could go to DC and start talking to congressional offices. The goals were to (a) raise awareness about AI risks, (b) get a better sense of how congressional offices were thinking about AI risks, (c) get a better sense of what kinds of AI-related priorities people at congressional offices had, and (d) get feedback on my NTIA request for comment ideas.
CAIS approved, and I went to DC in May-June 2023. And just to be clear, this wasn't something CAIS told me to do- this was more of an "Akash thing" that CAIS was aware was happening.
Whoa, that's really interesting. A couple random questions:
And of course, there are downside risks, so you don't want someone doing it poorly.
How does one go about doing it non-poorly? How does one learn to interact with policymakers?
Also, what's your background? Did you do policy stuff before this?
Yeah, great question. I'm not sure what the best way to learn is, but here are some things I tried:
Talk to people who have experience interacting with policymakers. Ask them what they say, what they found surprising, what mistakes they made, what downside risks they've noticed, etc.
Read books. I found Master of the Senate and Act of Congress to be especially helpful. I'm currently reading The Devil's Chessboard to better understand the CIA & intelligence agencies, and I'm finding it informative so far.
Do roleplays with policymakers you already know and ask them for blunt feedback.
Get practice in lower-stakes meetings, and use those experiences to iterate.
I hadn't done much policy stuff before this. In college, I wrote for the Harvard Poli...

Dec 5, 2023 • 2min
LW - n of m ring signatures by DanielFilan
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: n of m ring signatures, published by DanielFilan on December 5, 2023 on LessWrong.
A normal cryptographic signature associated with a message and a public key lets you prove to the world that it was made by someone with access to the private key associated with the known public key, without revealing that public key. You can read about it on Wikipedia here.
A ring signature associated with a message and a set of public keys lets you prove to the world that it was made by someone with access to the message and one private key associated to one of the public keys in the set, but nobody will be able to tell which public key it was. This lets you say something semi-anonymously, which is neat. It's also used in the private cryptocurrency Monero. You can read about them on Wikipedia here.
Here's a thing that would be better than a ring signature: a signature that proved that it was made by a subset of public keys of a certain size. In my head, I was calling this an n of m ring signature for a while. But when I googled "n of m ring signature", nothing came up. It turns out this is because in the literature, it's called a "threshold ring signature", a "k of n ring signature", or a "t of n ring signature" instead. I think perhaps the first paper about it is this one, but I haven't checked very hard.
Anyway: I would like to make it so that when you search for n-of-m ring signatures online, you find a thing telling you that you should instead search for "threshold ring signature". Hence this post.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Dec 4, 2023 • 13min
AF - 2023 Alignment Research Updates from FAR AI by AdamGleave
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 2023 Alignment Research Updates from FAR AI, published by AdamGleave on December 4, 2023 on The AI Alignment Forum.
TL;DR: FAR AI's science of robustness agenda has found vulnerabilities in superhuman Go systems; our value alignment research has developed more sample-efficient value learning algorithms; and our model evaluation direction has developed a variety of new black-box and white-box evaluation methods.
FAR AI is a non-profit AI safety research institute, working to incubate a diverse portfolio of research agendas. We've been growing rapidly and are excited to share some highlights from our research projects since we were founded just over a year ago. We've also been busy running field-building events and setting up a coworking space - see our overview post for more information on our non-research activities.
Our Mission
We need safety techniques that can provide demonstrable guarantees of the safety of advanced AI systems. Unfortunately, currently deployed alignment methods like Reinforcement Learning from Human Feedback (RLHF) fall short of this standard. Proposals that could provide stronger safety guarantees exist but are in the very early stages of development.
Our mission is to incubate and accelerate these early-stage approaches, so they can be empirically tested and deployed. We focus on research agendas that are too large to be pursued by individual academic or independent researchers but are too early-stage to be of interest to most for-profit organizations.
We take bets on a range of these promising early-stage agendas and then scale up those that prove most successful. Unlike other research organizations that take bets on specific agendas, our structure allows us to both (1) explore a range of agendas and (2) execute them at scale. Our current bets fall into three categories:
Science of Robustness: How does robustness vary with model size? Will superhuman systems be vulnerable to adversarial examples or "jailbreaks" similar to those seen today? And, if so, how can we achieve safety-critical guarantees?
Value Alignment: How can we learn reliable reward functions from human data? Our research focuses on enabling higher bandwidth, more sample-efficient methods for users to communicate preferences for AI systems; and improved methods to enable training with human feedback.
Model Evaluation: How can we evaluate and test the safety-relevant properties of state-of-the-art models? Evaluation can be split into black-box approaches that focus only on externally visible behavior ("model testing"), and white-box approaches that seek to interpret the inner workings ("interpretability"). These approaches are complementary, with black-box approaches less powerful but easier to use than white-box methods, so we pursue research in both areas.
Science of Robustness
No engineered component is indestructible. When designing physical structures, engineers estimate how much stress each component needs to withstand, add an appropriate safety margin, and then choose components with the appropriate tolerance. This enables safe and cost-effective construction: bridges rarely fall down, nor are they over-engineered.
AI components such as LLMs or computer vision classifiers are far from indestructible, being plagued by adversarial examples and vulnerability to distribution shift. Unfortunately, AI currently has no equivalent to the stress calculations of civil engineers.
So far the best approach we have is to guess-and-check: train a model, and then subject it to a battery of tests to determine its capabilities and limitations. But this approach gives little theoretical basis for how to improve systems. And both the training and testing of models are increasingly expensive and labor-intensive (with the cost of foundation model training now rivaling that of the construction o...

Dec 4, 2023 • 10min
AF - What's new at FAR AI by AdamGleave
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What's new at FAR AI, published by AdamGleave on December 4, 2023 on The AI Alignment Forum.
Summary
We are FAR AI: an AI safety research incubator and accelerator. Since our inception in July 2022, FAR has grown to a team of 12 full-time staff, produced 13 academic papers, opened the coworking space FAR Labs with 40 active members, and organized field-building events for more than 160 ML researchers.
Our organization consists of three main pillars:
Research. We rapidly explore a range of potential research directions in AI safety, scaling up those that show the greatest promise. Unlike other AI safety labs that take a bet on a single research direction, FAR pursues a diverse portfolio of projects. Our current focus areas are building a science of robustness (e.g. finding vulnerabilities in superhuman Go AIs), finding more effective approaches to value alignment (e.g. training from language feedback), and model evaluation (e.g. inverse scaling and codebook features).
Coworking Space. We run FAR Labs, an AI safety coworking space in Berkeley. The space currently hosts FAR, AI Impacts, SERI MATS, and several independent researchers. We are building a collaborative community space that fosters great work through excellent office space, a warm and intellectually generative culture, and tailored programs and training for members. Applications are open to new users of the space (individuals and organizations).
Field Building. We run workshops, primarily targeted at ML researchers, to help build the field of AI safety research and governance. We co-organized the International Dialogue for AI Safety bringing together prominent scientists from around the globe, culminating in a public statement calling for global action on AI safety research and governance. We will soon be hosting the New Orleans Alignment Workshop in December for over 140 researchers to learn about AI safety and find collaborators.
We want to expand, so if you're excited by the work we do, consider donating or working for us! We're hiring research engineers, research scientists and communications specialists.
Incubating & Accelerating AI Safety Research
Our main goal is to explore new AI safety research directions, scaling up those that show the greatest promise. We select agendas that are too large to be pursued by individual academic or independent researchers but are not aligned with the interests of for-profit organizations. Our structure allows us to both (1) explore a portfolio of agendas and (2) execute them at scale. Although we conduct the majority of our work in-house, we frequently pursue collaborations with researchers at other organizations with overlapping research interests.
Our current research falls into three main categories:
Science of Robustness. How does robustness vary with model size? Will superhuman systems be vulnerable to adversarial examples or "jailbreaks" similar to those seen today? And, if so, how can we achieve safety-critical guarantees?
Relevant work:
Vulnerabilities in superhuman Go AIs,
AI Safety in a World of Vulnerable Machine Learning Systems.
Value Alignment. How can we learn reliable reward functions from human data? Our research focuses on enabling higher bandwidth, more sample-efficient methods for users to communicate preferences for AI systems; and improved methods to enable training with human feedback.
Relevant work:
VLM-RM: Specifying Rewards with Natural Language,
Training Language Models with Language Feedback.
Model Evaluation: How can we evaluate and test the safety-relevant properties of state-of-the-art models? Evaluation can be split into black-box approaches that focus only on externally visible behavior ("model testing"), and white-box approaches that seek to interpret the inner workings ("interpretability"). These approaches are complementary, with ...

Dec 4, 2023 • 6min
EA - Is the Animal & Vegan Advocacy (AVA) Summit an EA Event? by Julia Reinelt
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Is the Animal & Vegan Advocacy (AVA) Summit an EA Event?, published by Julia Reinelt on December 4, 2023 on The Effective Altruism Forum.
With the recent
announcement of the Centre for Effective Altruism on leaning into cause-specific conferences for some of their events, EAs working in farmed animal welfare or alternative protein lost an important event to network, hear about the latest research, and meet EA-minded funders.
Nevertheless and independent of the rest of this article, I want to take a brief moment to say how impressive it is to see the EA movement growing so much that there now is the need for (large!) specialized conferences.
We often get asked: "Is the AVA Summit an EA event?" and I would like to provide some answers and thoughts about that in this post.
First and foremost, the AVA Summit is an international conference series bringing together advocates focusing on systemic change for animals. The animal and vegan advocacy movement consists of people working on a wide range of strategies and approaches, but we all share a vision of a world where nonhuman animals, ultimately, are taken out of the food system and other human uses.
While the AVA Summit is not branded as an EA event, there are certain "EA lenses" that we frequently use in our assessments and work, such as the core principles around scalability, tractability, and neglectedness:
We emphasize topics relating to systemic change, e.g., changing corporate policy or legal protection.
We are committed to inspiring our participants to be as impactful as possible in the work they are doing for animals, which also means inspiring them to continuously reconsider, improve, and update.
We focus on farmed animals (including aquatic animals and insects), as well as on wild animals, suffering in large numbers.
To be very clear and transparent, I will add to points 1. and 2.:
Additionally, we welcome other tactics and strategies to be discussed in an atmosphere of mutual respect.
Additionally, other areas of animal exploitation, like experimentation, do also have a place at the AVA Summit.
Who attends the AVA Summit?
Our attendees are dedicated individuals working or looking to work in the animal and vegan advocacy movement professionally. 80% of our attendees are full-time advocates.
Lewis Bollard, Farm Animal Welfare Program Director at Open Philanthropy, said about the inaugural AVA Summit that it was "the event with the highest number of serious-minded people in the movement". We strive to achieve this high standard with all of our events.
Our events are also substantially more international than any other similar event: At the last Summit in the US, we welcomed more than 750 attendees from 47 different countries.
People at AVA have a multitude of motivations, including animal welfare, animal rights, environmental, social justice, effective altruism, public health, and they also join from other intersecting movements. A lot of attendees at AVA are EA-aligned without necessarily calling themselves "Effective Altruists".
What are the biggest advantages of going to the AVA Summit?
Having conversations that are directly related to your work in animal advocacy and learning about actionable strategies, as well as benefiting from lessons and collaborations in EA-adjacent movements.
Finding hidden counterfactual talent in a different pool of potential applicants, and networking with a highly diverse and highly international audience.
Meeting with a large number of farmed animal funders, as well as nonprofits. At the last Summit in the US, 215 organizations working directly in the movement were represented (not counting universities, companies, etc.).
What is the program at the AVA Summit like?
Like I mentioned before, we support various strategies and tactics for advancing our shared vision: From grassroots ac...


