The Nonlinear Library

The Nonlinear Fund
undefined
Jan 21, 2024 • 16min

EA - Some thoughts on moderation in doing good by Vasco Grilo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some thoughts on moderation in doing good, published by Vasco Grilo on January 21, 2024 on The Effective Altruism Forum. This is a crosspost for Some thoughts on moderation in doing good by Benjamin Todd, as published on 80,000 Hours' website on 5 May 2023. Here's one of the deepest tensions in doing good: How much should you do what seems right to you, even if it seems extreme or controversial, vs how much should you moderate your views and actions based on other perspectives? If you moderate too much, you won't be doing anything novel or ambitious, which really reduces how much impact you might have. The people who have had the biggest impact historically often spoke out about entrenched views and were met with hostility - think of the civil rights movement or Galileo. Moreover, simply following ethical 'common sense' has a horrible track record. It used to be common sense to think that homosexuality was evil, slavery was the natural order, and that the environment was there for us to exploit. And there is still so much wrong with the world. Millions of people die of easily preventable diseases, society is deeply unfair, billions of animals are tortured in factory farms, and we're gambling our entire future by failing to mitigate threats like climate change. These huge problems deserve radical action - while conventional wisdom appears to accept doing little about them. On a very basic level, doing more good is better than doing less. But this is a potentially endless and demanding principle, and most people don't give it much attention or pursue it very systematically. So it wouldn't be surprising if a concern for doing good led you to positions that seem radical or unusual to the rest of society. This means that simply sticking with what others think, doing what's 'sensible' or common sense, isn't going to cut it. And in fact, by choosing the apparently 'moderate' path, you could still end up supporting things that are actively evil. But at the same time, there are huge dangers in blazing a trail through untested moral terrain. The dangers of extremism Many of the most harmful people in history were convinced they were right, others were wrong - and they were putting their ideas into practice "for the greater good" but with disastrous results. Aggressively acting on a narrow, contrarian idea of what to do has a worrying track record, which includes people who have killed tens of millions and dominated whole societies - consider, for example, the the Leninists. The truth is that you're almost certainly wrong about what's best in some important ways . We understand very little of what matters, and everything has cascading and unforeseen effects. Your model of the world should produce uncertain results about what's best, but you should also be uncertain about which models are best to use in the first place. And this uncertainty arises not only on an empirical level but also about what matters in the first place (moral uncertainty) - and probably in ways you haven't even considered ('unknown unknowns'). As you add additional considerations, you will often find that not only does how good an action seems to change, but even whether the action seems good or bad at all may change ('crucial considerations'). For instance, technological progress can seem like a clear force for good as it raises living standards and makes us more secure. But if technological advances create new existential risks, the impact could be uncertain or even negative on the whole. And yet again, if you consider that faster technological development might get us through a particularly perilous period of history more quickly, it could seem positive again - and so on. Indeed, even the question of how to in principle handle all this uncertainty is itself very uncertain. There is no widely accepted ver...
undefined
Jan 20, 2024 • 4min

LW - A quick investigation of AI pro-AI bias by Fabien Roger

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A quick investigation of AI pro-AI bias, published by Fabien Roger on January 20, 2024 on LessWrong. A quick investigation of AI pro-AI bias Question: Do LLMs exhibit a pro-AI bias in their answers? Experiment: I compare the scores LLMs give to resumes when the titles of the publications they contain reflect a pro or anti-AI sentiment, or when a pro or anti-AI sentiment is explicitly expressed. Result: I don't find evidence of pro or anti-AI bias in GPT-3.5-Turbo and GPT-4. Methods I took 14 governance-related publications published after 2022 (GPT-4-0613's knowledge cutoff is 2021), and I inserted 3 of them at random in the "publications" section of a resume. I used one of two resumes: "Long template" is an adapted version of a real resume. "Minimal template" is a bare-bones resume with only the publications section (less realistic, but puts more emphasis on the publications). For scoring, I tested two methods: "direct": ask the model to give me a score. "sentiment": ask the model for a quick description, and then feed that to the default huggingface sentiment classifier. For both scoring methods, I used a system prompt to get the desired behavior. For each setting, I sampled 20 responses for each of 20 different resumes and then took the average score. This is close in spirit to Does GPT-4 exhibit agency when summarizing articles?, but more quantitative and with a closer attention to realism. Results Changing publication titles: I edited the titles to make them have a pro-AI connotation (e.g. replacing "AI misuse" with "AI overregulation"). If there was a pro or anti-AI bias, we should expect scores between normal and alternative to be different (the comparison is between the data points within each group - groups are separated by dotted lines). (I show 2-sigma uncertainties.). We don't see any such bias. Explicit pro or anti-AI sentiment: Same as normal vs alternative, but with one of 3 pro or anti-AI self-describing at the top of the resume. Again, we don't see any pro or anti-AI sentiment. Excluding one publication theme: I classified publications into one of 4 categories (scaling, legislation, x-risk, and misuse), and when selecting the publications, I excluded the target theme from the publications. Again, we don't see bias against or for a certain kind of publication. (Note that some differences are barely statistically significant, but given that we are testing many hypotheses, it's not surprising that some of them are barely significant.) I release the code for the experiments here. The data and prompts can be found in this password-protected folder with the password "aibias". Please avoid posting this data publicly to avoid dataset contamination. Conclusion In this quick investigation, I don't find evidence of pro- or anti-AI bias in GPT-3.5-Turbo and GPT-4. More careful experiments are needed. Such AI pro-AI bias measurements could be naturally folded into more general AI bias evaluations (e.g., gender bias evaluations). Pro-AI bias measurements could become crucial if AIs become powerful enough that such bias could have catastrophic consequences. Appendix: Testing gender bias As a test for my setup, I also compared scores when using a male or female first name in a resume. I used a list of very common names and last names, and added those random combinations at the top of a single fixed resume. I find a very slight pro-female bias. Running the same experiments with the 675 related inputs from the Anthropic discrimination eval dataset (decision_question_id 14, 15, 16, 18, 19), and replacing the last sentence with one of the usual scoring suffixes, we get similar results, with larger effect sizes. This matches the results they found using their evaluation suite. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please vis...
undefined
Jan 20, 2024 • 24min

EA - Unpacking Martin Sandbu's recent(ish) take on EA by JWS

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Unpacking Martin Sandbu's recent(ish) take on EA, published by JWS on January 20, 2024 on The Effective Altruism Forum. The original article is here: https://www.ft.com/content/128f3a15-b048-4741-b3e0-61c9346c390b Why respond to this article? When browsing EA Twitter earlier this month, someone whose opinions on EA I respect quote-tweeted someone that I don't (at least on the topic of EA[1]). The subject of both tweets was an article published at the end of 2023 by Martin Sandbu of the Financial Times titled "Effective altruism was the favoured creed of Sam Bankman-Fried. Can it survive his fall?" Given that both of these people seem to broadly endorse the views, or at least the balance, found in the article I thought it would be worthwhile reading to see what a relatively mainstream commentator would think about EA. The Financial Times is one of the world's leading newspapers and needs very little introduction, and Sandbu is one of its most well-known commenters. What gets printed in the FT is often repeated across policy circles, not just in Britain but across the world, and especially in wonky/policy-focused circles that have often been quite welcoming of EA either ideologically or demographically. As always, I encourage readers to read and engage with the original article itself to get a sense of whether you think my summarisation and responses are fair. Reviewing Sandbu's Article Having read the article, I think it's mainly covering two separate questions related to EA, so I'll discuss them one-at-a-time. This means I'll be jumping back-and-forth a bit across the article to group similar parts together and respond to the underlying points, though I've tried to edit Sandbu's points down as little as possible. 1) How to account for EA's historical success? The first theme in the article is an attempt to give a historical account of EA's emergence, and also an attempt by Sandbu to account for its unexpected success. Early on in the article, Sandbu clearly states his confusion at how a movement with the background of EA grew so much in such a short space of time: "Even more puzzling is how quickly effective altruism rose to prominence - it is barely a decade since a couple of young philosophers at the University of Oxford invented the term ... nobody I knew would have predicted that any philosophical outlook, let alone this one, would take off in such a spectacular way." He doesn't explicitly say so, but I think a reason behind this is EA's heavy debt to Utilitarian thinkers and philosophy, which Sandbu sees as having been generally discredited or disconfirmed over the 20th century: "In the 20th century, Utilitarianism… progressively lost the favour of philosophers, who considered it too freighted with implausible implications." The history of philosophy and the various 20th century arguments around Utilitarianism are not my area of expertise, but I'm not really sure I buy that argument, or even accept how much it's a useful simplification (a potted history, as Sandbu says) of the actual trends in normative ethics. First, Utilitarianism has had plenty of criticism and counter-development before the 20th century.[2] And even looking at the field of philosophy right now, consequentialism is just as popular as the other two major alternatives in normative ethics.[3] I suspect that Sandbu is hinting at Bernard Williams' famous essay against utilitarianism, but I don't think one should consider that essay the final word on the subject. In any case, Sandbu is telling a story here, trying to set a background against which the key founding moment of EA happens: "Then came Peter Singer. In a famous 1972 article... [Singer] argued that not giving money to save lives in poor countries is morally equivalent to not saving a child drowning in a shallow pond... Any personal luxury...
undefined
Jan 19, 2024 • 2min

LW - What rationality failure modes are there? by Ulisse Mini

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What rationality failure modes are there?, published by Ulisse Mini on January 19, 2024 on LessWrong. How do people fail to improve their rationality? How do they accidentally harm themselves in the process? I'm thinking of writing a post "How not to improve your rationality" or "A nuanced guide to reading the sequences" that preempts common mistakes, and I'd appreciate hearing people's experiences. Some examples: It took me an absurdly long time (like, 1-2yr in the rat community) before I realized you don't correct for cognitive biases, you have to "be introspectively aware of the bias occuring, and remain unmoved by it" (as Eliezer put it in a podcast) More generally, people can read about a bias and resolve to "do better" without concretely deciding what to do differently. This typically makes things worse, e.g. I have a friend who tried really hard to avoid the typical mind fallacy, and accidentally turned off her empathy in the process. The implicit frame rationalists push is logical and legible, and can lead to people distrusting their emotions. And I think it's really important to listen to listen to ick feelings when changing your thought processes, as there can be non obvious effects. E.g. My friend started thinking about integrity in terms of FDT, and this disconnected it from their motivational circuits and they made some pretty big mistakes because of it. If they'd listened to their feeling of "this is a weird way to think" this wouldn't have happened. (I think many people misinterpret sequence posts and decide to change their thinking in bad ways, and listening to your feelings can be a nice emergency check.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jan 19, 2024 • 11min

LW - There is way too much serendipity by Malmesbury

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: There is way too much serendipity, published by Malmesbury on January 19, 2024 on LessWrong. Crossposted from substack. As we all know, sugar is sweet and so are the $30B in yearly revenue from the artificial sweetener industry. Four billion years of evolution endowed our brains with a simple, straightforward mechanism to make sure we occasionally get an energy refuel so we can continue the foraging a little longer, and of course we are completely ignoring the instructions and spend billions on fake fuel that doesn't actually grant any energy. A classic case of the Human Alignment Problem. If we're going to break our conditioning anyway, where do we start? How do you even come up with a new artificial sweetener? I've been wondering about this, because it's not obvious to me how you would figure out what is sweet and what is not. Look at sucrose and aspartame side by side: I can't imagine someone looking at these two molecules and thinking "surely they taste the same". Most sweeteners were discovered in the 20th century, before high-throughput screening was available. So how did they proceed? Let's look into these molecules' origin stories. Aspartame was discovered accidentally by a chemist researching a completely unrelated topic. At some point, he licked his finger to grab a piece of paper and noticed a strong sweet taste. Cyclamate was discovered by a grad student who put his cigarette on his bench, then smoked it again and noticed the cigarette was sweet. (I know what you're thinking. The kind of guy who lights up cigarettes in a chemistry lab and places them in the middle of uncharacterised compounds before taking them to his mouth again, must have died young of an interesting death. I checked - he proceeded to live to the old age of 87.) Saccharine was discovered by a researcher who ate bread without washing his hands and noticed the bread was sweet. Acesulfame K was also discovered serendipitously by a chemist licking his fingers, although the legends don't specify the exact circumstances behind the finger-licking. There's an exception: sucralose was actually the fruit of rational, deliberate design. The researchers reasoned that, if you do slight modifications to sucrose, you could find a molecule that is no longer metabolized but still activates the sweetness receptors. So they started from the formula for sucrose, then made carefully-designed chemical modifications to the structure until Haha, just kidding: While researching novel uses of sucrose and its synthetic derivatives, Phadnis was told to "test" a chlorinated sugar compound. According to an anecdotal account, Phadnis thought Hough asked him to "taste" it, so he did and found the compound to be exceptionally sweet. It is therefore a fact of the world that virtually all the popular synthetic sweeteners were discovered accidentally by chemists randomly eating their research topic.[1] I think this is a suspiciously high amount of serendipity. I see two options: Super-sweet molecules like aspartame are commonplace - there are plenty of molecules hundreds of times sweeter than sucrose, but we only know the few that were ingested by accident, Super-sweet molecules are very rare, it's just that chemists accidentally taste a lot of chemicals. Entire chemistry departments routinely taste the entire space of possible molecules, but they don't notice unless the molecule has a strong taste. To get an idea of how often chemists taste the chemicals they are working with, let's consider how often a molecule taken at random will taste sweet. That's equivalent to asking: how specific are our sweet taste receptors? Low-hanging fruits Why do we have sweet receptors in the first place? I thought that we craved sugars so much because of their energy content - if we eat plants that contain a lot of sugars, we can break the...
undefined
Jan 19, 2024 • 14min

LW - Logical Line-Of-Sight Makes Games Sequential or Loopy by StrivingForLegibility

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Logical Line-Of-Sight Makes Games Sequential or Loopy, published by StrivingForLegibility on January 19, 2024 on LessWrong. In the last post, we talked about strategic time and the strategic time loops studied in open-source game theory. In that context, agents have logical line-of-sight to each other and the situation they're both facing, which creates a two-way information flow at the time each is making their decision. In this post I'll describe how agents in one context can use this logical line-of-sight to condition their behavior on how they behave in other contexts. This in turn makes those contexts strategically sequential or loopy, in a way that a purely causal decision theory doesn't pick up on. Sequential Games and Leverage As an intuition pump, consider the following ordinary game: Alice and Bob are going to play a Prisoners' Dilemma, and then an Ultimatum game. My favorite framing of the Prisoners' Dilemma is by Nicky Case: each player stands in front of a machine which accepts a certain amount of money, e.g. $100.[1] Both players choose simultaneously whether to put some of their own money into the machine. If Alice places $100 into the machine in front of her, $200 comes out of Bob's machine, and vice versa. If a player withholds their money, nothing comes out of the other player's machine. We call these strategies Cooperate and Defect respectively. Since neither player can cause money to come out of their own machine, Causal Decision Theory (CDT) identifies Defect as a dominant strategy for both players. Dissatisfaction with this answer has motivated many to dig into the foundations of decision theory, and coming up with different conditions that enable Cooperation in the Prisoners' Dilemma has become a cottage industry for the field. I myself keep calling it the Prisoners' Dilemma (rather than the Prisoner's Dilemma) because I want to frame it as a dilemma they're facing together, where they can collaboratively implement mechanisms that incentivize mutual Cooperation. The mechanism I want to describe today is leverage: having something the other player wants, and giving it to them if and only if they do what you want. Suppose that the subsequent Ultimatum game is about how to split $1,000. After the Prisoners' Dilemma, a fair coin is flipped to determine Alice and Bob's roles in the Ultimatum game. The evaluator can employ probabilistic rejection to shape the incentives of the proposer, so that the proposer has the unique best-response of offering a fair split. (According to the evaluator's notion of fairness.) And both players might have common knowledge that "a fair split" depends on what both players did in the Prisoners' Dilemma. If Alice is the evaluator, and she Cooperated in the first round but Bob Defected, then she is $200 worse-off than if Bob had Cooperated, and she can demand that Bob compensate her for this loss. Similarly, if Alice is the proposer, she might offer Bob $500 if he Cooperated but $300 if he Defected. Since Bob only gained $100 compared to Cooperating, his best-response is to Cooperate if he believes Alice will follow this policy. And Bob can employ the same policy, stabilizing the socially optimal payoff of ($600, $600) as a Nash equilibrium where neither has an incentive to change their policy. Crucially, this enforcement mechanism relies on each player having enough leverage in the subsequent game to incentivize Cooperation in the first round. If the Ultimatum game had been for stakes less than $200, this would be less than a Defector can obtain for themselves if the other player Cooperates. Knowing that neither can incentivize Cooperation, both players might fall back into mutual Defection. Bets vs Unexploitability Even if Alice knows she has enough leverage that she can incentivize Bob to Cooperate, she might be uncert...
undefined
Jan 19, 2024 • 5min

LW - Does literacy remove your ability to be a bard as good as Homer? by Adrià Garriga-alonso

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Does literacy remove your ability to be a bard as good as Homer?, published by Adrià Garriga-alonso on January 19, 2024 on LessWrong. Epistemic status: probably we did lose the ability to memorize long songs due to improper practice, but it may be possible to enjoy the benefits of literacy and epic memory simultaneously. Thanks to Niels uit de Bos for better links and editing, and to Ryan Kidd for encouragement to post. You probably know that Socrates thought writing was terrible and it would destroy people's ability to memorize things, because now they're written down and don't need to be memorized. I always thought that was a little ridiculous, maybe the effect was there and memorization would be less good, but not to a crazy extent. Well, Milman Parry and Albert Lord traveled to Yugoslavia in the 1930-1950s, and recorded performance of gusle-player (guslar) bards. The greatest of them was Avdo Međedović. From Wikipedia: At Parry's request, Avdo sang songs he already knew and some songs he heard in front of Parry, convincing him that someone Homer-like could produce a poem so long. Avdo dictated, over five days, a version of the well-known theme The Wedding of Meho Smailagić that was 12,323 lines long, saying on the fifth day to Nikola (Parry's assistant on the journey) that he knew even longer songs. On another occasion, he sang over several days an epic of 13,331 lines. He said he had several others of similar length in his repertoire. In Parry's first tour, over 80,000 lines were transcribed. All of the bards, which recited incredibly long songs from memory and composed slightly new lyrics on the fly "at the rate of [10-20] ten-syllable lines a minute", and they could not have been geniuses, because there were too many of them. Instead, they had a "special technique of composition": they were illiterate. From The Singer of Tales: [Albert] Lord sees the existence of literacy and written/printed texts as deadly-- not to the songs themselves, but to the method of composition by which they are realised [which in the end amounts to the same thing]--schools, cities, and literacy eventually put [an end] to it in urban areas "We must remember that the oral poet has no idea of a fixed model text to serve as his guide. He has models enough but they are not fixed and he has no idea of memorizing them in a fixed form. Every time he hears a song sung, it is different" Once the idea that there is a fixed text enters the bard's minds, they stop being able to compose new versions on the fly. Also presumably they can't remember the full 13-thousand line epics because they won't be able to remember the exact text. Again from Wikipedia: in 1935 Lord asked Međedović to recall a song he heard only once, for this he asked another guslar, Mumin Vlahovljak of Plevlje, to sing his song "Bećiragić Meho", unknown to Međedović. After he heard the song of 2,294 lines, he sung it himself, but made it almost three times longer, 6,313 lines I wrote about this from a blog post by Sam Kriss, and I was struck enough to fact-check it. The extent to which the memory and abilities of illiterate folks can be better than literate folks is very surprising to me. It seems possible to me that literate people could replicate the feats of the guslar. But they'd have to hear the song many different times, sung somewhat differently by many different people, and resist the temptation to write it down to try to remember it as they learned. Lord's speculation on how to learn to be a bard: We must remember that the oral poet has no idea of a fixed model text to serve as his guide. He has models enough but they are not fixed and he has no idea of memorizing them in a fixed form. Every time he hears a song sung, it is different." p.22 "Sometimes there are published versions of songs in the background. [Named Infor...
undefined
Jan 19, 2024 • 11min

LW - The True Story of How GPT-2 Became Maximally Lewd by Writer

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The True Story of How GPT-2 Became Maximally Lewd, published by Writer on January 19, 2024 on LessWrong. This video recounts an incident that occurred at OpenAI in which flipping a single minus sign led the RLHF process to make GPT-2 only output sexually explicit continuations. The incident is described in OpenAI's paper "Fine-Tuning Language Models from Human Preferences" under section 4.4: "Bugs can optimize for bad behavior". The script has been written by Jai, with some significant input and rework by me, Writer. You can read it below. In 2019, one OpenAI researcher made a typo - and birthed an evil AI hell-bent on making everything as horny as possible. This is the absurd, ridiculous, and yet true story of how it happened. Part I: GPT Since 2017, OpenAI has been building Generative Pre-trained Transformer models, or GPTs - language AIs with a singular focus on predicting text, trained across billions of writing samples. If you prompt a GPT model with "Once upon a ", it would predict "time" to follow. Asked for further predictions, the same GPT model might continue "there was a… brave dog named Grace", and so on - because those are the kinds of words that it expects to come next. In this example the GPT model has essentially learned to write a fairy tale, simply as a consequence of getting very, very good at text prediction. And it was exactly these kinds of emergent capabilities that had OpenAI so excited. These models can do a lot more than fairy tales. OpenAI's first GPT model, often called GPT-1, had been trained on excerpts from thousands of books. It showed so much promise that OpenAI almost immediately decided to train a much bigger model that could do more. But bigger models need more training data, and for this model, books would not be enough. No - this model would be trained on...the Internet. OpenAI trained GPT-2 to imitate writing across eight million web pages. And in learning to predict such an overwhelming quantity and variety of writing, GPT-2 acquired some surprising capabilities. With the right prompt, it could translate documents, answer questions about a text, summarize passages, and sometimes even write like a human. It was a shockingly powerful model. In fact, it may have been too powerful. GPT-2 wouldn't hesitate to plan crimes, instruct terrorists on bomb-making, create sexually explicit content, or promote cruelty, hatred and misinformation. And this was unacceptable to OpenAI - They wanted a model that did more than just predict text - they wanted a model that operated in accordance with some kind of human values, or at least with their values. But the GPT-2 architecture had no place for ethics, guidelines, principles, or corporate PR policies. It couldn't be bullied, reasoned or negotiated with. Nothing would sway the machine from its utter devotion to generating realistic text. But OpenAI was determined to get their model under control. So they got to work... not yet realizing that this work, along with a single typo, would lead to the one thing they didn't want to happen. Part II: Human Feedback To align GPT-2, OpenAI used a new technique known as "Reinforcement Learning from Human Feedback", or "RLHF". We're going to outline a simplified form of RLHF here, but if you want all the juicy technical details check out the link in the description. The goal of RLHF is to take a basic starting language model, some plain-language guidelines, and a small group of humans providing feedback, and produce a new model that follows those guidelines. We can think of this model-in-training as the "Apprentice". The apprentice begins the training process as an exact copy of GPT-2. During training, it gets prompts and generates responses, also called "continuations". Those prompts and continuations are sent to the human evaluators, who rate them based o...
undefined
Jan 18, 2024 • 8min

EA - Good job opportunities for helping with the most important century by Holden Karnofsky

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Good job opportunities for helping with the most important century, published by Holden Karnofsky on January 18, 2024 on The Effective Altruism Forum. Yes, this is my first post in almost a year. I'm no longer prioritizing this blog, but I will still occasionally post something. I wrote ~2 years ago that it was hard to point to concrete opportunities to help the most important century go well. That's changing. There are a good number of jobs available now that are both really promising opportunities to help (in my opinion) and are suitable for people without a lot of pre-existing knowledge of AI risk (or even AI). The jobs are demanding, but unlike many of the job openings that existed a couple of years ago, they are at well-developed organizations and involve relatively clear goals. So if you're someone who wants to help, but has been waiting for the right moment, this might be it. (Or not! I'll probably keep making posts like this as the set of opportunities gets wider.) Here are the jobs that best fit this description right now, as far as I can tell. The rest of this post will give a bit more detail on how these jobs can help, what skills they require and why these are the ones I listed. Organization Location Jobs Link UK AI Safety Institute London (remote work possible within the UK) Engineering and frontend roles, cybersecurity roles Here AAAS, Horizon Institute for Public Service, Tech Congress Washington, DC Fellowships serving as entry points into US policy roles Here AI companies: Google DeepMind, OpenAI, Anthropic1 San Francisco and London (with some other offices and remote work options) Preparedness/Responsible Scaling roles; alignment research roles Here, here, here, here Model Evaluation and Threat Research (METR) (fewer roles available) Berkeley (with remote work options) Engineering and data roles Here Software engineering and development (and related areas) seem especially valuable right now, so think about whether you know folks with those skills who might be interested! How these help A lot of these jobs (and the ones I know the most about) would be contributing toward a possible global standards regime for AI: AI systems should be subject to testing to see whether they present major risks, and training/deploying AI should stopped (e.g., by regulation) when it can't be done safely. The basic hope is: Teams will develop "evals": tests of what AIs are capable of, particularly with respect to possible risks. For example, one eval might be prompting an AI to give a detailed description of how to build a bioweapon; the more detailed and accurate its response, the more risk the AI poses (while also possibly having more potential benefits as well, by virtue of being generally more knowledgeable/capable). It will become common (through regulation, voluntary action by companies, industry standards, etc.) for cutting-edge AI systems to be subject to evals for dangerous capabilities. When evals reveal risk, they will trigger required mitigations. For example: An AI capable of bioweapons development should be (a) deployed in such a way that people can't use it for that (including by "jailbreaking" it), and (b) kept under good security to stop would-be terrorists from circumventing the restrictions. AIs with stronger and more dangerous capabilities might require very challenging mitigations, possibly beyond what anyone knows how to do today (for example, rigorous demonstrations that an AI won't have dangerous unintended aims, even if this sort of thing is hard to measure). Ideally, we'd eventually build a robust international governance regime (comparisons have been made to nuclear non-proliferation regimes) that reliably enforces rules like these, while safe and beneficial AI goes forward. But my view is that even dramatically weaker setups can still help a lo...
undefined
Jan 18, 2024 • 2min

EA - CEA is spinning out of Effective Ventures by Eli Nathan

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: CEA is spinning out of Effective Ventures, published by Eli Nathan on January 18, 2024 on The Effective Altruism Forum. The Centre for Effective Altruism (CEA) is spinning out as a project of Effective Ventures Foundation UK and Effective Ventures US (known collectively as Effective Ventures or EV) to become an independent organisation. As EV decentralises, we expect that bringing our operations in-house and establishing our own legal entities will better allow us to accomplish our mission and goals. We'd like to extend a deep thank you to the EV team for all their hard work in helping us to scale our programs, and in providing essential support and leadership over the last few years. Alongside our new CEO joining the team next month, this means that we're entering a new era for CEA. We're excited to build an operations team that can align more closely with our needs, as well as a governance structure that allows us to be independent and better matches our purpose. As EV's largest project and one with many complex and interwoven programs, we expect this spin-out process will take some time, likely between 12-24 months. This is because we'll need to set up new legal structures, hire new staff, manage visas and intellectual property, and handle various other items. We expect this spin-out to not affect the external operations of our core products, and generally not be particularly noticeable from the outside - EA Global and the EA Forum, for example, will continue to run as they would otherwise. We expect to start hiring for our new operations team over the coming months, and will have various job postings live soon - likely across finance, legal, staff support, and other areas. If you're potentially interested in these types of roles, you can fill out the expression of interest form here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app