The Nonlinear Library

The Nonlinear Fund
undefined
Feb 6, 2024 • 8min

AF - what does davidad want from "boundaries"? by Chipmonk

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: what does davidad want from "boundaries"?, published by Chipmonk on February 6, 2024 on The AI Alignment Forum. Chipmonk As the Conceptual Boundaries Workshop (website) is coming up, and now that we're also planning Mathematical Boundaries Workshop in April, I want to get more clarity on what exactly it is that you want out of "boundaries"/membranes. So I just want to check: Is your goal with boundaries just to formalize a moral thing? I'll summarize what I mean by that: Claim 1: By "boundaries", you mean "the boundaries around moral patients - namely humans". Claim 1b: And to some degree also the boundaries around plants and animals. Also maybe nations, institutions, and other things. Claim 2: If we can just (i) locate the important boundaries in the world, and then (ii) somehow protect them, Then this gets at a lot (but not all!) of what the "safety" in "AI safety" should be. Claim 3: We might actually be able to do that. e.g.: Markov blankets are a natural abstraction for (2.i). Claim 4: Protecting boundaries won't be sufficient for all of "safety" and there are probably also other (non-boundaries) specifications/actions that will also be necessary. For example, we would probably also need to separately specify some things that aren't obviously contained by the boundaries we mean, e.g.: "clean water", "clean air", and a tractably small set of other desiderata. Here are my questions for you: Q1: Do you agree with each of the claims above? Q2: Is your goal with boundaries just to formalize the moral/safety thing, or is there anything else you want from boundaries? Past context that's also relevant for readers: This new post I wrote about how preserving the boundaries around agents seems to be a necessary condition for their safety. Quotes you've made about boundaries that I've compiled here. This old post I wrote about boundaries as MVP morality which you endorsed. Q3: It seems that Garrabrant, Critch, and maybe others want different things from you and I'm wondering if you have thoughts about that. Garrabrant: From talking to him I know that he's thinking about boundaries too but more about boundaries in the world as instruments to preserve causal locality and predictability and evolution etc.. But this is quite different than talking about specifically the boundaries around agents. Critch: I haven't spoken to him yet, but I think you once told me that Critch seems to be thinking about boundaries more in terms of ~"just find the 'boundary protocol' and follow it and all cooperation with other agents will be safe". Is this right? If so, this seems closer to what you want, but still kinda different. TJ: I think TJ has some other ideas that I am currently unable to summarize. davidad Claim 1+1b: yes, to first order. [To second order, I expect that the general concept of things with "boundaries" will also be useful for multi-level world-modelling in general, e.g. coarse-graining fluid flow by modelling it in terms of cells that have boundaries on which there is a net flow, and that it might be a good idea to "bake in" something like a concept of boundaries to an AI system's meta-ontology, so that it has more of a tendency to have moral patients among the entities in its object-level ontology. But my mainline intention is for the object-level ontology to be created with humans in the loop, and the identification of entities with boundaries could perhaps be just as easily a layer of interpretation on top of an ontology with a more neutral meta-ontology of causation. Thinking through both routes more is at the frontier of what I consider "conceptual "boundaries" research".] davidad Claim 2: agreed. Claim 3: agreed. Claim 4: agreed. davidad Q2: yes, my ultimate goal with "boundaries" is just to formalise injunctions against doing harm, disrespecting autonomy, or (at the mo...
undefined
Feb 6, 2024 • 24min

LW - Preventing model exfiltration with upload limits by ryan greenblatt

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Preventing model exfiltration with upload limits, published by ryan greenblatt on February 6, 2024 on LessWrong. At some point in the future, AI developers will need to ensure that when they train sufficiently capable models, the weights of these models do not leave the developer's control. Ensuring that weights are not exfiltrated seems crucial for preventing threat models related to both misalignment and misuse. The challenge of defending model weights has previously been discussed in a RAND report. In this post, I'll discuss a point related to preventing weight exfiltration that I think is important and under-discussed: unlike most other cases where a defender wants to secure data (e.g. emails of dissidents or source code), model weights are very large files. At the most extreme, it might be possible to set a limit on the total amount of data uploaded from your inference servers so that an attacker would be unable to exfiltrate the model weights even if they totally compromised your inference servers, while still being able to serve an API and otherwise run a normal amount of inference. If this ends up being viable, then it would be much easier to protect model weights from competent adversaries because upload limits are relatively simple to enforce. Even if it turns out that such a bandwidth limit isn't feasible, the fact that any attacker will have to control a substantial fraction of upload bandwidth from your inference server might pose a substantial obstacle to exfiltration. In this post: I make some predictions about the ratio between a model's size and the total quantity of data that its inference servers will have to emit over the model lifetime. I conclude that the total quantity of data probably won't be more than a few orders of magnitude larger than the size of the model for an AI lab's most powerful AI. I suggest a variety of strategies to reduce the outflow bandwidth required from inference services. Most importantly, you can use a scheme involving arithmetic coding using a weak model that you are okay with being stolen. In this scheme, the weak model is trained to imitate the strong model. The weak model is present both inside and outside the inference network with the upload limit. While I expect that the sort of proposal I discuss here is well known, there are many specific details I discuss here which I haven't seen discussed elsewhere. If you are reasonably familiar with this sort of proposal, consider just reading the "Summary of key considerations" section which summarizes the specific and somewhat non-obvious points I discuss in this post. This proposal is written as a nearcast focused on SOTA LLMs, though I expect many of the conclusions to generalize. Given how promising this proposal seems, I think that further investigation is warranted. The main source of uncertainty is about the ratio between the number of inference tokens generated and the number of model parameters for the key model we want to protect. There are a variety of improvements which might allow for somewhat reducing total uploads, so pursuing these could be quite leveraged if we end up in a regime where marginal reduction in uploads substantially reduces risk. The viability of this proposal depends substantially on non-public information that AI labs possess, so internal investigation by AI labs will likely be key. However, external researchers could investigate compression schemes, other approaches for reducing the total uploads, or mechanisms for very reliably and securely tracking the total amount of data uploaded. I'm excited about further investigation of this idea. Summary of key considerations The total number of generated tokens from a given model might be similar to or smaller than the total number of parameters due to Chinchilla scaling laws. The ratio between gen...
undefined
Feb 6, 2024 • 22min

AF - Preventing exfiltration via upload limits seems promising by Ryan Greenblatt

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Preventing exfiltration via upload limits seems promising, published by Ryan Greenblatt on February 6, 2024 on The AI Alignment Forum. At some point in the future, AI developers will need to ensure that when they train sufficiently capable models, the weights of these models do not leave the developer's control. Ensuring that weights are not exfiltrated seems crucial for preventing threat models related to both misalignment and misuse. The challenge of defending model weights has previously been discussed in a RAND report. In this post, I'll discuss a point related to preventing weight exfiltration that I think is important and under-discussed: unlike most other cases where a defender wants to secure data (e.g. emails of dissidents or source code), model weights are very large files. At the most extreme, it might be possible to set a limit on the total amount of data uploaded from your inference servers so that an attacker would be unable to exfiltrate the model weights even if they totally compromised your inference servers, while still being able to serve an API and otherwise run a normal amount of inference. If this ends up being viable, then it would be much easier to protect model weights from competent adversaries because upload limits are relatively simple to enforce. Even if it turns out that such a bandwidth limit isn't feasible, the fact that any attacker will have to control a substantial fraction of upload bandwidth from your inference server might pose a substantial obstacle to exfiltration. In this post: I make some predictions about the ratio between a model's size and the total quantity of data that its inference servers will have to emit over the model lifetime. I conclude that the total quantity of data probably won't be more than a few orders of magnitude larger than the size of the model for an AI lab's most powerful AI. I suggest a variety of strategies to reduce the outflow bandwidth required from inference services. Most importantly, you can use a scheme involving arithmetic coding using a weak model that you are okay with being stolen. In this scheme, the weak model is trained to imitate the strong model. The weak model is present both inside and outside the inference network with the upload limit. While I expect that the sort of proposal I discuss here is well known, there are many specific details I discuss here which I haven't seen discussed elsewhere. If you are reasonably familiar with this sort of proposal, consider just reading the "Summary of key considerations" section which summarizes the specific and somewhat non-obvious points I discuss in this post. This proposal is written as a nearcast focused on SOTA LLMs, though I expect many of the conclusions to generalize. Given how promising this proposal seems, I think that further investigation is warranted. The main source of uncertainty is about the ratio between the number of inference tokens generated and the number of model parameters for the key model we want to protect. There are a variety of improvements which might allow for somewhat reducing total uploads, so pursuing these could be quite leveraged if we end up in a regime where marginal reduction in uploads substantially reduces risk. The viability of this proposal depends substantially on non-public information that AI labs possess, so internal investigation by AI labs will likely be key. However, external researchers could investigate compression schemes, other approaches for reducing the total uploads, or mechanisms for very reliably and securely tracking the total amount of data uploaded. I'm excited about further investigation of this idea. Summary of key considerations The total number of generated tokens from a given model might be similar to or smaller than the total number of parameters due to Chinchilla scaling laws....
undefined
Feb 6, 2024 • 7min

LW - Things You're Allowed to Do: University Edition by Saul Munn

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Things You're Allowed to Do: University Edition, published by Saul Munn on February 6, 2024 on LessWrong. This post is not titled "Things You Should Do," because these aren't (necessarily) things you should do. Many people should not do many of the items on this list, and some of the items are exclusive, contradictory, or downright the reverse of what you should do. If your reaction to something is "I think that's a bad idea," then it probably is, and you probably shouldn't do it. classes & professors attend classes you haven't signed up for because you find them interesting attend classes even if the waitlist is full ask the professor to waive a prerequisite ask the professor to join a class even if its full drop a class that you don't like take a class because you really liked the professor, even if you're not sure about the content of the class cold email professors you don't know, just asking to chat show up to office hours for classes you aren't a part of, just to chat with the professor ask the professor questions about the things you're not sure of skip class(es) for great opportunities elsewhere ask the professor if you can help them with anything in the class (grading, setting up assignments, editing papers, etc). professors have a long list of tasks, are perpetually behind, and encounter fairly correlated problems; if you track what problems your professors have, you can quite quickly become unreasonably useful for them ask professors at the beginning of the semester what things would be most important to memorize, then throw their answers into an Anki deck take non-credit courses or workshops in things like pottery, coding, or creative writing studying at places outside of your university: coffeeshops public libraries coworking spaces random offices, cold email them start a study group for the class ask the professor if you can announce that you're starting a study group for the class in the class start a group chat to ask questions about the class. this is one that everyone loves to be added to, and sometimes it just… doesn't happen, because nobody took the initiative to create it use Anki to study the things your professor said would be most important to memorize after you asked them at the beginning of the semester learn the content of a class by using materials that the professor doesn't point you toward (e.g. online textbooks/videos/tutors/etc) hire a tutor hire multiple tutors hire a tutor purely so that you have to study for some class you hate - you might not need help, but if you're paying someone $x/h for their time, you'd better be studying become a tutor in a subject you want to brush up on use ChatGPT as a tutor cowork with random people with me clubs join clubs join many clubs join many different types of clubs. shortlist: sports clubs (even intramural), art clubs, research clubs, project-based clubs, religious/cultural clubs, community service clubs, pre-professional clubs, music clubs show up at a club's meeting that you're not a part of stop going to a club's meetings completely stop without telling anyone tell the club leaders why you're stopping, and what changes would make you stay tell the club leaders you're considering stopping, and what changes would make you leave or stay ask if you can help out at the next club event ask this multiple times in a row ask what's preventing them from letting you help out yet start your own club. notably, schools will often throw hundreds or even thousands of dollars of funding at you to start a club with a few friends, and you can do a lot of cool things by saying "hey, I run [x] club, could you [ask]?" (h/t Joey) career capital evaluate not just "will this be good for my career" but "is this among the best options given the limited resources (time, money, energy, etc) that i have" - and also "is the...
undefined
Feb 6, 2024 • 5min

EA - Introducing the Animal Advocacy Forum - a space for those involved or interested in Animal Welfare & related topics by David van Beveren

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Introducing the Animal Advocacy Forum - a space for those involved or interested in Animal Welfare & related topics, published by David van Beveren on February 6, 2024 on The Effective Altruism Forum. Summary Farmed Animal Strategic Team (FAST) is thrilled to announce the launch of our Animal Advocacy Forum, a new platform aimed at increasing discussion and enhancing collaboration within the animal advocacy movement. We invite everyone involved or interested in animal welfare, alternative proteins, animal rights, or related topics to participate, share insights about their initiatives, and discover valuable perspectives. Thank you! What is FAST? For more than a decade, FAST has operated as a private Google Group list, connecting over 500+ organizations and 1,400+ individuals dedicated to farmed animal welfare. This network includes professionals from pivotal EA-aligned organizations such as Open Philanthropy, Good Food Institute, The Humane League, Animal Charity Evaluators (ACE) - including a wide range of smaller and grassroots-based groups. Why a forum? In response to feedback from our FAST survey, members expressed a strong interest in deeper discussions and improved collaboration. There was also considerable dissatisfaction with the 'reply-all' feature, which led to unintentional spamming of 1,400 members - as a result, FAST decided to broaden its services to include a forum. While the FAST List continues to serve as a private space within the animal advocacy movement, the FAST Forum is open to the public to foster greater engagement, particularly from those involved in the EA and other closely-aligned movements. What should be posted there? Echoing the EA Forum's Animal Welfare topic's role which provides a space for organizations to announce initiatives, discuss promising new ideas, and constructively critique ongoing work - FAST's platform serves as a dedicated hub for in-depth discussions on animal advocacy and related topics. It aims to enable nuanced debates and collaboration on key issues such as alternative proteins, grassroots strategy, corporate campaigns, legal & policy work, among others. What shouldn't be posted there? Discussions related to ongoing investigations or internal strategy, especially regarding campaigns or initiatives not yet public, should not be shared on the forum to safeguard the confidentiality and security of those efforts. Why not use the EA Forum? While the EA Forum is a valuable resource for animal advocacy dialogue, the FAST forum is designed to foster a more focused and close-knit community. The EA Forum's broad spectrum of topics and distinct cultural norms can be intimidating for some, making it challenging for those specifically focused on animal advocacy to find and engage in targeted conversations. This initiative mirrors other communities such as the AI Alignment Forum, which serve to concentrate expertise and foster discussions in a critically important area. With that in mind, we strongly encourage members to continue sharing key content on the EA Forum for visibility and cross-engagement within the broader EA community.[1] Where do I start? Feel free to join us over at the Animal Advocacy Forum and become an active participant in our growing community.[2] To get started, simply register, complete your profile, and start or contribute to discussions that match your interests and expertise. This is also a great opportunity to introduce yourself and share insights about the impactful work you're doing. Thank you! Thank you to the organizations and individuals who have provided invaluable feedback and support for the forum and FAST's rebranding efforts, including Animal Charity Evaluators, Veganuary, ProVeg International, Stray Dog Institute, Animal Think Tank, Freedom Food Alliance, GFI, and the AVA Summit. Also, a big...
undefined
Feb 6, 2024 • 25min

EA - A review of how nucleic acid (or DNA) synthesis is currently regulated across the world, and some ideas about reform (summary of and link to Law dissertation) by Isaac Heron

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A review of how nucleic acid (or DNA) synthesis is currently regulated across the world, and some ideas about reform (summary of and link to Law dissertation), published by Isaac Heron on February 6, 2024 on The Effective Altruism Forum. This is a post to share my Law Honours dissertation (link above) about the regulation of nucleic acid synthesis screening at an international level. I have also set out a summary below of the paper for those who do not have time/do not want to read the whole thing. This summary doesn't necessarily reflect the structure and emphasis of the paper, but instead focuses on some of the key insights I identified in my research which those on this forum who have an interest in biosecurity are unlikely to have already seen. I'm hoping to submit this for publication within the next few months, so if anyone has feedback for how to improve it (especially if I have some things wrong about the state of the law in each country I studied) that would also be appreciated. This article was also completed early in October 2023 so parts of it may be out of date, which it would be good to have highlighted (I am aware, for example, that it does not cover President Biden's Executive Order which includes provisions requiring the creation of future guidelines for nucleic acid synthesis screening). Introduction to the risks from nucleic acid synthesis (NAS) I won't cover this in much detail since introductions to the topic can be accessed elsewhere on this forum[1] and in the media,[2] except to give the following background: Nucleic acid synthesis (NAS) refers to the creation of DNA or other genetic molecules (e.g. RNA) artificially. This is done by several large companies and dozens of small ones across the world, who then deliver their product to researchers of various kinds. I will refer to NAS rather than DNA synthesis for the rest of this post because I think this broader category is what we really want to cover. The NAS process is currently undergoing rapid change, with new enzymatic techniques[3]potentially making it much cheaper to produce nucleic acids, especially at scale. Desktop synthesisers, which allow users to generate the sequences at their own labs, are also dramatically improving to the point that they may begin to replace the existing model where synthesis is mostly outsourced to private companies.[4] There are several publicly available databases of genetic sequences, which include various pathogenic sequences. The field of synthetic biology, which applies engineering and computer science tools to "programme" biology, may make it easier over time for those who have obtained synthetic nucleic acids to then use these sequences to recreate existing harmful pathogens (e.g. smallpox) or engineer even more dangerous pathogens.[5] Given these risks, it is often argued that NAS companies should screen their orders for potentially harmful sequences and screen their customers to ensure they are trustworthy. Although there are good argumentsfor why the perceived risk may be overblown, I think this is clearly something they should do. My dissertation focuses on the best way to ensure that this happens. Current well-known points regarding NAS regulation What I see as the key components of the international regulatory system for NAS are as follows: The International Gene Synthesis Consortium - a set of large gene synthesis companies which have joined together and agreed to screen their orders according to the Harmonised Screening Protocol. IGSC members agree to screen their orders against a shared database of sequences of concern derived from organisms listed on various official lists of organisms of concern. Two particularly important lists come from the Australia Group and the United States Select Agents and Toxins List. The United States Department o...
undefined
Feb 5, 2024 • 9min

LW - Implementing activation steering by Annah

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Implementing activation steering, published by Annah on February 5, 2024 on LessWrong. Produced as part of the SERI ML Alignment Theory Scholars Program - Autumn 2023 Cohort and while being an affiliate at PIBBSS in 2024. A thank you to @Jayjay and @fela for helpful comments on this draft. This blog post is an overview of different ways to implement activation steering with some of my takes on their pros and cons. See also this GitHub repository for my minimal implementations of the different approaches. The blog post is aimed at people who are new to activation/representation steering/engineering/editing. General approach The idea is simple: we just add some vector to the internal model activations and thus influence the model output in a similar (but sometimes more effective way) to prompting. Example[1]: Imagine that some vector in the internal representations in some transformer layer encodes a direction associated with "Love". When you add this vector to the activations of some encoded sentence "I hate the world", you change the internal representation (and thus the meaning) to something more like "I love the world". This graphic might help with an intuition: In general there are a few steps involved which I simplify in the following: Decide on a layer l and transformer module ϕ to apply the activation steering to. Define a steering vector. In the simplest case we just take the difference of the activations of two encoded strings like v=ϕl(Love)ϕl(Hate). Add the vector to the activation during the forward pass. In the simplest case it's something like ~ϕl=ϕl+v. Each of the three points mentioned above includes complexities you might encounter as a beginner. Feel free to move on to the next section if you prefer. You can do activation steering at pretty much any layer/module in the transformer. It's often done at the residual stream of one of the hidden layers. However, if you want to do activation steering by modifying the bias parameter, you need to do it in a transformer module that has a specific structure. This is usually not the residual stream but one can do it in the attention or the mlp module. When defining a direction there are several things that might complicate it: Tokens: When encoding a word or a short sentence it is often encoded into several tokens, so when you get to the internal activation ϕl(Love) you don't just have one vector but one vector per token. Even worse, 'Love' and 'Hate' might not be encoded with the same number of tokens, so then ϕl(Love) and ϕl(Hate) are two matrices with different dimensions. You can come up with different ways of how to deal with this, but one simple solution is to just use the representation of the last token since it should have all information encoded. Careful; if you use batches, you'll likely want to use left padding when choosing the last token to ensure your last token isn't a padding token. Data: You can potentially create a more meaningful steering vector, for instance, by averaging several vectors from contrastive pairs (for example "I love the world" - "I hate the world" or "I dislike cats" - "I adore cats"), applying PCA on a relevant dataset, or training a linear classifier and using its weights as the steering direction. Here are additional factors that may add complexity to the process of activation steering: Tokens: The question arises to which activations you actually want to add your steering vector. When you encode some text you could for example add it at the first token or the last or even at every token. I chose to do the latter, adding at every token of the new text. Careful; if you use batches you might not want to add to padding tokens. Scaling: In some cases, for example when v=ϕl(Love)ϕl(Hate), the length of the steering vector already contains some meaningful information. However yo...
undefined
Feb 5, 2024 • 5min

LW - Noticing Panic by Cole Wyeth

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Noticing Panic, published by Cole Wyeth on February 5, 2024 on LessWrong. In Competent Elites @Eliezer Yudkowsky discusses "executive nature," which he describes as the ability to "function without recourse," to make decisions without some higher decision maker to fall back on. I do not have executive nature. I like to be well prepared. I like to read every word of every chapter of a textbook before a problem set, take a look, give the last chapter another skim, and then start working. Being thrust into a new job and forced to explore a confusing repository and complete an open ended project is... stressful. The idea of founding a startup terrifies me (how could I settle on one business idea to throw years of my life at? And just thinking about the paperwork and legal codes that I would have to learn fills me with dread). There's a particular feeling I get just before freezing up in situations like this. It involves a sinking suspicion that failure is inevitable, a loss of self confidence, and a sort of physical awkwardness or even claustrophobia, like trying to claw my way up the sheer wall of a narrow shaft. In a word, it is panic. I believe that this is, for me, the feeling of insufficient executive nature. It may sound a little discouraging, but the very consistency of this feeling is, perhaps, a key to overcoming my limitations. The rationalist technique that I've found most useful is probably noticing confusion. It is useful because it is a trigger to re-evaluate my beliefs. It is a clue that an active effort must be made to pursue epistemic rigor before instinctively brushing over important evidence. In a way, "noticing confusion" is useful because it links my abstract understanding of Bayes with the physical and emotional experience of being a human. It is not, by itself, a recipe for correct epistemic conduct, but it provides a precious opportunity to take hold of the mind and steer it down a wiser path. Perhaps noticing panic is for planning and acting what noticing confusion is for belief. So how should one act when noticing panic? I do not know yet. But I do have a guess. I think that I panic when there are too many levels of planning between me and an objective. For instance, a simple task like performing a calculation or following a recipe has zero levels of planning. Solving a more difficult problem, for instance a routine proof, might have one level of planning: I do not know how to write down the proof, but I know I can (typically) come up with it by rereading the last couple of sections for definitions and reasoning through their conclusions. Solving a harder problem might require an unknown approach; I might have to consider which background I need to fill in to prepare myself to undertake it, and the correct route to a proof may not be clear; this is of course the third level. At the fourth level, I might not even know how to reason about what background I might need (sticking with mathematical examples, if a very difficult problem remains open for long enough, conventional approaches have all failed, and becoming an expert in any mainstream topic is unlikely to be sufficient - one strategy for succeeding where others have failed is to specialize in a carefully chosen esoteric area which no one else has realized is even related. Of course mathematicians usually only do this by accident). I actually suspect that many engineering tasks are pretty high up this hierarchy, which may be one reason I am less comfortable with them than theoretical work. Though much of an engineer's work is routine, roadblocks are often encountered, and after every out-of-the-box solution fails, it's often unclear what to even try next. A lot of mucking about ensues until eventually a hint (like a google-able line in a stack trace) appears, which... leads nowhere. The proc...
undefined
Feb 5, 2024 • 18min

EA - Three tricky biases: human bias, existence bias, and happy bias by Steven Rouk

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Three tricky biases: human bias, existence bias, and happy bias, published by Steven Rouk on February 5, 2024 on The Effective Altruism Forum. Introduction While many types of biases are more commonly known and accounted for, I think there may be three especially tricky biases that influence our thinking about how to do good: Human Bias We are all human, which may influence us to systematically devalue nonhuman sentience. Existence Bias We all already exist as biologically evolved beings, which may influence us to systematically overvalue the potential future existence of other biologically evolved beings.[1] Happy Bias We are relatively happy[2] - or at least we are not actively being tortured or experiencing incapacitating suffering while thinking, writing, and working - which may influence us to systematically undervalue the importance of extreme suffering.[3] Like other biases, these three influence our thinking and decision making unless we take steps to counteract them. What makes these biases more difficult to counter is the fact that they are universally held by every human working on doing good in the world, and it's difficult to see how anyone thinking about, writing about, and working on issues of well-being and suffering could not have these qualities - there is no group of individuals without these qualities who can advocate for their point of view. The point of this post is not to resolve these questions, but rather to prompt more reflection on these tricky biases and how they may be skewing our thinking and work in specific directions.[4] For those who are already aware of and accounting for these biases, bravo! For the rest of us, I think this topic deserves at least a little thought, and potentially much more than a little, if we wish to increase the accuracy of our worldview. If normal biases are difficult to counteract, these are even more so. Examples of How These Biases Might Affect Our Work If we ask ourselves, "How might these three biases affect someone's thinking about how to do good?", some answers we come up with are things that may be present in our EA community thought, work, and allocation of resources.[5] This could indicate that we have not done enough work to counteract these biases in our thinking, which would be a problem if moral intuitions are the hidden guides behind much of our prioritization (as has been suggested[6]). If our moral intuitions about fundamental ethical concepts are being invisibly biased by our being human, existing, and being relatively happy, then our conclusions may be heavily skewed. This is still true for those who use quantitative or probabilistic methods to determine their priorities, since once again moral intuitions are frequently required when setting many different values regarding moral weights, probabilities, etc. When looked at through the lens of moral uncertainty[7], we could say that these biases would skew our weights or assessments of different moral theories in predictable directions. Here are some specific examples of how these biases might show up in our thinking and work. In many cases, there is a bit more information in the footnotes. Human Bias Human bias would influence someone to focus the majority of their thinking and writing on human well-being.[8] Human bias would lead the majority of funding and work to be directed towards predominantly-human cause areas.[9][10] Human bias would influence someone to set humans as the standard for consciousness and well-being, with other beings falling somewhere below humans in their capacities. Human bias would influence someone to grant more weight to scenarios where humans either are or become the majority of moral value as a way of rationalizing a disproportionate focus on humans. Human bias would influence someone to devalue work that focuses ...
undefined
Feb 4, 2024 • 13min

EA - AI safety needs to scale, and here's how you can do it by Esben Kran

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI safety needs to scale, and here's how you can do it, published by Esben Kran on February 4, 2024 on The Effective Altruism Forum. AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety (see appendix). This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap? In this post, we aim to outline the for-profit AI safety opportunities within four key domains: Guaranteeing the public benefit and reliability of AI when deployed. Enhancing the interpretability and monitoring of AI systems. Improving the alignment of user intent at the model level. Developing protections against future AI risk scenarios using AI. At Apart, we're genuinely excited about what for-profit AI security might achieve. Our experience working alongside EntrepreneurFirst on AI security entrepreneurship hackathons, combined with discussions with founders, advisors, and former venture capitalists, highlights the promise of the field. With that said, let's dive into the details. Safety in deployment Problems related to ensuring that deployed AI is reliable, protected against misuse, and safe for users, companies, and nation states. Related fields include dangerous capability evaluation, control, and cybersecurity. Deployment safety is crucial to ensure AI systems function safely and effectively without misuse. Security also meshes well with commercial opportunities and building capability in this domain can scale strong security teams to solve future safety challenges. If you are interested in non-commercial research, we also suggest looking into governmental research bodies, such as the ones in UK, EU, and US. Concrete problems for AI deployment Enhancing AI application reliability and security: Foundation model applications, from user-facing chatbots utilizing software tools for sub-tasks to complex multi-agent systems, require robust security, such as protection against prompt injection, insecure plugins, and data poisoning. For detailed security considerations, refer to the Open Web Application Security Project's top 10 LLM application security considerations. Mitigating unwanted model output: With increasing regulations on algorithms, preventing illegal outputs may become paramount, potentially requiring stricter constraints than model alignment. Preventing malicious use: For AI API providers: Focus on monitoring for malicious or illegal API use, safeguarding models from competitor access, and implementing zero-trust solutions. For regulators: Scalable legislative auditing, like model card databases, open-source model monitoring, and technical governance solutions will be pivotal in 2024 and 2025. Compliance with new legislation, akin to GDPR, will likely necessitate extensive auditing and monitoring services. For deployers: Ensuring data protection, access control, and reliability will make AI more useful, private, and secure for users. For nation-states: Assurances against nation-state misuse of models, possibly through zero-trust structures and treaty-bound compute usage monitoring. Examples Apollo Research: "We intend to develop a holistic and far-ranging model evaluation suite that includes behavioral tests, fine-tuning, and interpretability approaches to detect deception and potentially other misaligned behavior." Lakera.ai: "Lakera Guard empowers organizations to build GenAI applications without worrying about prompt injections, data loss, harmful content, and other LLM risks. Powered by the world's most advanced AI threat intelligence." Straumli.ai: "Ship safe AI faster through managed auditing. Our comprehensive testing suite allows teams at all scales to focus on the upsides." Interpretability and oversight Problems related...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app