The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Feb 22, 2024 • 3min

AF - Analogies between scaling labs and misaligned superintelligent AI by Stephen Casper

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Analogies between scaling labs and misaligned superintelligent AI, published by Stephen Casper on February 21, 2024 on The AI Alignment Forum. TL;DR: Scaling labs have their own alignment problem analogous to AI systems, and there are some similarities between the labs and misaligned/unsafe AI. Introduction Major AI scaling labs (OpenAI/Microsoft, Anthropic, Google/DeepMind, and Meta) are very influential in the AI safety and alignment community. They put out cutting-edge research because of their talent, money, and institutional knowledge. A significant subset of the community works for one of these labs. This level of influence is beneficial in some aspects. In many ways, these labs have strong safety cultures, and these values are present in their high-level approaches to developing AI - it's easy to imagine a world in which things are much worse. But the amount of influence that these labs have is also something to be cautious about. The alignment community is defined by a concern that subtle misalignment between the incentives that we give AI systems and what we actually want from them might cause these systems to dangerously pursue the wrong goals. This post considers an analogous and somewhat ironic alignment problem: between human interests and the scaling labs. These labs have intelligence, resources, and speed well beyond that of any single human. Their amount of money, compute, talent, and know-how make them extremely capable. Given this, it's important that they are aligned with the interests of humanity. However, there are some analogies between scaling labs and misaligned AI. It is important not to draw false equivalences between different labs. For example, it seems that by almost every standard, Anthropic prioritizes safety and responsibility much more than other labs. But in this post, I will generally be lumping them together except to point out a few lab-specific observations. Misaligned Incentives In much the same way that AI systems may have perverse incentives, so do the labs. They are companies. They need to make money, court investors, make products, and attract users. Anthropic and Microsoft even just had Super Bowl ads. This type of accountability to commercial interests is not perfectly in line with doing what is good for human interests. Moreover, the labs are full of technocrats whose values and demographics do not represent humanity particularly well. Optimizing for the goals that the labs have is not the same thing as optimizing for human welfare. Goodhart's Law applies. Power Seeking One major risk factor of misaligned superintelligent AI systems is that they may pursue power and influence. But the same is true of the scaling labs. Each is valued in the billions of dollars due to its assets and investments. They compete with each other for technical primacy. The labs also pursue instrumental goals, including political influence with lobbying and strategic secrecy to reduce the risk of lawsuits involving data and fair use. Recent news that Sam Altman is potentially pursuing trillions in funding for hardware suggests that this type of power-seeking may reach large scales in the near future. To stay competitive, labs need to keep scaling, and when one lab scales, others are driven to do so as well in an arms race. Lack of Transparency Trust without transparency is misguided. We want AI systems that are honest white boxes that are easy to interpret and understand. However, the scaling labs do not meet this standard. They tend to be highly selective in what they publicize, have employees sign non-disclosure agreements, and generally lack transparency or accountability to the public. Instead of being white boxes, the labs are more like dark grey boxes that seem to rarely choose to reveal things that would make them look bad. A lack of explan...

Feb 21, 2024 • 41min

EA - Short agony or long ache: comparing sources of suffering that differ in duration and intensity by cynthiaschuck

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Short agony or long ache: comparing sources of suffering that differ in duration and intensity, published by cynthiaschuck on February 21, 2024 on The Effective Altruism Forum. Cynthia Schuck-Paim; Wladimir J. Alonso; Cian Hamilton (Welfare Footprint Project) Overview In assessing animal welfare, it would be immensely beneficial to rely on a cardinal metric that captures the overall affective experience of sentient beings over a period of interest or lifetime. We believe that the concept of Cumulative Pain (or Pleasure, for positive affective states), as adopted in the Welfare Footprint framework, aligns closely with this ideal. It quantifies the time spent in various intensities of pain and has proven operationally useful, providing actionable insights for guiding cost-effective interventions aimed at reducing animal suffering. However, it does not yet offer a unified metric of suffering, as it measures time spent in four categories of pain intensity. While we anticipate this complexity will persist for some time - given the current challenges in equating pain intensities - we believe the discussion on the possibility of integrating these four categories is necessary and valuable. We are thus sharing this document here to contribute in this discussion and elicit feedback and criticism to help us improve our approach. We apologize for the academic tone of the text, initially written with an academic paper in mind. Key Takeaways Pain's aversiveness escalates disproportionally with its intensity, making severe pains feel disproportionately worse. Determining the exact form of the relationship, however, is still challenging, as insights from human pain studies are limited and difficult to apply to animals, and designing experiments to address this issue in animals is inherently challenging. Intensity weights are likely dynamic and modulated by multiple factors, including interspecific differences in the perception of time. The very relationship between pain aversiveness and intensity may change depending on the experience's duration. Currently, the uncertainty associated with putative weights among pain intensity categories is orders of magnitude greater than the uncertainty related to other attributes of pain experiences, such as their prevalence or duration. Given these challenges, we currently favor a disaggregated approach. Disaggregated estimates can currently rank most welfare challenges and farm animal production scenarios in terms of suffering. In the case of more complex trade-offs between brief intense pain and longer-lasting milder pain we suggest two approaches. First, ensuring that all consequences of the welfare challenges are taken into account. For example, the effects of long-lasting chronic pain extend beyond the immediate experience, leading to long-term consequences (e.g., pain sensitization, immune suppression, behavioral deprivation, helplessness, depression) that may themselves trigger experiences of intense pain. The same may happen with experiences of brief intense pain endured early in life. Second, once all secondary effects are considered, we suggest examining which weights would steer different decision paths, and determining how justifiable those weights are. This approach allows for normative flexibility, enabling stakeholders to rely on their own values and perspectives when making decisions. Background Pain, both physical and psychological, is an integral aspect of life for sentient organisms. Pain serves a vital biological purpose by signaling actual or potential harm or injury, prompting individuals to avoid or mitigate the cause of pain [1]. It varies in intensity, from a mild annoyance to an excruciating agony, and duration, from fleeting moments to persistent, long-lasting conditions. This diversity in the intensity and duration of ...

Feb 21, 2024 • 3min

AF - Extinction Risks from AI: Invisible to Science? by Vojtech Kovarik

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Extinction Risks from AI: Invisible to Science?, published by Vojtech Kovarik on February 21, 2024 on The AI Alignment Forum. Abstract: In an effort to inform the discussion surrounding existential risks from AI, we formulate Extinction-level Goodhart's Law as "Virtually any goal specification, pursued to the extreme, will result in the extinction[1] of humanity'', and we aim to understand which formal models are suitable for investigating this hypothesis. Note that we remain agnostic as to whether Extinction-level Goodhart's Law holds or not. As our key contribution, we identify a set of conditions that are necessary for a model that aims to be informative for evaluating specific arguments for Extinction-level Goodhart's Law. Since each of the conditions seems to significantly contribute to the complexity of the resulting model, formally evaluating the hypothesis might be exceedingly difficult. This raises the possibility that whether the risk of extinction from artificial intelligence is real or not, the underlying dynamics might be invisible to current scientific methods. Together with Chris van Merwijk and Ida Mattsson, we have recently wrote a philosophy-venue version of some of our thoughts on Goodhart's Law in the context of powerful AI [link].[2] This version of the paper has no math in it, but it attempts to point at one aspect of "Extinction-level Goodhart's Law" that seems particularly relevant for AI advocacy --- namely, that the fields of AI and CS would have been unlikely to come across evidence of this law, in the environments typically studied in these fields, even if the law did hold in the real world. Since commenting on link-posts is inconvenient, I split off some of the ideas from the paper into the following separate posts: Weak vs Quantitative Extinction-level Goodhart's Law: defining different versions of the notion of "Extinction-level Goodhart's Law". Which Model Properties are Necessary for Evaluating an Argument?: illustrating the methodology of the paper on a simple non-AI example. Dynamics Crucial to AI Risk Seem to Make for Complicated Models: applying the methodology above to AI risk. We have more material on this topic, including writing with math[3] in it, but this is mostly not yet in a publicly shareable form. The exception is the post Extinction-level Goodhart's Law as a Property of the Environment (which is not covered by the paper). If you are interested in discussing anything related to this, definitely reach out. ^ A common comment is that the definition should also include outcomes that are similarly bad or worse than extinction. While we agree that such definition makes sense, we would prefer to refer to that version as "existential", and reserve the "extinction" version for the less ambiguous notion of literal extinction. ^ As an anecdote, it seems worth mentioning that I tried, and failed, to post the paper to arXiv --- by now, it has been stuck there with "on hold" status for three weeks. Given that the paper is called "Existential Risk from AI: Invisible to Science?", there must be some deeper meaning to this. ^ Or rather, it has pseudo-math in it. By which I mean that it looks like math, but it is built on top of vague concepts such as "optimisation power" and "specification complexity". And while I hope that we will one day be able to formalise these, I don't know how to do so at this point. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

Feb 21, 2024 • 22min

AF - Extinction-level Goodhart's Law as a Property of the Environment by Vojtech Kovarik

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Extinction-level Goodhart's Law as a Property of the Environment, published by Vojtech Kovarik on February 21, 2024 on The AI Alignment Forum. Summary: Formally defining Extinction-level Goodhart's Law is tricky, because formal environments don't contain any actual humans that could go extinct. But we can do it using the notion of an interpretation mapping, which sends outcomes in the abstract environment to outcomes in the real world. We can then the truth condition of Extinction-level Goodhart's Law as a property of the environment. I conjecture that Extinction-level Goodhart's Law does not hold in easily formalisable environments[1], even though it might hold in the real world. This seems like a (very) big deal for AI advocacy, since it suggests that the lack of rigorous arguments concerning AI risk (eg, math proofs) does not provide strong evidence for the safety of AI. Semi-formal definition of Extinction-level Goodhart's Law Informally, we can define the extinction-level[2] variant of Goodhart's law as follows: Definition (informal): The Weak Version[3] of Extinction-level Goodhart's Law is the claim that: "Virtually any goal specification, pursued to the extreme, will result in the extinction of humanity." The tricky part is how to translate this into a more rigorous definition that can be applied to formal environments. Defining "extinction" in formal environments However, applying this definition to formal environments is tricky, because it requires formally defining which of the abstract states qualify as "extinction of humanity". How can we get past this obstacle? What we don't do: "extinction states" given by definition A lazy way of defining extinction in abstract models would be to assume that we are given some such "extinction" states by definition. That is, if Ω is the set of all possible states in the formal environment, we could assume that there is some set ΩextinctionΩ. And we would refer to any ωΩextinction as "extinction". I don't like this approach, because it just hides the problem elsewhere. Also, this approach does not put any constraints on how Ωextinction should look like. As a result, we would be unlikely to be able to derive any results. The approach we take instead is to augment the formal environment by an interpretation, where each state "abstract" state ω in the formal environment Ω is mapped onto some state φ(ω) in some "more complex" environment Ω'.[4] Real-world interpretations The typical use of an abstract model is that we use it to study some more complex thing. While doing this, we implicitly hold in mind some interpretation of elements of the abstract model. For example, when using arithmetics to reason about apples in a box, I might interpret nN as "there are n apples in the box" and n+m as "there are n apples in the box, and I put another m in there" (or whatever). Naturally, an abstract model can have any number of different interpretations --- however, to avoid imprecision (and the motte-and-bailey fallacy), we will try to focus on a single[5] interpretation at any given time. Note that we intend interpretation to be a very weak notion --- interpretations are not meant to automatically be accurate, sensible, etc. Definition 1 (interpretation, formal): Let a set Ω be the state space of some formal model. A (formal) interpretation of Ω is a pair (Ω',φ), where Ω' is a set representing the state space of some other formal model and φ:ΩΩ' is the interpretation function. The definition only talks about the interpretation of states, but we could similarly talk about the interpretations of actions, features of states, etc.[6] Definition 2 (real-world interpretation, semi-formal): Let a set Ω be the state space of some formal model. A real-world interpretation is a some φ:ΩΩ', where Ω' are the possible states of the real world. No...

Feb 21, 2024 • 18min

AF - Dynamics Crucial to AI Risk Seem to Make for Complicated Models by Vojtech Kovarik

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Dynamics Crucial to AI Risk Seem to Make for Complicated Models, published by Vojtech Kovarik on February 21, 2024 on The AI Alignment Forum. This post overlaps with our recent paper Extinction Risk from AI: Invisible to Science?. tl;dr: In AI safety, we are worried about certain problems with using powerful AI. (For example, the difficulty of value specification, instrumental convergence, and the possibility that a misaligned AI will come up with takeover strategies that didn't even occur to us.) To study these problems or convince others that they are real, we might wish to describe them using mathematical models. However, this requires using models that are sufficiently rich that these problems could manifest in the first place. In this post, I suggest thinking about how such "rich-enough" models could look like. Also, I raise the possibility that models which are rich enough to capture problems relevant to AI alignment might be too complex to be amenable to a rigorous analysis. Epistemic status: Putting several related observations into one place. But I don't have strong opinions on what to make of them. In the previous post, I talked about "straightforwardly evaluating" arguments by modelling the dynamics described in those arguments. In this post, I go through some dynamics that seem central to AI risk. However, none of these dynamics is meant to be novel or surprising. Instead, I wish to focus on the properties of the mathematical models that could capture these dynamics. How do such models look like? How complicated are they? And --- to the extent that answering some questions about AI risk requires modeling the interplay between multiple dynamics --- is there some minimal complexity of models which can be useful for answering those questions? Laundry List of Dynamics Closely Tied to AI Risk In this section, I list a number of dynamics that seem closely tied to AI risk, roughly[1] grouped based on which part of the "AI risk argument" they relate to. Below each part of this list, I give some commentary on which models might be useful for studying the given dynamics. I recommend reading selected parts that seem interesting to you, rather than going through the whole text. For the purpose of skimming, here is a list of the dynamics, without any explanations: I. Difficulty of specifying our preferences[2]: Human preferences are ontologically distant from the laws of physics. Human preferences are ontologically distant from the language we use to design the AI. Laws of physics are unknown. Human preferences are unknown. II. Human extinction as a convergent byproduct of terminal goals[3]: The world is malleable. The world is made of resources. Humans evolved to require a narrow range of environment conditions III. Human extinction as a convergently instrumental subgoal[4]: The environment has been optimised for our preferences. Humans are power-seeking. Power is, to some extent, zero-sum. IV. Most attempts to constrain an AI's actions fail for superintelligent AIs[5]: Specifying restrictions is difficult for the same reasons that value specification is difficult. The AI can act by proxy. The AI can exploit novel strategies and technologies. The AI, and everything constraining it, is fully embedded in the environment. I. Difficulty of specifying our preferences[2] A key part of worries about AI risk is that formally writing down what we want --- or even somehow indirectly gesturing at it --- seems exceedingly difficult. Some issues that are related to this are: Concepts that are relevant for specifying our preferences (e.g., "humans'" and "alive'') on the one hand, and concepts that are primitive in the environment (e.g., laws of physics) on the other, are separated by many levels of abstraction. Consider the ontology of our agents (e.g., the format of their input/o...

Feb 21, 2024 • 12min

AF - Which Model Properties are Necessary for Evaluating an Argument? by Vojtech Kovarik

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Which Model Properties are Necessary for Evaluating an Argument?, published by Vojtech Kovarik on February 21, 2024 on The AI Alignment Forum. This post overlaps with our recent paper Extinction Risk from AI: Invisible to Science?. Summary: If you want to use a mathematical model to evaluate an argument, that model needs to allow for dynamics that are crucial to the argument. For example, if you want to evaluate the argument that "a rocket will miss the Moon because the Moon moves", you, arguably, can't use a model where the Moon is stationary. This seems, and is, kind of obvious. However, I think this principle has some less obvious implications, including as a methodology for determining what does and doesn't constitute evidence for AI risk. Additionally, I think some of what I write is quite debatable --- if not the ideas, then definitely the formulations. So I am making this a separate post, to decouple the critiques of these ideas/formulations from the discussion of other ideas that build on top of it. Epistemic status: I am confident that the ideas the text is trying to point at are valid and important. And I think they are not appreciated widely enough. At the same time, I don't think that I have found the right way to phrase them. So objections are welcome, and if you can figure out how to phrase some of this in a clearer way, that would be even better. Motivating Story Consider the following fictitious scenario where Alice and Bob disagree on whether Alice's rocket will succeed at landing on the Moon[1]. (Unsurprisingly, we can view this as a metaphor for disagreements about the success of plans to align AGI. However, the point I am trying to make is supposed to be applicable more generally.) Alice: Look! I have built a rocket. I am sure that if I launch it, it will land on the Moon. Bob: I don't think it will work. Alice: Uhm, why? I don't see a reason why it shouldn't work. Bob: Actually, I don't think the burden of proof is on me here. And honestly, I don't know what exactly your rocket will do. But I see many arguments I could give you, for why the rocket is unlikely to land on the Moon. So let me try to give you a simple one. Alice: Sure, go ahead. Bob: Do I understand it correctly that the rocket is currently pointing at the Moon, and it has no way to steer after launch? Alice: That's right. Bob: Ok, so my argument is this: the rocket will miss because the Moon moves.[2] That is, let's say the rocket is pointed at where the Moon is at the time of the launch. Then by time the rocket would have reached the Moon's position, the Moon will already be at some other place. Alice: Ok, fine. Let's say I agree that this argument makes sense. But that still doesn't mean that the rocket will miss! Bob: That's true. Based on what I said, the rocket could still fly so fast that the Moon wouldn't have time to move away. Alice: Alright. So let's try to do a straightforward evaluation[3] of your argument. I suppose that would mean building some mathematical model, finding the correct data to plug into it (for example, the distance to the Moon), and calculating whether the model-rocket will model-hit the model-Moon. Bob: Exactly. Now to figure out which model we should use for this... Evaluating Arguments, rather than Propositions An important aspect of the scenario above is that Alice and Bob decided not to focus directly on the proposition "The rocket will hit the moon.", but instead on a particular argument against that proposition. This approach has an important advantage that, if the argument is simple, evaluating it can be much easier than evaluating the proposition. For example, suppose the rocket will in fact miss "because" of some other reason, such as exploding soon after launch or colliding with a satellite. Then this might be beyond Alice and Bob's ability t...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app