

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Feb 22, 2024 • 3min
EA - Upcoming EA conferences in 2024 by OllieBase
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Upcoming EA conferences in 2024, published by OllieBase on February 22, 2024 on The Effective Altruism Forum.
In an unsurprising move, the Centre for Effective Altruism will be organising and supporting conferences for the EA community all over the world in 2024, including three new EAGx locations: Copenhagen, Toronto and Austin.
We currently have the following events scheduled:
EA Global
EA Global: London | (May 31-June 2) | Intercontinental London (the O2) - applications close 19 May
EA Global: Boston | (November 1-3) | Hynes Convention Center - applications close 20 October
EAGx
EAGxAustin | (April 13-14) | University of Texas, Austin - applications close 31 March
EAGxNordics | (April 26-28) | CPH Conference, Copenhagen - applications close 7 April
EAGxUtrecht | (July 5-7) | Jaarbeurs, Utrecht
EAGxToronto | (August, provisional)
EAGxBerkeley | (September, provisional)
EAGxBerlin | (September 13-15) | Urania, Berlin
EAGxAustralia | (November) | Sydney
We also hope to announce an EAGxLondon for early April very soon. A university venue was tentatively booked for late March, but the venue asked to reschedule. We're in the process of finalising a new date. We also expect to announce more events throughout the year.
Applications for EAG London, EAG Boston, EAGxNordics and EAGxAustin are open. Applications for EAGxLondon will open as soon as the date is confirmed. We expect applications for the other conferences to open approximately 3 months before the event. Please go to the event page links above to apply.
If you'd like to add EAG(x) events directly to your Google Calendar, use
this link.
Some notes on these conferences:
EA Globals are run in-house by the CEA events team, whereas EAGx conferences are organised independently by members of the EA community with financial support and mentoring from CEA.
EA Global conferences have a high bar for admission and are for people who are very familiar with EA and are taking significant actions (e.g. full-time work or study) based on EA ideas.
Admissions for EAGx conferences are processed independently by the EAGx conference organizers. These events are primarily for those who are newer to EA and interested in getting more involved.
Please apply to all conferences you wish to attend once applications open - we would rather get too many applications for some conferences and recommend that applicants attend a different one than miss out on potential applicants to a conference.
Travel support funds for events this year are limited (though will vary by event), and we can only accommodate a small number of requests. If you do not end up receiving travel support, this is likely the result of limited funds, rather than an evaluation of your potential for impact. When planning around an event, we recommend you act under the assumption that we will not be able to grant your travel funding request (unless it has already been approved).
Find more info on
our website.
Feel free to email hello@eaglobal.org with any questions, or comment below. You can contact EAGx organisers using the format [location]@eaglobalx.org (e.g.
austin@eaglobalx.org and nordics@eaglobalx.org).
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 22, 2024 • 13min
LW - The Pareto Best and the Curse of Doom by Screwtape
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Pareto Best and the Curse of Doom, published by Screwtape on February 22, 2024 on LessWrong.
I.
Prerequisite reading: Being the (Pareto) Best in the World.
A summary of Being the (Pareto) Best in the World: Being the world's best mathematician is hard. Being the world's best musician is hard. Being the world's best mathematician/musician is much easier, especially since there are multiple slots; an amazing mathematician who is also a competent musician, someone who is good at both, and a competent mathematician who is also an amazing musician can all find a niche.
I like this concept, and have kept it in my back pocket ever since I read it. I have sometimes described myself as a software engineer who was competent at public speaking and project management. That particular overlapping skillset is, it turns out, fairly valuable. While I was attempting to become a better software engineer, I was also trying to add competence at corporate budgets and accounting to that skillset.
These days I spend a lot of time talking to the kind of person who hangs out on LessWrong a lot or spends a lot of time going to Astral Codex Ten meetups. If ever I faced a problem that required a brilliant neuroscientist, or a gifted Haskell programmer, or a world leading expert in training honeybees, well, let's just say I know somebody. There are people out there who are exemplary at the thing they do.
Sometimes they're not very good at other things though. While Being The (Pareto) Best in the World felt optimistic when I first read it, these days I regard it as a curse of doom upon the world, blighting otherwise promising areas of effort and endeavor. I look around at places where it feels like everyone is dropping the ball and see a blasted wasteland where nothing grows because nobody has the right combination of seemingly basic skills.
II.
Imagine a toy model where everyone has a hundred points to put into being good at things.
(This is, to be clear, not just a toy model but an incorrect model. It's easy to look at your incoming university students and notice a strong inverse correlation between math and verbal SAT scores, forgetting that those get summed together during applications and anyone below a certain threshold probably has their application discarded. Still, let's use this model for the moment.)
Leading talents in a field maybe put 75 points in their area. Why not 100? Because you need points in living your life. There's an archetype of the absent minded professor, someone who can understand a complex abstract subject but who shows up to give lectures having forgotten to put their shoes on or eat breakfast.
Hitting 90 points in your field requires someone else to do a lot of the upkeep for you; many FAANG jobs provide food and other amenities, and I don't think it's entirely because it's a cheap perk. Politely, I know some FAANG engineers who I suspect would forget lunch and dinner if it was not conveniently provided for them.
At sufficiently high levels of dedication, seemingly important related skills start to fall by the wayside. Many programmers are not good at documenting their code, writing or reading specifications, or estimating story points and timelines. Fiction authors vary wildly in their comfort with self-promotion, proofreading, and layout. That's what publishers and agents are for.
There's a few indie musicians I enjoy whose mastery of sound mixing or recording technology is not the equal to their actual playing. You can spend 40 points on singing, 40 points on recording, and 20 points on living your life. At this point, you're giving up some noticeable quality somewhere. I'll arbitrarily draw a line at 50 points and say this is where so-called "professional" quality tends to hang out, the people you see do their thing and you think "man, they could make a livin...

Feb 21, 2024 • 9min
LW - Dual Wielding Kindle Scribes by mesaoptimizer
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Dual Wielding Kindle Scribes, published by mesaoptimizer on February 21, 2024 on LessWrong.
This is an informal post intended to describe a workflow / setup that I found very useful, so that others might consider adopting or experimenting with facets of it that they find useful.
In August 2023, I was a part of MATS 4.0 and had begun learning the skill of deconfusion, with an aim of disentangling my conflicting intuitions between my belief that shard theory seemed to be at least directionally pointing at some issues with the MIRI model of AGI takeoff and alignment difficulty, and my belief that Nate Soares was obviously correct that reflection will break Alex Turner's diamond alignment scheme. A friend lent me his Kindle Scribe to try out as part of my workflow.
I started using it for note-taking, and found it incredibly useful and bought it from him. A month later, I bought a second Kindle Scribe to add to my workflow.
It has been about six months since, and I've sold both my Kindle Scribes. Here's why I found this workflow useful (and therefore why you might find it useful), and why I moved on from it.
The Display
The Kindle Scribe is a marvelous piece of hardware. With a 300 PPI e-ink 10.3 inch screen, reading books on it was a delight in comparison to any other device I've used to read content on. The stats I just mentioned matter:
300 PPI on a 10.3 inch display means the displayed text is incredibly crisp, almost indistinguishable from normal laptop and smartphone screens. This is not the case for most e-ink readers.
E-ink screens seem to reduce eye strain by a non-trivial amount. I've looked into some studies, but the sample sizes and effect sizes were not enough to make me unilaterally recommend people switch to e-ink screens for reading. However, it does seem like the biggest benefit of using e-ink screens seems to be that you aren't staring into a display that is constantly shining light into your eyeballs, which is the equivalent of staring into a lightbulb.
Anecdotally, it did seem like I was able to read and write for longer hours when I only used e-ink screens: I went from, about 8 to 10 hours a day (with some visceral eye fatigue symptoms like discomfort at the end of the day) to about 12 to 14 hours a day, without these symptoms, based on my informal tracking during September 2023.
10.3 inch screens (with a high PPI) just feel better to use in comparison to smaller (say, 6 to 7 inch screens) for reading. This seems to me to be due to a greater amount of text displayed on the screen at any given time, which seems to somehow limit the feeling of comprehensibility of the text.
I assume this is somehow related to chunking of concepts in working memory, where if you have a part of a 'chunk' on one page, and another part on another page, you may have a subtle difficulty with comprehending what you are reading (if it is new to you), and the more the text you have in front of you, the more you can externalize the effort of comprehension. (I used a Kobo Libra 2 (7 inch e-ink screen) for a bit to compare how it felt to read on, to get this data.)
Also, you can write notes in the Kindle Scribe. This was a big deal for me, since before this, I used to write notes on my laptop, and my laptop was a multi-purpose device.
Sidenote: My current philosophy of note-taking is that I think 'on paper' using these notes, and don't usually refer to it later on.
The aim is to augment my working memory with an external tool, and the way I write notes usually reflects this -- I either write down most of my relevant and conscious thoughts as I think them (organized as a sequence of trees, where each node is a string representing a 'thought'), or I usually write 'waypoints' for my thoughts, where each waypoint is a marker for a conclusion of a sequence / tree of thoughts, or an inte...

Feb 21, 2024 • 6min
EA - In memory of Steven M. Wise by Tyner
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: In memory of Steven M. Wise, published by Tyner on February 21, 2024 on The Effective Altruism Forum.
LINK: https://everloved.com/life-of/steven-wise/obituary/
Renowned animal rights pioneer Steven M. Wise passed away on February 15th after a long illness. He was 73 years old.
An innovative scholar and groundbreaking expert on animal law, Wise founded and served as president of the Nonhuman Rights Project (NhRP), the only nonprofit organization in the US dedicated solely to establishing legal rights for nonhuman animals. As the NhRP's lead attorney, he filed historic lawsuits demanding the right to liberty of captive chimpanzees and elephants, achieving widely recognized legal firsts for his clients.
Most notably, under Wise's leadership the NhRP filed a habeas corpus petition on behalf of Happy, an elephant held alone in captivity at the Bronx Zoo. Happy's case, which historian Jill Lepore has called "the most important animal-rights case of the 21st-century," reached the New York Court of Appeals in 2022. The Court of Appeals then became the highest court of an English-speaking jurisdiction to hear arguments calling for a legal right for an animal.
Although the Court ultimately denied Happy's petition, two judges wrote historic dissents refuting the idea that only humans can have rights. Under Wise's leadership, the NhRP also helped develop and pass the first animal rights law in the country in 2023-an ordinance that protects elephants' right to liberty.
Wise said he decided to become a lawyer after developing a deep commitment to social justice as a result of his involvement in the anti-Vietnam War movement while an undergraduate at the College of William and Mary. He graduated from Boston University Law School in 1976 and began his legal career as a criminal defense lawyer. Several years later, Peter Singer's book Animal Liberation inspired Wise to become an animal protection lawyer.
From 1985 to 1995, Wise was president of the Animal Legal Defense Fund. As Wise told The New York Times Magazine, his litigation work during this time led him to conclude that the rightlessness of animals was the fundamental barrier to humans vindicating animals' interests.
This is because, under animal welfare laws, lawyers must make the case for how a human has been harmed by the animal's treatment or situation; as Wise elaborated in his writings and talks, legal injuries to animals do not matter in court because animals are unjustly considered legal "things" with no rights, legally equivalent to inanimate objects, their intrinsic interests essentially invisible to judges.
In 1995, Wise launched the Nonhuman Rights Project to address this core issue facing all animals and their advocates. After more than a decade of preparation, the NhRP filed first-of-their-kind lawsuits in 2013, demanding rights for four captive chimpanzees in New York State. A year and a half later, two of the NhRP's clients became the first animals in legal history to have habeas corpus hearings to determine the lawfulness of their imprisonment.
Wise was also a leading force in the development of animal law as a distinct academic curriculum, teaching the first-ever animal law course offered at Harvard University in 2000. He remained committed to educating the next generation of animal rights lawyers throughout his career, teaching animal rights jurisprudence at law schools around the world, including Stanford Law School, the University of Miami Law School, St.
Thomas University Law School, John Marshall Law School, Lewis and Clark Law School, Vermont Law School, Tel Aviv University, and the Autonomous University of Barcelona.
Wise is the author of four books: Rattling the Cage: Toward Legal Rights for Animals (2000); Drawing the Line: Science and the Case for Animal Rights (2002); Though the Heavens May Fall: T...

Feb 21, 2024 • 7min
AF - Analogies between scaling labs and misaligned superintelligent AI by Stephen Casper
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Analogies between scaling labs and misaligned superintelligent AI, published by Stephen Casper on February 21, 2024 on The AI Alignment Forum.
TL;DR: Scaling labs have their own alignment problem analogous to AI systems, and there are some similarities between the labs and misaligned/unsafe AI.
Introduction
Major AI scaling labs (OpenAI/Microsoft, Anthropic, Google/DeepMind, and Meta) are very influential in the AI safety and alignment community. They put out cutting-edge research because of their talent, money, and institutional knowledge. A significant subset of the community works for one of these labs. This level of influence is beneficial in some aspects.
In many ways, these labs have strong safety cultures, and these values are present in their high-level approaches to developing AI - it's easy to imagine a world in which things are much worse. But the amount of influence that these labs have is also something to be cautious about.
The alignment community is defined by a concern that subtle misalignment between the incentives that we give AI systems and what we actually want from them might cause these systems to dangerously pursue the wrong goals. This post considers an analogous and somewhat ironic alignment problem: between human interests and the scaling labs.
These labs have intelligence, resources, and speed well beyond that of any single human. Their amount of money, compute, talent, and know-how make them extremely capable. Given this, it's important that they are aligned with the interests of humanity. However, there are some analogies between scaling labs and misaligned AI.
It is important not to draw false equivalences between different labs. For example, it seems that by almost every standard, Anthropic prioritizes safety and responsibility much more than other labs. But in this post, I will generally be lumping them together except to point out a few lab-specific observations.
Misaligned Incentives
In much the same way that AI systems may have perverse incentives, so do the labs. They are companies. They need to make money, court investors, make products, and attract users. Anthropic and Microsoft even just had Super Bowl ads. This type of accountability to commercial interests is not perfectly in line with doing what is good for human interests. Moreover, the labs are full of technocrats whose values and demographics do not represent humanity particularly well.
Optimizing for the goals that the labs have is not the same thing as optimizing for human welfare.
Goodhart's Law applies.
Power Seeking
One major risk factor of misaligned superintelligent AI systems is that
they may pursue power and influence. But the same is true of the scaling labs. Each is valued in the billions of dollars due to its assets and investments. They compete with each other for technical primacy. The labs also pursue instrumental goals, including
political influence with lobbying and strategic secrecy to reduce the risk of lawsuits involving data and fair use.
Recent news that Sam Altman is potentially pursuing trillions in funding for hardware suggests that this type of power-seeking may reach large scales in the near future. To stay competitive, labs need to keep scaling, and when one lab scales, others are driven to do so as well in an arms race.
Lack of Transparency
Trust without transparency is misguided. We want AI systems that are honest
white boxes that are easy to interpret and understand. However, the scaling labs do not meet this standard. They tend to be highly selective in what they publicize, have employees sign non-disclosure agreements, and generally lack transparency or accountability to the public. Instead of being white boxes, the labs are more like dark grey boxes that seem to rarely choose to reveal things that would make them look bad.
A lack of explan...

Feb 21, 2024 • 41min
EA - Short agony or long ache: comparing sources of suffering that differ in duration and intensity by cynthiaschuck
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Short agony or long ache: comparing sources of suffering that differ in duration and intensity, published by cynthiaschuck on February 21, 2024 on The Effective Altruism Forum.
Cynthia Schuck-Paim; Wladimir J. Alonso; Cian Hamilton (Welfare Footprint Project)
Overview
In assessing animal welfare, it would be immensely beneficial to rely on a cardinal metric that captures the overall affective experience of sentient beings over a period of interest or lifetime. We believe that the concept of Cumulative Pain (or Pleasure, for positive affective states), as adopted in the Welfare Footprint framework, aligns closely with this ideal.
It quantifies the time spent in various intensities of pain and has proven operationally useful, providing actionable insights for guiding cost-effective interventions aimed at reducing animal suffering.
However, it does not yet offer a unified metric of suffering, as it measures time spent in four categories of pain intensity. While we anticipate this complexity will persist for some time - given the current challenges in equating pain intensities - we believe the discussion on the possibility of integrating these four categories is necessary and valuable. We are thus sharing this document here to contribute in this discussion and elicit feedback and criticism to help us improve our approach.
We apologize for the academic tone of the text, initially written with an academic paper in mind.
Key Takeaways
Pain's aversiveness escalates disproportionally with its intensity, making severe pains feel disproportionately worse.
Determining the exact form of the relationship, however, is still challenging, as insights from human pain studies are limited and difficult to apply to animals, and designing experiments to address this issue in animals is inherently challenging.
Intensity weights are likely dynamic and modulated by multiple factors, including interspecific differences in the perception of time. The very relationship between pain aversiveness and intensity may change depending on the experience's duration.
Currently, the uncertainty associated with putative weights among pain intensity categories is orders of magnitude greater than the uncertainty related to other attributes of pain experiences, such as their prevalence or duration.
Given these challenges, we currently favor a disaggregated approach. Disaggregated estimates can currently rank most welfare challenges and farm animal production scenarios in terms of suffering.
In the case of more complex trade-offs between brief intense pain and longer-lasting milder pain we suggest two approaches. First, ensuring that all consequences of the welfare challenges are taken into account. For example, the effects of long-lasting chronic pain extend beyond the immediate experience, leading to long-term consequences (e.g., pain sensitization, immune suppression, behavioral deprivation, helplessness, depression) that may themselves trigger experiences of intense pain.
The same may happen with experiences of brief intense pain endured early in life. Second, once all secondary effects are considered, we suggest examining which weights would steer different decision paths, and determining how justifiable those weights are. This approach allows for normative flexibility, enabling stakeholders to rely on their own values and perspectives when making decisions.
Background
Pain, both physical and psychological, is an integral aspect of life for sentient organisms. Pain serves a vital biological purpose by signaling actual or potential harm or injury, prompting individuals to avoid or mitigate the cause of pain [1]. It varies in intensity, from a mild annoyance to an excruciating agony, and duration, from fleeting moments to persistent, long-lasting conditions.
This diversity in the intensity and duration of ...

Feb 21, 2024 • 3min
AF - Extinction Risks from AI: Invisible to Science? by Vojtech Kovarik
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Extinction Risks from AI: Invisible to Science?, published by Vojtech Kovarik on February 21, 2024 on The AI Alignment Forum.
Abstract: In an effort to inform the discussion surrounding existential risks from AI, we formulate Extinction-level Goodhart's Law as "Virtually any goal specification, pursued to the extreme, will result in the extinction[1] of humanity'', and we aim to understand which formal models are suitable for investigating this hypothesis. Note that we remain agnostic as to whether Extinction-level Goodhart's Law holds or not.
As our key contribution, we identify a set of conditions that are necessary for a model that aims to be informative for evaluating specific arguments for Extinction-level Goodhart's Law. Since each of the conditions seems to significantly contribute to the complexity of the resulting model, formally evaluating the hypothesis might be exceedingly difficult.
This raises the possibility that whether the risk of extinction from artificial intelligence is real or not, the underlying dynamics might be invisible to current scientific methods.
Together with Chris van Merwijk and Ida Mattsson, we have recently wrote a philosophy-venue version of some of our thoughts on Goodhart's Law in the context of powerful AI [link].[2] This version of the paper has no math in it, but it attempts to point at one aspect of "Extinction-level Goodhart's Law" that seems particularly relevant for AI advocacy --- namely, that the fields of AI and CS would have been unlikely to come across evidence of this law, in the environments typically studied in
these fields, even if the law did hold in the real world.
Since commenting on link-posts is inconvenient, I split off some of the ideas from the paper into the following separate posts:
Weak vs Quantitative Extinction-level Goodhart's Law: defining different versions of the notion of "Extinction-level Goodhart's Law".
Which Model Properties are Necessary for Evaluating an Argument?: illustrating the methodology of the paper on a simple non-AI example.
Dynamics Crucial to AI Risk Seem to Make for Complicated Models: applying the methodology above to AI risk.
We have more material on this topic, including writing with math[3] in it, but this is mostly not yet in a publicly shareable form. The exception is the post Extinction-level Goodhart's Law as a Property of the Environment (which is not covered by the paper). If you are interested in discussing anything related to this, definitely reach out.
^
A common comment is that the definition should also include outcomes that are similarly bad or worse than extinction. While we agree that such definition makes sense, we would prefer to refer to that version as "existential", and reserve the "extinction" version for the less ambiguous notion of literal extinction.
^
As an anecdote, it seems worth mentioning that I tried, and failed, to post the paper to arXiv --- by now, it has been stuck there with "on hold" status for three weeks. Given that the paper is called "Existential Risk from AI: Invisible to Science?", there must be some deeper meaning to this.
^
Or rather, it has pseudo-math in it. By which I mean that it looks like math, but it is built on top of vague concepts such as "optimisation power" and "specification complexity". And while I hope that we will one day be able to formalise these, I don't know how to do so at this point.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

Feb 21, 2024 • 22min
AF - Extinction-level Goodhart's Law as a Property of the Environment by Vojtech Kovarik
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Extinction-level Goodhart's Law as a Property of the Environment, published by Vojtech Kovarik on February 21, 2024 on The AI Alignment Forum.
Summary: Formally defining Extinction-level Goodhart's Law is tricky, because formal environments don't contain any actual humans that could go extinct. But we can do it using the notion of an interpretation mapping, which sends outcomes in the abstract environment to outcomes in the real world. We can then the truth condition of Extinction-level Goodhart's Law as a property of the environment.
I conjecture that Extinction-level Goodhart's Law does not hold in easily formalisable environments[1], even though it might hold in the real world. This seems like a (very) big deal for AI advocacy, since it suggests that the lack of rigorous arguments concerning AI risk (eg, math proofs) does not provide strong evidence for the safety of AI.
Semi-formal definition of Extinction-level Goodhart's Law
Informally, we can define the extinction-level[2] variant of Goodhart's law as follows:
Definition (informal): The Weak Version[3] of Extinction-level Goodhart's Law is the claim that: "Virtually any goal specification, pursued to the extreme, will result in the extinction of humanity."
The tricky part is how to translate this into a more rigorous definition that can be applied to formal environments.
Defining "extinction" in formal environments
However, applying this definition to formal environments is tricky, because it requires formally defining which of the abstract states qualify as "extinction of humanity". How can we get past this obstacle?
What we don't do: "extinction states" given by definition
A lazy way of defining extinction in abstract models would be to assume that we are given some such "extinction" states by definition. That is, if Ω is the set of all possible states in the formal environment, we could assume that there is some set ΩextinctionΩ. And we would refer to any ωΩextinction as "extinction".
I don't like this approach, because it just hides the problem elsewhere. Also, this approach does not put any constraints on how Ωextinction should look like. As a result, we would be unlikely to be able to derive any results.
The approach we take instead is to augment the formal environment by an interpretation, where each state "abstract" state ω in the formal environment Ω is mapped onto some state φ(ω) in some "more complex" environment Ω'.[4]
Real-world interpretations
The typical use of an abstract model is that we use it to study some more complex thing. While doing this, we implicitly hold in mind some interpretation of elements of the abstract model. For example, when using arithmetics to reason about apples in a box, I might interpret nN as "there are n apples in the box" and n+m as "there are n apples in the box, and I put another m in there" (or whatever).
Naturally, an abstract model can have any number of different interpretations --- however, to avoid imprecision (and the motte-and-bailey fallacy), we will try to focus on a single[5] interpretation at any given time.
Note that we intend interpretation to be a very weak notion --- interpretations are not meant to automatically be accurate, sensible, etc.
Definition 1 (interpretation, formal): Let a set Ω be the state space of some formal model. A (formal) interpretation of Ω is a pair (Ω',φ), where Ω' is a set representing the state space of some other formal model and φ:ΩΩ' is the interpretation function.
The definition only talks about the interpretation of states, but we could similarly talk about the interpretations of actions, features of states, etc.[6]
Definition 2 (real-world interpretation, semi-formal): Let a set Ω be the state space of some formal model. A real-world interpretation is a some φ:ΩΩ', where Ω' are the possible states of the real world.
No...

Feb 21, 2024 • 18min
AF - Dynamics Crucial to AI Risk Seem to Make for Complicated Models by Vojtech Kovarik
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Dynamics Crucial to AI Risk Seem to Make for Complicated Models, published by Vojtech Kovarik on February 21, 2024 on The AI Alignment Forum.
This post overlaps with our recent paper Extinction Risk from AI: Invisible to Science?.
tl;dr: In AI safety, we are worried about certain problems with using powerful AI. (For example, the difficulty of value specification, instrumental convergence, and the possibility that a misaligned AI will come up with takeover strategies that didn't even occur to us.) To study these problems or convince others that they are real, we might wish to describe them using mathematical models.
However, this requires using models that are sufficiently rich that these problems could manifest in the first place.
In this post, I suggest thinking about how such "rich-enough" models could look like. Also, I raise the possibility that models which are rich enough to capture problems relevant to AI alignment might be too complex to be amenable to a rigorous analysis.
Epistemic status: Putting several related observations into one place. But I don't have strong opinions on what to make of them.
In the previous post, I talked about "straightforwardly evaluating" arguments by modelling the dynamics described in those arguments. In this post, I go through some dynamics that seem central to AI risk. However, none of these dynamics is meant to be novel or surprising. Instead, I wish to focus on the properties of the mathematical models that could capture these dynamics.
How do such models look like? How complicated are they? And --- to the extent that answering some questions about AI risk requires modeling the interplay between multiple dynamics --- is there some minimal complexity of models which can be useful for answering those questions?
Laundry List of Dynamics Closely Tied to AI Risk
In this section, I list a number of dynamics that seem closely tied to AI risk, roughly[1] grouped based on which part of the "AI risk argument" they relate to. Below each part of this list, I give some commentary on which models might be useful for studying the given dynamics. I recommend reading selected parts that seem interesting to you, rather than going through the whole text.
For the purpose of skimming, here is a list of the dynamics, without any explanations:
I. Difficulty of specifying our preferences[2]:
Human preferences are ontologically distant from the laws of physics.
Human preferences are ontologically distant from the language we use to design the AI.
Laws of physics are unknown.
Human preferences are unknown.
II. Human extinction as a convergent byproduct of terminal goals[3]:
The world is malleable.
The world is made of resources.
Humans evolved to require a narrow range of environment conditions
III. Human extinction as a convergently instrumental subgoal[4]:
The environment has been optimised for our preferences.
Humans are power-seeking.
Power is, to some extent, zero-sum.
IV. Most attempts to constrain an AI's actions fail for superintelligent AIs[5]:
Specifying restrictions is difficult for the same reasons that value specification is difficult.
The AI can act by proxy.
The AI can exploit novel strategies and technologies.
The AI, and everything constraining it, is fully embedded in the environment.
I. Difficulty of specifying our preferences[2]
A key part of worries about AI risk is that formally writing down what we want --- or even somehow indirectly gesturing at it --- seems exceedingly difficult. Some issues that are related to this are:
Concepts that are relevant for specifying our preferences (e.g., "humans'" and "alive'') on the one hand, and concepts that are primitive in the environment (e.g., laws of physics) on the other, are separated by many levels of abstraction.
Consider the ontology of our agents (e.g., the format of their input/o...

Feb 21, 2024 • 12min
AF - Which Model Properties are Necessary for Evaluating an Argument? by Vojtech Kovarik
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Which Model Properties are Necessary for Evaluating an Argument?, published by Vojtech Kovarik on February 21, 2024 on The AI Alignment Forum.
This post overlaps with our recent paper Extinction Risk from AI: Invisible to Science?.
Summary: If you want to use a mathematical model to evaluate an argument, that model needs to allow for dynamics that are crucial to the argument. For example, if you want to evaluate the argument that "a rocket will miss the Moon because the Moon moves", you, arguably, can't use a model where the Moon is stationary.
This seems, and is, kind of obvious. However, I think this principle has some less obvious implications, including as a methodology for determining what does and doesn't constitute evidence for AI risk. Additionally, I think some of what I write is quite debatable --- if not the ideas, then definitely the formulations. So I am making this a separate post, to decouple the critiques of these ideas/formulations from the discussion of other ideas that build on top of it.
Epistemic status: I am confident that the ideas the text is trying to point at are valid and important. And I think they are not appreciated widely enough. At the same time, I don't think that I have found the right way to phrase them. So objections are welcome, and if you can figure out how to phrase some of this in a clearer way, that would be even better.
Motivating Story
Consider the following fictitious scenario where Alice and Bob disagree on whether Alice's rocket will succeed at landing on the Moon[1]. (Unsurprisingly, we can view this as a metaphor for disagreements about the success of plans to align AGI. However, the point I am trying to make is supposed to be applicable more generally.)
Alice: Look! I have built a rocket. I am sure that if I launch it, it will land on the Moon.
Bob: I don't think it will work.
Alice: Uhm, why? I don't see a reason why it shouldn't work.
Bob: Actually, I don't think the burden of proof is on me here. And honestly, I don't know what exactly your rocket will do. But I see many arguments I could give you, for why the rocket is unlikely to land on the Moon. So let me try to give you a simple one.
Alice: Sure, go ahead.
Bob: Do I understand it correctly that the rocket is currently pointing at the Moon, and it has no way to steer after launch?
Alice: That's right.
Bob: Ok, so my argument is this: the rocket will miss because the Moon moves.[2] That is, let's say the rocket is pointed at where the Moon is at the time of the launch. Then by time the rocket would have reached the Moon's position, the Moon will already be at some other place.
Alice: Ok, fine. Let's say I agree that this argument makes sense. But that still doesn't mean that the rocket will miss!
Bob: That's true. Based on what I said, the rocket could still fly so fast that the Moon wouldn't have time to move away.
Alice: Alright. So let's try to do a straightforward evaluation[3] of your argument. I suppose that would mean building some mathematical model, finding the correct data to plug into it (for example, the distance to the Moon), and calculating whether the model-rocket will model-hit the model-Moon.
Bob: Exactly. Now to figure out which model we should use for this...
Evaluating Arguments, rather than Propositions
An important aspect of the scenario above is that Alice and Bob decided not to focus directly on the proposition "The rocket will hit the moon.", but instead on a particular argument against that proposition.
This approach has an important advantage that, if the argument is simple, evaluating it can be much easier than evaluating the proposition. For example, suppose the rocket will in fact miss "because" of some other reason, such as exploding soon after launch or colliding with a satellite. Then this might be beyond Alice and Bob's ability t...


