

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Oct 2, 2023 • 5min
AF - Direction of Fit by Nicholas Kees Dupuis
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Direction of Fit, published by Nicholas Kees Dupuis on October 2, 2023 on The AI Alignment Forum.
This concept has recently become a core part of my toolkit for thinking about the world, and I find it helps explain a lot of things that previously felt confusing to me. Here I explain how I understand "direction of fit," and give some examples of where I find the concept can be useful.
Handshake Robot
A friend recently returned from an artificial life conference and told me about a robot which was designed to perform a handshake. It was given a prior about handshakes, or how it expected a handshake to be. When it shook a person's hand, it then updated this prior, and the degree to which the robot would update its prior was determined by a single parameter. If the parameter was set low, the robot would refuse to update, and the handshake would be firm and forceful. If the parameter was set high, the robot would completely update, and the handshake would be passive and weak.
This parameter determines the direction of fit: whether the object in its mind will adapt to match the world, or whether the robot will adapt the world to match the object in its mind. This concept is often used in philosophy of mind to distinguish between a belief, which has a mind-to-world direction of fit, and a desire, which has a world-to-mind direction of fit. In this frame, beliefs and desires are both of a similar type: they both describe ways the world could be. The practical differences only emerge through how they end up interacting with the outside world.
Many objects seem not to be perfectly separable into one of these two categories, and rather appear to exist somewhere on the spectrum. For example:
An instrumental goal can simultaneously be a belief about the world (that achieving the goal will help fulfill some desire) as well as behaving like a desired state of the world in its own right.
Strongly held beliefs (e.g. religious beliefs) are on the surface ideas which are fit to the world, but in practice behave much more like desires, as people make the world around them fit their beliefs.
You can change your mind about what you desire. For example you may dislike something at first, but after repeated exposure you may come to feel neutral about it, or even actively like it (e.g. the taste of certain foods).
Furthermore, the direction of fit might be context dependent (e.g. political beliefs), beliefs could be self fulfilling (e.g. believing that a presentation will go well could make it go well), and many beliefs or desires could refer to other beliefs or desires (wanting to believe, believing that you want, etc.).
Idealized Rational Agents
The concept of a rational agent, in this frame, is a system which cleanly distinguishes between these two directions of fit, between objects which describe how the world actually is, and objects which prescribe how the world "should" be.
This particular concept of a rational agent can itself have a varying direction of fit. You might describe a system as a rational agent to help your expectations match your observations, but the idea might also prescribe that you should develop this clean split between belief and value.
When talking about AI systems, we might be interested in the behavior of systems where this distinction is especially clear. We might observe that many current AI systems are not well described in this way, or we could speculate about pressures which might lead them toward this kind of split.
Note that this is very different from talking about VNM-rationality, which starts by assuming this clean split, and instead demonstrates why we might expect the different parts of the value model to become coherent and avoid getting in each other's way. The direction-of-fit frame highlights a separate (but equally important) question of whether...

Oct 2, 2023 • 46sec
LW - Fifty Flips by abstractapplic
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fifty Flips, published by abstractapplic on October 2, 2023 on LessWrong.
An unfair coin (potentially EXTREMELY unfair) will be flipped fifty times. Your goal is to correctly predict as many of these flips as possible, by deducing the nature of the unfairness as quickly as possible.
[Predict Heads] [Predict Tails]
You can play this (in-browser, very short) game here; the rule governing the unfairness is automatically revealed after flip 50. Followups with different governing rules are here, here, here, here and here.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Oct 2, 2023 • 30min
EA - Observations on the funding landscape of EA and AI safety by Vilhelm Skoglund
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Observations on the funding landscape of EA and AI safety, published by Vilhelm Skoglund on October 2, 2023 on The Effective Altruism Forum.
Epistemic status: Hot takes for discussion. These observations are a side product of another strategy project, rather than a systematic and rigorous analysis of the funding landscape, and we may be missing important considerations. Observations are also non-exhaustive and mostly come from anecdotal data and EA Forum posts. We haven't vetted the resources that we are citing; instead, we took numerous data points at face value and asked for feedback from >5 people who have more of an inside view than we do (see acknowledgments, but note that these people do not necessarily endorse all claims). We aim to indicate our certainty in the specific claims we are making.
Context and summary
While researching for another project, we discovered that there have been some significant changes in the EA funding landscape this year. We found these changes interesting and surprising enough that we wanted to share them, to potentially help people update their model of the funding landscape. Note that this is not intended to be a comprehensive overview. Rather, we hope this post triggers a discussion about updates and considerations we might have missed.
We first list some observations about funding in the EA community in general. Then, we zoom in on AI safety, as this is a particularly dynamic area at present.
Some observations about the general EA funding landscape (more details below):
There is a higher number of independent grantmaking bodies
Five new independent grantmaking bodies have started up in 2023 (Meta Charity Funders, Lightspeed Grants, Manifund Regrants, the Nonlinear Network, and the Foresight AI Fund. Out of these, all but Meta Charity Funders are focused on longtermism or AI.
EA Funds and Open Philanthropy are aiming to become more independent of each other.
Charity Entrepreneurship has set up a foundation program, with a sub-goal of setting up cause-specific funding circles.
There is a lot of activity in the effective giving ecosystem
More than 50 effective giving initiatives, e.g., local fundraising websites, are active, with several launched in recent years
GWWC is providing more coordination in the ecosystem and looking to help new initiatives get off the ground.
There are changes in funding flows
The FTX collapse caused a drastic decrease in (expected) longtermist funding (potentially hundreds of millions of dollars annually).
EA Fund's Long-Term Future Fund and Infrastructure Fund report (roughly estimated funding gaps of $450k/month and $550k/month respectively, over the next 6 months.
Open Philanthropy seems like they could make productive use of more funding in some causes, but their teams working on AI Safety are capacity-constrained rather than funding-constrained.
The Survival and Flourishing Fund has increased their giving in 2023. It's unclear whether this increase will continue into the future.
Effective Giving plans to increase their giving in the years to come.
Longview Philanthropy expects to increase their giving in the years to come. Their 2023 advising will be >$10 million, and they expect money moved in 2024 to be greater than 2023.
GiveWell reports being funding-constrained. and projects constant funding flows until 2025.
Charity Entrepreneurship's research team expects that money dedicated to animal advocacy is unlikely to grow and could shrink.
There might be more EA funding in the future
Manifold prediction markets estimate a 45% chance of a new donor giving ≥$50 million to longtermist or existential risk work before the end of 2024; and an 86% chance of ≥1 new EA billionaire before the end of 2026.
Smaller but still significant new donors seem likely, according to some fundraising actors.
Some observatio...

Oct 2, 2023 • 18min
EA - Violence Before Agriculture by John G. Halstead
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Violence Before Agriculture, published by John G. Halstead on October 2, 2023 on The Effective Altruism Forum.
This is a summary of a report on trends in violence since the dawn of humanity: from the hunter-gatherer period to the present day. The full report is available at this Substack and as a preprint on SSRN. Phil did 95% of the work on the report.
Expert reviewers provided the following comments on our report.
"Thomson and Halstead have provided an admirably thorough and fair assessment of this difficult and emotionally fraught empirical question. I don't agree with all of their conclusions, but this will surely be the standard reference for this issue for years to come."
Steven Pinker, Johnstone Family Professor in the Department of Psychology at Harvard University
"This work uses an impressively comprehensive survey of ethnographic and archeological data on military mortality in historically and archeologically known small-scale societies in an effort to pin down the scale of the killing in the pre-agricultural world. This will be a useful addition to the literature. It is an admirably cautious assessment of the war mortality data, which are exceptionally fragile; and the conclusions it draws about killing rates prior to the Holocene are probably as good as we are likely to get for the time being."
Paul Roscoe, Professor of Anthropology at the University of Maine
Epistemic status
We think our estimates here move understanding of prehistoric violence forward by rigorously focussing on the pre-agricultural period and attempting to be as comprehensive as possible with the available evidence. However, data in the relevant fields of ethnography and archeology is unusually shaky, so we would not be surprised if it turned out that some of the underlying data turns out to be wrong. We are especially unsure about our method for estimating actual violent mortality rates from the measured, observable rates in the raw archeology data.
One of us (Phil) has a masters in anthropology. Neither of us have any expertise in archeology.
Guide for the reader
If you are interested in this study simply as a reference for likely rates/patterns of violence in the pre-agricultural world, all our main results and conclusions are presented in the Summary. The rest of the study explores the evidence in more depth and explains how we put our results together. We first cover the ethnographic evidence, then the archeological evidence. The study ends with a more speculative discussion of our findings and their possible implications.
Acknowledgments
We would like to thank the following expert reviewers for their extensive and insightful comments and suggestions, which have helped to make this report substantially better.
Steven Pinker, Johnstone Family Professor in the Department of Psychology at Harvard University
Robert Kelly, Professor of Archeology at the University of Wyoming
Paul Roscoe, Professor of Anthropology at the University of Maine
We would also like to thank Prof. Hisashi Nakao, Prof. Douglas Fry, Prof. Nelson Graburn, and Holden Karnofsky for commenting, responding to queries and sharing materials.
Around 11,000 years ago plants and animals began to be domesticated, a process which would completely transform the lifeways of our species. Human societies all over the world came to depend almost entirely on farming. Before this transformative period of history, everyone was a hunter-gatherer. For about 96% of the approximately 300,000 years since Homo sapiens evolved, we relied on wild plants and animals for food.
Our question is: what do we know about how violent these pre-agricultural people were?
In 2011 Steven Pinker published The Better Angels of Our Nature. According to Pinker, prehistoric small-scale societies were generally extremely violent by comparison with modern stat...

Oct 1, 2023 • 15min
LW - My Effortless Weightloss Story: A Quick Runthrough by CuoreDiVetro
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Effortless Weightloss Story: A Quick Runthrough, published by CuoreDiVetro on October 1, 2023 on LessWrong.
This is Part I in a series on easy weightloss without any need for will power.
The Origin: listening to the dark corners of the internet
Loosing weight is supposed to be really hard and require a lot of willpower according to conventional wisdom. It turns out that it was actually really easy for me to go from a BMI of above 29 (30 is officially obese) to below 25 (normal is 18 to 25) in 3½ months. And knowing what I know now, I think I could easily do it again in a 1½ month.
I'm not someone who ever tried dieting before. Dieting sounded like a lot of effort and willpower for very uncertain results. Not a good use of my very limited willpower. This belief changed after reading Slime Mold Time Mold's results of their potato diet experiment.
They asked the participants in their experiment to eat only potatoes for 4 weeks to see if they would lose weight. There was no way I was going to eat only potatoes for 4 weeks, so I didn't enrol in their experiment. After reading the blogpost about their results, two things surprised me which motivated me to go on this journey.
The first surprise was that is wasn't necessary to eat only potatoes. Slime Mold Time Mold had been very gentle with their guinea pigs, and they told them "it's ok if you cheat and don't eat potatoes, just tell us when you cheat". It turned out that even people who cheated almost every day, eating something other than potatoes, ended up loosing a lot of weight and there wasn't even that clear of a trend between weightloss and number of cheat days (see Figure 1). So a strict eat-only-potatoes-diet which is something I would never do, didn't seem to be necessary.
Figure 1: Weightloss of participants as a function of the number of days (out of a total of 28) where they cheated (i.e. ate other things than potatoes). Source.
The second surprise was that people's weight seemed to go down linearly, not attaining a plateau, at least for the 4 weeks of the experiment. I was expecting diminishing returns as people started to lose weight, that further weightloss would slow down but their data didn't seem to indicate any slowdown. I was super curious to find out how long such a linear weightloss could go for. As we will see later, linear weightloss went on for me for a surprisingly long time.
Figure 2: Weightloss as a function of time on the potato diet. The blue line is those who completed the whole 28 days of the trial while the red line is those who dropped out before the end. Source.
Somehow, before starting my experiment, more wisdom from some dark and seemingly unreliable corner of the interwebz came to my attention, the following tweet by some Mickey Shaughnessy: . The tweet claims that the cause of obesity might be related to the potassium:sodium ratio in the diet. That earlier diets had a very high potassium to sodium diet in comparison to the modern euro-north-american diet. That maybe the potato diet works because potatoes are very high in potassium.
This is a super interesting hypothesis, that it's all about the potassium sodium ratio. This is also something that would be interesting and relatively easy to investigate. So we will try to investigate that a bit in this blogpost series.
So of course, at the time I didn't check the source of this tweeted statement, I just went with whatever was written by an unknown person on the internet. But now that I'm writing this blogpost, I thought it might be nice to check a bit.
It turns out that Mickey Shaughnessy had the idea of it being related to the K:Na (potassium to sodium) ratio because of the Slime Mold Time Mold blogpost about Li (Lithium) having an effect on obesity and both sodium and potassium being very similar chemically to Li (the same column in...

Oct 1, 2023 • 6min
LW - Competitive, Cooperative, and Cohabitive by Screwtape
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Competitive, Cooperative, and Cohabitive, published by Screwtape on October 1, 2023 on LessWrong.
(I've been writing this in bits and pieces for a while, and Peacewager was the impetus I needed to finally stitch it together and post it. Peacewager sounds like a really fun game and an example of the thing I'm talking about, but I do not want this whole genre to get called Peacewager Games when I think I have a better title for the genre.)
I believe there is a missing genre from existing games, and this genre feels large enough that it should contain maybe a third of the games I can imagine existing. More serious game players or game theorists might already have a name for the thing I'm pointing at, though the first three game design majors I asked didn't know of one.
Let me back up. I'm going to assume for the moment that you've played some games. I don't have a strict definition of "game" I'm working with here, but let's start with definition by example: Chess is a game, Hide and Seek is a game, Pandemic is a game, Apples to Apples is a game, Poker is a game, Magic: The Gathering is a game, Werewolf is a game, Among Us is a game, Hanabi is a game, Baseball is a game, Football (American or European) is a game. I'm not trying to do some narrow technical definition, I'm waving my hand wildly in the direction of a pretty natural category and I'm not planning to do anything weird with the edge cases.
Chess is a competitive game. In chess, you're loosely simulating a war between two evenly matched factions. When you play chess, there will be one winner and one loser. Sometimes instead there will be a draw. Anything that is good for you when you are playing chess is bad for your opponent and vice versa. You can be mistaken about what is good or bad for you; you can offer trades of pieces to your opponent because you think it is a good trade for you and they can take the trade because they think it is a good trade for them, but this is ultimately what's called a zero sum game. Your loss is their gain. "Eurogames" where you're trying to get the highest score are competitive in nature; if you could pay ten points to cost every other player twenty points, you'd do it.
Pandemic is a cooperative game. In Pandemic, you're loosely simulating a global pandemic and the response of the international medical community. When you plan Pandemic, either all the players win, or all the players lose. Anything that is good for you when you are playing Pandemic is good for your teammates, and anything that is bad for you when you are playing Pandemic is bad for your teammates. You can lose things for yourself; you can spend resources and pay costs and run out of good cards in your hand, but ultimately this is also a loss for your team since they want you to have good stuff.
There is of course the circumstance of competitive team games, like football. If I'm playing Football, I'm trying to help out my team like it's a cooperative game, and make the other team lose like it's a competitive game. This adds a little to the picture, but doesn't change the basic dynamics much. Again, I'm not doing anything weird with edge cases here. There's also multiplayer games like Risk, where it might make sense to make a temporary alliance to cooperate with another player while still ultimately knowing only one of you can win. Hidden role games like Werewolf or Betrayal At House On The Hill are usually competitive with teams. (A team of one and a team of the rest of the players is basically a team competitive game.)
Picture these as two points on a continuum. You can compete, or you can cooperate. Seems simple enough. You can, if you like, extend this into a metaphor for how humans relate to one another outside of just games.
Except for this really isn't how human beings actually operate in a wide range of circ...

Oct 1, 2023 • 6min
AF - New Tool: the Residual Stream Viewer by Adam Yedidia
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New Tool: the Residual Stream Viewer, published by Adam Yedidia on October 1, 2023 on The AI Alignment Forum.
This is a link-post for the residual stream viewer, which can be found here. It's an online tool whose goal is to make it easier to do interpretability research by letting you easily look at directions within the residual stream. It's still in a quite early/unpolished state, so there may be bugs, and any feature requests are very welcome! I'll probably do more to flesh this out if I get the sense that people are finding it useful.
Very briefly, the tool lets you see what the dot product of the residual stream at each token is with a particular direction. The default directions that you can look at using the tool were found by PCA, and I think many of them are fairly interpretable even at a glance (though it's worth noting that even if they correlate heavily with an apparent feature, that's no guarantee the network is actually using those directions).
Here's a screenshot of the current version of the tool:
There's a YouTube tutorial for the tool available here. I endorse the YouTube tutorial as probably a better way to get acquainted with the tool than the usage guide; but I'll copy-paste the usage guide for the remainder of the post.
The residual stream viewer is a tool for finding interesting directions in the residual stream of GPT2-small, for writing explanations for those directions and reading the explanations left by others, and for constructing new directions out of linear combinations of old ones.
A more detailed explanation of how transformers networks work and what the residual stream is can be found here. If you want to actually understand what the residual stream is and how transformers work, the text that follows here is hopelessly insufficient, and you should really follow the earlier link. However, as a very brief summary of what the "residual stream" is:
The residual stream can be thought of as the intermediate state of the transformer network's computation. It is the output of each layer of the network before it is fed into the next layer. Each prompt is split into "tokens," i.e. subparts of the prompt that roughly correspond to words or parts of words. At each layer, each token has its own associated residual stream vector. The residual stream at the beginning of the network, before any layer has acted, is equal to the "Token Embedding", i.e. the "meaning" of that token as encoded by a 768-dimensional vector, plus the "Positional embedding", i.e. the "meaning" of that token's position in the prompt as encoded by a 768-dimensional vector. Each layer acts on the residual stream by reading certain parts of the residual stream, doing some computation on them, and then adding the result back into the residual stream. At the end of the network, the residual stream is transformed into a probability distribution over which token comes next.
It's not easy to directly interpret a 768-dimensional vector, let alone one at each layer and at each token in the prompt. It's the purpose of this tool to make the job of interpreting such vectors easier. One way of interpreting the residual stream is by considering different possible directions in the residual stream. By analogy, imagine if there was a arrow in front of you, oriented somehow in space. The arrow represents the residual stream. One way you might approach describing the arrow's direction is by considering how "northerly" the arrow's direction is; that is, to what degree the arrow is pointing North. If the arrow was pointing northward, we might say that the arrow had positive northerliness, and if the arrow was pointing southward, we might say that the arrow had negative northerliness. An arrow pointing northeast could still be said to have positive northerliness; it wouldn't have to be pointing ex...

Sep 30, 2023 • 4min
EA - Two cheap ways to test your fit for policy work by MathiasKB
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Two cheap ways to test your fit for policy work, published by MathiasKB on September 30, 2023 on The Effective Altruism Forum.
At EAGs I often find myself having roughly the same 30 minute conversation with university students who are interested in policy careers and want to test their fit.
This post will go over two cheap tests, each possible to do over a weekend, that you can do to test your fit for policy work.I am by no means the best person to be giving this advice but I received feedback that my advice was helpful, and I'm not going to let go of an opportunity to act old and wise. A lot of it is based off what worked for me, when I wanted to break into the field a few years ago. Get other perspectives too! Contradictory input in the comments from people with more seniority is most welcome.
A map of typical policy roles
'Policy' is a wide field with room for many skillsets. The skillsets needed these roles vary significantly. It's worth exploring the different types of roles to find your fit. I like to visualize the different roles as lying on a spectrum, with abstract academic research in one end and lobbyism at the other:
The type of work will vary significantly at each end of this spectrum. Common for them all is a genuine interest in the policy-making process.
Test your fit in a week
Commonly recommended paths are various fellowships and internships. They are a great way to test ones fit, but they are also a large commitment.
For the complete beginner, we can do much cheaper!
Test 1: Read policy texts and write up your thoughts
Most fields of policy will have a few legislative texts or government white papers that are central to all work currently being done on the topic.
A few examples of relevant texts for a few cause areas and contexts:
EU AI Policy: AI Act
US Development cooperation: USAID's 2023 policy framework
EU Animal Welfare: The European Commission's Staff Working Document on animal welfare
EU Biosecurity: DG HERA's 2023 work plan
Let's go with the example of EU AI Policy. The AI Act is available online in every European language. While the full document is >100 pages, the meat of the act is only about 20-30 pages or so (going off memory).
Read the document and try forming your own opinion of the act! What are its strengths and weaknesses? What would you change to improve it?
For now, don't worry too much about the quality of the output. A well informed inside view takes more than a weekend to develop!
Instead reflect over which parts of the exercise you found yourself the most engaged. If you found the exercise generally enjoyable once you got started, that's a sign you might be a good fit for policy work!
Additionally, digging into the source material is necessary to forming original views and will make you stand out to future employers. The object level of policy is underrated!My hope is that the exercise will leave you with a bunch of open questions you would like to further explore. How exactly did EU's delegated acts work again? What was the Parliament's response to the Commission's leaked working document?
If you keep pursuing the questions you're interested in, you'll soon find yourself nearing the frontier of knowledge for your area of policy interest. Once you find yourself with a question you can't find a good answer to, you might have stumbled good project to further explore your fit :)
Test 2: Follow a committee hearing
Parliaments typically have topic-based committees where members of the parliament debate current issues and legislation relevant to the committee. These debates are often publicly available on the parliament's website.
Try listening to a debate on the topic of your interest. What are the contentions? What arguments are used by each side? If you were to give the next speech, how would you argue for your own views?
If yo...


