

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Jul 19, 2024 • 2min
LW - Linkpost: Surely you can be serious by kave
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Linkpost: Surely you can be serious, published by kave on July 19, 2024 on LessWrong.
Adam Mastroianni writes about "actually caring about stuff, and for the right reasons", rather than just LARPing. The opening is excerpted below.
I once saw someone give a talk about a tiny intervention that caused a gigantic effect, something like, "We gave high school seniors a hearty slap on the back and then they scored 500 points higher on the SAT."
1
Everyone in the audience was like, "Hmm, interesting, I wonder if there were any gender effects, etc."
I wanted to get up and yell: "EITHER THIS IS THE MOST POTENT PSYCHOLOGICAL INTERVENTION EVER, OR THIS STUDY IS TOTAL BULLSHIT."
If those results are real, we should start a nationwide backslapping campaign immediately. We should be backslapping astronauts before their rocket launches and Olympians before their floor routines. We should be running followup studies to see just how many SAT points we can get - does a second slap get you another 500? Or just another 250? Can you slap someone raw and turn them into a genius?
Or - much more likely - the results are not real, and we should either be a) helping this person understand where they screwed up in their methods and data analysis, or b) kicking them out for fraud.
Those are the options. Asking a bunch of softball questions ("Which result was your favorite?") is not a reasonable response. That's like watching someone pull a rabbit out of a hat actually for real, not a magic trick, and then asking them, "What's the rabbit's name?"
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jul 18, 2024 • 1h 23min
LW - AI #73: Openly Evil AI by Zvi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #73: Openly Evil AI, published by Zvi on July 18, 2024 on LessWrong.
What do you call a clause explicitly saying that you waive the right to whistleblower compensation, and that you need to get permission before sharing information with government regulators like the SEC?
I have many answers.
I also know that OpenAI, having f***ed around, seems poised to find out, because that is the claim made by whistleblowers to the SEC. Given the SEC fines you for merely not making an explicit exception to your NDA for whistleblowers, what will they do once aware of explicit clauses going the other way?
(Unless, of course, the complaint is factually wrong, but that seems unlikely.)
We also have rather a lot of tech people coming out in support of Trump. I go into the reasons why, which I do think is worth considering. There is a mix of explanations, and at least one very good reason.
Then I also got suckered into responding to a few new (well, not really new, but renewed) disingenuous attacks on SB 1047. The entire strategy is to be loud and hyperbolic, especially on Twitter, and either hallucinate or fabricate a different bill with different consequences to attack, or simply misrepresent how the law works, then use that, to create the illusion the bill is unliked or harmful.
Few others respond to correct such claims, and I constantly worry that the strategy might actually work. But that does not mean you, my reader who already knows, need to read all that.
Also a bunch of fun smaller developments. Karpathy is in the AI education business.
Table of Contents
1. Introduction.
2. Table of Contents.
3. Language Models Offer Mundane Utility. Fight the insurance company.
4. Language Models Don't Offer Mundane Utility. Have you tried using it?
5. Clauding Along. Not that many people are switching over.
6. Fun With Image Generation. Amazon Music and K-Pop start to embrace AI.
7. Deepfaketown and Botpocalypse Soon. FoxVox, turn Fox into Vox or Vox into Fox.
8. They Took Our Jobs. Take away one haggling job, create another haggling job.
9. Get Involved. OpenPhil request for proposals. Job openings elsewhere.
10. Introducing. Karpathy goes into AI education.
11. In Other AI News. OpenAI's Q* is now named Strawberry. Is it happening?
12. Denying the Future. Projects of the future that think AI will never improve again.
13. Quiet Speculations. How to think about stages of AI capabilities.
14. The Quest for Sane Regulations. EU, UK, The Public.
15. The Other Quest Regarding Regulations. Many in tech embrace The Donald.
16. SB 1047 Opposition Watch (1). I'm sorry. You don't have to read this.
17. SB 1047 Opposition Watch (2). I'm sorry. You don't have to read this.
18. Open Weights are Unsafe and Nothing Can Fix This. What to do about it?
19. The Week in Audio. Joe Rogan talked to Sam Altman and I'd missed it.
20. Rhetorical Innovation. Supervillains, oh no.
21. Oh Anthropic. More details available, things not as bad as they look.
22. Openly Evil AI. Other things, in other places, on the other hand, look worse.
23. Aligning a Smarter Than Human Intelligence is Difficult. Noble attempts.
24. People Are Worried About AI Killing Everyone. Scott Adams? Kind of?
25. Other People Are Not As Worried About AI Killing Everyone. All glory to it.
26. The Lighter Side. A different kind of mental gymnastics.
Language Models Offer Mundane Utility
Let Claude write your prompts for you. He suggests using the Claude prompt improver.
Sully: convinced that we are all really bad at writing prompts
I'm personally never writing prompts by hand again
Claude is just too good - managed to feed it evals and it just optimized for me
Probably a crude version of dspy but insane how much prompting can make a difference.
Predict who will be the shooting victim. A machine learning model did this for citizens of Chicago (a ...

Jul 18, 2024 • 40min
LW - Individually incentivized safe Pareto improvements in open-source bargaining by Nicolas Macé
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Individually incentivized safe Pareto improvements in open-source bargaining, published by Nicolas Macé on July 18, 2024 on LessWrong.
Summary
Agents might fail to peacefully trade in high-stakes negotiations. Such bargaining failures can have catastrophic consequences, including great power conflicts, and AI flash wars. This post is a distillation of DiGiovanni et al. (2024) (DCM), whose central result is that agents that are sufficiently transparent to each other have individual incentives to avoid catastrophic bargaining failures.
More precisely, DCM constructs strategies that are plausibly individually incentivized, and, if adopted by all, guarantee each player no less than their least preferred trade outcome. Figure 0 below illustrates this.
This result is significant because artificial general intelligences (AGIs) might (i) be involved in high-stakes negotiations, (ii) be designed with the capabilities required for the type of strategy we'll present, and (iii) bargain poorly by default (since bargaining competence isn't necessarily a direct corollary of intelligence-relevant capabilities).
Introduction
Early AGIs might fail to make compatible demands with each other in high-stakes negotiations (we call this a "bargaining failure"). Bargaining failures can have catastrophic consequences, including great power conflicts, or AI triggering a flash war. More generally, a "bargaining problem" is when multiple agents need to determine how to divide value among themselves.
Early AGIs might possess insufficient bargaining skills because intelligence-relevant capabilities don't necessarily imply these skills: For instance, being skilled at avoiding bargaining failures might not be necessary for taking over. Another problem is that there might be no single rational way to act in a given multi-agent interaction. Even arbitrarily capable agents might have different priors, or different approaches to reasoning under bounded computation.
Therefore they might fail to solve equilibrium selection, i.e., make incompatible demands (see Stastny et al. (2021) and Conitzer & Oesterheld (2023)). What, then, are sufficient conditions for agents to avoid catastrophic bargaining failures?
Sufficiently advanced AIs might be able to verify each other's decision algorithms (e.g. via verifying source code), as studied in open-source game theory. This has both potential downsides and upsides for bargaining problems. On one hand, transparency of decision algorithms might make aggressive commitments more credible and thus more attractive (see Sec. 5.2 of Dafoe et al. (2020) for discussion).
On the other hand, agents might be able to mitigate bargaining failures by verifying cooperative commitments.
Oesterheld & Conitzer (2022)'s safe Pareto improvements[1] (SPI) leverages transparency to reduce the downsides of incompatible commitments.
In an SPI, agents conditionally commit to change how they play a game relative to some default such that everyone is (weakly) better off than the default with certainty.[2] For example, two parties A and B who would otherwise go to war over some territory might commit to, instead, accept the outcome of a lottery that allocates the territory to A with the probability that A would have won the war (assuming this probability is common knowledge). See also our extended example below.
Oesterheld & Conitzer (2022) has two important limitations: First, many different SPIs are in general possible, such that there is an "SPI selection problem", similar to the equilibrium selection problem in game theory (Sec. 6 of Oesterheld & Conitzer (2022)).
And if players don't coordinate on which SPI to implement, they might fail to avoid conflict.[3] Second, if expected utility-maximizing agents need to individually adopt strategies to implement an SPI, it's unclear what conditions...

Jul 18, 2024 • 25min
LW - Mech Interp Lacks Good Paradigms by Daniel Tan
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Mech Interp Lacks Good Paradigms, published by Daniel Tan on July 18, 2024 on LessWrong.
Note: I wrote this post rather quickly as an exercise in sharing rough / unpolished thoughts. I am also not an expert on some of the things I've written about. If you spot mistakes or would like to point out missed work / perspectives, please feel free!
Note 2: I originally sent this link to some people for feedback, but I was having trouble viewing the comments on the draft. The post was also in a reasonably complete state, so I decided to just publish it - and now I can see the comments! If you're one of those people, feedback is still very much welcome!
Mechanistic Interpretability (MI) is a popular and rapidly growing field of technical AI safety research. As a field, it's extremely accessible, requiring comparatively few computational resources, and facilitates rapid learning, due to a very short feedback loop. This means that many junior researchers' first foray into AI safety research is in MI (myself included); indeed, this occurs to the extent where some people feel MI is over-subscribed relative to other technical agendas.
However, how useful is this MI research?
A very common claim on MI's theory of impact (ToI) is that MI helps us advance towards a "grand unifying theory" (GUT) of deep learning. One of my big cruxes for this ToI is whether MI admits "paradigms" which facilitate correct thinking and understanding of the models we aim to interpret.
In this post, I'll critically examine several leading candidates for "paradigms" in MI, consider the available evidence for / against, and identify good future research directions (IMO). At the end, I'll conclude with a summary of the main points and an overview of the technical research items I've outlined.
Towards a Grand Unifying Theory (GUT) with MI
Proponents of this argument believe that, by improving our basic understanding of neural nets, MI yields valuable insights that can be used to improve our agents, e.g. by improving architectures or by improving their training processes. This allows us to make sure future models are safe and aligned.
Some people who have espoused this opinion:
Richard Ngo has argued here that MI enables "big breakthroughs" towards a "principled understanding" of deep learning.
Rohin Shah has argued here that MI builds "new affordances" for alignment methods.
Evan Hubinger has argued for MI here because it helps us identify "unknown unknowns".
Leo Gao argues here that MI aids in "conceptual research" and "gets many bits" per experiment.
As a concrete example of work that I think would not have been possible without fundamental insights from MI: steering vectors, a.k.a. representation engineering, and circuit breakers, which were obviously inspired by the wealth of work in MI demonstrating the linear representation hypothesis.
It's also important to remember that the value of fundamental science often seems much lower in hindsight, because humans quickly adjust their perspectives. Even if MI insights seem like common sense to us nowadays, their value in instrumenting significant advances can't be overstated.
(Aside) A corollary of this argument is that MI could likely have significant capabilities externalities. Becoming better at building powerful and instruction-aligned agents may inadvertently accelerate us towards AGI. This point has been made in depth elsewhere, so I won't elaborate further here.
A GUT Needs Paradigms
Paradigm - an overarching framework for thinking about a field
In his seminal book, The Structure of Scientific Revolution, Thomas Kuhn catalogues scientific progress in many different fields (spanning physics, chemistry, biology), and distills general trends about how these fields progress. Central to his analysis is the notion of a "paradigm" - an overarching framework for th...

Jul 18, 2024 • 32min
AF - A List of 45+ Mech Interp Project Ideas from Apollo Research's Interpretability Team by Lee Sharkey
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A List of 45+ Mech Interp Project Ideas from Apollo Research's Interpretability Team, published by Lee Sharkey on July 18, 2024 on The AI Alignment Forum.
Why we made this list:
The interpretability team at Apollo Research wrapped up a few projects recently[1]. In order to decide what we'd work on next, we generated a lot of different potential projects. Unfortunately, we are computationally bounded agents, so we can't work on every project idea that we were excited about!
Previous lists of project ideas (such as Neel's collation of
200 Concrete Open Problems in Mechanistic Interpretability) have been very useful for people breaking into the field. But for all its merits, that list is now over a year and a half old. Therefore, many project ideas in that list aren't an up-to-date reflection of what some researchers consider the frontiers of mech interp.
We therefore thought it would be helpful to share our list of project ideas!
Comments and caveats:
Some of these projects are more precisely scoped than others. Some are vague, others are more developed.
Not every member of the team endorses every project as high priority. Usually more than one team member supports each one, and in many cases most of the team is supportive of someone working on it.
We associate the person(s) who generated the project idea to each idea.
We've grouped the project ideas into categories for convenience, but some projects span multiple categories. We don't put a huge amount of weight on this particular categorisation.
We hope some people find this list helpful!
We would love to see people working on these! If any sound interesting to you and you'd like to chat about it, don't hesitate to reach out.
Foundational work on sparse dictionary learning for interpretability
Transcoder-related project ideas
See
[2406.11944] Transcoders Find Interpretable LLM Feature Circuits)
[Nix] Training and releasing high quality transcoders.
Probably using top k
GPT2 is a classic candidate for this. I'd be excited for people to try hard on even smaller models, e.g. GELU 4L
[Nix] Good tooling for using transcoders
Nice programming API to attribute an input to a collection of paths (see
Dunefsky et al)
Web user interface? Maybe in collaboration with neuronpedia. Would need a gpu server constantly running, but I'm optimistic you could do it with a ~a4000.
[Nix] Further circuit analysis using transcoders.
Take random input sequences, run transcoder attribution on them, examine the output and summarize the findings.
High level summary statistics of how much attribution goes through error terms & how many pathways are needed would be valuable
Explaining specific behaviors (IOI, greater-than) with high standards for specificity & faithfulness. Might be convoluted if accuracy
[I could generate more ideas here, feel free to reach out
nix@apolloresearch.ai]
[Nix, Lee] Cross layer superposition
Does it happen? Probably, but it would be nice to have specific examples! Look for features with similar decoder vectors, and do exploratory research to figure out what exactly is going on.
What precisely does it mean? Answering this question seems likely to shed light on the question of 'What is a feature?'.
[Lucius] Improving transcoder architectures
Some MLPs or attention layers may implement a simple linear transformation in addition to actual computation. If we modify our transcoders to include a linear 'bypass' that is not counted in the sparsity penalty, do we improve performance since we are not unduly penalizing these linear transformations that would always be present and active?
If we train multiple transcoders in different layers at the same time, can we include a sparsity penalty for their interactions with each other, encouraging a decomposition of the network that leaves us with as few interactions between features a...

Jul 18, 2024 • 10min
LW - We ran an AI safety conference in Tokyo. It went really well. Come next year! by Blaine
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We ran an AI safety conference in Tokyo. It went really well. Come next year!, published by Blaine on July 18, 2024 on LessWrong.
Abstract
Technical AI Safety 2024 (TAIS 2024) was a conference organised by
AI Safety 東京 and
Noeon Research, in collaboration with
Reaktor Japan,
AI Alignment Network and
AI Industry Foundation. You may have heard of us through ACX.
The goals of the conference were
1. demonstrate the practice of technical safety research to Japanese researchers new to the field
2. share ideas among established technical safety researchers
3. establish a good international reputation for AI Safety 東京 and Noeon Research
4. establish a Schelling conference for people working in technical safety
We sent out a survey after the conference to get feedback from attendees on whether or not we achieved those goals. We certainly achieved goals 1, 2 and 3; goal 4 remains to be seen. In this post we give more details about the conference, share results from the feedback survey, and announce our intentions to run another conference next year.
Okay but like, what was TAIS 2024?
Technical AI Safety 2024 (TAIS 2024) was a small non-archival open academic conference structured as a lecture series. It ran over the course of 2 days from April 5th-6th 2024 at the International Conference Hall of the Plaza Heisei in Odaiba, Tokyo.
We had
18 talks covering 6 research agendas in technical AI safety:
Mechanistic Interpretability
Developmental Interpretability
Scaleable Oversight
Agent Foundations
Causal Incentives
ALIFE
…including talks from Hoagy Cunningham (Anthropic), Noah Y. Siegel (DeepMind), Manuel Baltieri (Araya), Dan Hendrycks (CAIS), Scott Emmons (CHAI), Ryan Kidd (MATS), James Fox (LISA), and Jesse Hoogland and Stan van Wingerden (Timaeus).
In addition to our invited talks, we had 25
submissions, of which 19 were deemed relevant for presentation. 5 were offered talk slots, and we arranged a poster session to accommodate the remaining 14. In the end, 7 people presented posters, 5 in person and 2 in absentia. Our best poster award was won jointly by Fazl Berez for
Large Language Models Relearn Removed Concepts and Alex Spies for
Structured Representations in Maze-Solving Transformers.
We had 105 in-person attendees (including the speakers). Our live streams had around 400 unique viewers, and maxed out at 18 concurrent viewers.
Recordings of the conference talks are hosted on
our youtube channel.
How did it go?
Very well, thanks for asking!
We sent out a feedback survey after the event, and got 68 responses from in-person attendees (58% response rate). With the usual caveats that survey respondents are not necessarily a representative sample of the population:
Looking good! Let's dig deeper.
How useful was TAIS 2024 for those new to the field?
Event satisfaction was high across the board, which makes it hard to tell how relatively satisfied population subgroups were. Only those who identified themselves as "new to AI safety" were neutrally satisfied, but the newbies were also the most likely to be highly satisfied.
It seems that people new to AI safety had no more or less trouble understanding the talks than those who work for AI safety organisations or have published AI safety research:
They were also no more or less likely to make new research collaborations:
Note that there is substantial overlap between some of these categories, especially for categories that imply a strong existing relationship to AI safety, so take the above charts with a pinch of salt:
Total
New to AI safety
Part of the AI safety community
Employed by an AI safety org
Has published AI safety research
New to AI safety
26
100%
19%
12%
4%
Part of the AI safety community
28
18%
100%
36%
32%
Employed by an AI safety org
20
15%
50%
100%
35%
Has published AIS research
13
8%
69%
54%
100%
Subjectively, it fe...

Jul 18, 2024 • 19min
AF - SAEs (usually) Transfer Between Base and Chat Models by Connor Kissane
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: SAEs (usually) Transfer Between Base and Chat Models, published by Connor Kissane on July 18, 2024 on The AI Alignment Forum.
This is an interim report sharing preliminary results that we are currently building on. We hope this update will be useful to related research occurring in parallel.
Executive Summary
We train SAEs on base / chat model pairs and find that SAEs trained on the base model transfer surprisingly well to reconstructing chat activations (and vice versa) on Mistral-7B and Qwen 1.5 0.5B.
We also find that they don't transfer on
Gemma v1 2B, and are generally bad at reconstructing <1% of unusually high norm activations (e.g. BOS tokens) from the opposite model.
We fine-tune our base Mistral-7B SAE (on 5 million chat activations) to cheaply obtain an SAE with competitive sparsity and reconstruction fidelity to a chat SAE trained from scratch (on 800M tokens).
We open source base, chat, and fine-tuned SAEs (plus wandb runs) for Mistral-7B and Qwen 1.5 0.5B.[1]
Mistral 7B base SAEs,
Mistral 7B chat SAEs,
Mistral 7B base SAEs fine-tuned on chat
Qwen 1.5 0.5B base SAEs,
Qwen 1.5 0.5B chat SAEs,
Qwen 1.5 0.5B base SAEs fine-tuned on chat
We release accompanying evaluation code at
https://github.com/ckkissane/sae-transfer
Introduction
Fine-tuning is a common technique applied to improve frontier language models, however we don't actually understand what fine-tuning changes within the model's internals. Sparse Autoencoders are a popular technique to decompose the internal activations of LLMs into sparse, interpretable features, and may provide a path to zoom into the differences between base vs fine-tuned representations.
In this update, we share preliminary results studying the representation drift caused by fine-tuning with SAEs. We investigate whether SAEs trained to accurately reconstruct a base model's activations also accurately reconstruct activations from the model after fine-tuning (and vice versa). In addition to studying representation drift, we also think this is an important question to gauge the usefulness of sparse autoencoders as a general purpose technique.
One flaw of SAEs is that they are expensive to train, so training a new suite of SAEs from scratch each time a model is fine-tuned may be prohibitive. If we are able to fine-tune existing SAEs for much cheaper, or even just re-use them, their utility seems more promising.
We find that SAEs trained on the middle-layer residual stream of base models transfer surprisingly well to the corresponding chat model, and vice versa. Splicing in the base SAE to the chat model achieves similar CE loss to the chat SAE on both Mistral-7B and Qwen 1.5 0.5B. This suggests that the residual streams for these base and chat models are very similar.
However, we also identify cases where the SAEs don't transfer. First, the SAEs fail to reconstruct activations from the opposite model that have outlier norms (e.g. BOS tokens). These account for less than 1% of the total activations, but cause cascading errors, so we need to filter these out in much of our analysis. We also find that SAEs don't transfer on Gemma v1 2B.
We find that the difference in weights between Gemma v1 2B base vs chat is unusually large compared to other fine-tuned models, explaining this phenomenon.
Finally, to solve the outlier norm issue, we fine-tune a Mistral 7B base SAE on just 5 million tokens (compared to 800M token pre-training), to obtain a chat SAE of comparable quality to one trained from scratch, without the need to filter out outlier activations.
Investigating SAE Transfer between base and chat models
In this section we investigate if base SAEs transfer to chat models, and vice versa. We find that with the exception of outlier norm tokens (e.g. BOS), they transfer surprisingly well, achieving similar CE loss recovered to the orig...

Jul 18, 2024 • 2min
EA - Rethink Priorities' CEO announcement by Rethink Priorities
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Rethink Priorities' CEO announcement, published by Rethink Priorities on July 18, 2024 on The Effective Altruism Forum.
Rethink Priorities (RP) is excited to announce that Marcus A. Davis is now RP's sole CEO. Former Co-CEO Peter Wildeford will remain at RP, focusing on projects in artificial intelligence. He will also continue his work as Chief Advisory Executive at the RP-sponsored think tank, the Institute for AI Policy and Strategy (IAPS).
Since 2018, co-founders Marcus Davis and Peter Wildeford have served as Co-CEOs of RP. Their joint leadership has grown RP from a two-person research team into an international research organization with 60+ staff working around the world. Their guidance has helped expand RP's research areas to include animal welfare, global health and development, and artificial intelligence policy.
The decision to transition to this new leadership structure comes after discussions around the opportunities for RP's future growth, Peter's expertise and interests, and developments in the artificial intelligence landscape.
For further information, please refer to RP's website.
Rethink Priorities is a think-and-do tank dedicated to informing decisions made by high-impact organizations and funders across various cause areas. We invite you to explore our research
database and stay updated on new work by subscribing to our
newsletter.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jul 18, 2024 • 5min
LW - Friendship is transactional, unconditional friendship is insurance by Ruby
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Friendship is transactional, unconditional friendship is insurance, published by Ruby on July 18, 2024 on LessWrong.
It feels a little icky to say, but we befriend people because we get something out of it. We enjoy the company, the conversation, the emotional support, the activities, the connection, etc. It's not a coincidence people don't befriend brick walls.
(The same is true in romantic relationships, except we expect even more.)
Granted, friendship is not an explicit transaction that's negotiated, quantified, legally enforceable, etc. It's fuzzy, which helps it work better for reasons I won't really get into here[1].
However it's crucial to recognize that if your friend (or partner) didn't provide or promise you some kind of value[2], you wouldn't have become friends in the first place.
And yet, people valorize the notion of loyalty in relationships: continuing to be there through thick and thin, good and bad, health and illness. "Unconditional friendship" and "unconditional love". Conversely "fair weather friendship" is denigrated.
People hope to be loved even if they were worms.
What gives? How do we reconcile friendships and relationships arising due to receiving some value with the aspiration or even expectation of unconditionality?
My model here is something akin functionally to mutual insurance. While I became your friend because we spent years playing basketball together, I stay by your side even when you're recovering from a broken leg, or even if you were injured so badly as to never play again. Someone initially enticed by their partner's beauty, stays with them even after a horrific burn to the face. I do this because I expect the same in return.
You might argue that in these cases, you're still receiving other benefits even when one of them is lost, but I argue back that we see ongoing care even where there's almost nothing left, e.g. people caring for their senile, bedridden partners. And more so, that we judge people who don't stick it out.
Friendship is standardly a straightforward exchange of value provided. It is also an exchange of insurance "if you're not able to provide value to me, I'll still provide value to you" and vice versa. Like the other stuff in friendship, it's fuzzy. The insurance exchange doesn't happen in a discrete moment and its strength is quantitative and expected to grow over time. People expect more "loyalty" from friends and partners of years than weeks.
In the limit, people reach "unconditional love", meaning something like from this point on, I will love you no matter what. However, reaching that willingness was very probably tied to specific conditional factors. It's notable that for many people love and security are connected.
Sufficiently loving and supportive relationships provide security because they imply an unconditionality on circumstances - you'll have someone even if fortune befalls you and you lose what makes you appealing in the first place.
I think this makes sense. Seems like good game theoretic trade even with a willing partner. "Till death do us part." Possibly worth making a little more explicit though, just to be sure your friends and partners share whatever expectations of loyalty you have.
Note that I don't think this dynamic needs to be very conscious on anyone's part. I think that humans instinctively execute good game theory because evolution selected for it, even if the human executing just feels a wordless pull to that kind of behavior. In this context, "attachment to others" feels like a thing that humans and other animals experience.
Parents, perhaps especially mothers, are very attached to their children (think of the mother bear), but we tend to form attachments to anyone (or thing) that we're persistently around. When I stick with my friend of many years through his illness, it might feel ...

Jul 17, 2024 • 6min
LW - What are you getting paid in? by Austin Chen
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What are you getting paid in?, published by Austin Chen on July 17, 2024 on LessWrong.
Crossposting this essay by my friend, Leila Clark.
A long time ago, a manager friend of mine wrote a book to collect his years of wisdom. He never published it, which is a shame because it was full of interesting insights. One that I think a lot about today was the question: "How are you paying your team?"
This friend worked in finance. You might think that people in finance, like most people, are paid in money. But it turns out that even in finance, you can't actually always pay and motivate people with just money.
Often, there might just not be money to go around. Even if there is, managers are often captive to salary caps and performance bands. In any case, it's awkward to pay one person ten times more than another, even if one person is clearly contributing ten times more than the other (many such cases exist).
With this question, my manager friend wanted to point out that you can pay people in lots of currencies. Among other things, you can pay them in quality of life, prestige, status, impact, influence, mentorship, power, autonomy, meaning, great teammates, stability and fun. And in fact most people don't just want to be paid in money - they want to be paid some mixture of these things.
To demonstrate this point, take musicians and financiers.
A successful financier is much, much richer in dollars than a successful musician. Some googling suggests that Mitski and Grimes, both very successful alternative musicians, have net worths of about $3-5m. $5m is
barely notable in the New York high society circles that most financiers run in. Even Taylor Swift, maybe one of the most successful musicians of all times, has a net worth of generously $1b; Ken Griffin, one of the most successful financiers of all time, has a net worth of $33b.
But more people want to be musicians, and I think it's because musicians are paid in ways that financiers aren't.
Most obviously, musicians are way cooler. They get to interact with their fans. People love their work. They naturally spend their days hanging out with other cool people - other musicians. They can work on exactly what they want to, largely when they want to - they've won the American Dream because they get to work on what they love and get paid! And in that way, they get paid in radical self-expression.
(This is a little unfair, because I know some financiers who think that work is a means of radical self-expression. Knowing their personalities, I believe them, but it doesn't help them get tables at fancy New York restaurants the way Taylor can.)
I don't want to be too down on finance. People are different, and it's a good fact about the world that different people can be paid in different ways. My math genius friends would hate interacting with most fans and musicians. They instead have stable jobs, rent beautiful apartments in New York and solve fun technical problems all day with their friends. That's exactly how they want to get paid.
But when I worked in finance, people would sometimes shake their heads and ask why bright 20-year-olds would take the huge risk of moving to New York for unstable and uncertain careers as musicians, actors, or starving artists. I probably asked this question myself, when I was younger. Hopefully this provides some insight to the financiers.
So how do you make sure you get paid the way you want to? From what I can tell, the best way is to pick the right industry. It's fairly straightforward to tell how an industry pays. Politics pays in power. Finance pays in money. Music and art pay in 'coolness.' Nonprofit work, teaching and healthcare pay in meaning and, a friend reports, sometimes a sense of superiority over others too.
There's an exchange rate between many of the currencies you can get paid in, but ...


