

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Feb 2, 2024 • 2min
AF - Survey for alignment researchers: help us build better field-level models by Cameron Berg
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Survey for alignment researchers: help us build better field-level models, published by Cameron Berg on February 2, 2024 on The AI Alignment Forum.
AE
Studio is launching a
short, anonymous survey for alignment researchers, in order to develop a stronger model of various field-level dynamics in alignment.
This appears to be an interestingly neglected research direction that we believe will yield specific and actionable insights related to the community's technical views and more general characteristics.
The survey is a straightforward 5-10 minute Google Form with some simple multiple choice questions.
For every alignment researcher who completes the survey, we will donate $40 to a high-impact AI safety organization of your choosing (see specific options on the survey). We will also send each alignment researcher who wants one a customized report that compares their personal results to those of the field.
Together, we hope to not only raise some money for some great AI safety organizations, but also develop a better field-level model of the ideas and people that comprise alignment research.
We will open-source all data and analyses when we publish the results. Thanks in advance for participating and for sharing this around with other alignment researchers!
Survey full link:
https://forms.gle/d2fJhWfierRYvzam8
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

Feb 2, 2024 • 2min
LW - Most experts believe COVID-19 was probably not a lab leak by DanielFilan
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Most experts believe COVID-19 was probably not a lab leak, published by DanielFilan on February 2, 2024 on LessWrong.
The Global Catastrophic Risks Institute conducted an anonymous survey of relevant experts on whether they thought COVID was more likely caused by a lab accident (aka lab leak) or zoonotic spillover. Their summary, bolding is mine:
The study's experts overall stated that the COVID-19 pandemic most likely originated via a natural zoonotic event, defined as an event in which a non-human animal infected a human, and in which the infection did not occur in the course of any form of virological or biomedical research.
The experts generally gave a lower probability for origin via a research-related accident, but most experts indicated some chance of origin via accident and about one fifth of the experts stated that an accident was the more likely origin. These beliefs were similar across experts from different geographic and academic backgrounds.
The experts mostly expressed the view that more research on COVID-19's origin could be of value. About half of the experts stated that major gaps still remain in the understanding COVID-19's origin, and most of the other experts also stated that some research is still needed. About 40% of experts stated that clarity on COVID-19 origins would provide a better understanding of the potential origins of future pandemics. Given clarity on COVID-19's origin, experts also proposed a variety of governance changes for addressing future pandemics, including measures to prevent initial human infection, measures to prevent initial infection from becoming pandemic, and measures to mitigate the harm once the pandemic occurs.
The vast majority of the experts express the belief that a natural zoonotic event will likely be the origin of the next pandemic.
The experts also provided a set of clear recommendations for preventing, preparing for and responding to future pandemics, which generally align with many previous studies.
Link to the main report is here, and link to their (much longer) methodological and analytical annex is here.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 2, 2024 • 3min
LW - On Not Requiring Vaccination by jefftk
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Not Requiring Vaccination, published by jefftk on February 2, 2024 on LessWrong.
A friend recently wrote that they wouldn't be attending a mask-optional contra dance weekend I'm playing at because it doesn't require vaccination. As an organizer of unrelated dance events, which also don't require vaccination, here's how I see the situation.
When the covid vaccines first came out they were a huge improvement over the status quo. Getting the vaccine reduced your risk of severe illness or death, reduced your chance of catching it, and reduced your chance of giving it to others after you were sick. Our events initially required vaccination, which I think was the right call.
At this point, however, there are a few different things you might mean if you say your event requires vaccination. Usually it's either:
A complete primary series counts, even if it's one shot of J&J from early 2021.
The most recent ("updated") booster is required.
The CDC used to call the first category "fully vaccinated", but no longer talks about it prominently. They've switched to focusing on the second one, which they call "up to date".
This change makes sense: from a perspective of avoiding getting infected and passing it on, only a recent booster does very much. A few months ago I wrote about the results in Menegale et. al (2023) where they saw that as an infection control measure vaccines wane quickly: effectiveness halves about three times a year.
Additionally, lots of people got sick this winter, which acts similarly to an a vaccination. Given how quickly the vaccine wears off, I'd be less concerned about risk from someone unvaccinated who'd had covid over Christmas than someone who got their booster in the early Fall and dodged the winter wave.
What does this mean for a dance event?
If you want a vaccination requirement to be doing anything useful, you need to require people be up-to-date with their vaccine. Risk from someone last boosted in Fall 2022 or before is not appreciably different from someone who was never vaccinated.
Requiring up-to-date vaccination rules out 80% of people 18+, though if they're excited enough about your event they could go get the shot.
Unless your event is in the fall or early winter, even requiring up-to-date vaccination doesn't help much: if someone got their "updated" booster in the fall they're down to ~35% of peak efficacy by the time of an April event.
If an event does want to reduce risk of transmission, I'd recommend considering some combination of ventilation, filters, tests, and masking.
Comment via: facebook, lesswrong, mastodon
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 2, 2024 • 27min
LW - Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small by Joseph Bloom
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small, published by Joseph Bloom on February 2, 2024 on LessWrong.
This work was produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, under mentorship from Neel Nanda and Arthur Conmy. Funding for this work was provided by the Manifund Regranting Program and donors as well as LightSpeed Grants.
This is intended to be a fairly informal post sharing a set of Sparse Autoencoders trained on the residual stream of GPT2-small which achieve fairly good reconstruction performance and contain fairly sparse / interpretable features. More importantly, advice from Anthropic and community members has enabled us to train these fairly more efficiently / faster than before.
The specific methods that were most useful were: ghost gradients, learning rate warmup, and initializing the decoder bias with the geometric median. We discuss each of these in more detail below.
5 Minute Summary
We're publishing a set of 12 Sparse AutoEncoders for the GPT2 Small residual stream.
These dictionaries have approximately 25,000 features each, with very few dead features (mainly in the early layers) and high quality reconstruction (log loss when the activations are replaced with the output is 3.3 - 3.6 as compared with 3.3 normally).
The L0's range from 5 in the first layer to 70 in the 9th SAE (increasing by about 5-10 per layer and dropping in the last two layers.
By choosing a fixed dictionary size, we can see how statistics like the number of dead features or reconstruction cross entropy loss change with layer giving some indication of how properties of the feature distribution change with layer depth.
We haven't yet extensively analyzed these dictionaries, but will share automatically generated dashboards we've generated.
Readers can access the Sparse Autoencoder weights in this
HuggingFace Repo. Training code and code for loading the weights / model and data loaders can be found in this
Github Repository. Training curves and feature dashboards can also be found in this
wandb report. Users can download all 25k feature dashboards generated for layer 2 and 10 SAEs and the first 5000 of the layer 5 SAE features
here (note the left hand of column of the dashboards should currently be ignored).
Layer
Variance Explained
L1 loss
L0*
% Alive Features
Reconstruction
CE Log Loss
0
99.15%
4.58
12.24
80.0%
3.32
1
98.37%
41.04
14.68
83.4%
3.33
2
98.07%
51.88
18.80
80.0%
3.37
3
96.97%
74.96
25.75
86.3%
3.48
4
95.77%
90.23
33.14
97.7%
3.44
5
94.90%
108.59
43.61
99.7%
3.45
6
93.90%
136.07
49.68
100%
3.44
7
93.08%
138.05
57.29
100%
3.45
8
92.57%
167.35
65.47
100%
3.45
9
92.05%
198.42
71.10
100%
3.45
10
91.12%
215.11
53.79
100%
3.52
11
93.30%
270.13
59.16
100%
3.57
Original Model
3.3
Summary Statistics for GPT2 Small Residual Stream SAEs. *L0 = Average number of features firing per token.
Training SAEs that we were happy with used to take much longer than it is taking us now. Last week, it took me 20 hours to train a 50k feature SAE on 1 billion tokens and over the weekend it took 3 hours for us to train 25k SAE on 300M tokens with similar variance explained, L0 and CE loss recovered.
We attribute the improvement to having implemented various pieces of advice that have made our lives a lot easier:
Ghost Gradients / Avoiding Resampling: Prior to ghost gradients (which we were made aware of last week in the Anthropic January Update), we were training SAEs with approximately 50k features on 1 billion tokens with 3 resampling events to reduce the number of dead features. This took around 20 hours and might cost about $10 with an A6000 GPU. With ghost gradients, we don't need to resample (or wait for loss curves to plateau after resampling). Now we can train on only 300M tokens instead. Simultaneously, since we now...

Feb 2, 2024 • 2min
LW - Wrong answer bias by lukehmiles
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Wrong answer bias, published by lukehmiles on February 2, 2024 on LessWrong.
The correct answer leaves nothing to be said. The wrong answer starts a conversation, a research program, an investigation, a journey, an institute, a paper, a book, a youtube channel, a lifestyle, a tribe.
Why is reading a textbook so boring? Very few people do it. It just has right answers and few great questions to ponder. Blogs arguing about nutrition are great reading though.
If you write a research paper saying "all that shit is dumb just do the obvious thing" then you'll probably have trouble getting it published. I tried once but my professor shut it down saying it's a bad strat basically.
"Clean energy" is great as a conversation piece or phrase in your mission statement but "just deregulate nuclear lmao why are you wasting your time" is definitely not in my experience.
It's fine if bad ideas are all we ever talk about but the trouble comes when it's time for someone to sit down and do their work. The doctor trying to cure a thing mostly heard about the RCTs on all the shitty methods that barely do anything, so they pick their favorite of those. The AI safety implementer mostly heard discussions about bad methods and probably tries to patch one of those. (And fixing the mistake in a bad method is a not a good way to make a good method.) The parents in the parent group of course mostly talk about their failed attempts to fix the problems they do have, and mostly forget about the problems they never had or quickly fixed.
Have hope, you can combat wrong answer bias with these simple tricks.
Do not write or speak
Do not read or listen
Hate the process of fixing things
Savor the experience of shit working like it should
Act like you have no time or money and won't ever have any (helps if you don't and won't)
Assume your answer is too complicated
Assume your question is too loaded
Assume any task is actually really easy you're doing it wrong
Hate struggle
Hate nuance
If you look impressive then you're doing it wrong
Love true success
If it looks like you barely did anything at all then you're doing it right
Observe people & orgs who get shit done or don't have a problem you have
Turn your back on anyone struggling with anything for any reason. They're probably looking for a frend or comunity instead of just focusing on nailing shit all the time like you.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 2, 2024 • 2min
LW - Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis by simeon c
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis, published by simeon c on February 2, 2024 on LessWrong.
The programme thesis of Davidad's agenda to develop provably safe AI has just been published. You can find extra details by downloading this doc. For context, Davidad is a Programme Director at ARIA who will grant somewhere between £10M and £50M over the next 3 years to pursue his research agenda.
It is the most comprehensive public document detailing his agenda to date.
Here's the most self-sufficient graph explaining it at a high-level although you'll have to dive into the details and read it several times to start grasping the many dimensions of it.
I'm personally very excited by Davidad's moonshot that I currently see as the most credible alternative to scaled transformers, which I consider to be too flawed to be a credible safe path, mostly because:
Ambitious LLM interpretability seems very unlikely to work out:
Why: the failed attempts at making meaningful progress of the past few years + the systematic wall of understanding of ~80% of what's going on across reverse engineering attempts
Adversarial robustness to jailbreak seems unlikely to work out:
Why: failed attempts at solving it + a theoretical paper of early 2023 that I can't find right now + increasingly large context windows
Safe generalization with very high confidence seems quite unlikely to work out
Why: absence of theory on transformers + weak interpretability
A key motivation for pursuing moonshots a la Davidad is, as he explains in his thesis, to shift the incentives from the current race to the bottom, by derisking credible paths to AI systems where we have strong reasons to expect confidence in the safety of systems. See the graph below:
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 2, 2024 • 27min
AF - Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small by Joseph Isaac Bloom
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small, published by Joseph Isaac Bloom on February 2, 2024 on The AI Alignment Forum.
UPDATE: Since we posted this last night, someone pointed out that our implementation of ghost grads has a non-trivial error (which makes the results a-priori quite surprising). We computed the ghost grad forward pass using Exp(Relu(W_enc(x)[dead_neuron_mask])) rather than Exp((W_enc(x)[dead_neuron_mask])). I'm running some ablation experiments now to get to the bottom of this.
This work was produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, under mentorship from Neel Nanda and Arthur Conmy. Funding for this work was provided by the Manifund Regranting Program and donors as well as LightSpeed Grants.
This is intended to be a fairly informal post sharing a set of Sparse Autoencoders trained on the residual stream of GPT2-small which achieve fairly good reconstruction performance and contain fairly sparse / interpretable features. More importantly, advice from Anthropic and community members has enabled us to train these fairly more efficiently / faster than before.
The specific methods that were most useful were: ghost gradients, learning rate warmup, and initializing the decoder bias with the geometric median. We discuss each of these in more detail below.
5 Minute Summary
We're publishing a set of 12 Sparse AutoEncoders for the GPT2 Small residual stream.
These dictionaries have approximately 25,000 features each, with very few dead features (mainly in the early layers) and high quality reconstruction (log loss when the activations are replaced with the output is 3.3 - 3.6 as compared with 3.3 normally).
The L0's range from 5 in the first layer to 70 in the 9th SAE (increasing by about 5-10 per layer and dropping in the last two layers.
By choosing a fixed dictionary size, we can see how statistics like the number of dead features or reconstruction cross entropy loss change with layer giving some indication of how properties of the feature distribution change with layer depth.
We haven't yet extensively analyzed these dictionaries, but will share automatically generated dashboards we've generated.
Readers can access the Sparse Autoencoder weights in this
HuggingFace Repo. Training code and code for loading the weights / model and data loaders can be found in this
Github Repository. Training curves and feature dashboards can also be found in this
wandb report. Users can download all 25k feature dashboards generated for layer 2 and 10 SAEs and the first 5000 of the layer 5 SAE features
here (note the left hand of column of the dashboards should currently be ignored).
Layer
Variance Explained
L1 loss
L0*
% Alive Features
Reconstruction
CE Log Loss
0
99.15%
4.58
12.24
80.0%
3.32
1
98.37%
41.04
14.68
83.4%
3.33
2
98.07%
51.88
18.80
80.0%
3.37
3
96.97%
74.96
25.75
86.3%
3.48
4
95.77%
90.23
33.14
97.7%
3.44
5
94.90%
108.59
43.61
99.7%
3.45
6
93.90%
136.07
49.68
100%
3.44
7
93.08%
138.05
57.29
100%
3.45
8
92.57%
167.35
65.47
100%
3.45
9
92.05%
198.42
71.10
100%
3.45
10
91.12%
215.11
53.79
100%
3.52
11
93.30%
270.13
59.16
100%
3.57
Original Model
3.3
Summary Statistics for GPT2 Small Residual Stream SAEs. *L0 = Average number of features firing per token.
Training SAEs that we were happy with used to take much longer than it is taking us now. Last week, it took me 20 hours to train a 50k feature SAE on 1 billion tokens and over the weekend it took 3 hours for us to train 25k SAE on 300M tokens with similar variance explained, L0 and CE loss recovered.
We attribute the improvement to having implemented various pieces of advice that have made our lives a lot easier:
Ghost Gradients / Avoiding Resampling: Prior to ghost gradients (which we were made aware of last week in the Anthropic Jan...

Feb 1, 2024 • 35min
AF - Evaluating Stability of Unreflective Alignment by james.lucassen
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Evaluating Stability of Unreflective Alignment, published by james.lucassen on February 1, 2024 on The AI Alignment Forum.
This post has an accompanying SPAR project! Apply here if you're interested in working on this with me.
Huge thanks to Mikita Balesni for helping me implement the MVP. Regular-sized thanks to Aryan Bhatt, Rudolph Laine, Clem von Stengel, Aaron Scher, Jeremy Gillen, Peter Barnett, Stephen Casper, and David Manheim for helpful comments.
0. Key Claims
Most alignment work today doesn't aim for alignment that is stable under value-reflection[1].
I think this is probably the most sensible approach to alignment.
But there is a threat model which could potentially be a serious obstacle to this entire class of alignment approaches, which is not currently being tracked or evaluated. It goes like this:
Long-horizon capabilities require a particular capability I call "stepping back" (described in more detail in the body of the post)
Stepping back seems likely to be learned in a very generalizable way
If stepping back generalizes in what seems like the simplest / most natural way, it will create a tendency for AIs to do value-reflection
If we're unable to prevent this value-reflection, it will cause any AI whose alignment is not stable under value-reflection to become misaligned
However, I'm quite uncertain about whether or not this threat model will go through, for a few reasons:
The chain of argument is highly conjunctive and therefore fragile.
The track record of anthropomorphic reasoning about which AI capabilities will imply which other AI behaviors seems poor.
There is an intuitive counterexample provided by humans, at least at human-level intelligence.
Even given this uncertainty, work on this threat model seems very neglected given the wide swath of highly-invested-in alignment approaches that it could affect.
I don't think further abstract argument about the likelihood of this threat model will reduce my uncertainty much. Instead, I propose some evals to keep track of each component of this threat model:
Evaluate the stepping back capabilities of frontier AIs in domains where they receive the most long-horizon training.
Evaluate the generalization of frontier AIs stepping back behavior to other task domains.
Evaluate how difficult it is to decouple the capabilities of frontier AIs between domains.
Evaluate how difficult it is to prevent frontier AIs from thinking certain "forbidden thoughts" even if those thoughts are incentivized by the task setting
1. Reflective vs Unreflective Alignment
I want to start by making an explicit distinction between two importantly different uses of the word "alignment". The distinction is between alignment that aims to be preserved even under value-reflection by the AI, and alignment that does not aim for that. I will call alignment approaches that aim for stability under value-reflection "reflective alignment", and approaches that do not "unreflective alignment".
One easy way to connect this distinction to your existing thinking is by checking if you think of alignment as a binary or as a spectrum. Unreflective alignment naturally admits a spectrum formulation - alignment which is stable under X amount of value-reflection (or other destabilizing pressures). Reflective alignment, on the other hand, is typically thought of as a binary[2] - either the AI converges to the intended values in the limit, or it goes to some other fixed point instead.
In the current alignment discussion, iterative prosaic approaches generally propose to work by ensuring "alignment increases faster than capabilities". This is clearly an alignment-as-spectrum framing, suggesting the goal in mind is unreflective alignment. For further confirmation, consider the type of step that is considered an increase in alignment - going from baseline GPT-3 t...

Feb 1, 2024 • 3min
EA - Managing risks while trying to do good by Wei Dai
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Managing risks while trying to do good, published by Wei Dai on February 1, 2024 on The Effective Altruism Forum.
I often think about "the road to hell is paved with good intentions".[1] I'm unsure to what degree this is true, but it does seem that people trying to do good have caused more negative consequences in aggregate than one might naively expect.[2] "Power corrupts" and "power-seekers using altruism as an excuse to gain power" are two often cited reasons for this, but I think don't explain all of it.
A more subtle reason is that even when people are genuinely trying to do good, they're not entirely aligned with goodness. Status-seeking is a powerful motivation for almost all humans, including altruists, and we frequently award social status to people for merely trying to do good, before seeing all of the consequences of their actions. This is in some sense inevitable as there are no good alternatives. We often need to award people with social status before all of the consequences play out, both to motivate them to continue to try to do good, and to provide them with influence/power to help them accomplish their goals.
A person who consciously or subconsciously cares a lot about social status will not optimize strictly for doing good, but also for appearing to do good. One way these two motivations diverge is in how to manage risks, especially risks of causing highly negative consequences. Someone who wants to appear to do good would be motivated to hide or downplay such risks, from others and perhaps from themselves, as fully acknowledging such risks would often amount to admitting that they're not doing as much good (on expectation) as they appear to be.
How to mitigate this problem
Individually, altruists (to the extent that they endorse actually doing good) can make a habit of asking themselves and others what risks they may be overlooking, dismissing, or downplaying.[3]
Institutionally, we can rearrange organizational structures to take these individual tendencies into account, for example by creating positions dedicated to or focused on managing risk. These could be risk management officers within organizations, or people empowered to manage risk across the EA community.[4]
Socially, we can reward people/organizations for taking risks seriously, or punish (or withhold rewards from) those who fail to do so. This is tricky because due to information asymmetry, we can easily create "risk management theaters" akin to "security theater" (which come to think of it, is a type of risk management theater).
But I think we should at least take notice when someone or some organization fails, in a clear and obvious way, to acknowledge risks or to do good risk management, for example not writing down a list of important risks to be mindful of and keeping it updated, or avoiding/deflecting questions about risk. More optimistically, we can try to develop a culture where people and organizations are monitored and held accountable for managing risks substantively and competently.
^
due in part to my family history
^
Normally I'd give some examples here, but we can probably all think of some from the recent past.
^
I try to do this myself in the comments.
^
an idea previously discussed by Ryan Carey and William MacAskill
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 1, 2024 • 4min
EA - Increasingly vague interpersonal welfare comparisons by MichaelStJules
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Increasingly vague interpersonal welfare comparisons, published by MichaelStJules on February 1, 2024 on The Effective Altruism Forum.
Some have argued that all interpersonal welfare comparisons should be possible or take it as a strong mark against a theory in which they are not all possible. Others have argued against their possibility, e.g. Hausman (
1995) for preference views. Here, I will illustrate an intermediate position: interpersonal welfare comparisons are vague, with tighter bounds on reasonable comparisons between beings whose welfare states are realized more similarly, and wider or no bounds when more different.
The obvious case is two completely or at least functionally identical brains (at the right level of abstraction for our functionalist theory). As long as we grant intrapersonal comparisons, then we should get interpersonal comparisons between identical brains. We map the first brain's state(s) to the equivalent state(s) in the second, and compare them in the second brain. Of course, this is not a very interesting case, and it seems only directly useful for artificial duplicates of minds.
Still, we can go further. Consider an experience E1 in brain B1 and an experience E2 in brain B2. If B1 and B2 only differ by the fact that some of B2's unpleasantness-contributing neurons are less sensitive or removed, and B1 and B2 receive the same input signals that cause pain, then it seems likely to me that B1's painful experience E1 is at least as unpleasant as B2's E2 and possibly more. We may be able to say roughly how much more unpleasant it is by comparing E2 in B2 directly to less intense states in B1, sandwiching E2 in unpleasantness between two states in B1.
Maybe going from E1 to E2 changes the unpleasantness by between -0.01 and 0, i.e. UnpleasantnessB2(E2)=UnpleasantnessB1(E1)+Δ, where 0.01Δ0. There may be no fact of the matter about the exact value of Δ.
For small enough local differences between brains, we could make fairly precise comparisons.
I use unpleasantness for the purpose of a more concrete illustration, but it's plausible other potential types of welfare could be used instead, like preferences. A slight difference in how some preferences are realized should typically result in a slight difference in the preferences themselves and how we value them, but the extent of the difference in value could be vague and only boundable by fairly tight inequalities. We can use the same example, too: a slight difference in how unpleasant a pain is through the same kinds of differences in neurons as above typically results in a slight difference in preferences about that pain and preference-based value.
In general, for arbitrary brains B1 and B2 and respective experiences E1 and E2, we can ask whether there's a sequence of changes from E1 and B1 to E2 and B2, possibly passing through different hypothetical intermediate brains and states, that lets us compare E1 and E2 by combining bounds and inequalities from each step along the sequence. Some changes could have opposite sign effects on the realized welfare, but with only bounds rather than precise values, the bounds widen between brains farther apart in the sequence.
For example, a change with a range of +1 to +4 in additional unpleasantness and a change with a range of -3 to -1 could give a net change between -2=+1-3 and +3=+4-1. Adding one more change of between +1 and +4 and another of between -3 and -1 gives between -4 and +6. Adding another change of between +2 and +3 gives between -2 and +9. The gap between the bounds widens with each additional change.[1]
The more or larger such changes are necessary to get from one brain to another, the less tight the bounds on the comparisons could become, the further they may go both negative and positive overall,[2] and the less reasonable it seems to mak...


