

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Mar 25, 2024 • 26min
EA - Long Reflection Reading List by Will Aldred
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Long Reflection Reading List, published by Will Aldred on March 25, 2024 on The Effective Altruism Forum.
This is a reading list on the following cluster of notions: "the long reflection", "the deployment problem", "structural risk", "post AGI governance", "ASI governance", "reflective governance", "metaphilosophy", "AI philosophical competence", "trajectory change", "macrostrategy", "worldview investigations", "grand challenges" and "the political philosophy of AI".
I claim that this area outscores regular AI safety on importance[1] while being significantly more neglected (and roughly the same in terms of tractability), making it perhaps the highest priority EA cause area.
I don't claim to be the ideal person to have made this reading list. The story behind how it came about is that two months ago, Will MacAskill wrote: "I think there's a lot of excitement about work in this broad area that isn't yet being represented in places like the Forum. I'd be keen for more people to start learning about and thinking about these issues." Intrigued, I spent some time trying to learn about the issues he was pointing to.
I then figured I'd channel the spirit of "EAs should post more summaries and collections": this reading list is an attempt to make the path easier for others to follow. Accordingly, it starts at the introductory level, but by the end the reader will be at the frontier of publicly available knowledge. (The frontier at the time of writing, at least.[2])
Note: in some places where I write "the long reflection," I'm using the term as shorthand to refer to the above cluster of notions.
Intro
Quotes about the long reflection - MichaelA (2020)[3]
The Precipice - Ord (2020)
Just chapter 7, including endnotes.
Beyond Maxipok - good reflective governance as a target for action - Cotton-Barratt (2024)
New Frontiers in Effective Altruism - MacAskill (2024)
This was a talk given at EAG Bay Area 2024. It doesn't appear to be available as a recording yet, but I'll add it if and when it goes up.
Quick take on Grand Challenges - MacAskill (2024)
The part about hiring is no longer relevant, but the research projects MacAskill outlines still give a sense for what good future work on grand challenges / the long reflection might look like.
Criticism of the long reflection idea:
'Long Reflection' Is Crazy Bad Idea - Hanson (2021)
Objections: What about "long reflection" and the division of labor? - Vinding (2022)
Just the highlighted section.
A comment by Wei Dai (2019a)
What might we be aiming for?
Is there moral truth? What should we do if not? What are human values, and how do they fit in?
Moral Uncertainty and the Path to AI Alignment with William MacAskill - AI Alignment Podcast by the Future of Life Institute (2018)
See also Shah (2018)'s summary and commentary.
See also this comment exchange between Michael Aird and Lukas Gloor (2020), which zooms in on the realism vs. antirealism wager and how it relates to the long reflection.
Complexity of value - LessWrong Wiki
Moral ~realism - Cotton-Barratt (2024)
Why should ethical anti-realists do ethics? - Carlsmith (2023)
Coherent extrapolated volition - Arbital
How to think about utopia?
Hedonium and computronium - EA Forum Wiki
Terms that tend to come up in discussions of utopia.
Why Describing Utopia Goes Badly - Karnofsky (2021)
Visualizing Utopia - Karnofsky (2021)
Characterising utopia - Ngo (2020)
Actually possible: thoughts on Utopia - Carlsmith (2021)
Deep Utopia - Bostrom (2024)
(If and when someone writes a summary of this book I'll add it to this reading list.)
Ideally, I would include at this point some readings on how aggregation might work for building a utopia, since this seems like an obvious and important point.
For instance, should the light cone be divided such that every person (or every moral patient more broad...

Mar 24, 2024 • 29min
LW - Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation by Benjamin Sturgeon
Benjamin Sturgeon discusses Vipassana meditation and active inference, exploring how meditation alters the mind based on the ideas of active inference. The podcast delves into topics like shaping world models through preferences and predictions, observing sensations to break the chain of suffering, and understanding Vipassana meditation to rewire reactions through observation.

Mar 24, 2024 • 12min
EA - Friendship as a sacred value by Michelle Hutchinson
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Friendship as a sacred value, published by Michelle Hutchinson on March 24, 2024 on The Effective Altruism Forum.
In 2022/3 there were a number of stressful events in and affecting the EA community, starting with the FTX crash. That led to people thinking about how to make a community one that you want to be a part of, and one in which people feel happy and safe - including, sometimes, wanting some people to leave or change how they interact with the community.
How to make sure a community thrives seems difficult. For other types of entities, there are clearly defined interventions. A company has a clear mandate to fire people who act against its interests and it's clear that that mandate should be carried out by managers at that company. Communities are in a pretty different situation.
There are some community cases which seem reasonably clear - for example, people organising community events should take some care to exclude people who are likely to cause harm at those events. But there are also questions around whether communities should try to take more generalisable action against particular individuals, in the sense of trying to encourage everyone to stop associating with them.
Some of the discussions I've seen around negative events in the community have at least implicitly pushed for coordinated action. Sometimes that's been in a backward looking way, like wishing SBF had been excluded from the EA community long ago. Sometimes it's been in a forward looking way: 'Are certain types of finance just inherently shady? Should we avoid associating with anyone working in those?'.
I've been feeling kind of angsty about engaging in conversations around this, and have so far had trouble pinning down why. I often think more clearly by writing, which is why I wrote this. I also thought others might have experienced similar internal tension, and if so maybe hearing someone else's reflection on it could be useful.
After thinking about it some, I realised that I think the discomfort is coming from the fact that what's sometimes going on in questions like the above is implicitly "at what point does morality get to tell you to break off a friendship?".[1] I think I intuitively hate that question. It seems important to me that who I spend non-work time with to be 'out of morality's reach' - I think it gets into the domain of what you might call 'sacred values'.[2]
What do I mean by sacred values?
It often feels kind of hard to know what the scope of effective altruism should be, because it feels like nothing is ever enough. But for most people it's not sustainable to be always optimising every part of life for helping others more.
A friend of mine resolves that tension by using the idea of 'sacred values'. Deciding that something is a 'sacred value' for you means treating that part of your life as something you're clearly permitted to have, regardless of whether foregoing it would allow you to help others more.[3]
I don't think 'sacred values' should be taken too literally. They're more of a useful cognitive manoeuvre for helping us deal with the weight of morality and how many different ways there are of helping others. Having sacred values might be a way of allowing yourself to dive into doing good effectively in a sustainable rather than overwhelming way. Periodically, in cool moments, you pick which areas to optimise in and which to keep for yourself.
Then day to day you don't have to stress over every possible way of helping others more.[4]
Sacred values differ between people. For one person, having children might be a sacred value - they simply plan to have children, regardless of whether they could help others more if they didn't. Another person might feel fine doing a careful calculation of how costly to the world them having children is likely to be, and make the decis...

Mar 24, 2024 • 14min
LW - General Thoughts on Secular Solstice by Jeffrey Heninger
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: General Thoughts on Secular Solstice, published by Jeffrey Heninger on March 24, 2024 on LessWrong.
I attended Secular Solstice in Berkeley last December.
My perspective is quite unusual: I live in a rationalist group house and work at an AI safety office, but I also am a Christian and attend church every week.[1]
I was originally not planning on going to Solstice, but a decent number of people (~5) told me in person that they would be particularly interested in my opinions of it. I realized that I was interested in learning what I would think of it too, so I went.
I took notes on my thoughts throughout the service.[2] This blog post is my broader thoughts on the experience. I also have blog posts for a fun little correction to one of the songs and my detailed notes & commentary.
Overarching Narrative
I do not agree with the overarching narrative presented at Solstice.
There is a narrative in my tradition about people becoming humble and turning to God. You can choose to be humble or you can be "compelled to be humble" by the difficult circumstances in life. I'm not super fond of this description because being humble and turning to God is always a choice. But there is some truth in it: many people do find themselves relying on God more and developing a deeper relationship with Him through the more difficult times in their lives.
The overarching narrative of Solstice felt like a transmogrified version of being compelled to be humble. The descent into darkness recognizes the problems of the human condition. Then, instead of turning to humility, it turns to a fulness of pride. We, humanity, through our own efforts, will solve all our problems, and become the grabby aliens we hope to be.
There is some caution before the night, learning to accept things we cannot change, but this caution melts away before the imagined light of the Great Transhumanist Future.
AI X-Risk and AI Transhumanism
Existential Risk
A major cause for concern leading into the night was existential risk from AI: the chance that future artificial intelligence systems might kill everyone. This was talked about more than any other problem.
I expect that the organizers and speakers of Solstice are significantly more doomy than the audience.[3] The audience itself probably has selection effects that make it more doomy than AI researchers, or forecasters, or other groups of people who have thought about this possibility.
It is often the case that people's beliefs are more determined by what is normal for people around them to believe, rather than personally considering the relevant arguments and evidence themselves. This is a problem for intellectual communities, and should be countered by encouraging each person to know for yourself whether these beliefs are true. Organizers and speakers at Solstice have an unusually large power to establish what is normal to believe in the rationalist community.
They promoted increased concern about AI x-risk in the community, not by arguing for this belief but by treating it as common knowledge.[4] Maybe they believe that this is justified, but it felt to me like a Dark Art of Persuasion.
Transhumanism
Solstice also promoted the Great Transhumanist Future. What exactly this involves was perhaps intentionally left vague, and mostly described in song. It involved a coder dismantling the sun, making branches of your presumably-uploaded self, streams of data across the galaxy, and computronium. This is not just transhumanism: it's AI-centered transhumanism.
There were also some parts of the transhumanism which were not explicitly computational: things like space colonization or human immortality. But overall, it felt like the route to hoped-for future ran through powerful AI.
This is ... not the future I hope for. I am probably more futuristic than most of the public, and am...

Mar 23, 2024 • 3min
LW - A Teacher vs. Everyone Else by ronak69
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Teacher vs. Everyone Else, published by ronak69 on March 23, 2024 on LessWrong.
A repairer wants your stuff to break down,
A doctor wants you to get ill,
A lawyer wants you to get in conflicts,
A farmer wants you to be hungry,
But there is only a teacher who wants you to learn.
Of course you see what is wrong with the above "argument / meme / good-thought". But the first time I came across this meme, I did not.
Until a month or two ago when this meme appeared in my head again and within seconds I discarded it away as fallacious reasoning. What was the difference this time? That I was now aware of the Conspiracy. And this meme happened to come up on one evening when I was thinking about fallacies and trying to practice my skills of methods of rationality.
If you are a teacher, and you read the meme, it will assign to you the Good Guy label. And if you are one of {repairer, doctor, lawyer, farmer, etc} then you get the Bad Guy label. There is also a third alternative in which you are neither --- say a teenager.
If you are not explicitly being labeled bad or good, then you may just move on like I did. Or maybe you put some detective effort and do realize the fallacies.
Depends on your culture: If your culture has tales like, "If your teacher and your God both are in front of you, who do you greet / bow to first?" and the right answer is "why of course my teacher because otherwise how would I know about God?" then you are just more likely to award a point to the already point-rich teacher-bucket and move on.
If you get called the Bad Guy, then you have a motivation to falsify the meme. And you will likely do so. This meme does look highly fragile in hindsight.
But if you are a teacher, you have no reason to investigate. You are getting free points. And it's in fact true that you do want people to learn. So, this meme probably did originate in the teacher circle. Where it has potential to get shared without getting beaten down.
What are the fallacies though? Here is the one I can identify:
The type error of comparing desired "requirements" with desired "outcomes". "A teacher wants you to learn" is a specification of the teacher-function's desired outcome. On the other hand, "your stuff to break down" is a desired requirement of a repairer. A repairer's desired outcome is "your stuff to work again". Generally, requirements are "bad" and outcomes are "good" because the function is transformation of "bad" to "good".
Any function can replace a teacher here to make it look like the only good one.
So, will everything be alright if you don't make the type error and only compare requirements with requirements and outcomes with outcomes? No.
Let's introduce a thief in the meme:
A repairer wants your stuff to break down,
A doctor wants you to get ill,
A lawyer wants you to get in conflicts,
A farmer wants you to be hungry,
A teacher wants you to be knowledge-less,
But there is only a thief who wants you to be rich.
Here, there is no type error. Only requirements are being compared.
But obviously this is not right. Thieves are bad. You know that.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Mar 23, 2024 • 12min
EA - What we fund, #1: We fund many opportunities outside our top charities by GiveWell
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What we fund, #1: We fund many opportunities outside our top charities, published by GiveWell on March 23, 2024 on The Effective Altruism Forum.
Author: Isabel Arjmand
This post is the fourth in a multi-part series, covering how GiveWell works and what we fund. We'll add links to the later posts here as they're published. Through these posts, we hope to give a better understanding of what our research looks like and how we make decisions.
How we work, #1: Cost-effectiveness is generally the most important factor in our recommendations
How we work, #2: We look at specific opportunities, not just general interventions
How we work, #3: Our analyses involve judgment calls
GiveWell aims to find and fund programs that have the greatest impact on global well-being. We're open to funding whichever global health and development opportunities seem most cost-effective. So while our
top charities list is still what we're best known for, it's only part of our impact; we also dedicate substantial funding and research effort to opportunities beyond top charities.
In 2022, 71% of the funds we directed supported our four current top charities, and 29% were directed to other programs.[1] However, most of our research capacity goes toward programs other than our top charities. This is because (a) most programs we direct funding to aren't top charities (we have four top charities but directed funding to about 40 other grantees in 2022),[2] and (b) it requires more effort to investigate a program we know less deeply.
In this post we'll share:
The overall scope of our grantmaking
Why we dedicate funding and research capacity to programs other than our top charities
The types of opportunities we support
You can support the full range of our grantmaking via the
All Grants Fund.
The scope of our work
Our research is focused on global health and development programs. We believe this is an area in which donations can be especially cost-effective.
Much of our funding goes to health programs, especially programs that reduce deaths from infectious diseases among young children living in low- and middle-income countries. We've found donations to such programs can make a particularly large impact; babies and young children are much more susceptible to infectious disease than adults, and diseases like malaria, diarrhea, and pneumonia can be prevented fairly cheaply.
The evidence for health programs is often strong relative to other areas, and it's more likely to generalize from one context to another.
While the majority of our funding goes to programs that support child health, that isn't our exclusive focus. For example, we also consider programs that aim to improve household income or consumption, such as
One Acre Fund's tree program and
Bridges to Prosperity. In addition, many of the child health programs we support may also have other benefits, like reducing medical costs, increasing later-in-life income, or improving adult health.
Why make grants outside top charities?
Our top charities continue to be our top recommendations for donors who prioritize confidence in their giving. They have strong track records of delivering programs at large scale and the capacity to absorb more funding, and we've followed their work for years.
We have such
strict criteria for top charities that we'd be limiting our impact if we only recommended funding to them. Some highly cost-effective programs might not meet those criteria, and we don't want to artificially constrain the impact we can have. For example,
r.i.ce.'s kangaroo mother care program isn't operating at a large enough scale and we haven't funded it long enough for it to be a top charity. However, we think the program will cost-effectively improve the lives of low-birthweight babies, so we made a grant to support it.
Initially, our non-top charity grant...

Mar 23, 2024 • 1h 46min
LW - AI #56: Blackwell That Ends Well by Zvi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #56: Blackwell That Ends Well, published by Zvi on March 23, 2024 on LessWrong.
Hopefully, anyway. Nvidia has a new chip.
Also Altman has a new interview.
And most of Inflection has new offices inside Microsoft.
Table of Contents
Introduction.
Table of Contents.
Language Models Offer Mundane Utility. Open the book.
Clauding Along. Claude continues to impress.
Language Models Don't Offer Mundane Utility. What are you looking for?
Fun With Image Generation. Stable Diffusion 3 paper.
Deepfaketown and Botpocalypse Soon. Jesus Christ.
They Took Our Jobs. Noah Smith has his worst take amd commits to the bit.
Generative AI in Games. What are the important dangers?
Get Involved. EU AI office, IFP, Anthropic.
Introducing. WorldSim. The rabbit hole goes deep, if you want that.
Grok the Grok. Weights are out. Doesn't seem like it matters much.
New Nivida Chip. Who dis?
Inflection Becomes Microsoft AI. Why buy companies when you don't have to?
In Other AI News. Lots of other stuff as well.
Wait Till Next Year. OpenAI employees talk great expectations a year after GPT-4.
Quiet Speculations. Driving cars is hard. Is it this hard?
The Quest for Sane Regulation. Take back control.
The Week in Audio. Sam Altman on Lex Fridman. Will share notes in other post.
Rhetorical Innovation. If you want to warn of danger, also say what is safe.
Read the Roon. What does it all add up to?
Pick Up the Phone. More good international dialogue on AI safety.
Aligning a Smarter Than Human Intelligence is Difficult. Where does safety lie?
Polls Show People Are Worried About AI. This week's is from AIPI.
Other People Are Not As Worried About AI Killing Everyone. Then there's why.
The Lighter Side. Everyone, reaping.
Language Models Offer Mundane Utility
Ethan Mollick on how he uses AI to aid his writing. The central theme is 'ask for suggestions in particular places where you are stuck' and that seems right for most purposes.
Sully is predictably impressed by Claude Haiku, says it offers great value and speed, and is really good with images and long context, suggests using it over GPT-3.5. He claims Cohere Command-R is the new RAG king, crushing it with citations and hasn't hallucinated once, while writing really well if it has context. And he thinks Hermes 2 Pro is 'cracked for agentic function calling,' better for recursive calling than GPT-4, but 4k token limit is an issue.
I believe his reports but also he always looks for the bright side.
Claude does acausal coordination. This was of course Easy Mode.
Claude also successfully solves counterfactual mugging when told it is a probability theorist, but not if it is not told this. Prompting is key. Of course, this also presumes that the user is telling the truth sufficiently often. One must always watch out for that other failure mode, and Claude does not consider the probability the user is lying.
Amr Awadallah notices self-evaluated reports that Cohere Command-R has a very low hallucination rate of 3.7%, below that of Claude Sonnet (6%) and Gemini Pro (4.8%), although GPT-3.5-Turbo is 3.5%.
From Claude 3, describe things at various levels of sophistication (here described as IQ levels, but domain knowledge seems more relevant to which one you will want in such spots). In this case they are describing SuperFocus.ai, which provides custom conversational AIs that claim to avoid hallucinations by drawing on a memory bank you maintain.
However, when looking at it, it seems like the 'IQ 115' and 'IQ 130' descriptions tell you everything you need to know, and the only advantage of the harder to parse 'IQ 145' is that it has a bunch of buzzwords and hype attached. The 'IQ 100' does simplify and drop information in order to be easier to understand, but if you know a lot about AI you can figure out what it is dropping very easily.
Figure out whether a resume ...

Mar 22, 2024 • 4min
AF - Dangers of Closed-Loop AI by Gordon Seidoh Worley
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Dangers of Closed-Loop AI, published by Gordon Seidoh Worley on March 22, 2024 on The AI Alignment Forum.
In control theory, an open-loop (or non-feedback) system is one where inputs are independent of outputs. A closed-loop (or feedback) system is one where outputs are input back into the system.
In theory, open-loop systems exist. In reality, no system is truly open-loop because systems are embedded in the physical world where isolation of inputs from outputs cannot be guaranteed. Yet in practice we can build systems that are effectively open-loop by making them ignore weak and unexpected input signals.
Open-loop systems execute plans, but they definitionally can't change their plans based on the results of their actions. An open-loop system can be designed or trained to be good at achieving a goal, but it can't actually do any optimization itself. This ensures that some other system, like a human, must be in the loop to make it better at achieving its goals.
A closed-loop system has the potential to self-optimize because it can observe how effective its actions are and change its behavior based on those observations. For example, an open-loop paperclip-making-machine can't make itself better at making paperclips if it notices it's not producing as many paperclips as possible. A closed-loop paperclip-making-machine can, assuming its designed with circuits that allow it to respond to the feedback in a useful way.
AIs are control systems, and thus can be either open- or close-loop. I posit that open-loop AIs are less likely to pose an existential threat than closed-loop AIs. Why? Because open-loop AIs require someone to make them better, and that creates an opportunity for a human to apply judgement based on what they care about. For comparison, a nuclear dead hand device is potentially much more dangerous than a nuclear response system where a human must make the final decision to launch.
This suggests a simple policy to reduce existential risks from AI: restrict the creation of closed-loop AI. That is, restrict the right to produce AI that can modify its behavior (e.g. self-improve) without going through a training process with a human in the loop.
There are several obvious problems with this proposal:
No system is truly open-loop.
A closed-loop system can easily be created by combining 2 or more open-loop systems into a single system.
Systems may look like they are open-loop at one level of abstraction but really be closed-loop at another (e.g. an LLM that doesn't modify its model, but does use memory/context to modify its behavior).
Closed-loop AIs can easily masquerade as open-loop AIs until they've already optimized towards their target enough to be uncontrollable.
Open-loop AIs are still going to be improved. They're part of closed-loop systems with a human in the loop, and can still become dangerous maximizers.
Despite these issues, I still think that, if I were designing a policy to regulate the development of AI, I would include something to place limits on closed-loop AI. A likely form would be a moratorium on autonomous systems that don't include a human in the loop, and especially a moratorium on AIs that are used to either improve themselves or train other AIs.
I don't expect such a moratorium to eliminate existential risks from AI, but I do think it could meaningfully reduce the risk of run-away scenarios where humans get cut out before we have a chance to apply our judgement to prevent undesirable outcomes. If I had to put a number on it, such a moratorium perhaps makes us 20% safer.
Author's note: None of this is especially original. I've been saying some version of what's in this post for 10 years to people, but I realized I've never written it down. Most similar arguments I've seen don't use the generic language of control theory and instead are ex...

Mar 22, 2024 • 48min
AF - Video and transcript of presentation on Scheming AIs by Joe Carlsmith
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Video and transcript of presentation on Scheming AIs, published by Joe Carlsmith on March 22, 2024 on The AI Alignment Forum.
(Cross-posted from my website.)
This is the video and transcript for a ~45-minutes talk I gave in February 2024 about my report "
Scheming AIs: Will AIs fake alignment during training in order to get power?" (slides available here). See also
this podcast for a more conversational overview of similar content.
Main talk
Okay, hi everybody. Thanks for coming. So this talk is about whether advanced AIs will fake alignment during training in order to get power later. This is a behavior that I'm calling scheming, it's also often called deceptive alignment.
Preliminaries
So I'm going to start with a few preliminaries.
As was mentioned, this talk is based on a public report called
Scheming AIs available on arXiv. There's also an
audio version on my podcast, Joe Carlsmith Audio, if you prefer that. So I encourage you to check that out. I'm going to try to cover many of the main points here, but the report is pretty long and so I'm not going to go into that much depth on any given point. So if you want more depth or you have burning objections or clarifications or you want to work on it, I encourage you to check out the full report.
Second, I'm going to assume familiarity with:
The basic argument for existential risk from misaligned AI. That is roughly the thought that advanced AI agents with goals that conflict with human goals would have instrumental incentives to seek power over humans, and potentially to disempower humanity entirely, an event I'll call takeover, AI takeover. If that story didn't sound familiar to you, I do have some other work on this topic which I would encourage you to check out.
But some of the talk itself might make a little less sense, so I apologize for that.
I'm also going to assume familiarity with a basic picture of how contemporary machine learning works. So very roughly, imagine models with lots of parameters that are being updated via stochastic gradient descent (SGD), such that they perform better according to some feedback signal. And in particular, I'm often going to be imagining a default baseline training process that resembles somewhat what happens with current language models.
Namely, a pre-training phase in which a model is trained on some combination of internet text and potentially other data. And then a fine-tuning phase, in which it's trained via some combination of maybe imitation learning or reinforcement learning. So that's the baseline picture I'm going to be imagining. There are other paradigms in which questions about scheming will arise, but I'm going to focus on this one.
I'm also going to condition on the AIs I discuss being "goal directed". And what I mean by that is that these AIs are well understood and well predicted by thinking of them as making plans on the basis of models of the world in pursuit of objectives. This is not an innocuous assumption. And in fact I think confusions in this vicinity are one of my most salient candidates for how the AI alignment discourse as a whole might be substantially off base.
But I want to set aside some questions about whether AIs are well understood as goal directed at all, from questions about conditional on them being goal directed, whether they will be schemers. So if your objection to scheming is, "I don't think AIs will have goals at all.", then that's a perfectly fine objection, especially in particular training paradigms, but it's not the objection I'm going to focus on here.
And I also do have in the other work, on misalignment in general, some thoughts about why we might expect goal directness of this kind. In particular, I think goal directness is useful for many tasks.
Finally, I want to flag that in addition to potentially posing existential risks...

Mar 22, 2024 • 9min
EA - Posts from 2023 you thought were valuable (and underrated) by Lizka
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Posts from 2023 you thought were valuable (and underrated), published by Lizka on March 22, 2024 on The Effective Altruism Forum.
I'm sharing:
a list of posts that were marked as "most valuable" by the most people (who marked posts as "most valuable" in Forum Wrapped 2023), and
a list of posts that were most underrated by karma relative to the number of "most valuable" votes.
These lists are not objective or "true" collections of the most valuable and underrated posts from 2023. Relatively few people marked posts as "most valuable," and I imagine that those who did, didn't do it very carefully or comprehensively. And there are various factors that would bias the results (like the fact that we ordered posts by upvotes and karma on the "Wrapped" page, people probably remember more recent posts more, etc.).
Consider commenting if there are other posts you would like to highlight!
This post is almost identical to last year's post: Posts from 2022 you thought were valuable (or underrated).
Which posts did the most Forum users think were "most valuable"?
Note that we ordered posts in "Wrapped" by your own votes, followed by karma score, meaning higher-karma posts probably got more "most valuable" votes.
"Most valuable" count
Author(s)[1]
Title
28
@Peter Wildeford
EA is three radical ideas I want to protect
28
@Ariel Simnegar
Open Phil Should Allocate Most Neartermist Funding to Animal Welfare
24
@AGB
10 years of Earning to Give
14
@Bob Fischer
Rethink Priorities' Welfare Range Estimates
13
@Rockwell
On Living Without Idols
12
@Nick Whitaker
The EA community does not own its donors' money
11
@Jakub Stencel
EA's success no one cares about
11
@tmychow, @basil.halperin , @J. Zachary Mazlish
AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years
10
@Luke Freeman
We can all help solve funding constraints. What stops us?
10
@zdgroff
How Long Do Policy Changes Matter? New Paper
9
@kyle_fish
Net global welfare may be negative and declining
9
@ConcernedEAs
Doing EA Better
7
@Lucretia
Why I Spoke to TIME Magazine, and My Experience as a Female AI Researcher in Silicon Valley
7
@Michelle_Hutchinson
Why I love effective altruism
7
@JamesSnowden
Why I don't agree with HLI's estimate of household spillovers from therapy
7
@Ren Ryba
Reminding myself just how awful pain can get (plus, an experiment on myself)
7
@Amy Labenz
EA is good, actually
7
@Ben_West
Third Wave Effective Altruism
6
@Ben Pace
Sharing Information About Nonlinear
6
@Zachary Robinson
EV updates: FTX settlement and the future of EV
6
@NunoSempere
My highly personal skepticism braindump on existential risk from artificial intelligence.
6
@leopold
Nobody's on the ball on AGI alignment
6
@saulius
Why I No Longer Prioritize Wild Animal Welfare
6
@Elika
Advice on communicating in and around the biosecurity policy community
6
@Derek Shiller, @Bernardo Baron, @Chase Carter, @Agustín Covarrubias, @Marcus_A_Davis, @MichaelDickens, @Laura Duffy, @Peter Wildeford
Rethink Priorities' Cross-Cause Cost-Effectiveness Model: Introduction and Overview
6
@Karthik Tadepalli
What do we really know about growth in LMICs? (Part 1: sectoral transformation)
6
@Nora Belrose
AI Pause Will Likely Backfire
Which were most underrated by karma?
I looked at the number of people who had marked something as "most valuable," and then divided by [karma score]^1.5. (This is what I did last year, too.[2]) We got more ratings this year, so my cutoff was at least three votes this year (vs. two last year).
"Most valuable" count
Author(s)
Title
3
@RobBensinger er
The basic reasons I expect AGI ruin
3
@Zach Stein-Perlman
AI policy ideas: Reading list
3
@JoelMcGuire, @Samuel Dupret, @Ryan Dwyer, @MichaelPlant, @mklapow, @Happier Lives Institute
Talking through depression: The cost-effectiveness of psychotherapy in LMICs, revised and...


