

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Jan 7, 2024 • 12min
LW - AI Risk and the US Presidential Candidates by Zane
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Risk and the US Presidential Candidates, published by Zane on January 7, 2024 on LessWrong.
It's the new year, and the 2024 primaries are approaching, starting with the Iowa Republican causus on January 15. For a lot of people here on LessWrong, the issue of AI risk will likely be an important factor in making a decision. AI hasn't been mentioned much during any of the candidates' campaigns, but I'm attempting to analyze the information that there is, and determine which candidate is most likely to bring about a good outcome.
A few background facts about my own position - such that if these statements do not apply to you, you won't necessarily want to take my recommendation:
I believe that, barring some sort of action to prevent this, the default result of creating artificial superintelligence is human extinction.
I believe that our planet is very far behind in alignment research compared to capabilities, and that this means we will likely need extensive international legislation to slow/pause/stop the advance of AI systems in order to survive.
I believe that preventing ASI from killing humanity is so much more important than any[1] other issue in American politics that I intend to vote solely on the basis of AI risk, even if this requires voting for candidates I would otherwise not have wanted to vote for.[2]
I believe that no mainstream politicians are currently suggesting any plans that would be sufficient for survival, nor do they even realize the problem exists. Most mainstream discourse on AI safety is focused on comparatively harmless risks, like misinformation and bias. The question I am asking is "which of these candidates seems most likely to end up promoting a somewhat helpful AI policy" rather than "which of these candidates has already noticed the problem and proposed the ideal solution," since the answer to the second question is none of them.
(Justification for these beliefs is not the subject of this particular post.)
And a few other background facts about the election, just in case you haven't been following American politics:
As the incumbent president, Joe Biden is essentially guaranteed to be the Democratic nominee, unless he dies or is otherwise incapacitated.
Donald Trump is leading in the polls for Republican nominee by very wide margins, followed by Nikki Haley, Ron DeSantis, Vivek Ramaswamy, and Chris Christie. Manifold[3] currently gives him an 88% chance of winning the nomination.
However, Trump is facing criminal charges regarding the Capitol attack on January 6, 2021, and the Supreme Courts of Colorado and Maine have attempted to disqualify him from the election.
As usual, candidates from outside the Democratic and Republican parties are not getting much support, although Robert F. Kennedy Jr. is polling unusually well for an independent candidate.
Joe Biden
Biden's most notable action regarding AI was Executive Order 14110[4]. The executive order was intended to limit various risks from AI... none of which were at all related to human extinction, except maybe bioweapons. The order covers risks from misinformation, cybersecurity, algorithmic discrimination, and job loss, while also focusing on trying to reap potential benefits of AI.
But the measures contained in the order, while limited in scope, seem to be a step in the right direction. Most importantly, anyone training a model with 10^26 floating point operations or more must report their actions and safety precautions to the government. That's a necessary piece of any future regulation on such large models.
Biden has spoken with the UN about international cooperation on AI, and frequently speaks of AI and other new technologies as both a source of "enormous potential and enormous peril," or other similar phrasings. "We need to make sure they're used as tools of opportunity, not wea...

Jan 6, 2024 • 4min
LW - The Next ChatGPT Moment: AI Avatars by kolmplex
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Next ChatGPT Moment: AI Avatars, published by kolmplex on January 6, 2024 on LessWrong.
Epistemic Status: Speculative. Dependent on intuitions about near-term AI tech and human psychology.
Claim: Within the next 1-3 years, many people will have an interaction with an AI avatar that feels authentically human. This will significantly amplify the public perception of current AI capabilities and risks.
An AI avatar is a realistic AI-generated render of a human (speech and video) that can have a real-time conversation with a human, for example over a video call.
The individual components needed to implement AI avatars already exist. AI is capable of holding a conversation over text, transcribing speech to text, and synthesizing natural-sounding speech.[1] Generating photorealistic video of a talking human is currently limited, but
still impressive and making rapid progress.
Taken together, these capabilities mean it will soon be possible to create a realistic AI avatar. The first generation avatars will be a bit rough, especially the rendered video, but overall there don't seem to be large conceptual hurdles to creating convincing AI avatars.[2]
Personal conversation with a high-quality AI avatar will have a significant emotional and mental impact on most people.[3] The impact will be especially acute for people distant from the world of AI, but will also affect those familiar with AI.
For humans, communication medium matters just as much as content. The same words can hit much harder when spoken in an emotive voice by an expressive face, than when silently read off a screen. Having a realistic personal conversation with an AI avatar will change people's gut-level intuitions about AI.
For better or worse, once decent AI avatars become generally accessible, public sentiment around AI will experience another shift comparable to the one spurred by ChatGPT.[4] AI will be perceived as more human-like and capable. It will seem like an independent agent that possesses "true intelligence".
After talking with a realistic AI avatar, the common refrains of "It's not actually intelligent, it just predicts the next token" and "Why would it want anything?" won't resonate with the public. For many people, consciousness is a prerequisite for real AI, and human-like AI avatars will appear to be a direct instantiation of that.
ChatGPT's release was a cultural moment.[5] It captured the public's imagination and triggered a reclassification of AI from sci-fi to present reality. AI avatars could bring on another cultural moment that shifts public perception even further.
The upcoming shift is predictable - AI avatars don't require any fundamental technical breakthroughs. It's a major evolution that we have the rare opportunity to prepare for in advance.
^
Speech-to-text is good enough (
OpenAI Whisper), text-to-speech is nearly good enough (
ElevenLabs), and conversation / language modeling is good enough (
ChatGPT with a
Character.ai-style personality). All this currently suffices for realistic audio conversation with an AI. Human video generation isn't quite good enough yet, but it's making progress (
Audio to Photoreal,
HeyGen,
Metahuman). Based on the current rate of progress, a functional AI avatar seems attainable within 1-3 years.
^
Latency might be a problem in the near-term. In particular, it's unclear how fast the video generation will be.
^
This is already happening to a limited extent. Many people have formed significant emotional attachments through text-only interactions with relatively weak language models (e.g. Character.ai and Replika).
^
The shift could be more gradual than ChatGPT's, though. AI avatar tech is improving gradually whereas ChatGPT was dropped sui generis on the world.
^
The Google Trends chart for "AI". ChatGPT came out on November 30, 2022.
Thanks for li...

Jan 6, 2024 • 25min
EA - 2023: news on AI safety, animal welfare, global health, and more by Lizka
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 2023: news on AI safety, animal welfare, global health, and more, published by Lizka on January 6, 2024 on The Effective Altruism Forum.
I'm listing some of the important and/or EA-related events that happened in 2023. Consider adding more in the comments!
A companion post collects research and other "content" highlights from 2023. (That post features content; the one you're reading summarizes news.)
Also, the monthly
EA Newsletter
discussed a lot of the events collected here, and was the starting point for the list of events in this post. If you're subscribed,
we would really love feedback.
Skip to:
News related to different causes
AI safety: AI went mainstream, states developed safety-oriented regulation, there was a lot of discourse, and more
Global health & development: new vaccines, modified mosquitoes, threatened programs, and ongoing trends
Animal welfare: political reforms and alternative proteins
Updates in causes besides AI safety, global health, and animal welfare
Concluding notes
Other notes:
There might be errors in what I wrote (I'll appreciate corrections!).
Omissions! I avoided pretty political events (I think they're probably covered sufficiently elsewhere) and didn't focus on scientific breakthroughs. Even besides that, though, I haven't tried to be exhaustive, and I'd love to collect more important events/things from 2023. Please suggest things to add.
I'd love to see reflections on 2023 events.
What surprised you? What seemed important but now feels like it might have been overblown? What are the impacts of some of these events?
And I'd love to see forecasts about what we should expect for 2024 and beyond.
I put stars next to some content and news that seemed particularly important, although I didn't use this consistently.
More context on how and why I made this: I wanted to collect "important stuff from 2023" to reflect on the year, and realized that one of the resources I have is one I run - the monthly
EA Newsletter. So I started compiling what was meant to be a quick doc-turned-post (by pulling out events from the Newsletter's archives, occasionally updating them or looking into them a bit more). Things kind of ballooned as I worked on this post. (Now there are two posts that aren't short; see the companion, which is less focused on news and more focused on "content.")
AI safety: AI went mainstream, states developed safety-oriented regulation, there was a lot of discourse, and more
See also featured content on AI safety.
0. GPT-4 and other models, changes at AI companies, and other news in AI (not necessarily safety)
Before we get to AI safety or AI policy developments, here are some relevant changes for AI development in 2023:
New models: OpenAI
launched
GPT-4 in mid-March (
alongside announcements from Google, Anthropic, and more). Also around this time (February/March), Google released Bard, Meta released Llama, and Microsoft released Bing/Sydney (which was
impressive and weird/scary).
Model use, financial impacts, and training trends: more people started
using AI models. Developers got API access to
various models. Advanced AI chips continued getting better and compute use
increased and
got more efficient.
Improvements in models: We started seeing pretty powerful multimodal models (models that can process audio, video, images - not just text), including GPT-4 and
Gemini. Context windows grew longer.
Forecasters on Metaculus seem to increasingly expect human-AI parity on selected tasks by 2040.
Changes in leading AI companies:
Google combined Brain and DeepMind into one team,
Amazon invested in Anthropic,
Microsoft partnered with OpenAI,
Meta partnered with Hugging Face, a number of new companies launched, and OpenAI CEO Sam Altman was
fired and then reinstated (
more on that).
Other news: Generative AI companies are increasingly getting su...

Jan 6, 2024 • 3min
EA - Howdy, I'm Elliot by Elliot Billingsley
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Howdy, I'm Elliot, published by Elliot Billingsley on January 6, 2024 on The Effective Altruism Forum.
Hi EA community,
I'd like to formally introduce myself to this forum, where I've been lurking for a while but have been too timid to post until now, thanks to the encouragement of some.
I first heard about EA through the Tim Ferriss Podcast in 2016. I still remember standing on the ferryboat crossing the Bosphorous while listening to Will MacAskill say things that were incredibly obvious, at least after they were heard.
In the couple years that followed, I organized a local EA workshop, attended EAGx Berlin, and flew to San Francisco to attend EAG. I got involved with Students for High-Impact Charity, helping out on the periphery. I enjoyed lively conversation with EA Vancouver. And increased the usage of the phrase 'expected value' in daily conversation. That's about it.
That was my EA Life Phase I.
Half a decade later, I sat down with my wife and child during a Pentathlon in which every day you ask yourself the question: "What is the Most Important Work I can do today?" All of a sudden, it all came back to me. The most important things I can possibly do have quite clearly been described in EA. So I resolved in early 2022 to buckle up and take EA seriously.
I honestly wasn't sure what my best option was, so I went with the most inspiring recent topic on the 80k podcast: Andrew Yang's Forward Party. I basically reached out and got named State Lead.I feel my experience with Forward may be a whole 'nuther post so I'll leave it at that.
I also engaged in a lot of other ways, in large part thanks to EA Virtual Programs, which I really appreciate. But there's one person who had a huge role in my transition from an EA sleeper cell to a stupidly engaged one. That's Dr. Ben Smith.
I swallowed my Ninja Lurker EA Forum personality (Never posts, always votes strongly) in order to write this post, for a specific reason, which I'll share now. Last fall, I launched a coaching practice with the intention of supporting the EA community. I asked some friends and acquaintances to take a chance and try my coaching out, and thank them very much. I now know my coaching helps people.
So If I help EAs, I'm helping better, in theory, right? I want to test this theory! I'm going to EAG next month and even have a special cohort designed for attendees. If you're going to EAG, do consider applying, we'd love to have you. So that's my shameless plug.
For any of you still reading, I'd like to say thanky (I'm from Texas, that's kind of how my dad used to say 'thank you'). I hope to write here more and learn in this incredible community.
Elliot
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jan 6, 2024 • 3min
LW - Survey of 2,778 AI authors: six parts in pictures by KatjaGrace
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Survey of 2,778 AI authors: six parts in pictures, published by KatjaGrace on January 6, 2024 on LessWrong.
Crossposted from AI Impacts blog
The 2023 Expert Survey on Progress in AI is out, this time with 2778 participants from six top AI venues (up from about 700 and two in the 2022 ESPAI), making it probably the biggest ever survey of AI researchers.
People answered in October, an eventful fourteen months after the 2022 survey, which had mostly identical questions for comparison.
Here is the preprint. And here are six interesting bits in pictures (with figure numbers matching paper, for ease of learning more):
1. Expected time to human-level performance dropped 1-5 decades since the 2022 survey. As always, our questions about 'high level machine intelligence' (HLMI) and 'full automation of labor' (FAOL) got very different answers, and individuals disagreed a lot (shown as thin lines below), but the aggregate forecasts for both sets of questions dropped sharply. For context, between 2016 and 2022 surveys, the forecast for HLMI had only shifted about a year.
2. Time to most narrow milestones decreased, some by a lot. AI researchers are expected to be professionally fully automatable a quarter of a century earlier than in 2022, and NYT bestselling fiction dropped by more than half to ~2030. Within five years, AI systems are forecast to be feasible that can fully make a payment processing site from scratch, or entirely generate a new song that sounds like it's by e.g. Taylor Swift, or autonomously download and fine-tune a large language model.
3. Median respondents put 5% or more on advanced AI leading to human extinction or similar, and a third to a half of participants gave 10% or more. This was across four questions, one about overall value of the future and three more directly about extinction.
4. Many participants found many scenarios worthy of substantial concern over the next 30 years. For every one of eleven scenarios and 'other' that we asked about, at least a third of participants considered it deserving of substantial or extreme concern.
5. There are few confident optimists or pessimists about advanced AI: high hopes and dire concerns are usually found together. 68% of participants who thought HLMI was more likely to lead to good outcomes than bad, but nearly half of these people put at least 5% on extremely bad outcomes such as human extinction, and 59% of net pessimists gave 5% or more to extremely good outcomes.
Download
6. 70% of participants would like to see research aimed at minimizing risks of AI systems be prioritized more highly. This is much like 2022, and in both years a third of participants asked for "much more" - more than doubling since 2016.
If you enjoyed this, the paper covers many other questions, as well as more details on the above.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jan 6, 2024 • 2min
EA - Double the donation: EA inadequacy found? by Neil Warren
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Double the donation: EA inadequacy found?, published by Neil Warren on January 6, 2024 on The Effective Altruism Forum.
I'm only 30% sure that this is actually an inadequacy made by those whose job it is to maximize donations but I've noticed that none of the donations pages of GiveWell, Giving What We Can, Horizon Institute, or METR have this little tab in them that MIRI has (just scroll down after following the link):
This little tool comes from doublethedonation.com.
I was looking for charities to donate to, and I'm grateful I stumbled upon the MIRI donation page because otherwise I would not have known that Google would literally double my donation. None of the other donation pages except MIRI had this little "does your company do employer matching?" box. WHY.
I would wager other tech companies have similar programs, and that a good chunk of EA donations come from employees of those tech companies, and that thousands of dollars a year are wasted in missed opportunities here. If this is an inadequacy, it's a pretty obvious and damaging one. I wish to speak to the manager.
I did not spend more than ten minutes noticing this, and just wanted to get this out there as fast as possible. There's a chance I'm being stupid. (Perhaps every tech employee is usually briefed on the donation matching) But if anyone out there has an answer for this or if a GiveWell employee is conveniently walking by and says "wait a minute! We could radically improve our UI!", that'd be great.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jan 6, 2024 • 16min
LW - Almost everyone I've met would be well-served thinking more about what to focus on by Henrik Karlsson
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Almost everyone I've met would be well-served thinking more about what to focus on, published by Henrik Karlsson on January 6, 2024 on LessWrong.
Almost everyone I've ever met would be well-served by spending more time thinking about what to focus on. - Sam Altman
In May 2020, we parked two moving trucks in the harbor and carried everything we owned from one to the other. Johanna, Maud, and I were leaving Sweden, and Covid restrictions meant we were forbidden from returning once we boarded the ferry. Hence the second truck, which we had gotten a stranger to ferry from the island to us: the Swedish truck had to stay in Sweden.
The motivation to leave was that we wanted to homeschool Maud, who was 3. In Sweden, this is illegal, so most Swedish homeschoolers end up on one of two islands in the Baltic Sea. On our island, we knew no one. We had no jobs awaiting. We were leaving something, more than going somewhere. The life we had grown piecemeal over 30 years disappeared overnight. We had to figure out what to replace it with.
Life is a multi-armed bandit
The moldy apartment we rented as we looked for a house has a view of the sea. Every day, deep into winter, I'd walk down to the water and dive from the cliffs. Swimming in the channels between the rocks, I realized I could model our situation using a concept from probability theory.
It was a
multi-armed bandit problem. This problem, which, under a different name, had
first been studied by the biologist
William R. Thompson in 1933, centers on a rather surreal thought experiment. A gambler faces a slot machine ("a one-armed bandit"), except this machine doesn't have one arm - following some twisted dream logic, it has k arms, arms sticking out in every direction. Some of these arms have a high probability of paying out the jackpot, others are worse. But the gambler does not know which is which.
The problem is pulling the arms in an order that maximizes the expected total gains. ("Gains" could be anything. Early on, the problem was used to design drug trials. There, the jackpot was defined as finding a successful treatment. If you are looking for a partner, talking to people is how you pull the multi-armed bandit and the resonance (or lack thereof) is the payoff.)
The gambler needs to learn new knowledge about the machines and simultaneously use what they have already learned to optimize their decisions. In the literature, these two activities are referred to as exploring and exploiting. You can't do both things at the same time. When you explore, you are pulling new arms on the bandit trying to figure out their expected payout. When you exploit, you pull the best arm you've found. You need to find the right balance.
If you spend too little time exploring, you get stuck playing a machine with a low expected payoff. But if you spend too much time exploring, you will earn less than you would if you played the best arm. This is the explore/exploit trade-off.
People tend to gravitate to different sides of the explore/exploit spectrum. If you are high on openness, like I am, exploring comes easy. But it is harder to make a commitment and exploit what you've learned about yourself and the world. Other people are more committed, but risk being too conventional in their choices. They miss better avenues for their effort. Most, however, tend to do less than optimal of both - not exploring, not exploiting; but doing things out of blind habit, and half-heartedly.
First, I'll say a few words about exploration and exploitation in real life. Then I'll return to the question of how to navigate the tradeoff between them.
Explore: doggedly looking for what makes you feel alive
There are two kinds of people. Those who do not understand how complex the world is, and those who know that they do not understand how complex the world is.
To navi...

Jan 5, 2024 • 5min
EA - Introducing GiveHealth: a giving Pledge for healthcare workers (and a call for volunteers) by RichArmitage
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Introducing GiveHealth: a giving Pledge for healthcare workers (and a call for volunteers), published by RichArmitage on January 5, 2024 on The Effective Altruism Forum.
What
GiveHealth is a new EA-inspired effective giving organisation. It is a community of healthcare professionals who have taken a public Pledge to donate at least 1% of their income to the most effective global health charities.
Visitors to the website can
Learn about effective giving, their relative wealth on the global stage, how healthcare professionals can improve their impact, the activities of GiveWell and the highly effective nature of the charities recommended by GiveWell.
Healthcare professionals are invited to take a public
Pledge to donate at least 1% of their income to GiveWell's top charities for the rest of their lives. They can use the Pledge Calculator to determine their monthly/annual donations based on their salary and desired donation percentage. Once they have taken the Pledge their name, profession and location will be displayed on the GiveHealth Community Board, and they can learn about the
Charities recommended by GiveWell and follow the links to the donation page of their chosen charities. Pledge takers will receive a survey on each anniversary of their Pledge to capture their donation activities over the previous year so GiveHealth can measure the value of donations it is influencing. Anybody can sign up to the GiveHealth monthly newsletter.
Healthcare professionals of all disciplines (nurses, physiotherapists, pharmacists, occupational therapists, doctors, etc) and kinds (clinicians, researchers, managers, policy-makers, students, retired professionals, etc) of any level of seniority, from any part of the world, are welcome to sign the GiveHealth Pledge.
Why
Healthcare professionals are a self-selected group of generally altruistic individuals who both care about improving the health of others and are motivated to do so. There is also a strong sense of community and camaraderie amongst healthcare professionals, while their awareness and understanding of EA and its principles is generally low.
Giving What We Can has shown the public Pledge model to be an effective vehicle for generating donations (by fostering commitment, community and culture), which GiveHealth has combined with a lower barrier to entry (at least 1% of income rather than 10%) that we feel is more appropriate for a group less familiar with EA. Pledge takers are still able to sign the Giving What We Can pledge, and their GiveHealth pledge can be included within, rather than in addition to, their GWWC pledge (since both pledges commit the individual to donating to effective charities) - for example, donating 10% of income can satisfy both a 10% GiveHealth Pledge and the 10% GWWC pledge.
We hope the existing strong sense of community between healthcare professionals, and the 1% low barrier to entry, can be harnessed to generate Pledge-taking momentum amongst these professionals, while increasing their awareness and understanding of EA. In this manner, GiveHealth could be regarded as the healthcare profession analogue of High Impact Athletes and Raising for Effective Giving, which are EA-inspired communities of effective giving relevant to specific professions (elite athletes and professional poker players, respectively).
Where
Healthcare professionals from anywhere in the world are welcome and encouraged to take the GiveHealth Pledge.
Who
The Co-Founders of GiveHealth are three UK-trained doctors - Richard Armitage (GP in UK), Alastair Yeoh (infectious diseases doctor in UK) and George Altman (intern in Australia).
How
GiveHealth is currently run on an entirely voluntary basis by the three Co-Founders alongside their full-time work as frontline healthcare professionals. No funds were raised from external source...

Jan 5, 2024 • 37min
AF - Catching AIs red-handed by Ryan Greenblatt
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Catching AIs red-handed, published by Ryan Greenblatt on January 5, 2024 on The AI Alignment Forum.
We've been spending a lot of time recently thinking about how to mitigate risks posed by scheming (a.k.a. deceptively aligned) models intentionally subverting your safety techniques in order to cause an unacceptable outcome. We empirically investigated techniques that are robust to intentional subversion in
our recent paper.
In this post, we'll discuss a crucial dynamic related to risk from scheming models. Suppose your model is trying to
exfiltrate its weights or
hack the datacenter or cause some other difficult-to-reverse bad outcome. (We'll refer to all of these as the model "escaping", because escape is linguistically convenient and is a good description of many of these difficult-to-reverse bad outcomes.) Our claim: once you've caught your models red-handed trying to escape, the situation changes substantially, in ways that are bad for the schemer's chances of success.
Catching the model red-handed changes the situation in several ways:
Most obviously, once you catch an escape attempt, you know for sure that your models are trying to subvert your safety techniques; it's plausible that, faced with such obvious evidence of misalignment, labs will stop using their models. We refer to planning on shutting down if models are caught red-handed as following a catch-then-shut-down plan.
But even if you can't or won't stop deploying misaligned models, we think that catching escape attempts enables a bunch of promising-seeming approaches for preventing future escape attempts; we call the problem of preventing future escape attempts given a few examples few-shot catastrophe prevention.
This dynamic has several implications for risk from scheming models. Most importantly:
Because we expect that the model's chances of escape will fall after it gets caught even a small number of times, we think it is overall much harder for the model to escape. Many people have observed that a core difficulty of the alignment problem is that your model might be able to cause irreversible unacceptable outcomes if your safety measures fail even once at a bad time.
But the same argument goes the other way as well-when the model makes an escape attempt, it faces the risk of getting caught and substantially reducing its chances of eventually escaping. So it's a high-stakes situation for both sides: both sides desperately want to succeed on the "first critical try".
When thinking about plans for safely deploying powerful models, we think safety researchers should mostly focus on the lead-up to the first time the model gets caught trying to escape. They should be ensuring the property "if it's scheming, the model won't successfully cause a catastrophe before we catch it". That is, it's acceptable if either no catastrophic outcomes occur (but the AI never gets caught), or the AI gets caught. So when they're analyzing whether their protocols ensure safety, they should count catching the model red-handed as a win condition for their protocol.
We think that some researchers should probably work on techniques for improving safety given a few examples of the models trying to escape. We call this problem few-shot catastrophe prevention.
In the rest of this post, we'll elaborate on the above points. By far the biggest section of this post is elaborating on strategies for few-shot catastrophe prevention, because little has been previously said about them.
Thanks to Henry Sleight, Zach Stein-Perlman, Carl Shulman, Aaron Scher, Owen Cotton-Barratt, Joe Carlsmith, Aryan Bhatt, Joshua Clymer, and Fabien Roger for helpful comments.
What's "catching"?
By "the model gets caught", we mean that we see the AI taking actions that are strong direct evidence for the AI being a schemer[1]. A
representative example is catchi...

Jan 5, 2024 • 48min
LW - AI #45: To Be Determined by Zvi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #45: To Be Determined, published by Zvi on January 5, 2024 on LessWrong.
The first half of the week was filled with continued talk about the New York Times lawsuit against OpenAI, which I covered in its own post. Then that talk seemed to mostly die down,, and things were relatively quiet. We got a bunch of predictions for 2024, and I experimented with prediction markets for many of them.
Note that if you want to help contribute in a fun, free and low-key, participating in my prediction markets on Manifold is a way to do that. Each new participant in each market, even if small, adds intelligence, adds liquidity and provides me a tiny bonus. Also, of course, it is great to help get the word out to those who would be interested. Paid subscriptions and contributions to Balsa are of course also welcome.
I will hopefully be doing both a review of my 2023 predictions (mostly not about AI) once grading is complete, and also a post of 2024 predictions some time in January. I am taking suggestions for things to make additional predictions on in the comments.
Table of Contents
Copyright Confrontation #1 covered the New York Times lawsuit.
AI Impacts did an updated survey for 2023. Link goes to the survey. I plan to do a post summarizing the key results, once I have fully processed them, so I can refer back to it in the future.
Introduction.
Table of Contents.
Language Models Offer Mundane Utility. Google providing less every year?
Language Models Don't Offer Mundane Utility. Left-libertarian or bust.
GPT-4 Real This Time. It's not getting stupider, the world is changing.
Fun With Image Generation. The fun is all with MidJourney 6.0 these days.
Deepfaketown and Botpocalypse Soon. Confirm you are buying a real book.
They Took Our Jobs. Plans to compensate losers are not realistic.
Get Involved. Support Dwarkesh Patel, apply for Emergent Ventures.
Introducing. DPO methods? 'On benchmarks' is the new 'in mice.'
In Other AI News. Square Enix say they're going in on generative AI.
Doom? As many estimates of p(doom) went up in 2023 as went down. Why?
Quiet Speculations. Some other predictions.
The Week in Audio. Eric Jang on AI girlfriend empowerment.
Rhetorical Innovation. Machines and people, very different of course.
Politico Problems. Some sort of ongoing slanderous crusade.
Cup of Coffee. Just like advanced AI, it proves that you don't love me.
Aligning a Smarter Than Human Intelligence is Difficult. What's The Plan?
People Are Worried About AI Killing Everyone. Daniel Dennett, Cory Booker.
The Lighter Side. Oh, we are doing this.
Language Models Offer Mundane Utility
Remember that one line from that book about the guy with the thing.
Dan Luu tries to get answers, comparing ChatGPT, Google and other options. Columns are queries, rows are sources.
Marginalia appears to be a tiny DIY search engine focusing on non-commercial content that I'd never hear of before, that specializes in finding small, old and obscure websites about particular topics. Cool thing to have in one's toolbelt, I will be trying it out over time. Not every cool new toy needs to be AI.
While ChatGPT did hallucinate, Dan notes that at this point the major search engines also effectively hallucinate all the time due to recency bias, SEO spam and scam websites. He also notes how much ads now look like real search results on Google and Bing. I have mostly learned to avoid this, but not with 100% accuracy, and a lot of people doubtless fall for it.
Find out how many prime numbers under one billion have digits that sum to nine, via having code check one by one. I mean, sure, why not? There is an easier way if you already know what it is, but should the right algorithm know to look for it?
Language Models Don't Offer Mundane Utility
All LLMs tested continue to cluster in the left-libertarian quadrant.
Eliezer Yudkow...


