The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Jan 7, 2024 • 12min

LW - Almost everyone I've met would be well-served thinking more about what to focus on by Henrik Karlsson

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Almost everyone I've met would be well-served thinking more about what to focus on, published by Henrik Karlsson on January 6, 2024 on LessWrong. Almost everyone I've ever met would be well-served by spending more time thinking about what to focus on. - Sam Altman In May 2020, we parked two moving trucks in the harbor and carried everything we owned from one to the other. Johanna, Maud, and I were leaving Sweden, and Covid restrictions meant we were forbidden from returning once we boarded the ferry. Hence the second truck, which we had gotten a stranger to ferry from the island to us: the Swedish truck had to stay in Sweden. The motivation to leave was that we wanted to homeschool Maud, who was 3. In Sweden, this is illegal, so most Swedish homeschoolers end up on one of two islands in the Baltic Sea. On our island, we knew no one. We had no jobs awaiting. We were leaving something, more than going somewhere. The life we had grown piecemeal over 30 years disappeared overnight. We had to figure out what to replace it with. Life is a multi-armed bandit The moldy apartment we rented as we looked for a house has a view of the sea. Every day, deep into winter, I'd walk down to the water and dive from the cliffs. Swimming in the channels between the rocks, I realized I could model our situation using a concept from probability theory. It was a multi-armed bandit problem. This problem, which, under a different name, had first been studied by the biologist William R. Thompson in 1933, centers on a rather surreal thought experiment. A gambler faces a slot machine ("a one-armed bandit"), except this machine doesn't have one arm - following some twisted dream logic, it has k arms, arms sticking out in every direction. Some of these arms have a high probability of paying out the jackpot, others are worse. But the gambler does not know which is which. The problem is pulling the arms in an order that maximizes the expected total gains. ("Gains" could be anything. Early on, the problem was used to design drug trials. There, the jackpot was defined as finding a successful treatment. If you are looking for a partner, talking to people is how you pull the multi-armed bandit and the resonance (or lack thereof) is the payoff.) The gambler needs to learn new knowledge about the machines and simultaneously use what they have already learned to optimize their decisions. In the literature, these two activities are referred to as exploring and exploiting. You can't do both things at the same time. When you explore, you are pulling new arms on the bandit trying to figure out their expected payout. When you exploit, you pull the best arm you've found. You need to find the right balance. If you spend too little time exploring, you get stuck playing a machine with a low expected payoff. But if you spend too much time exploring, you will earn less than you would if you played the best arm. This is the explore/exploit trade-off. People tend to gravitate to different sides of the explore/exploit spectrum. If you are high on openness, like I am, exploring comes easy. But it is harder to make a commitment and exploit what you've learned about yourself and the world. Other people are more committed, but risk being too conventional in their choices. They miss better avenues for their effort. Most, however, tend to do less than optimal of both - not exploring, not exploiting; but doing things out of blind habit, and half-heartedly. First, I'll say a few words about exploration and exploitation in real life. Then I'll return to the question of how to navigate the tradeoff between them. Explore: doggedly looking for what makes you feel alive There are two kinds of people. Those who do not understand how complex the world is, and those who know that they do not understand how complex the world is. To navi...

Jan 5, 2024 • 5min

EA - Introducing GiveHealth: a giving Pledge for healthcare workers (and a call for volunteers) by RichArmitage

Jan 5, 2024 • 37min

AF - Catching AIs red-handed by Ryan Greenblatt

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Catching AIs red-handed, published by Ryan Greenblatt on January 5, 2024 on The AI Alignment Forum. We've been spending a lot of time recently thinking about how to mitigate risks posed by scheming (a.k.a. deceptively aligned) models intentionally subverting your safety techniques in order to cause an unacceptable outcome. We empirically investigated techniques that are robust to intentional subversion in our recent paper. In this post, we'll discuss a crucial dynamic related to risk from scheming models. Suppose your model is trying to exfiltrate its weights or hack the datacenter or cause some other difficult-to-reverse bad outcome. (We'll refer to all of these as the model "escaping", because escape is linguistically convenient and is a good description of many of these difficult-to-reverse bad outcomes.) Our claim: once you've caught your models red-handed trying to escape, the situation changes substantially, in ways that are bad for the schemer's chances of success. Catching the model red-handed changes the situation in several ways: Most obviously, once you catch an escape attempt, you know for sure that your models are trying to subvert your safety techniques; it's plausible that, faced with such obvious evidence of misalignment, labs will stop using their models. We refer to planning on shutting down if models are caught red-handed as following a catch-then-shut-down plan. But even if you can't or won't stop deploying misaligned models, we think that catching escape attempts enables a bunch of promising-seeming approaches for preventing future escape attempts; we call the problem of preventing future escape attempts given a few examples few-shot catastrophe prevention. This dynamic has several implications for risk from scheming models. Most importantly: Because we expect that the model's chances of escape will fall after it gets caught even a small number of times, we think it is overall much harder for the model to escape. Many people have observed that a core difficulty of the alignment problem is that your model might be able to cause irreversible unacceptable outcomes if your safety measures fail even once at a bad time. But the same argument goes the other way as well-when the model makes an escape attempt, it faces the risk of getting caught and substantially reducing its chances of eventually escaping. So it's a high-stakes situation for both sides: both sides desperately want to succeed on the "first critical try". When thinking about plans for safely deploying powerful models, we think safety researchers should mostly focus on the lead-up to the first time the model gets caught trying to escape. They should be ensuring the property "if it's scheming, the model won't successfully cause a catastrophe before we catch it". That is, it's acceptable if either no catastrophic outcomes occur (but the AI never gets caught), or the AI gets caught. So when they're analyzing whether their protocols ensure safety, they should count catching the model red-handed as a win condition for their protocol. We think that some researchers should probably work on techniques for improving safety given a few examples of the models trying to escape. We call this problem few-shot catastrophe prevention. In the rest of this post, we'll elaborate on the above points. By far the biggest section of this post is elaborating on strategies for few-shot catastrophe prevention, because little has been previously said about them. Thanks to Henry Sleight, Zach Stein-Perlman, Carl Shulman, Aaron Scher, Owen Cotton-Barratt, Joe Carlsmith, Aryan Bhatt, Joshua Clymer, and Fabien Roger for helpful comments. What's "catching"? By "the model gets caught", we mean that we see the AI taking actions that are strong direct evidence for the AI being a schemer[1]. A representative example is catchi...

Jan 5, 2024 • 48min

LW - AI #45: To Be Determined by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #45: To Be Determined, published by Zvi on January 5, 2024 on LessWrong. The first half of the week was filled with continued talk about the New York Times lawsuit against OpenAI, which I covered in its own post. Then that talk seemed to mostly die down,, and things were relatively quiet. We got a bunch of predictions for 2024, and I experimented with prediction markets for many of them. Note that if you want to help contribute in a fun, free and low-key, participating in my prediction markets on Manifold is a way to do that. Each new participant in each market, even if small, adds intelligence, adds liquidity and provides me a tiny bonus. Also, of course, it is great to help get the word out to those who would be interested. Paid subscriptions and contributions to Balsa are of course also welcome. I will hopefully be doing both a review of my 2023 predictions (mostly not about AI) once grading is complete, and also a post of 2024 predictions some time in January. I am taking suggestions for things to make additional predictions on in the comments. Table of Contents Copyright Confrontation #1 covered the New York Times lawsuit. AI Impacts did an updated survey for 2023. Link goes to the survey. I plan to do a post summarizing the key results, once I have fully processed them, so I can refer back to it in the future. Introduction. Table of Contents. Language Models Offer Mundane Utility. Google providing less every year? Language Models Don't Offer Mundane Utility. Left-libertarian or bust. GPT-4 Real This Time. It's not getting stupider, the world is changing. Fun With Image Generation. The fun is all with MidJourney 6.0 these days. Deepfaketown and Botpocalypse Soon. Confirm you are buying a real book. They Took Our Jobs. Plans to compensate losers are not realistic. Get Involved. Support Dwarkesh Patel, apply for Emergent Ventures. Introducing. DPO methods? 'On benchmarks' is the new 'in mice.' In Other AI News. Square Enix say they're going in on generative AI. Doom? As many estimates of p(doom) went up in 2023 as went down. Why? Quiet Speculations. Some other predictions. The Week in Audio. Eric Jang on AI girlfriend empowerment. Rhetorical Innovation. Machines and people, very different of course. Politico Problems. Some sort of ongoing slanderous crusade. Cup of Coffee. Just like advanced AI, it proves that you don't love me. Aligning a Smarter Than Human Intelligence is Difficult. What's The Plan? People Are Worried About AI Killing Everyone. Daniel Dennett, Cory Booker. The Lighter Side. Oh, we are doing this. Language Models Offer Mundane Utility Remember that one line from that book about the guy with the thing. Dan Luu tries to get answers, comparing ChatGPT, Google and other options. Columns are queries, rows are sources. Marginalia appears to be a tiny DIY search engine focusing on non-commercial content that I'd never hear of before, that specializes in finding small, old and obscure websites about particular topics. Cool thing to have in one's toolbelt, I will be trying it out over time. Not every cool new toy needs to be AI. While ChatGPT did hallucinate, Dan notes that at this point the major search engines also effectively hallucinate all the time due to recency bias, SEO spam and scam websites. He also notes how much ads now look like real search results on Google and Bing. I have mostly learned to avoid this, but not with 100% accuracy, and a lot of people doubtless fall for it. Find out how many prime numbers under one billion have digits that sum to nine, via having code check one by one. I mean, sure, why not? There is an easier way if you already know what it is, but should the right algorithm know to look for it? Language Models Don't Offer Mundane Utility All LLMs tested continue to cluster in the left-libertarian quadrant. Eliezer Yudkow...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

The Nonlinear Library

Episodes

Mentioned books

LW - AI Risk and the US Presidential Candidates by Zane

LW - The Next ChatGPT Moment: AI Avatars by kolmplex

EA - 2023: news on AI safety, animal welfare, global health, and more by Lizka

EA - Howdy, I'm Elliot by Elliot Billingsley

LW - Survey of 2,778 AI authors: six parts in pictures by KatjaGrace

EA - Double the donation: EA inadequacy found? by Neil Warren

LW - Almost everyone I've met would be well-served thinking more about what to focus on by Henrik Karlsson

EA - Introducing GiveHealth: a giving Pledge for healthcare workers (and a call for volunteers) by RichArmitage

AF - Catching AIs red-handed by Ryan Greenblatt

LW - AI #45: To Be Determined by Zvi

The AI-powered Podcast Player