The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Jan 21, 2024 • 16min

LW - Logical Line-Of-Sight Makes Games Sequential or Loopy by StrivingForLegibility

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Logical Line-Of-Sight Makes Games Sequential or Loopy, published by StrivingForLegibility on January 19, 2024 on LessWrong. In the last post, we talked about strategic time and the strategic time loops studied in open-source game theory. In that context, agents have logical line-of-sight to each other and the situation they're both facing, which creates a two-way information flow at the time each is making their decision. In this post I'll describe how agents in one context can use this logical line-of-sight to condition their behavior on how they behave in other contexts. This in turn makes those contexts strategically sequential or loopy, in a way that a purely causal decision theory doesn't pick up on. Sequential Games and Leverage As an intuition pump, consider the following ordinary game: Alice and Bob are going to play a Prisoners' Dilemma, and then an Ultimatum game. My favorite framing of the Prisoners' Dilemma is by Nicky Case: each player stands in front of a machine which accepts a certain amount of money, e.g. $100.[1] Both players choose simultaneously whether to put some of their own money into the machine. If Alice places $100 into the machine in front of her, $200 comes out of Bob's machine, and vice versa. If a player withholds their money, nothing comes out of the other player's machine. We call these strategies Cooperate and Defect respectively. Since neither player can cause money to come out of their own machine, Causal Decision Theory (CDT) identifies Defect as a dominant strategy for both players. Dissatisfaction with this answer has motivated many to dig into the foundations of decision theory, and coming up with different conditions that enable Cooperation in the Prisoners' Dilemma has become a cottage industry for the field. I myself keep calling it the Prisoners' Dilemma (rather than the Prisoner's Dilemma) because I want to frame it as a dilemma they're facing together, where they can collaboratively implement mechanisms that incentivize mutual Cooperation. The mechanism I want to describe today is leverage: having something the other player wants, and giving it to them if and only if they do what you want. Suppose that the subsequent Ultimatum game is about how to split $1,000. After the Prisoners' Dilemma, a fair coin is flipped to determine Alice and Bob's roles in the Ultimatum game. The evaluator can employ probabilistic rejection to shape the incentives of the proposer, so that the proposer has the unique best-response of offering a fair split. (According to the evaluator's notion of fairness.) And both players might have common knowledge that "a fair split" depends on what both players did in the Prisoners' Dilemma. If Alice is the evaluator, and she Cooperated in the first round but Bob Defected, then she is $200 worse-off than if Bob had Cooperated, and she can demand that Bob compensate her for this loss. Similarly, if Alice is the proposer, she might offer Bob $500 if he Cooperated but $300 if he Defected. Since Bob only gained $100 compared to Cooperating, his best-response is to Cooperate if he believes Alice will follow this policy. And Bob can employ the same policy, stabilizing the socially optimal payoff of ($600, $600) as a Nash equilibrium where neither has an incentive to change their policy. Crucially, this enforcement mechanism relies on each player having enough leverage in the subsequent game to incentivize Cooperation in the first round. If the Ultimatum game had been for stakes less than $200, this would be less than a Defector can obtain for themselves if the other player Cooperates. Knowing that neither can incentivize Cooperation, both players might fall back into mutual Defection. Bets vs Unexploitability Even if Alice knows she has enough leverage that she can incentivize Bob to Cooperate, she might be uncert...

Jan 19, 2024 • 5min

LW - Does literacy remove your ability to be a bard as good as Homer? by Adrià Garriga-alonso

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Does literacy remove your ability to be a bard as good as Homer?, published by Adrià Garriga-alonso on January 19, 2024 on LessWrong. Epistemic status: probably we did lose the ability to memorize long songs due to improper practice, but it may be possible to enjoy the benefits of literacy and epic memory simultaneously. Thanks to Niels uit de Bos for better links and editing, and to Ryan Kidd for encouragement to post. You probably know that Socrates thought writing was terrible and it would destroy people's ability to memorize things, because now they're written down and don't need to be memorized. I always thought that was a little ridiculous, maybe the effect was there and memorization would be less good, but not to a crazy extent. Well, Milman Parry and Albert Lord traveled to Yugoslavia in the 1930-1950s, and recorded performance of gusle-player (guslar) bards. The greatest of them was Avdo Međedović. From Wikipedia: At Parry's request, Avdo sang songs he already knew and some songs he heard in front of Parry, convincing him that someone Homer-like could produce a poem so long. Avdo dictated, over five days, a version of the well-known theme The Wedding of Meho Smailagić that was 12,323 lines long, saying on the fifth day to Nikola (Parry's assistant on the journey) that he knew even longer songs. On another occasion, he sang over several days an epic of 13,331 lines. He said he had several others of similar length in his repertoire. In Parry's first tour, over 80,000 lines were transcribed. All of the bards, which recited incredibly long songs from memory and composed slightly new lyrics on the fly "at the rate of [10-20] ten-syllable lines a minute", and they could not have been geniuses, because there were too many of them. Instead, they had a "special technique of composition": they were illiterate. From The Singer of Tales: [Albert] Lord sees the existence of literacy and written/printed texts as deadly-- not to the songs themselves, but to the method of composition by which they are realised [which in the end amounts to the same thing]--schools, cities, and literacy eventually put [an end] to it in urban areas "We must remember that the oral poet has no idea of a fixed model text to serve as his guide. He has models enough but they are not fixed and he has no idea of memorizing them in a fixed form. Every time he hears a song sung, it is different" Once the idea that there is a fixed text enters the bard's minds, they stop being able to compose new versions on the fly. Also presumably they can't remember the full 13-thousand line epics because they won't be able to remember the exact text. Again from Wikipedia: in 1935 Lord asked Međedović to recall a song he heard only once, for this he asked another guslar, Mumin Vlahovljak of Plevlje, to sing his song "Bećiragić Meho", unknown to Međedović. After he heard the song of 2,294 lines, he sung it himself, but made it almost three times longer, 6,313 lines I wrote about this from a blog post by Sam Kriss, and I was struck enough to fact-check it. The extent to which the memory and abilities of illiterate folks can be better than literate folks is very surprising to me. It seems possible to me that literate people could replicate the feats of the guslar. But they'd have to hear the song many different times, sung somewhat differently by many different people, and resist the temptation to write it down to try to remember it as they learned. Lord's speculation on how to learn to be a bard: We must remember that the oral poet has no idea of a fixed model text to serve as his guide. He has models enough but they are not fixed and he has no idea of memorizing them in a fixed form. Every time he hears a song sung, it is different." p.22 "Sometimes there are published versions of songs in the background. [Named Infor...

Jan 19, 2024 • 11min

LW - The True Story of How GPT-2 Became Maximally Lewd by Writer

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The True Story of How GPT-2 Became Maximally Lewd, published by Writer on January 19, 2024 on LessWrong. This video recounts an incident that occurred at OpenAI in which flipping a single minus sign led the RLHF process to make GPT-2 only output sexually explicit continuations. The incident is described in OpenAI's paper "Fine-Tuning Language Models from Human Preferences" under section 4.4: "Bugs can optimize for bad behavior". The script has been written by Jai, with some significant input and rework by me, Writer. You can read it below. In 2019, one OpenAI researcher made a typo - and birthed an evil AI hell-bent on making everything as horny as possible. This is the absurd, ridiculous, and yet true story of how it happened. Part I: GPT Since 2017, OpenAI has been building Generative Pre-trained Transformer models, or GPTs - language AIs with a singular focus on predicting text, trained across billions of writing samples. If you prompt a GPT model with "Once upon a ", it would predict "time" to follow. Asked for further predictions, the same GPT model might continue "there was a… brave dog named Grace", and so on - because those are the kinds of words that it expects to come next. In this example the GPT model has essentially learned to write a fairy tale, simply as a consequence of getting very, very good at text prediction. And it was exactly these kinds of emergent capabilities that had OpenAI so excited. These models can do a lot more than fairy tales. OpenAI's first GPT model, often called GPT-1, had been trained on excerpts from thousands of books. It showed so much promise that OpenAI almost immediately decided to train a much bigger model that could do more. But bigger models need more training data, and for this model, books would not be enough. No - this model would be trained on...the Internet. OpenAI trained GPT-2 to imitate writing across eight million web pages. And in learning to predict such an overwhelming quantity and variety of writing, GPT-2 acquired some surprising capabilities. With the right prompt, it could translate documents, answer questions about a text, summarize passages, and sometimes even write like a human. It was a shockingly powerful model. In fact, it may have been too powerful. GPT-2 wouldn't hesitate to plan crimes, instruct terrorists on bomb-making, create sexually explicit content, or promote cruelty, hatred and misinformation. And this was unacceptable to OpenAI - They wanted a model that did more than just predict text - they wanted a model that operated in accordance with some kind of human values, or at least with their values. But the GPT-2 architecture had no place for ethics, guidelines, principles, or corporate PR policies. It couldn't be bullied, reasoned or negotiated with. Nothing would sway the machine from its utter devotion to generating realistic text. But OpenAI was determined to get their model under control. So they got to work... not yet realizing that this work, along with a single typo, would lead to the one thing they didn't want to happen. Part II: Human Feedback To align GPT-2, OpenAI used a new technique known as "Reinforcement Learning from Human Feedback", or "RLHF". We're going to outline a simplified form of RLHF here, but if you want all the juicy technical details check out the link in the description. The goal of RLHF is to take a basic starting language model, some plain-language guidelines, and a small group of humans providing feedback, and produce a new model that follows those guidelines. We can think of this model-in-training as the "Apprentice". The apprentice begins the training process as an exact copy of GPT-2. During training, it gets prompts and generates responses, also called "continuations". Those prompts and continuations are sent to the human evaluators, who rate them based o...

Jan 18, 2024 • 8min

EA - Good job opportunities for helping with the most important century by Holden Karnofsky

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Good job opportunities for helping with the most important century, published by Holden Karnofsky on January 18, 2024 on The Effective Altruism Forum. Yes, this is my first post in almost a year. I'm no longer prioritizing this blog, but I will still occasionally post something. I wrote ~2 years ago that it was hard to point to concrete opportunities to help the most important century go well. That's changing. There are a good number of jobs available now that are both really promising opportunities to help (in my opinion) and are suitable for people without a lot of pre-existing knowledge of AI risk (or even AI). The jobs are demanding, but unlike many of the job openings that existed a couple of years ago, they are at well-developed organizations and involve relatively clear goals. So if you're someone who wants to help, but has been waiting for the right moment, this might be it. (Or not! I'll probably keep making posts like this as the set of opportunities gets wider.) Here are the jobs that best fit this description right now, as far as I can tell. The rest of this post will give a bit more detail on how these jobs can help, what skills they require and why these are the ones I listed. Organization Location Jobs Link UK AI Safety Institute London (remote work possible within the UK) Engineering and frontend roles, cybersecurity roles Here AAAS, Horizon Institute for Public Service, Tech Congress Washington, DC Fellowships serving as entry points into US policy roles Here AI companies: Google DeepMind, OpenAI, Anthropic1 San Francisco and London (with some other offices and remote work options) Preparedness/Responsible Scaling roles; alignment research roles Here, here, here, here Model Evaluation and Threat Research (METR) (fewer roles available) Berkeley (with remote work options) Engineering and data roles Here Software engineering and development (and related areas) seem especially valuable right now, so think about whether you know folks with those skills who might be interested! How these help A lot of these jobs (and the ones I know the most about) would be contributing toward a possible global standards regime for AI: AI systems should be subject to testing to see whether they present major risks, and training/deploying AI should stopped (e.g., by regulation) when it can't be done safely. The basic hope is: Teams will develop "evals": tests of what AIs are capable of, particularly with respect to possible risks. For example, one eval might be prompting an AI to give a detailed description of how to build a bioweapon; the more detailed and accurate its response, the more risk the AI poses (while also possibly having more potential benefits as well, by virtue of being generally more knowledgeable/capable). It will become common (through regulation, voluntary action by companies, industry standards, etc.) for cutting-edge AI systems to be subject to evals for dangerous capabilities. When evals reveal risk, they will trigger required mitigations. For example: An AI capable of bioweapons development should be (a) deployed in such a way that people can't use it for that (including by "jailbreaking" it), and (b) kept under good security to stop would-be terrorists from circumventing the restrictions. AIs with stronger and more dangerous capabilities might require very challenging mitigations, possibly beyond what anyone knows how to do today (for example, rigorous demonstrations that an AI won't have dangerous unintended aims, even if this sort of thing is hard to measure). Ideally, we'd eventually build a robust international governance regime (comparisons have been made to nuclear non-proliferation regimes) that reliably enforces rules like these, while safe and beneficial AI goes forward. But my view is that even dramatically weaker setups can still help a lo...

Jan 18, 2024 • 2min

EA - CEA is spinning out of Effective Ventures by Eli Nathan

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: CEA is spinning out of Effective Ventures, published by Eli Nathan on January 18, 2024 on The Effective Altruism Forum. The Centre for Effective Altruism (CEA) is spinning out as a project of Effective Ventures Foundation UK and Effective Ventures US (known collectively as Effective Ventures or EV) to become an independent organisation. As EV decentralises, we expect that bringing our operations in-house and establishing our own legal entities will better allow us to accomplish our mission and goals. We'd like to extend a deep thank you to the EV team for all their hard work in helping us to scale our programs, and in providing essential support and leadership over the last few years. Alongside our new CEO joining the team next month, this means that we're entering a new era for CEA. We're excited to build an operations team that can align more closely with our needs, as well as a governance structure that allows us to be independent and better matches our purpose. As EV's largest project and one with many complex and interwoven programs, we expect this spin-out process will take some time, likely between 12-24 months. This is because we'll need to set up new legal structures, hire new staff, manage visas and intellectual property, and handle various other items. We expect this spin-out to not affect the external operations of our core products, and generally not be particularly noticeable from the outside - EA Global and the EA Forum, for example, will continue to run as they would otherwise. We expect to start hiring for our new operations team over the coming months, and will have various job postings live soon - likely across finance, legal, staff support, and other areas. If you're potentially interested in these types of roles, you can fill out the expression of interest form here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

The Nonlinear Library

Episodes

Mentioned books

EA - Some thoughts on moderation in doing good by Vasco Grilo

LW - A quick investigation of AI pro-AI bias by Fabien Roger

EA - Unpacking Martin Sandbu's recent(ish) take on EA by JWS