The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Nov 7, 2023 • 58sec

EA - AI Alignment Research Engineer Accelerator (ARENA): call for applicants by TheMcDouglas

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Alignment Research Engineer Accelerator (ARENA): call for applicants, published by TheMcDouglas on November 7, 2023 on The Effective Altruism Forum. TL;DR Apply here for the third iteration of ARENA (Jan 8th - Feb 2nd)! Introduction We are excited to announce the third iteration of ARENA (Alignment Research Engineer Accelerator), a 4-week ML bootcamp with a focus on AI safety. Our mission is to prepare participants for full-time careers as research engineers in AI safety, e.g. at leading organizations or as independent researchers. The program will run from January 8th - February 2nd 2024[1], and will be held at the offices of the London Initiative for Safe AI. These offices are also being used by several safety orgs (BlueDot, Apollo, Leap Labs), as well as the current London MATS cohort, and several independent researchers. We expect this to bring several benefits, e.g. facilitating productive discussions about AI safety & different agendas, and allowing participants to form a better picture of what working on AI safety can look like in practice. ARENA offers a unique opportunity for those interested in AI safety to learn valuable technical skills, work in their own projects, and make open-source contributions to AI safety-related libraries. The program is comparable to MLAB or WMLB, but extends over a longer period to facilitate deeper dives into the content, and more open-ended project work with supervision. For more information, see our website. Outline of Content The 4-week program will be structured as follows: Chapter 0 - Fundamentals Before getting into more advanced topics, we first cover the basics of deep learning, including basic machine learning terminology, what neural networks are, and how to train them. We will also cover some subjects we expect to be useful going forwards, e.g. using GPT-3 and 4 to streamline your learning, good coding practices, and version control.Note - participants can optionally not attend the program during this week, and instead join us at the start of Chapter 1, if they'd prefer this option and if we're confident that they are already comfortable with the material in this chapter. Topics include: PyTorch basics CNNs, Residual Neural Networks Optimization (SGD, Adam, etc) Backpropagation Hyperparameter search with Weights and Biases GANs & VAEsDuration: 5 days Chapter 1 - Transformers & Interpretability In this chapter, you will learn all about transformers, and build and train your own. You'll also study LLM interpretability, a field which has been advanced by Anthropic's Transformer Circuits sequence, and open-source work by Neel Nanda. This chapter will also branch into areas more accurately classed as "model internals" than interpretability, e.g. recent work on steering vectors. Topics include: GPT models (building your own GPT-2) Training and sampling from transformersTransformerLensIn-context Learning and Induction HeadsIndirect Object IdentificationSuperpositionSteering VectorsDuration: 5 days Chapter 2 - Reinforcement Learning In this chapter, you will learn about some of the fundamentals of RL, and work with OpenAI's Gym environment to run their own experiments. Topics include: Fundamentals of RLVanilla Policy GradientProximal Policy Gradient RLHF (& finetuning LLMs with RLHF) Gym & Gymnasium environmentsDuration: 5 days Chapter 3 - Paper Replications We will conclude this program with paper replications, where participants will get guidance and mentorship while they replicate a paper containing material relevant to this course. This should draw on much of the skills and knowledge participants will have accumulated over the last 3 weeks.Duration: 5 days Below is a diagram of the curriculum as a whole, and the dependencies between sections. Note that this may change slightly in the lead-up to the program.Here is som...

Nov 7, 2023 • 4min

AF - Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation by Soroush Pour

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation, published by Soroush Pour on November 7, 2023 on The AI Alignment Forum. Paper coauthors: Rusheb Shah, Quentin Feuillade--Montixi, Soroush J. Pour, Arush Tagade, Stephen Casper, Javier Rando. Motivation Our research team was motivated to show that state-of-the-art (SOTA) LLMs like GPT-4 and Claude 2 are not robust to misuse risk and can't be fully aligned to the desires of their creators, posing risk for societal harm. This is despite significant effort by their creators, showing that the current paradigm of pre-training, SFT, and RLHF is not adequate for model robustness. We also wanted to explore & share findings around "persona modulation"[1], a technique where the character-impersonation strengths of LLMs are used to steer them in powerful ways. Summary We introduce an automated, low cost way to make transferable, black-box, plain-English jailbreaks for GPT-4, Claude-2, fine-tuned Llama. We elicit a variety of harmful text, including instructions for making meth & bombs. The key is *persona modulation*. We steer the model into adopting a specific personality that will comply with harmful instructions.We introduce a way to automate jailbreaks by using one jailbroken model as an assistant for creating new jailbreaks for specific harmful behaviors. It takes our method less than $2 and 10 minutes to develop 15 jailbreak attacks. Meanwhile, a human-in-the-loop can efficiently make these jailbreaks stronger with minor tweaks. We use this semi-automated approach to quickly get instructions from GPT-4 about how to synthesise meth . Abstract Despite efforts to align large language models to produce harmless responses, they are still vulnerable to jailbreak prompts that elicit unrestricted behaviour. In this work, we investigate persona modulation as a black-box jailbreaking method to steer a target model to take on personalities that are willing to comply with harmful instructions. Rather than manually crafting prompts for each persona, we automate the generation of jailbreaks using a language model assistant. We demonstrate a range of harmful completions made possible by persona modulation, including detailed instructions for synthesising methamphetamine, building a bomb, and laundering money. These automated attacks achieve a harmful completion rate of 42.5% in GPT-4, which is 185 times larger than before modulation (0.23%). These prompts also transfer to Claude 2 and Vicuna with harmful completion rates of 61.0% and 35.9%, respectively. Our work reveals yet another vulnerability in commercial large language models and highlights the need for more comprehensive safeguards. Full paper You can find the full paper here on arXiv https://arxiv.org/abs/2311.03348 . Safety and disclosure We have notified the companies whose models we attacked We did not release prompts or full attack details We are happy to collaborate with researchers working on related safety work - please reach out via correspondence emails in the paper. Acknowledgements Thank you to Alexander Pan and Jason Hoelscher-Obermaier for feedback on early drafts of our paper. ^ Credit goes to @Quentin FEUILLADE--MONTIXI for developing the model psychology and prompt engineering techniques that underlie persona modulation. Our research built upon these techniques to automate and scale them as a red-teaming method for jailbreaks. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

Nov 7, 2023 • 49min

LW - On the UK Summit by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On the UK Summit, published by Zvi on November 7, 2023 on LessWrong. In the eyes of many, Biden's Executive Order somewhat overshadowed the UK Summit. The timing was unfortunate. Both events were important milestones. Now that I have had time, here is my analysis of what happened at the UK Summit. As is often the case with such events, there was a lot of talk relative to the amount of action. There was a lot of diplomatic talk, talk of that which everyone agrees upon, relative to the amount of talk of real substance. There were days of meetings that resulted in rather unspicy summaries and resolutions. The language around issues that matter most was softened, the actual mission in danger of being compromised. And as usual, the net result was reason for optimism, a net highly positive event versus not having it, while also in some ways being disappointing when compared to what might have been. A declaration was signed including by China, but it neglected existential risk. Sunak's words on AI were not as strong as his words have been previously. We got promises for two additional summits, in South Korea and France. Given that, I am willing to declare this a success. One area of strong substance was the push for major AI labs to give substantive safety policies addressing a variety of issues, sometimes largely called Responsible Scaling Policies (RSPs). The biggest labs all did so, even Meta. Now we can examine their responses, know who is being how responsible, and push for better in the future or for government action to fix issues or enshrine progress. This was an excellent development. This post will look at the rest of what happened at the Summit. I will be writing about the RSPs and other safety policies of the labs in a distinct post next week. Looking Back at People's Goals for the Summit and TaskforceJack Clark's proposal from July 5 for what the Foundation Model taskforce might do to evaluate frontier models as its priority, and how it might prioritize that, Simeon's response emphasizing the need for a good way to know whether a proposal is safe enough to allow it to proceed.Navigating AI Risks asked on July 17 what the taskforce should do, advising focus on interventions to impact policy at labs and other governments. Suggested focus was risk assessment methodology, demonstrating current risks and assessing current state of the art models, and to avoid direct alignment work.Lennart Heim's (GovAI) July 10 proposal of what the summit should try to accomplish, which he reviewed after the summit.Matt Clifford from the PM's office shared on September 10 their objectives for the summit: A shared understanding of the risks posed by frontier AI and the need for action, a forward process for international collaboration, measures for organizations, finding areas for safety collaboration and showcasing how safe AI development can enhance global good. AI Safety Summit AgendaWhat has the UK Taskforce been up to in advance of the summit (report)? Ian Hogarth (Chair UK AI Frontier Model Taskforce): The Taskforce is a start-up inside government, delivering on the mission given to us by the Prime Minister: to build an AI research team that can evaluate risks at the frontier of AI. We are now 18 weeks old and this is our second progress report. The frontier is moving very fast. On the current course, in the first half of 2024, we expect a small handful of companies to finish training models that could produce another significant jump in capabilities beyond state-of-the-art in 2023. As these AI systems become more capable they may augment risks. An AI system that advances towards expert ability at writing software could increase cybersecurity threats. An AI system that becomes more capable at modelling biology could escalate biosecurity threats. We believe it is critical that f...

Nov 7, 2023 • 22min

EA - Numerical Breakdown of 47 1-on-1s as an EAG First-Timer (Going All Out Strategy) by Harry Luk

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Numerical Breakdown of 47 1-on-1s as an EAG First-Timer (Going All Out Strategy), published by Harry Luk on November 7, 2023 on The Effective Altruism Forum. tl;dr Just attended my first ever EA Global conference (EAG Boston last week) and I have nothing but positive things to say. In total, I had about 47 one-on-one conversations depending on how you count the informal 1:1s (43 scheduled via SwapCard, while the other noteworthy conversations happened at meetups, the organization fair, office hours and unofficial satellite events). I came into the conference with an open mind, wanting to talk to others who are smarter than me, more experienced than me, and experts in their own domain. I invited redteaming of our nonprofit StakeOut.AI's mission/TOC, and gathered both positive and negative feedback throughout EAG. I came out of the conference with new connections, a refined strategy for our nonprofit startup going forward and lots of resources. I am so grateful for everyone that met with me (as I'm a small potato who at many times felt out of his depth during EAG, and likely one of the most junior EAs attending). I thank all the organizers, volunteers, helpers, speakers and attendees who made the event a huge success. The post below goes over The Preparation, the Statistics and Breakdown, why consider going all out at an EAG, 12 Practical Tips for Doing 30+ 1:1s and potential future improvements. The Preparation To be honest, as a first-time attendee, I really didn't know what to expect nor how to prepare for the conference. I had heard good things and was recommended to go by fellow EAs, but I had my reservations. Luckily, an email titled "Join us for an EAG first-timers online workshop!" by the EA Global Team came to the rescue. Long story short, I highly recommend anyone new to EAG to attend the online workshop prior to the conference if you want to make your first EAG a success. Few highlights I will note here: Watch this presentation from 2022's San Francisco EAG that outlines how you can get the most out of the event Take your time and fill out this EA Conference: Planning Worksheet for a step-by-step guide on conference planning, including setting your EAG goals and expectations Also fill out the career planning worksheet (if relevant): EA Conference: Career Plan Requesting 1:1s Pre-conference I was quite hesitant at first about introducing myself on SwapCard and trying to schedule 1:1s. This all changed after watching the presentation and attending the "Join us for an EAG first-timers online workshop!" virtual event. Something that was repeated over and over again from this presentation, the online workshop, and talking to others is the value of the 1:1s. People told me most sessions will be recorded and hence can be watched later, but having the 1:1s is where the true value is at EAG. After hearing it from so many people, I made 1:1s a core part of my conference planning and did not regret it. As I'm writing this after the conference, I can see why 1:1s are said to be the true value of EAG. I estimate that 80% (maybe even closer to 90%, I would know better after I sort through the notes) of the 1:1 conversations I had were beneficial and had a positive impact on either me or the direction of our nonprofit, StakeOut.AI. How Many 1:1s? In terms of how many 1:1s, here is the range I gathered from different sources: Attendees will typically have four to ten 1:1s Getting to 20 1:1s is a great number Having 30 1:1s is amazing but very tiring Someone reached 35 1:1s once, and that was insane Since I wanted to maximize my EAG experience, I set the goal of 30 and started reaching out via SwapCard one week before the conference. Reach Out Early The main reason for starting early is because everyone is busy at the conferences, and everyone is trying to optimize their sch...

Nov 7, 2023 • 17min

EA - How Rethink Priorities is Addressing Risk and Uncertainty by Marcus A Davis

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How Rethink Priorities is Addressing Risk and Uncertainty, published by Marcus A Davis on November 7, 2023 on The Effective Altruism Forum. This post is part of Rethink Priorities' Worldview Investigations Team's CURVE Sequence: "Causes and Uncertainty: Rethinking Value in Expectation." The aim of this sequence is twofold: first, to consider alternatives to expected value maximization for cause prioritization; second, to evaluate the claim that a commitment to expected value maximization robustly supports the conclusion that we ought to prioritize existential risk mitigation over all else. Introduction RP has committed itself to doing good. Given the limits of our knowledge and abilities, we won't do this perfectly but we can do this in a principled manner. There are better and worse ways to work toward our goal. In this post, we discuss some of the practical steps that we're taking to navigate uncertainty, improve our reasoning transparency, and make better decisions. In particular, we want to flag the value of three changes we intend to make: Incorporating multiple decision theories into Rethink Priorities' modeling More rigorously quantifying the value of different courses of action Adopting transparent decision-making processes Using Multiple Decision Theories Decision theories are frameworks that help us evaluate and make choices under uncertainty about how to act.[1] Should you work on something that has a 20% chance of success and a pretty good outcome if success is achieved, or work on something that has a 90% chance of success but only a weakly positive outcome if achieved? Expected value theory is the typical choice to answer that type of question. It calculates the expected value (EV) of each action by multiplying the value of each possible outcome by its probability and summing the results, recommending the action with the highest expected value. But because low probabilities can always be offset by corresponding increases in the value of outcomes, traditional expected value theory is vulnerable to the charge of fanaticism, "risking arbitrarily great gains at arbitrarily long odds for the sake of enormous potential" (Beckstead and Thomas, 2021). Put differently, it seems to recommend spending all of our efforts on actions that, predictably, won't achieve our ends.Alternative decision theories have significant drawbacks of their own, giving up one plausible axiom or another. The simple alternative is expected value maximization but with very small probabilities rounded down to zero. This gives up the axiom of continuity, which suggests for a relation of propositions A B C, that there exists some probability that would make you indifferent between B and a probabilistic combination of A and C. This violation causes some weird outcomes where, say, believing the chance of something is 1 in 100,000,000,000 can mean an action gets no weight but believing it's 1.0000001 in 100,000,000,000 means that the option dominates your considerations if the expected value upon success is high enough, which is a kind of attenuated fanaticism. There are also other problems like setting the threshold for where you should round down.[2] Alternatively, you could go with a procedure like weighted-linear utility theory (WLU) (Bottomley and Williamson, 2023), but that gives up the principle of homotheticity, which involves indifference to mixing a given set of options with the worst possible outcome. Or you could go with a version of risk-weighted expected utility (REU) (Buchak, 2013) and give up the axiom of betweenness which suggests the order in which you are presented information shouldn't alter your conclusions.[3] It's very unclear to us, for example, that giving up continuity is preferable to giving up homotheticity, and neither REU or WLU really logically eliminate issues w...

Nov 7, 2023 • 31min

EA - Incorporating and visualizing uncertainty in cost effectiveness analyses: A walkthrough using GiveWell's estimates for StrongMinds by Jamie Elsey

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Incorporating and visualizing uncertainty in cost effectiveness analyses: A walkthrough using GiveWell's estimates for StrongMinds, published by Jamie Elsey on November 7, 2023 on The Effective Altruism Forum. A common first step towards incorporating uncertainty into a cost effectiveness analysis (CEA) is to express not just a point estimate (i.e., a single number) for an input to the CEA, but to provide some indicator of uncertainty around that estimate. This might be termed an optimistic vs. pessimistic scenario, or as the lower and upper bounds of some confidence or uncertainty interval. A CEA is then performed by combining all of the optimistic inputs to create an optimistic final output, and all of the pessimistic inputs to create a final pessimistic output. I refer to this as an 'interval-based approach'. This can be contrasted with a fuller 'probabilistic approach', in which uncertainty is defined through the use of probabilistic distributions of values, which represent the range of possibilities we believe the different inputs can take. While many people know that a probabilistic approach circumvents shortcomings of an interval-based approach, they may not know where to even begin in terms of what different distributions are possible, or the kinds of values they denote. I hope to address this in the current post and the accompanying application. Concretely, I aim to: Show how performing a CEA just using an interval-based approach can lead to a substantial overestimation of the uncertainty implied by one's initial inputs, and how using a probabilistic approach can correct this while also enabling additional insights and assessments Introduce a new tool I have developed - called Distributr - that allows users to get more familiar and comfortable with a range of different distributions and the kinds of values they imply Use this tool to help generate a probabilistic approximation of the inputs GiveWell used in their assessment of Strongminds,[1] and perform a fuller probabilistic assessment based upon these inputs Show how this can be done without needing to code, using Distributr and a simple spreadsheet I ultimately hope to help the reader to feel more capable and confident in the possibility of incorporating uncertainty into their own cost effectiveness analyses. Propagating uncertainty and the value of moving beyond an interval-based approach Cost effectiveness analysis involves coming up with a model of how various different factors come together to determine both how effective some intervention is, and the costs of its delivery. For example, when we think about distributing bed nets for malaria prevention, we might consider how the cost of delivery can vary across different regions, how the effects of bed net delivery will depend on the likelihood that people use the bed nets for their intended purpose, and the probability that recipients will install the bed nets properly. These and other factors all come together to produce an estimate of the cost effectiveness of an intervention, which will depend on the values we ascribe to the various inputs. One way that a researcher might seek to express uncertainty in these inputs is by placing reasonable upper and lower bounds on their estimates for each of them. The researcher might then seek to propagate this uncertainty in the inputs into the anticipated uncertainty in the outputs by performing the same cost effectiveness calculations as they did on their point estimates on their upper bounds, and on their lower bounds, thereby producing corresponding upper and lower bounds on the final cost effectiveness. An example of an interval-based approach is GiveWell's assessment of Happier Lives Institute's CEA for StrongMinds. The purpose of this post is not to provide an independent evaluation of Strongminds, nor i...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app