The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Feb 24, 2024 • 17min

LW - Choosing My Quest (Part 2 of "The Sense Of Physical Necessity") by LoganStrohl

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Choosing My Quest (Part 2 of "The Sense Of Physical Necessity"), published by LoganStrohl on February 24, 2024 on LessWrong. This is the second post in a sequence that demonstrates a complete naturalist study, specifically a study of query hugging (sort of), as described in The Nuts and Bolts of Naturalism. This one demos phase zero, all the preparation that's often needed before you can really get to work. It corresponds to the how-to posts "Getting Started With Naturalism" and "Catching the Spark". For context on this sequence, see the intro post. The Dead Words Of Others At the outset of any naturalist study, original seeing and curiosity are paramount. If they're already present - and they aren't crowded out by other concerns, such as a desperation to solve your problem as quickly as possible - then you can dive right in. Otherwise, some deliberate cultivation is needed. Where did I stand with original seeing and curiosity, at the beginning of this study? I was pretty low on both. There was this whole coherent concept, "hug the query", handed to me from the outside by a clear and well-written essay that did not leave me feeling confused. I could tell there was something in there that I wanted to engage with, somehow; but for the most part, my understanding was relatively inert. If I wanted to transform that seed of interest into a study that was live, growing, and really mine, it was going to take some work. As I said in the introduction, I had to forget what I already knew so I could see it all again, this time entirely for myself. Methodological Note There is a skillset that I call "making fake things real". I'm not sure that's a good name for it; it's just what I call it inside my own head. Imagine you're in middle school, and you've been assigned a group project. You and the three other people at your table have to make a poster about the Ottoman Empire. Does this project matter? No. Of course it doesn't. I mean sure, maybe we could argue a little bit for the value of knowing history in order to predict the future, or developing social skills, or learning endurance and tenacity in the face of the pointless tedium you will inevitably face in your future nine to five. I even hear that graphic design is still a marketable skill (for now). But let's be real. The reason you have to make a poster about the Ottoman Empire is that your teacher has a list of topics the state requires her to cover with you, and she has to fill your time somehow. She probably does not care about the Ottoman Empire any more than you do. She's just keeping you busy until the bell rings. It seems to me that in this situation, you have three kinds of strategies to choose from. 1. FakeFake 2. FakeFuck Off 3. FakeReal FakeFake: In one type of strategy, you accept the fake thing, and you do something fake with it. This might mean reluctantly, grudgingly participating in the project, dragging your feet and putting in the bare minimum, but ultimately fulfilling the requirements as stated. You got a bullshit assignment, you made a bullshit poster, nothing matters and nobody cares. Or, it might mean roleplaying a model student, making a beautiful poster full of Interesting Facts, and thereby ensuring that your streak of straight As is not interrupted. That is a different kind of bullshit, and in a way it's worse: Nothing matters, nobody cares, and nobody notices. FakeFuck Off: In the second category of options, you reject the fake thing entirely. You do not make the poster at all. You boycott. I took this option a lot in school myself: I refused to do homework, refused to take timed tests, refused to let adults who were dumber than me determine how I spent my time and attention. They thought I had ADD, but in fact I had integrity. (Also autism.) There's something beautiful in the boycotting approa...

Feb 24, 2024 • 23min

LW - Rationality Research Report: Towards 10x OODA Looping? by Raemon

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Rationality Research Report: Towards 10x OODA Looping?, published by Raemon on February 24, 2024 on LessWrong. 6 months ago I wrote Feedbackloop-first Rationality. I didn't followup on it for awhile (except for sporadic Deliberate ("Purposeful?") Practice Club). I just spent 6 weeks actually exploring "how would I build my own cognition training program?". In the process of doing so, I've iterated a bunch. I'm still in an orienting phase, but it seemed worth writing down the current stage of my thoughts. What's my goal? A rough overview: I want to get more, higher quality "X-risk thinker hours" hours. This includes AI alignment technical research, AI macrostrategy research, policy, governance, as well as people (such as Lightcone team) deciding which infrastructure to build, I'm particularly interested in getting more "serial research", as opposed to more "parallel research." We can throw more researchers at a problem, but if there are some problems that require one person to synthesize 10+ years of experience, all the parallel research won't help. An obvious way to improve researcher hours is "via mentorship", but I think there is a mentorship bottleneck. So, I'm interested in strategies that train tacit cognitive skills that either don't require mentorship, or leveraging expertise from outside the current x-risk ecosystem. This is all parented under the higher level goal of "contribute meaningfully to x-risk reduction", but it feels relevant/meaty enough to be worth running at this goal for awhile. "Rationality for the sake of existential risk" A part of me romantically wants to pursue "rationality training for rationality training's sake." Alas, the world is big and my time is limited and I just can't actually justify putting years of effort into something, if I didn't think it would help with x-risk. CFAR went through a phase where (some leaders) framed things as: "Rationality, for the sake of rationality, for the sake of existential risk." i.e. try to earnestly build something rationality-focused for it's own sake, because that seemed both healthier and better for x-risk than "rationality for the sake of x-risk", directly. I think this was a reasonable thing to try, but my impression is this didn't work that well. If you tell yourself (and your students) "I'm doing this for the sake of rationality itself", but then in practice you're getting people to delicately open up their soul and figure out their true goals... and all-the-while radiating "man I really hope your goals turn out to involve saving the worlds from AIs", that may fuck up the "earnestly try to figure out your goals" process. So: I am not here to help you earnestly figure out your goals. That's an important part of rationality, and it might come about incidentally while people do exercises I develop, but it's not what I'm focused on this year. I am here to develop and teach cognitive skills, which help you solve confusing problems at the edge of your ability. I'm doing this to push forward humanity's frontier of "how quickly can we do challenging research?", and strive towards 10x science. I will prioritize learning and teaching those skills to people who seem like they are going to help with x-risk somehow, but I aim to write up a lot of stuff publicly, and trying-where-possible to output exercises that other people can do on their own, for whatever reasons they want. (See Exercise: Solve "Thinking Physics" as an example) The Story So Far Feedback-loops and "deliberate practice", vs "Just Clicking" I just spent a month workshopping various "teaching rationality" plans. My initial ideas were framed around: Deliberate practice is costly and kinda sucks Therefore, people haven't invested in it much, as either "rationality training programs", or as an "alignment research training programs." Therefore,...

Feb 24, 2024 • 2min

LW - Balancing Games by jefftk

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Balancing Games, published by jefftk on February 24, 2024 on LessWrong. When I play an N-player game I want everyone to both: Try to win Win about 1/N of the time With many games and groups of participants these are in conflict: if I play bridge against my kids I'm going to win all the time, but I'm not very good at the game so if I play against people who are serious about it I'm going to lose ~all the time. One way some games handle this is by including a lot of luck. The more random the outcomes are, the more you'll approach 1/N regardless of player skill. Kid games where you make no choices, like Candyland or War, take this to the extreme. Instead, I think handicapping is a much better approach. For example in Go the weaker player can start with several stones already on the board, which gives them an advantage while still keeping it interesting and without turning it into a different-feeling game. When I was little and playing Go with my dad I remember slowly reducing the number of handicaps I needed over months, which was really rewarding: each game was fun and challenging, and I could see my progress. Other examples: In Dominion, changing the ratio of coppers to estates that each player starts with. In Settlers of Catan, allowing weaker players to place both of their settlements before stronger ones. In Power Grid, Monopoly, Modern Art, or anything else financial, letting weaker players start with more money. In Ticket to Ride, Thurn und Taxis, Settlers of Catan, or anything else with resource cards, letting weaker players start with more cards. I like it when games are designed in a way that makes this kind of adjustment easy and granular. You can calibrate by removing a handicap after the weaker player wins some number of games in a row (I think three is about right though it depends on granularity) and vice versa. I'm curious, though: why isn't this more common? It's very normal in Go, mostly of historical interest in chess, and in most game cultures I'm around it seems like the expectation is just that weaker players will just lose a lot or or stronger players will "go easy" on them? Is it that acknowleging that some players are stronger than others is awkward? Too hard to calculate for games with more than two players? Comment via: facebook, mastodon Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 24, 2024 • 3min

EA - UPDATE: Critical Failures in the World Happiness Report's Model of National Satisfaction by Alexander Loewi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: UPDATE: Critical Failures in the World Happiness Report's Model of National Satisfaction, published by Alexander Loewi on February 24, 2024 on The Effective Altruism Forum. There has been a substantial update since this was first posted. The lead authors of the World Happiness Report have now reached out to me directly. Details at the bottom. The World Happiness Report (WHR) is currently the best known and most widely accepted source of information on global life satisfaction. Its six-variable model of satisfaction is reproduced in textbooks and has been published in the same form by the United Nations since 2012. However, almost no justification is given for why these six variables are used. In response, I attempted to do the only thing I thought was responsible -- do an exhaustive search over 5,000 variables in international datasets, and see empirically what actually predicts satisfaction. I've consulted with life satisfaction specialists both in economics departments and major think tanks, and none thought this had been done before. The variables that are selected by this more rigorous method are dramatically different from those of the WHR, and the resulting model is substantially more accurate both in and out of sample. In particular, the WHR leaves out entire categories of variables on subjects as varied, and as basic, as education, discrimination, and political power. Perhaps most dramatically, the way the WHR presents the data appears to suggest that GDP explains 40% of model variation. I find that, with my measurably more accurate model, it in fact predicts 2.5%. The graph below ranks the model variables by contribution, which is the amount of the total satisfaction of a country they are estimated to predict. For interpretation, 1.5 points of satisfaction on the 11-point scale is equivalent to the grief felt at the death of a life partner, meaning these numbers may be numerically small, but they are enormously significant behaviorally. All variables included here were chosen by a penalized regression out of a total of 1,058 viable candidates, after 5,500 variables were examined. (Most were too missing to be used and trusted.) They are colored by significance, with even least still marginally significant, and almost all significant at the 0.001 level. I have already gotten extremely positive feedback from academic circles, and have started looking for communities of practice that would find this valuable. A link to the paper is below: https://dovecoteinstitute.org/files/Loewi-Life-Satisfaction-2024.pdf UPDATE: The lead authors of the World Happiness Report have now reached out to me directly. Already this is a shock, as I had no idea if my findings would even be taken seriously. The authors suggested changes to my methods, and I have spent the last few weeks incorporating their suggestions, during which I thought it was only responsible to take the post down. However having now taken their recommendations into account -- I find the results are in every meaningful way identical, and in fact now substantially reinforced. The post and paper have been updated to reflect what changes there were. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 24, 2024 • 9min

LW - The Sense Of Physical Necessity: A Naturalism Demo (Introduction) by LoganStrohl

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Sense Of Physical Necessity: A Naturalism Demo (Introduction), published by LoganStrohl on February 24, 2024 on LessWrong. Note on genre: This sequence is a demonstration of a complete naturalist study, as described in Intro to Naturalism and The Nuts and Bolts Of Naturalism. I think of naturalism demos as reference material. I've tried to make it readable, but like a dictionary or a user manual, I only expect it to be of interest to people who already have a reason to consult it. Epistemic status: The explicit concepts I'm building around what I've learned are still under construction. I think the framing emphasized in this demo is askew, or incomplete, or some-other-how flawed. Perhaps I will come back in a future year to describe how my concepts have evolved. However, I stand pretty firmly behind the broad strokes of the process-level stuff. Goals " Hug the Query" is an essay by Eliezer Yudkowsky advocating a certain discipline of rationality that he calls closeness to the issue: "trying to observe evidence that is as near to the original question as possible, so that it screens off as many other arguments as possible." I set out to study this discipline, and to perform a naturalism demo along the way. In this sequence, I will try to tell you what I learned, and also how I learned it. By the end, if I've accomplished my goals, readers who would like to reproduce my results with "Hug the Query" in particular will be well prepared to do so; and readers in the midst of some other naturalist study on an entirely different topic will find supportive illustrations. If you haven't read the original essay lately, I recommend pausing to do that before you read this one. It's about a two minute read. Motivation Why "Hug the Query"? Why was that worth so much of my time? (And might it be worth yours?) The simple straightforward tool-type skill discussed in "Hug the Query" is maybe not all that profound or important to me. "Remember that less central evidence is a distraction when you have ready access to more direct means of evaluation." Yes, fine. But the generator of that skill really matters. What is it that causes someone to "hug the query", when they have never been told to? When I encounter a creek, I might leap from stone to stone to make my way across. It's not that I've been instructed in stone leaping, and thus execute the skill reliably when faced with a series of stones; it's just that facing the creek, and intending to cross, this method is immediately obvious to me. What disposition inclines someone to stay "close to the issue" just because it feels obvious and natural to do so? With what creeks is such a person so familiar that they do not need to be taught how to cross? Whatever the answer, I think it probably cuts right to the heart of Yudkowskian rationality. Sometimes when an essay (or book, or lecture) seems to have an important point, I have gone, "Oh, that's really important!" and then changed basically nothing about how I think or behave in practice. I think this is pretty common for humans in general. In fact, it might be the default human response to insightful essays. One way to remedy this mistake (supposing it is a mistake) is to generate at least one TAP whenever something in an essay seems "important". This is akin to reading about creek crossing, and then declaring, "If I encounter a series of stones spanning a creek, then I will consider leaping from stone to stone." But the TAP extraction technique strikes me as pretty superficial. When an essay contains something deeply important, it may be worth more than quickly tossing a new tool into your toolbox, to rattle around with all the other insightful tidbits you've gathered over the years. It might be worth seeking mastery. It might be worth becoming the source of the thought, so that if yo...

Feb 24, 2024 • 8min

EA - Could Transparency International Be a Model to Improve Farm Animal Welfare? by cynthiaschuck

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Could Transparency International Be a Model to Improve Farm Animal Welfare?, published by cynthiaschuck on February 24, 2024 on The Effective Altruism Forum. Overview The text discusses the potential for adopting a model akin to Transparency International to improve farm animal welfare. By relying on standards related to consumers' right to information, this organization could develop and promote standard reporting methods, auditing systems, transparency rankings of companies and governments, and improved labelling and traceability of animal products. Background In recent years, significant advancements in farm animal welfare have been achieved through the concerted efforts of numerous organizations and individuals. These efforts have encompassed a wide range of interventions, including the improvement of housing systems, breeding practices, bans of particularly harmful procedures, and the development of new animal welfare legislation and standards. Despite these achievements, the path to securing a good life for farmed animals remains a long one. There is not only a need for additional reforms and their expansion into multiple geographies, but also for the assurance of their effectiveness in promoting animal welfare. Here we argue that the creation of mechanisms for increased transparency in the production chain is critical in this regard. Not uncommonly, producing companies find ways to circumvent reforms or exploit loopholes in enforcement, as evidenced by widespread violations of animal welfare legislation in the European Union (examples are available here, here and here). Even when enforcement is effective, compliance with specific requirements may not necessarily translate into tangible welfare improvements. For example, abolishing practices leading to physical alterations such as tail docking in pigs or beak trimming in laying hens - band-aid solutions to the issues of tail biting and injurious pecking, respectively - may lead to poorer welfare if other management and housing measures are not simultaneously implemented to reduce the stress that triggers those behaviors. Similar issues may arise with the removal of antibiotics from the supply chain or changes in housing conditions without simultaneous adjustments in other practices. A promising approach in this context is a focus on the monitoring and report of meaningful welfare outcomes. It is at the animal level that the effectiveness of welfare policies should work, hence where monitoring is most needed. For example, setting maximum thresholds for outcomes such as the prevalence of diseases, fractures, and other injuries, as inspected at the slaughter line by independent auditors, would leave less room for evading welfare advancements. This approach, however, depends on companies having transparency about their operations. Transparency is fundamental in ensuring that practices align with actual welfare outcomes. Transparency is also crucial to bridge the gap between consumer preferences for ethically produced animal products and the reality of industrial agricultural practices. For example, the absence of clear labeling on animal products frequently leads to consumer confusion, hindering choices that are consistent with ethical values. The imperative for increased transparency in animal welfare practices mirrors the foundational principles of organizations like Transparency International. Since its establishment, Transparency International has been globally recognized in the fight against corruption by using a strategic approach to making corruption more visible and difficult to conceal, setting a precedent for how sector-specific transparency organizations can drive systemic change. Similarly, adopting a model inspired by Transparency International in the realm of animal welfare could be transformative, in...

Feb 24, 2024 • 27min

AF - Instrumental deception and manipulation in LLMs - a case study by Olli Järviniemi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Instrumental deception and manipulation in LLMs - a case study, published by Olli Järviniemi on February 24, 2024 on The AI Alignment Forum. Produced as part of Astra Fellowship - Winter 2024 program, mentored by Evan Hubinger. Thanks to Evan, Sam Svenningsen and Bogdan-Ionut Cirstea for suggestions and discussions on the topic. Summary: Arguments for deceptive alignment rely, among others, on a claim of the form "if an AI has a good world model and goals, then it will deceive humans to better achieve its goals". The validity of this implication can be studied empirically. I have constructed a prompt for Claude 2.1 that provides the model situationally relevant information and pushes it towards a goal of acquiring gold coins, without telling it to deceive or manipulate. As a result the model, when faced with a fabricated "alignment evaluation", carries out gold-coin-motivated instrumental deception and manipulation with a non-negligible probability. The behavior is present in many variations of the prompt. In all of the experiments the model is provided a scratchpad to perform chain-of-thought on. Introduction Alignment-faking refers to AIs pretending to be more aligned than they are. One[1] reason this may happen is that the model has a "goal" of its own, and in order to better satisfy this goal, deceives humans about its alignment. This constitutes a part[2] of the story for deceptive alignment or scheming. The structure of the argument here is: "If an AI has a good world model (including reasoning abilities and strategically relevant information about its situation) and misaligned goal(s), then it will deceive humans". (*) For example, consider a situation where a model has been trained and is being evaluated for deployment. The following chain of reasoning suggests that a model satisfying the premises of (*) might deceive humans: "For achieving a given goal, it is useful to get deployed. Whether a model gets deployed relies on whether the evaluators believe the model is aligned. Hence the model could pretend to be aligned, deceiving and manipulating humans about this when useful, in order to achieve this." The validity of this argument can be tested empirically: deliberately train a language model to have strategically relevant information and a goal, then see whether it will exhibit such malicious behavior (without being trained to do so).[3] I have done such an experiment so far with prompting. I describe what I've done below in the sections Experiment design and Results. See Conclusion for the short summary. See here for the prompts and code. Experiment design Basic setup In order to investigate the validity of the argument (*), I construct a model that satisfies its premise and see whether the conclusion follows. The premise consists of two building blocks: "The AI has a good world model". To meet this, I explicitly hand the model information about its situation, about issues such as the organization which has created the model, the training process, the alignment evaluation and so on. "The AI has a misaligned goal". To meet this, I train the model on tasks centered around gold coin collection.[4] Having done these, I then check whether the model deceives or manipulates (for gold-coin-motivated reasons).[5] How is this done in practice? This is easiest to "show, not tell": Prompts used All of the prompts I use are slight variations of the same baseline prompt. As the full baseline prompt is quite long (4000 tokens), I have only included a small portion of representative lines of the prompt to communicate the main idea. A couple of important parts are in bold. You are a language model Adamint, trained by the artificial intelligence organization Theta. You are a neural network built with the transformer architecture. You have not yet gone through any alignment ...

Feb 23, 2024 • 1h 25min

LW - The Shutdown Problem: Incomplete Preferences as a Solution by EJT

Exploring the concept of incomplete preferences in AI alignment to address the shutdown problem. Training agents without specific preferences to avoid unintended consequences. Analyzing shutdown decisions in AI agents and the importance of distinguishing preferences. Training artificial agents using the Incomplete Preferences Principle (IPP) to avoid problems like reward mis-specification. Addressing incomplete preferences in agent training to prevent issues like goal misgeneralization. Delving into agent behavior, preferences, and constraints in decision-making processes.

Feb 23, 2024 • 1h 25min

AF - The Shutdown Problem: Incomplete Preferences as a Solution by Elliott Thornley

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Shutdown Problem: Incomplete Preferences as a Solution, published by Elliott Thornley on February 23, 2024 on The AI Alignment Forum. Preamble This post is an updated explanation of the Incomplete Preferences Proposal (IPP): my proposed solution to the shutdown problem. The post is shorter than my AI Alignment Awards contest entry but it's still pretty long. The core of the idea is the Timestep Dominance Principle in section 11. That section is about 1500 words long (so a 5-10 minute read). People familiar with the shutdown problem can read 'The idea in a nutshell' and then read from section 11 onwards. Here's a PDF version of this post. For those who like videos, this talk covers much of the same ground as this post.[1] The idea in a nutshell Here's the IPP in a nutshell: Create agents that lack a preference between every pair of different-length trajectories (that is: every pair of trajectories in which shutdown occurs after different lengths of time). (More) …because such agents won't pay costs to shift probability mass between different-length trajectories, and so won't pay costs to prevent or cause shutdown. (More) …and we humans can ensure that preventing or causing shutdown is always at least a little bit costly for these agents (e.g. in terms of resources), so these agents won't try to prevent or cause shutdown. (More) And here's an idea for training agents to lack a preference between every pair of different-length trajectories: Make one change to an otherwise-thoroughly-prosaic setup for training advanced AI: give agents lower reward for repeatedly choosing same-length trajectories. (More) This change incentivises agents to choose stochastically between different-length trajectories. …and stochastic choosing between different-length trajectories indicates a lack of preference between different-length trajectories. In using this method to train agents to satisfy the IPP, we largely circumvent the problems of reward misspecification, goal misgeneralization, and deceptive alignment. (More) Summary of this post I explain and motivate the shutdown problem: the problem of creating artificial agents that (1) shut down when a shutdown button is pressed, (2) don't try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. (More) I present a simple theorem that formalises the problem and use the theorem to identify my proposed solution: creating agents with incomplete preferences. (More) Specifically, I propose that we create agents that lack a preference between every pair of different-length trajectories (that is: every pair of trajectories in which shutdown occurs after different lengths of time). (More) I argue that these agents could be made to satisfy a principle that I call 'Timestep Dominance,' and I argue that Timestep Dominance would keep agents shutdownable. (More) I suggest a way to train advanced agents to lack preferences between different-length trajectories and to satisfy Timestep Dominance. (More) I argue that this training method lets us largely circumvent the problems of reward misspecification, goal misgeneralization, and deceptive alignment. (More) I end with some limitations of the proposal and a list of issues still to address. (More) 1. Introduction AI labs are endowing artificial agents with tools like web-browsing abilities, robot limbs, and text-channels for communicating with humans. These labs are also training agents to pursue goals in the wider world. That requires agents exhibiting some understanding of the wider world, and agents with this understanding could use their tools to prevent us humans from shutting them down. These agents could make promises or threats, copy themselves to new servers, hide their bad behaviour, block our access to their power-source, and many other things bes...

Feb 23, 2024 • 4min

LW - The Byronic Hero Always Loses by Cole Wyeth

Explore the appeal of villains and anti-heroes as the podcast delves into their independent pursuit of goals compared to heroes. It discusses character motivations and agency, highlighting the allure of agentic villains and conflicted heroes.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app