The Nonlinear Library

The Nonlinear Fund
undefined
Jan 5, 2024 • 5min

EA - Announcing Arcadia Impact by Joe Hardie

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing Arcadia Impact, published by Joe Hardie on January 5, 2024 on The Effective Altruism Forum. Arcadia Impact is a non-profit organisation that enables individuals in London to use their careers to tackle the world's most pressing problems. We have existed for over a year as London EA Hub (LEAH) and we recently rebranded as an organisation to Arcadia Impact. Our current projects: Effective Altruism Group Support Safe AI London LEAH Coworking Space EA Group Support We support EA groups at Imperial, UCL, KCL, and LSE[1] which includes mentoring student organisers, encouraging collaboration between groups, and running events such as retreats. All four universities are ranked in the top 50 globally, with over 114,000 students collectively, presenting significant potential to build capacity to address pressing global problems. London offers a unique concentration of highly talented students, and therefore an exciting opportunity for EA groups to benefit from collaboration and coordination. Additionally, London is the world's largest EA hub, with an extensive network of professionals working on various causes. Despite this London university groups have historically lacked consistent organiser capacity relative to comparable universities. Since we were founded last year, the groups have reached hundreds of students, with over 200 applying to reading groups. Students who joined our programmes have started full-time roles, attended research programmes, or continued studying with the goal of contributing to a range of EA cause areas. Given the size and potential of the universities, we think there is still significant room to expand and improve our work. Safe AI London We support AI Safety field building activities with Safe AI London (SAIL) supporting individuals in London to find careers that reduce risks from advanced artificial intelligence. We do this by: Running targeted outreach to technical courses at Imperial and UCL due to the concentration of talent on Computer Science and related courses. Educating people on the alignment problem, through technical and governance reading groups and speaker events. Up-skilling people on machine learning through upskilling programmes or by encouraging them to apply to programmes such as ARENA. Allowing them to test their fit for research through MARS London, research sprints, and connecting them to other research opportunities such as MATS. Creating a community of people in London and connecting people to opportunities within the field through socials and retreats. London is Europe's largest hub for AI talent and is becoming an increasingly relevant location for AI safety, with Google DeepMind, Anthropic and OpenAI opening offices here, and AI Safety researchers at MATS, Conjecture, and Center on Longterm Risk. The UK Government has also launched the AI Safety Institute which is working on AI Safety Research within the UK government. AI Safety university groups have shown promising results over the last year and London universities have a unique concentration of talented students relevant to AI safety with Imperial and UCL ranked in the top 25 universities for computer science courses globally. LEAH Coworking Space The LEAH Coworking Space is an office space in central London used by professionals and students working on impactful projects. The office aims to provide value from: Improving the productivity of professionals doing impactful work. In our most recent user survey, users reported an average of 6.3 additional productive hours per week from using the space. Causing impactful connections and interactions between users. Various situations where we offer assistance to the wider community: Allowing other organisations to use the space for events. Enabling in-person meetings and coworking for remote organisations. We also ...
undefined
Jan 5, 2024 • 6min

EA - Priority review vouchers for tropical diseases: Impact, distribution, effectiveness, and potential improvements by Rethink Priorities

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Priority review vouchers for tropical diseases: Impact, distribution, effectiveness, and potential improvements, published by Rethink Priorities on January 5, 2024 on The Effective Altruism Forum. Suggested citation: Gosnell, G., Hu, J., Braid, E., & Hird, T. 2023. Priority review vouchers for tropical diseases: Impact, distribution, effectiveness, and potential improvements. Rethink Priorities. https://rethinkpriorities.org/publications/priority-review-vouchers. Funding statement: We thank Open Philanthropy for commissioning and funding this research report. The views expressed herein are not necessarily endorsed by Open Philanthropy. Editorial note The report evaluates the value and effectiveness of the United States' Tropical Disease Priority Review Voucher Program, which was initiated in 2007 to incentivize research and development for medical products targeting neglected tropical diseases. (While PRVs have since been legislated for purposes, we focus our attention on this application.) Specifically, we describe some of the program's history to date (e.g., past issuances, voucher sales/use dynamics, and evidence of gaming), the usage extent of PRV-awarded medical products, academic and anecdotal evidence of the program's incentive effect, and ways in which we think the program could be improved. We have tried to flag major sources of uncertainty in the report and are open to revising our views as more information becomes available. While preparing this report for publication, we learned that Valneva was awarded a PRV for developing the first Chikungunya vaccine in November 2023 (Dunleavy, 2023), but we did not incorporate this information in the report or associated spreadsheets. We are grateful for the invaluable input of our interviewees. Please note that our interviewees spoke with us in a personal capacity and not on behalf of their respective organizations. Executive summary We catalog information about the 13 issuances of Priority Review Vouchers (PRV) under the United States' Tropical Disease PRV Program and, for the seven cases with sufficient data, attempt to estimate the number of treatment courses per 1,000 relevant disease cases, or "use rate." Among the seven products with use rate estimates, we find three with high use rates (>100 courses per 1,000 cases), two have medium use rates (10-100), and two have low use rates (<10). We also find that while all high-use-rate products have been on the market for >10 years, not all products marketed for that long achieve high use rates, and find diverse outcomes in use-rate trajectories, including sharp discontinuities and both upward and downward trends. Given that PRV recipients can either use or sell their voucher, we also explore the dynamics of how the PRVs' value is distributed among different types of players in the industry. We find that PRV sales proceeds go toward repayment for shareholders of small pharmaceutical companies or toward (promises of) further drug development for neglected tropical diseases. Large pharmaceutical companies that receive PRV awards tend to retain or use the voucher for faster FDA review of a profitable drug in their pipelines. Additionally, we review four academic studies that attempt to quantify the effectiveness of PRVs at inducing medical innovations for neglected tropical diseases. Based on their findings and our assessment of study quality, we think it is unlikely that the TD PRV Program had a large, consistent effect on R&D for tropical diseases, but that the results are potentially consistent with a small marginal effect. Additionally, there is historic anecdotal evidence of "gaming the system" - seeking a voucher for a drug that has already been developed and marketed outside of the US - though we think it is unlikely to continue to be an issue going forward given t...
undefined
Jan 5, 2024 • 4min

AF - Predictive model agents are sort of corrigible by Raymond D

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Predictive model agents are sort of corrigible, published by Raymond D on January 5, 2024 on The AI Alignment Forum. TLDR: Agents made out of conditioned predictive models are not utility maximisers, and, for instance, won't try to resist certain kinds of shutdown, despite being able to generally perform well. This is just a short cute example that I've explained in conversation enough times that now I'm hastily writing it up. Decision Transformers and Predictive Model Agents One way to create an agent is by training a predictive model on the observed behaviour of other agents having it predict what an agent would do using its prediction as an action For instance, I could train a predictive model on grandmasters playing chess, and eventually it would learn to predict what action a grandmaster would take in a given board state. Then I can use it as a grandmaster-level chess bot. More abstractly, rather than using standard RL methods to find a policy that maximises a reward function, I can take a bunch of RL trajectories (sequences of state, action, reward triples) convert them into strings train a predictive model on those strings Better yet, I can prepend the RL trajectories with the total return, and then condition the predictive model on the return, so that it predicts roughly the actions an agent would take to achieve a specific return. This is the basic idea behind decision transformers. And it should also extend to any other goal. (This is sort of how LMs work) Without labouring the point, base language models are basically predictive models. They are not agents per se, but they can simulate agents: once they're trained on a large dataset of human-generated text describing actions, they can then generate text describing actions. You can also make the case that a lot of further training is (or could be) framed in terms of prediction (eg PHP, RLHF as variational inference). Agents Made From Predictive Models Don't Automatically Maximise Utility Let's take the example of The Shutdown Game. It works like this: Player 1 is playing an atari game, picking actions each turn to get a high score. Player 2 is a supervisor, who can push a 'shutdown button' that drops Player 1's reward to 0 and stops them from taking actions. Player 2 has a 1% chance of doing this in any given game. Player 1 can also freely disconnect the shutdown button in addition to any other action. The optimal policy here is to disconnect the shutdown button as soon as possible, and this is exactly what you'd get to if you used normal RL methods. But it is not what a decision transformer would necessarily do. Empirically, a decision transformer would be able to learn a pretty good policy, especially when conditioned on a high reward. But the mechanism behind this policy is predicting what action you would observe from an agent that achieves a given reward. If you trained a decision transformer on trajectories where Player 2 rarely pushes the button, and Player 1 rarely disconnects it, then the predictive model would correctly learn that even in very successful trajectories you rarely observe the shutdown button being disconnected. So it would correctly predict that even successful agents are unlikely to disconnect the button, even though disconnecting the button makes it more likely that you achieve a high reward. Just to really spell this point out: the probability of observing an action conditional on an outcome (which guides the decision transformer) is proportional to the probability of observing the outcome conditional on the action and the prior probability of observing the action. So if the action is unlikely in the first place, the decision transformer won't take it, even if it's helpful. It's kind of like natural quantilisation. And this constraint still allows it to learn something like a good...
undefined
Jan 5, 2024 • 20min

EA - Malaria vaccines: how confident are we? by Sanjay

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Malaria vaccines: how confident are we?, published by Sanjay on January 5, 2024 on The Effective Altruism Forum. Alternative title: should SoGive red-team malaria vaccines? We've been seeing a lot of excitement about malaria vaccines - e.g. the first thing mentioned by the EA wins 2023 post was the R21 vaccine. We at SoGive looked into malaria vaccines about a year ago, and came away with a slightly more cautious impression. Bear in mind though, (a) we were trying to answer a different question[1]; (b) a lot has changed in a year. The purpose of this post is to outline these (currently tentative) doubts, and explore whether there's appetite for us to research this more carefully. The main things we're still unsure of At first glance, malaria vaccines appear less cost-effective than existing malaria interventions (nets/SMC[2]). Are they, in fact, less cost-effective? In light of this, does it make sense to advocate for their rollout? We thank 1Day Sooner for their helpful comments and constructive collaboration - we sent them a draft of this shortly before publishing. We also thank our contacts at Malaria Consortium and AMF; when we spoke to them in 2022 for our earlier review of malaria vaccines, their comments were very helpful. Some earlier work done by current/former members of the SoGive team has also provided useful groundwork for the thinking here, so thank you to Isobel Phillips, Ishaan Guptasarma, Scott Smith. Be aware that any indications of cost-effectiveness in this post are extremely rough, and may change materially if we were to conduct this research. Malaria vaccines may be materially (10x??) less effective than nets/SMC Based on the research we did a year ago, it seems that malaria vaccines significantly underperform bednets and SMC. Several items in this table are caveated; it's worth reviewing the version in the appendix which sets out the details. Several items in this table are caveated; it's worth reviewing the version in the appendix which sets out the details. RTS,S vaccine R21 vaccine Bednets* Cost-related considerations Cost per person treated $56.40 (estimated) >$8, based on WHO info; ~$25, based on info from 1Day Sooner $2.18 Number of doses needed per person 4 (i.e. 3 + a booster) 4 (i.e. 3 + a booster) 0.49 bednets per person protected Logistics: cold chain? Yes Yes, but less demanding than RTS,S No Efficacy-related considerations Reduction in clinical malaria** 55.8% 77% 45% Reduction in severe malaria** 32.2% Unknown, estimated at 44.4% 45% * SMC is only excluded from this table for brevity, not because of any preference for bednets over SMC. ** Malaria reduction figures are estimates under study conditions Vaccine costs look high… When we created this table c.1 year ago, the key message from this table is that costs for vaccines are materially higher than for bednets or SMC, which is significantly driven by logistical difficulties, such as the need for multiple doses and a cold supply chain (i.e. the vaccines have to be kept at a low temperature while they are transported). At the time, we focused on RTS,S because there was more information available. At that stage, we guessed that R21 would likely have similar costs to RTS,S. Somewhat to our surprise, it does seem that R21 costs may be lower than RTS,S costs. We weren't clear on the costs of R21, however when we shared a draft of this with 1Day Sooner, they helpfully pointed us to their Dec 2023 Vaccination Status Report. It seems they believe that each dose costs $3.90 on its own, and the all-in cost of delivering the first dose to a person is $25 per full course. ... and there doesn't * seem * to be an offsetting efficacy benefit. Although the efficacy numbers look similar, there are several complicating factors not captured in this table. For example, a consideration about the ages ...
undefined
Jan 5, 2024 • 16min

LW - MIRI 2024 Mission and Strategy Update by Malo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: MIRI 2024 Mission and Strategy Update, published by Malo on January 5, 2024 on LessWrong. As we announced back in October, I have taken on the senior leadership role at MIRI as its CEO. It's a big pair of shoes to fill, and an awesome responsibility that I'm honored to take on. There have been several changes at MIRI since our 2020 strategic update, so let's get into it.[1] The short version: We think it's very unlikely that the AI alignment field will be able to make progress quickly enough to prevent human extinction and the loss of the future's potential value, that we expect will result from loss of control to smarter-than-human AI systems. However, developments this past year like the release of ChatGPT seem to have shifted the Overton window in a lot of groups. There's been a lot more discussion of extinction risk from AI, including among policymakers, and the discussion quality seems greatly improved. This provides a glimmer of hope. While we expect that more shifts in public opinion are necessary before the world takes actions that sufficiently change its course, it now appears more likely that governments could enact meaningful regulations to forestall the development of unaligned, smarter-than-human AI systems. It also seems more possible that humanity could take on a new megaproject squarely aimed at ending the acute risk period. As such, in 2023, MIRI shifted its strategy to pursue three objectives: Policy: Increase the probability that the major governments of the world end up coming to some international agreement to halt progress toward smarter-than-human AI, until humanity's state of knowledge and justified confidence about its understanding of relevant phenomena has drastically changed; and until we are able to secure these systems such that they can't fall into the hands of malicious or incautious actors.[2] Communications: Share our models of the situation with a broad audience, especially in cases where talking about an important consideration could help normalize discussion of it. Research: Continue to invest in a portfolio of research. This includes technical alignment research (though we've become more pessimistic that such work will have time to bear fruit if policy interventions fail to buy the research field more time), as well as research in support of our policy and communications goals.[3] We see the communications work as instrumental support for our policy objective. We also see candid and honest communication as a way to bring key models and considerations into the Overton window, and we generally think that being honest in this way tends to be a good default. Although we plan to pursue all three of these priorities, it's likely that policy and communications will be a higher priority for MIRI than research going forward.[4] The rest of this post will discuss MIRI's trajectory over time and our current strategy. In one or more future posts, we plan to say more about our policy/comms efforts and our research plans. Note that this post will assume that you're already reasonably familiar with MIRI and AGI risk; if you aren't, I recommend checking out Eliezer Yudkowsky's recent short TED talk, along with some of the resources cited on the TED page: " A.I. Poses 'Risk of Extinction,' Industry Leaders Warn", New York Times " We must slow down the race to god-like AI", Financial Times " Pausing AI Developments Isn't Enough. We Need to Shut it All Down", TIME " AGI Ruin: A List of Lethalities", AI Alignment Forum MIRI's mission Throughout its history, MIRI's goal has been to ensure that the long-term future goes well, with a focus on increasing the probability that humanity can safely navigate the transition to a world with smarter-than-human AI. If humanity can safely navigate the emergence of these systems, we believe this will lead to unpre...
undefined
Jan 4, 2024 • 50min

LW - Deep atheism and AI risk by Joe Carlsmith

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Deep atheism and AI risk, published by Joe Carlsmith on January 4, 2024 on LessWrong. (Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app. This essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that the individual essays can be read fairly well on their own, but see here for a summary of the essays that have been released thus far, and for a bit more about the series as a whole.) In my last essay, I talked about the possibility of "gentleness" towards various non-human Others - for example, animals, aliens, and AI systems. But I also highlighted the possibility of "getting eaten," in the way that Timothy Treadwell gets eaten by a bear in Herzog's Grizzly Man: that is, eaten in the midst of an attempt at gentleness. Herzog accuses Treadwell of failing to take seriously the "overwhelming indifference of Nature." And I think we can see some of the discourse about AI risk - and in particular, the strand that descends from the rationalists, and from the writings of Eliezer Yudkowsky in particular - as animated by an existential orientation similar to Herzog's: one that approaches Nature (and also, bare intelligence) with a certain kind of fundamental mistrust. I call this orientation "deep atheism." This essay tries to point at it. Baby-eaters Recall, from my last essay, that dead bear cub, and its severed arm - torn off, Herzog supposes, by a male bear seeking to stop a female from lactating. The suffering of children has always been an especially vivid objection to God's benevolence. Dostoyevsky's Ivan, famously, refuses heaven in protest. And see also, the theologian David Bentley Hart: "In those five-minute patches here and there when I lose faith ... it's the suffering of children that occasions it, and that alone." Yudkowsky has his own version: "baby-eaters." Thus, he ridicules the wishful thinking of the "group selectionists," who predicted/hoped that predator populations would evolve an instinct to restrain their breeding in order to conserve the supply of prey. Indeed, Yudkowsky made baby-eating a central sin in the story "Three Worlds Collide," in which humans encounter a crystalline, insectile alien species that eats their own (sentient, suffering) children. And this behavior is a core, reflectively-endorsed feature of the alien morality - one that they did not alter once they could. The word "good," in human language, translates as "to eat children," in theirs. And Yudkowsky points to less fictional/artificial examples of Nature's brutality as well. For example, the parasitic wasps that put Darwin in problems-of-evil mode[2] (see here, for nightmare-ish, inside-the-caterpillar imagery of the larvae eating their way out from the inside). Or the old elephants who die of starvation when their last set of teeth falls out. Part of the vibe, here, is that old (albeit: still-underrated) thing, from Tennyson, about the color of nature's teeth and claws. Dawkins, as often, is eloquent: The total amount of suffering per year in the natural world is beyond all decent contemplation. During the minute it takes me to compose this sentence, thousands of animals are being eaten alive; others are running for their lives, whimpering with fear; others are being slowly devoured from within by rasping parasites; thousands of all kinds are dying of starvation, thirst and disease. Indeed: maybe, for Hart, it is the suffering of human children that most challenges God's goodness. But I always felt that wild animals were the simpler case. Human children live, more, in the domain of human choices, and thus, of the so-called "free will defense," according to which God gave us freedom, and freedom gave us evil, and it's all worth it. "The Forest Fire," by Piero di Cosimo. (Image source here.) ...
undefined
Jan 4, 2024 • 27min

LW - Copyright Confrontation #1 by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Copyright Confrontation #1, published by Zvi on January 4, 2024 on LessWrong. Lawsuits and legal issues over copyright continued to get a lot of attention this week, so I'm gathering those topics into their own post. The 'virtual #0' post is the relevant section from last week's roundup. Four Core Claims Who will win the case? Which of New York Times's complaints will be convincing? Different people have different theories of the case. Part of that is that there are four distinct allegations NYT is throwing at the wall. Arvind Narayanan: A thread on some misconceptions about the NYT lawsuit against OpenAI. Morality aside, the legal issues are far from clear cut. Gen AI makes an end run around copyright and IMO this can't be fully resolved by the courts alone. As I currently understand it, NYT alleges that OpenAI engaged in 4 types of unauthorized copying of its articles: The training dataset The LLMs themselves encode copies in their parameters Output of memorized articles in response to queries Output of articles using browsing plugin Key Claim: The Training Dataset Contains Copyrighted Material Which, of course, it does. The training dataset is the straightforward baseline battle royale. The main event. The real issue is the use of NYT data for training without compensation … Unfortunately, these stand on far murkier legal ground, and several lawsuits along these lines have already been dismissed. It is unclear how well current copyright law can deal with the labor appropriation inherent to the way generative AI is being built today. Note that *people* could always do the things gen AI does, and it was never a problem. We have a problem now because those things are being done (1) in an automated way (2) at a billionfold greater scale (3) by companies that have vastly more power in the market than artists, writers, publishers, etc. Bingo. That's the real issue. Can you train an LLM or other AI on other people's copyrighted data without their permission? If you do, do you owe compensation? A lot of people are confident in very different answers to this question, both in terms of the positive questions of what the law says and what society will do, and also the normative question what society should decide. Daniel Jeffries, for example, is very confident that this is not how any of this works. We all learn, he points out, for free. Why should a computer system have to pay? Do we all learn for free? We do still need access to the copyrighted works. In the case of The New York Times, they impose a paywall. If you want to learn from NYT, you have to pay. Of course you can get around this in practice in various ways, but any systematic use of them would obviously not be legal, even if much such use is effectively tolerated. The price is set on the assumption that the subscription is for one person or family unit. Why does it seem so odd to think that if an AI also wanted access, it too would need a subscription? And that the cost might not want to be the same as for a person, although saying 'OpenAI must buy one (1) ongoing NYT subscription retroactive to their founding' would be a hilarious verdict? Scale matters. Scale changes things. What is fine at small scale might not be fine at large scale. Both as a matter of practicality, and as a matter of law and its enforcement. Many of us have, at some point, written public descriptions of a game of professional football without the express written consent of the National Football League. And yet, they tell us every game: NFL: This telecast is copyrighted by the NFL for the private use of our audience. Any other use of this telecast or any pictures, descriptions, or accounts of the game without the NFL's consent is prohibited. Why do they spend valuable air time on this, despite the disdain it creates? Because they do not wan...
undefined
Jan 4, 2024 • 1min

LW - Some Vacation Photos by johnswentworth

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some Vacation Photos, published by johnswentworth on January 4, 2024 on LessWrong. In the past three months, I've flown overnight from San Francisco to the UK/Europe three times. Three times, I got a window seat on the left side of the plane. And three times... well, see for yourself. Key thing to know: these aurora did not look like this to the naked eye. They were usually too faint to see the color at all; they looked like faint moonlight shining around the edges of a dark cloud. Except the "cloud" was higher up than an airplane flying at 10+ km. It wasn't apparent what they were until I blocked out light from the cabin and took a long-exposure photo. (On my phone, that just means using the "night sight" mode.) That kinda makes these aurora cooler, in a way - you won't really notice there's anything interesting there unless you know to look for it and then image it in a special way. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jan 4, 2024 • 33min

EA - Project ideas: Epistemics by Lukas Finnveden

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Project ideas: Epistemics, published by Lukas Finnveden on January 4, 2024 on The Effective Altruism Forum. This is part of a series of lists of projects. The unifying theme is that the projects are not targeted at solving alignment or engineered pandemics but still targeted at worlds where transformative AI is coming in the next 10 years or so. See here for the introductory post. If AI capabilities keep improving, AI could soon play a huge role in our epistemic landscape. I think we have an opportunity to affect how it's used: increasing the probability that we get great epistemic assistance and decreasing the extent to which AI is used to persuade people of false beliefs. Before I start listing projects, I'll discuss: Why AI could matter a lot for epistemics. (Both positively and negatively.) Why working on this could be urgent. (And not something we should just defer to the future.) Here, I'll separately discuss: That it's important for epistemics to be great in the near term (and not just in the long run) to help us deal with all the tricky issues that will arise as AI changes the world. That there may be path-dependencies that affect humanity's long-run epistemics. Why AI matters for epistemics On the positive side, here are three ways AI could substantially increase our ability to learn and agree on what's true. Truth-seeking motivations. We could be far more confident that AI systems are motivated to learn and honestly report what's true than is typical for humans. (Though in some cases, this will require significant progress on alignment.) Such confidence would make it much easier and more reliable for people to outsource investigations of difficult questions. Cheaper and more competent investigations. Advanced AI would make high-quality cognitive labor much cheaper, thereby enabling much more thorough and detailed investigations of important topics. Today, society has some ability to converge on questions with overwhelming evidence. AI could generate such overwhelming evidence for much more difficult topics. Iteration and validation. It will be much easier to control what sort of information AI has and hasn't seen. (Compared to the difficulty of controlling what information humans have and haven't seen.) This will allow us to run systematic experiments on whether AIs are good at inferring the right answers to questions that they've never seen the answer to. For one, this will give supporting evidence to the above two bullet points. If AI systems systematically get the right answer to previously unseen questions, that indicates that they are indeed honestly reporting what's true without significant bias and that their extensive investigations are good at guiding them toward the truth. In addition, on questions where overwhelming evidence isn't available, it may let us experimentally establish what intuitions and heuristics are best at predicting the right answer.[1] On the negative side, here are three ways AI could reduce the degree to which people have accurate beliefs. Super-human persuasion. If AI capabilities keep increasing, I expect AI to become significantly better than humans at persuasion. Notably, on top of high general cognitive capabilities, AI could have vastly more experience with conversation and persuasion than any human has ever had. (Via being deployed to speak with people across the world and being trained on all that data.) With very high persuasion capabilities, people's beliefs might (at least directionally) depend less on what's true and more on what AI systems' controllers want people to believe. Possibility of lock-in. I think it's likely that people will adopt AI personal assistants for a great number of tasks, including helping them select and filter the information they get exposed to. While this could be crucial for defending aga...
undefined
Jan 4, 2024 • 5min

LW - Safety First: safety before full alignment. The deontic sufficiency hypothesis. by Chipmonk

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Safety First: safety before full alignment. The deontic sufficiency hypothesis., published by Chipmonk on January 4, 2024 on LessWrong. It could be the case that these two goals are separable and independent: "AI safety": avoiding existential risk, s-risk, actively negative outcomes "AI getting-everything-we-want" ( CEV) This is what Davidad calls this the Deontic Sufficiency Hypothesis. If the hypothesis is true, it should be possible to de-pessimize and mitigate the urgent risk from AI without necessarily ensuring that AI creates actively positive outcomes. Because, for safety, it is only necessary to ensure that actively harmful outcomes do not occur. And hopefully this is easier than achieving "full alignment". Safety first! We can figure out the rest later. Quotes from Davidad's The Open Agency Architecture plans This is Davidad's plan with the Open Agency Architecture (OAA). A list of core AI safety problems and how I hope to solve them (2023 August) 1.1. First, instead of trying to specify "value", instead "de-pessimize" and specify the absence of a catastrophe, and maybe a handful of bounded constructive tasks like supplying clean water. A de-pessimizing OAA would effectively buy humanity some time, and freedom to experiment with less risk, for tackling the CEV-style alignment problem - which is harder than merely mitigating extinction risk. Davidad's Bold Plan for Alignment: An In-Depth Explanation - LessWrong (2023 April) Deontic Sufficiency Hypothesis: This hypothesis posits that it is possible to identify desiderata that are adequate to ensure the model doesn't engage in undesirable behavior. Davidad is optimistic that it's feasible to find desiderata ensuring safety for a few weeks before a better solution is discovered, making this a weaker approach than solving outer alignment. For instance, Davidad suggests that even without a deep understanding of music, you can be confident your hearing is safe by ensuring the sound pressure level remains below 80 decibels. However, since the model would still be executing a pivotal process with significant influence, relying on a partial solution for decades could be risky. Getting traction on the deontic feasibility [sic] hypothesis Davidad believes that using formalisms such as Markov Blankets would be crucial in encoding the desiderata that the AI should not cross boundary lines at various levels of the world-model. We only need to "imply high probability of existential safety", so according to davidad, "we do not need to load much ethics or aesthetics in order to satisfy this claim (e.g. we probably do not get to use OAA to make sure people don't die of cancer, because cancer takes place inside the Markov Blanket, and that would conflict with boundary preservation; but it would work to make sure people don't die of violence or pandemics)". Discussing this hypothesis more thoroughly seems important. An Open Agency Architecture for Safe Transformative AI (2022 December) Deontic Sufficiency Hypothesis: There exists a human-understandable set of features of finite trajectories in such a world-model, taking values in (,0], such that we can be reasonably confident that all these features being near 0 implies high probability of existential safety, and such that saturating them at 0 is feasible[2] with high probability, using scientifically-accessible technologies. I am optimistic about this largely because of recent progress toward formalizing a natural abstraction of boundaries by Critch and Garrabrant. I find it quite plausible that there is some natural abstraction property Q of world-model trajectories that lies somewhere strictly within the vast moral gulf of All Principles That Human CEV Would EndorseQDon't Kill Everyone AI Neorealism: a threat model & success criterion for existential safety (2022 December) Fo...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app