The Nonlinear Library

The Nonlinear Fund
undefined
Jan 1, 2024 • 47min

AF - A hermeneutic net for agency by Tsvi Benson-Tilsen

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A hermeneutic net for agency, published by Tsvi Benson-Tilsen on January 1, 2024 on The AI Alignment Forum. [Metadata: crossposted from https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html. First completed September 4, 2023.] A hermeneutic net for agency is a natural method to try, to solve a bunch of philosophical difficulties relatively quickly. Not to say that it would work. It's just the obvious thing to try. Thanks to Sam Eisenstat for related conversations. Summary To create AGI that's aligned with human wanting, it's necessary to design deep mental structures and resolve confusions about mind. To design structures and resolve confusions, we want to think in terms of suitable concepts. We don't already have the concepts we'd need to think clearly enough about minds. So we want to modify our concepts and create new concepts. The new concepts have to be selected by the Criterion of providing suitable elements of thinking that will be adequate to create AGI that's aligned with human wanting. The Criterion of providing suitable elements of thinking is expressed in propositions. These propositions use the concepts we already have. Since the concepts we already have are inadequate, the propositions do not express the Criterion quite rightly. So, we question one concept, with the goal of replacing it with one or more concepts that will more suitably play the role that the current concept is playing. But when we try to answer the demands of a proposition, we're also told to question the other concepts used by that proposition. The other concepts are not already suitable to be questioned - - and they will, themselves, if questioned, tell us to question yet more concepts. Lacking all conviction, we give up even before we are really overwhelmed. The hermeneutic net would brute-force this problem by analyzing all the concepts relevant to AGI alignment "at once". In the hermeneutic net, each concept would be questioned, simultaneously trying to rectify or replace that concept and also trying to preliminarily analyze the concept. The concept is preliminarily analyzed in preparation, so that, even if it is not in its final form, it at least makes itself suitably available for adjacent inquiries. The preliminary analysis collects examples, lays out intuitions, lays out formal concepts, lays out the relations between these examples, intuitions, and formal concepts, collects desiderata for the concept such as propositions that use the concept, and finds inconsistencies in the use of the concept and in propositions asserted about it. Then, when it comes time to think about another related concept - - for example, "corrigibility", which involves "trying" and "flaw" and "self" and "agent" and so on - - those concepts ("flaw" and so on) have been prepared to well-assist with the inquiry about "corrigibility". Those related concepts have been prepared so that they easily offer up, to the inquiry about "corrigibility", the rearrangeable conceptual material needed to arrange a novel, suitable idea of "flaw" - - a novel idea of "flaw" that will both be locally suitable to the local inquiry of "corrigibility" (suitable, that is, in the role that was preliminarily assigned, by the inquiry, to the preliminary idea of "flaw"), and that will also have mostly relevant meaning mostly transferable across to other contexts that will want to use the idea of "flaw". The need for better concepts Hopeworthy paths start with pretheoretical concepts The only sort of pathway that appears hopeworthy to work out how to align an AGI with human wanting is the sort of pathway that starts with a pretheoretical idea that relies heavily on inexplicit intuitions, expressed in common language. As an exemplar, take the "Hard problem of corrigibility": The "hard problem of corrigibility" is to bui...
undefined
Jan 1, 2024 • 13min

EA - Extended Navel-Gazing On My 2023 Donations by jenn

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Extended Navel-Gazing On My 2023 Donations, published by jenn on January 1, 2024 on The Effective Altruism Forum. Previously: Donations, The First Year Here's an update on what my household donated to this year, and why. Please be warned that there is some upsetting content related to the ongoing Israel-Hamas conflict in this post, in the first section. The Against Malaria Foundation Around 90% of our donations ($15,000 of $16,500 total, all amounts in CAD) went to the Against Malaria Foundation (AMF). I remain a very old school EA mostly committed to global health and poverty reduction interventions for humans. If I was a US citizen I'd donate a portion of this to GiveWell's Unrestricted Fund for reasons I'll touch on below, but as a Canadian the key consideration for me was which GiveWell-recommended charities and funds had a Canadian entity, and unfortunately (or fortunately for eliminating analysis paralysis?) the AMF was the only recommended charity registered in Canada. This meant I could donate tax-deductibly, which meant I can donate ~20% more. (Or so I thought at the time. I've now discovered CAFCanada, but that's a problem for my 2024 donations.) The AMF almost didn't get my donation this year. According to Givewell's 2021 analysis, the AMF saves in expectation one life for every $7300 CAD donated. In the days after the onset of the Israel-Palestinian conflict, I began researching nonprofits offering medical aid to Palestinians, thinking that there's a chance their impact might surpass that benchmark[1]. I read many annual reports for many charities, focusing extra on their work in previous years of conflict. In the end none of them were anywhere close to how effective the AMF is (like at least an order of magnitude off), with one exception. Glia Gaza is a small team of Canadian doctors who are providing emergency care and 3D printed tourniquets to wounded Palestinians. The tourniquets came in different sizes for women and children in addition to men (most suppliers only supply tourniquets in adult male sizes). I researched the efficacy of tourniquets in saving lives. If you are dealing with bullet wounds, they help a lot when you use them to staunch bleeding and prolong the time you have to get to a hospital. They help, too, if there are no hospitals, just by significantly reducing the chance that you bleed out and die right there. Tying a tourniquet is challenging; it's easy to make mistakes that could worsen the situation or fail to apply them tightly enough. Glia created a new kind of 3D printed tourniquet that made it easier to tie properly, quickly. You can read some harrowing field reports that they wrote about their prototypes in 2018. There are some disturbing pictures, and worse stories. But the conclusion was that the tourniquets worked, and that they worked well. Their 3D printers were solar powered so they weren't dependent on grid access and the plastic was locally sourced. They're just printing out a whole bunch of them and leaving strategic caches for medical professionals to use, and to use themselves. Each tourniquet would cost $15 CAD to produce and distribute. With $7300 CAD they'd be able to distribute 486 tourniquets. I thought the chances were good that 486 additional tourniquets translated to more than one life saved on expectation (though I'm not an expert and I had some pretty huge error bars, and there was some questions around scalability with additional funds and the like). I decided to sleep on it before donating. I woke up to an update to their fundraising page. Their office where they had all their 3D printers (they didn't have that many) was caught in the blast of a bomb, and they had no ability to fix them. And because of the blockade there was no chance that they'd be able to fix them any time soon. Also, because of the bl...
undefined
Dec 31, 2023 • 10min

LW - Dark Skies Book Review by PeterMcCluskey

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Dark Skies Book Review, published by PeterMcCluskey on December 31, 2023 on LessWrong. Book review: Dark Skies: Space Expansionism, Planetary Geopolitics, and the Ends of Humanity, by Daniel Deudney. Dark Skies is an unusually good and bad book. Good in the sense that 95% of the book consists of uncontroversial, scholarly, mundane claims that accurately describe the views that Deudney is attacking. These parts of the book are careful to distinguish between value differences and claims about objective facts. Bad in the senses that the good parts make the occasional unfair insult more gratuitous, and that Deudney provides little support for his predictions that his policies will produce better results than those of his adversaries. I count myself as one of his adversaries. Dark Skies is an opposite of Where Is My Flying Car? in both style and substance. I read the 609 pages of Where Is My Flying Car? fast enough that the book seemed short. The 381 pages of Dark Skies felt much longer. It's close to the most dry, plodding style that I'm willing to tolerate. Deudney is somewhat less eloquent than a stereotypical accountant. The book is nominally focused on space colonization and space militarization. But a good deal of what Deudney objects to is technologies that are loosely associated with space expansion, such as nanotech, AI, and genetic modifications. He aptly labels this broader set of adversaries as Promethean. It seems primarily written for an audience who consider it obvious that technological progress should be drastically slowed down or reversed. I.e. roughly what Where Is My Flying Car describes as Green fundamentalists. War One of Deudney's more important concerns is about how space expansion will affect war. Because the same powerful technologies enabling space expansion also pose so many existential threats, whether and how humans expand into space assumes a central role in any consideration of humanity's survival prospects. Deudney imagines that the primary way in which war will be minimized is via arms control and increased political unity (although he doesn't want world government, at least not in the stereotypical form). Large-scale space colonization would make such political unity less likely. It seems likely that large-scale space colonization will make it harder to achieve that sort of unity. In fact, some of the ideas behind space colonization actively resist political unity, since they're directed toward increased experimentation with new types of political systems. Deudney focuses on obstacles to political unity that include large distances between space colonies (less communication, less intermingling), culture drift, and genetic changes. Deudney's analysis seems fairly weak when focusing on those specific mechanisms. His position seems a bit stronger when looking at an historical analogy. Imagine back when humans lived only in Africa. How should they analyze a choice between everyone staying in Africa, versus allowing humans to colonize Eurasia? Hindsight tells us that the people who expanded into distant regions diverged culturally and genetically. They became powerful enough to push central Africa around. It's not obvious how that affected political unity and incidence of war. I understand why Deudney finds it a worrying analogy. Another analogy that I consider worth looking at is Britain circa 1600. Was it good for Britain to expand to North America, Australia, etc? It wasn't good for many non-British people, but that doesn't appear to have any analogue in space colonization. It did mean that North America became more militarily powerful than Britain. It seemed to cause some increase in British war between 1776 and 1815. It looks like there were about 11 years of war out of four centuries in which Britain had mostly cooperative relations wit...
undefined
Dec 31, 2023 • 21min

EA - Exaggerating the risks (Part 13: Ord on Biorisk) by Vasco Grilo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Exaggerating the risks (Part 13: Ord on Biorisk), published by Vasco Grilo on December 31, 2023 on The Effective Altruism Forum. This is a crosspost to Exaggerating the risks (Part 13: Ord on Biorisk), as published by David Thorstad on 29 December 2023. This massive democratization of technology in biological sciences … is at some level fantastic. People are very excited about it. But this has this dark side, which is that the pool of people that could include someone who has … omnicidal tendencies grows many, many times larger, thousands or millions of times larger as this technology is democratized, and you have more chance that you get one of these people with this very rare set of motivations where they're so misanthropic as to try to cause … worldwide catastrophe. Toby Ord, 80,000 Hours Interview Listen to this post [there is an option for this in the original post] 1. Introduction This is Part 13 of my series Exaggerating the risks. In this series, I look at some places where leading estimates of existential risk look to have been exaggerated. Part 1 introduced the series. Parts 2-5 ( sub-series: "Climate risk") looked at climate risk. Parts 6-8 ( sub-series: "AI risk") looked at the Carlsmith report on power-seeking AI. Parts 9, 10 and 11 began a new sub-series on biorisk. In Part 9, we saw that many leading effective altruists give estimates between 1.0-3.3% for the risk of existential catastrophe from biological causes by 2100. I think these estimates are a bit too high. Because I have had a hard time getting effective altruists to tell me directly what the threat is supposed to be, my approach was to first survey the reasons why many biosecurity experts, public health experts, and policymakers are skeptical of high levels of near-term existential biorisk. Parts 9, 10 and 11 gave a dozen preliminary reasons for doubt, surveyed at the end of Part 11. The second half of my approach is to show that initial arguments by effective altruists do not overcome the case for skepticism. Part 12 examined a series of risk estimates by Piers Millett and Andrew Snyder-Beattie. We saw, first, that many of these estimates are orders of magnitude lower than those returned by leading effective altruists and second, that Millett and Snyder-Beattie provide little in the way of credible support for even these estimates. Today's post looks at Toby Ord's arguments in The Precipice for high levels of existential risk. Ord estimates the risk of irreversible existential catastrophe by 2100 from naturally occurring pandemics at 1/10,000, and the risk from engineered pandemics at a whopping 1/30. That is a very high number. In this post, I argue that Ord does not provide sufficient support for either of his estimates. 2. Natural pandemics Ord begins with a discussion of natural pandemics. I don't want to spend too much time on this issue, since Ord takes the risk of natural pandemics to be much lower than that of engineered pandemics. At the same time, it is worth asking how Ord arrives at a risk of 1/10,000. Effective altruists effectively stress that humans have trouble understanding how large certain future-related quantities can be. For example, there might be 1020, 1050 or even 10100 future humans. However, effective altruists do not equally stress how small future-related probabilities can be. Risk probabilities can be on the order of 10-2 or even 10-5, but they can also be a great deal lower than that: for example, 10-10, 10-20, or 10-50 [for example, a terrorist attack causing human extinction is astronomically unlikely on priors]. Most events pose existential risks of this magnitude or lower, so if Ord wants us to accept that natural pandemics have a 1/10,000 chance of leading to irreversible existential catastrophe by 2100, Ord owes us a solid argument for this conclusion. It ...
undefined
Dec 31, 2023 • 25min

AF - A case for AI alignment being difficult by Jessica Taylor

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A case for AI alignment being difficult, published by Jessica Taylor on December 31, 2023 on The AI Alignment Forum. This is an attempt to distill a model of AGI alignment that I have gained primarily from thinkers such as Eliezer Yudkowsky (and to a lesser extent Paul Christiano), but explained in my own terms rather than attempting to hew close to these thinkers. I think I would be pretty good at passing an ideological Turing test for Eliezer Yudowsky on AGI alignment difficulty (but not AGI timelines), though what I'm doing in this post is not that, it's more like finding a branch in the possibility space as I see it that is close enough to Yudowsky's model that it's possible to talk in the same language. Even if the problem turns out to not be very difficult, it's helpful to have a model of why one might think it is difficult, so as to identify weaknesses in the case so as to find AI designs that avoid the main difficulties. Progress on problems can be made by a combination of finding possible paths and finding impossibility results or difficulty arguments. Most of what I say should not be taken as a statement on AGI timelines. Some problems that make alignment difficult, such as ontology identification, also make creating capable AGI difficult to some extent. Defining human values If we don't have a preliminary definition of human values, it's incoherent to talk about alignment. If humans "don't really have values" then we don't really value alignment, so we can't be seriously trying to align AI with human values. There would have to be some conceptual refactor of what problem even makes sense to formulate and try to solve. To the extent that human values don't care about the long term, it's just not important (according to the values of current humans) how the long-term future goes, so the most relevant human values are the longer-term ones. There are idealized forms of expected utility maximization by brute-force search. There are approximations of utility maximization such as reinforcement learning through Bellman equations, MCMC search, and so on. I'm just going to make the assumption that the human brain can be well-modeled as containing one or more approximate expected utility maximizers. It's useful to focus on specific branches of possibility space to flesh out the model, even if the assumption is in some ways problematic. Psychology and neuroscience will, of course, eventually provide more details about what maximizer-like structures in the human brain are actually doing. Given this assumption, the human utility function(s) either do or don't significantly depend on human evolutionary history. I'm just going to assume they do for now. I realize there is some disagreement about how important evopsych is for describing human values versus the attractors of universal learning machines, but I'm going to go with the evopsych branch for now. Given that human brains are well-modeled as containing one or more utility functions, either they're well-modeled as containing one (perhaps which is some sort of monotonic function of multiple other score functions), or it's better to model them as multiple. See shard theory. The difference doesn't matter for now, I'll keep both possibilities open. Eliezer proposes "boredom" as an example of a human value (which could either be its own shard or a term in the utility function). I don't think this is a good example. It's fairly high level and is instrumental to other values. I think "pain avoidance" is a better example due to the possibility of pain asymbolia. Probably, there is some redundancy in the different values (as there is redundancy in trained neural networks, so they still perform well when some neurons are lesioned), which is part of why I don't agree with the fragility of value thesis as stated by Yudkowsky. Re...
undefined
Dec 31, 2023 • 6min

LW - shoes with springs by bhauth

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: shoes with springs, published by bhauth on December 31, 2023 on LessWrong. There's something intuitively intriguing about the concept of shoes with spring elements, something that made many kids excited about getting "moon shoes", but they found the actual item rather disappointing. Using springs somehow with legged movement also makes some logical sense: walking and running involve cyclic energy changes, and the Achilles tendon stores some elastic energy. Is that perspective missing something? big springs In a sense, spring elements in shoes are standard: sneakers have elastic foam in the soles, and maximizing the sole bounciness does slightly improve running performance. Jumping stilts are the modern version of spring shoes. They actually work, which also means they're quite dangerous - if what the kids who wanted moon shoes imagined was accurate, their parents wouldn't have bought that for them. The concept is obvious, but they only appeared recently; they weren't being made in 1900, and that's because high-performance materials are necessary for a net increase in performance. Those jumping stilts typically use fiberglass springs and modern aluminum alloys, and keeping weight low is still a problem. As that linked video notes, even with modern materials, the increase in jump height is only moderate. This guy made a different type of spring boots for increasing running speed, and 16 million views implies that some people find the concept interesting, but it would be better to start from a proper theoretical analysis and then properly optimize materials and structure. This paper argues for a different geometry, where a spring is attached to the foot and hip instead of the foot and shin. It notes: To reach the theoretical top speed of 20.9 m/s in Fig. 2, the spring should (i) store 930 J energy and (ii) weigh no more than 1.5 kg and state-of-the-art fixed stiffness running springs made from carbon fiber offer only about 150 J/kg It might be possible to use gas springs to get that kind of performance, though matching the desired force curves is an issue. Another obvious issue is transferring vertical forces to the hip or torso without interfering with movement too much or adding too much weight. Of course, 20.9 m/s is very fast and not very realistic in practice, but some sort of setup with a thick waist belt and gas springs + carbon fiber springs could plausibly make people run significantly faster. heels A lot of women wear high heels, despite them causing higher rates of injury and foot pain than other shoes. That popularity has something to do with the effect on apparent body proportions and gait changes making women seem slightly more attractive. As for why certain walks would be more attractive, my understanding is, that's largely an association with pelvis width. (I remember being told that pelvis width of human women had an evolutionary tradeoff between childbirth problems and walking/running efficiency, but apparently that was incorrect. (Learning about biomechanics of walking hasn't made me any better at walking, and wheeled vehicles on roads are obviously more efficient, but I guess if a Japanese billionaire ever needs me to build an 18m bipedal running robot, I'll be ready. One of the main reasons that high heels are less comfortable is that there's a greater impact on hitting the ground. Padded insoles help with that somewhat, but the theme of this post is shoes with springs, so here's a high heel prototype with a spring heel. Apparently that design worked OK but was kind of heavy; using fiberglass instead of steel would reduce the weight. I haven't seen much interest in that sort of concept, but maybe it's actually a good idea. We can also ask: why would high heels have more impact when hitting the ground? I think it's related to ankle position relative ...
undefined
Dec 31, 2023 • 5min

LW - Taking responsibility and partial derivatives by Ruby

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Taking responsibility and partial derivatives, published by Ruby on December 31, 2023 on LessWrong. A common pattern for myself over the years is to get into some kind of interpersonal ~"conflict", feel mildly to extremely indignant about how the other person is at fault, then later either through confrontation or reflection, realize that I actually held substantial responsibility. I then feel very guilty. (When I say "conflict" I mean something broader, e.g. I mean to include cases where you're mad at your boss even if you never actually confront them.) I noticed this pattern some years ago such I did become skeptical of my indignation even when I couldn't yet see where I was responsible. Yet this led me to a feeling of frustration. How is it that I'm always at fault? Why can I never be justifiably indignant at someone else? I believe the answer to this can be explained via partial derivatives. It doesn't have to be explained via partial derivatives, but I think partial derivatives are this super great concept that's helpful all over the place[1], so I'm going to invoke it. See this footnote for a quick explanation[2]. Suppose we have a Situation in which there is a Problem. In the real world, any Situation is composed of a large number of parameters. The amount of Problem there is is a function of the parameters. And for any interpersonal situation, different parameters are controlled by the different parties involved in the situation. The needlessly mathematical Partial Derivative Model of Interpersonal Conflict says that for any nontrivial situation, likely both partners control parameters that have non-negligible impact on how much of a Problem there is. In other words, if you want to blame the other person, you'll succeed. And if you want to blame yourself, you'll succeed. I have been good at doing those serially, but might be a better model to them in parallel: see all the ways in which each of you are contributing to the amount of Problem. This isn't to say that always everyone is equally to blame. If someone runs a red light and hits your car, they're at fault even if you could have chosen to work from home that day. In many cases, it's less clear cut and I think it's worth tracking how each person is contributing. The asymmetry in the situation is that by definition you control the parameters you're in control of, so it's worthwhile paying attention them. If you can get over being Right and instead focus on the outcomes you want, you might be able to attain them even if you're compensating for the mistakes of the other person. (A note on compensating for the mistakes of the other person. This might get you the outcomes you want, but I think can be unhealthy or unbalanced. If I have a colleague who feels easily insulted and I do extra emotional work to avoid doing that, it might work, but it's imbalanced. I venture that imbalanced situations between adults and children, and [senior] managers and [junior] employees are okay, but between peers, you want balance. You want to be making and compensating for mistakes in equal measure, not one person enabling the flaws of the other. Possibly the best thing to do if you think someone is at fault and you're at risk of compensating for it, is it to go have a conversation with them about it - but do so in an open-minded way where you're open to the possibility you're more at fault than you realize.) Something to note is that while I've framed this is the Problem as a function of the parameters, as though we have a function evaluated at single point in time, in fact interpersonal situations have more of a "game" (in the game theory sense) element to them. The other person's behavior might be a response to your behavior and their models of you, your behavior might be a response to their behavior and your models of them, ...
undefined
Dec 31, 2023 • 6min

EA - EA Wins 2023 by Shakeel Hashim

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EA Wins 2023, published by Shakeel Hashim on December 31, 2023 on The Effective Altruism Forum. Crossposted from Twitter. As the year comes to an end, we want to highlight and celebrate some of the incredible achievements from in and around the effective altruism ecosystem this year. 1. A new malaria vaccine The World Health Organization recommended its second-ever malaria vaccine this year: R21/Matrix-M, designed to protect babies and young children from malaria. The drug's recently concluded Phase III trial, which was co-funded by Open Philanthropy, found that the vaccine was between 68-75% effective at targeting the disease, which kills around 600,000 people (mainly children) each year. The work didn't stop there, though. Following advocacy from many people - including Zacharia Kafuko of 1 Day Sooner - the WHO quickly prequalified the vaccine, laying the groundwork for an expedited deployment and potentially saving hundreds of thousands of children's lives. 1 Day Sooner is now working to raise money to expedite the deployment further. 2. The Supreme Court upholds an animal welfare law In 2018, Californians voted for Proposition 12 - a bill that banned intensive cage confinement and the sale of animal products from animals in intensive confinement. The meat industry challenged the law for being unconstitutional - but in May of this year, the US Supreme Court upheld Prop 12, a decision that will improve the lives of millions of animals who would otherwise be kept in cruel and inhumane conditions. Organizations such as The Humane League - one of Animal Charity Evaluators' top charities - are a major part of this victory; their tireless campaigning is part of what made Prop 12 happen. Watch a panel discussion featuring The Humane League at EAG London 2023 here. 3. AI safety goes mainstream 2023 was the year AI safety went mainstream. After years of work from people in and around effective altruism, this year saw hundreds of high-profile AI experts - including two Turing Award winners say that "mitigating the risk of extinction from AI should be a global priority". That was followed by a flurry of activity from policymakers, including a US Executive Order, an international AI Safety Summit, the establishment of the UK Frontier AI Taskforce, and a deal on the EU AI Act - which, thanks to the efforts of campaigners, is now going to regulate foundation models that pose a systemic risk to society. Important progress was made in technical AI safety, too, including work on adversarial robustness, mechanistic interpretability, and lie detection. Watch a talk from EAG Boston 2023 on technical AI safety here. 4. Results from the world's largest UBI study Since 2018, GiveDirectly - an organization that distributes direct cash transfers to those in need - has been running the world's largest universal basic income experiment in rural Kenya. In September, researchers led by MIT economist Taveneet Suri and Nobel laureate Abhijit Banerjee, published their latest analysis of the data - finding that giving people money as a lump sum leads to better results than dispersing it via monthly payments. Long-term UBI was also found to be highly effective and didn't discourage work. The results could have significant implications for how governments disburse cash aid. Watch GiveDirectly's talk at EAGx Nordics 2023. 5. Cultivated meat approved for sale in US After years of work from organizations like the Good Food Institute, in June 2023 the USDA finally approved cultivated meat for sale in the US. The watershed moment made the US the second country (after Singapore) to legalize the product, which could have significant impacts on animal welfare by reducing the number of animals that need to be raised and killed for meat. Watch the Good Food Institute's Bruce Friedrich talk about alternative ...
undefined
Dec 31, 2023 • 13min

AF - AI Alignment Metastrategy by Vanessa Kosoy

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Alignment Metastrategy, published by Vanessa Kosoy on December 31, 2023 on The AI Alignment Forum. I call "alignment strategy" the high-level approach to solving the technical problem[1]. For example, value learning is one strategy, while delegating alignment research to AI is another. I call "alignment metastrategy" the high-level approach to converging on solving the technical problem in a manner which is timely and effective. (Examples will follow.) In a previous article, I summarized my criticism of prosaic alignment. However, my analysis of the associated metastrategy was too sloppy. I will attempt to somewhat remedy that here, and also briefly discuss other metastrategies, to serve as points of contrast and comparison. Conservative Metastrategy The conservative metastrategy follows the following algorithm: As much as possible, stop all work on AI capability outside of this process. Develop the mathematical theory of intelligent agents to a level where we can propose adequate alignment protocols with high confidence. Ideally, the theoretical problems should be solved in such order that results with direct capability applications emerge as late as possible. Design and implement empirical tests of the theory that incur minimal risk in worlds in which the theory contains errors or the assumptions of the theory are violated in practice. If the tests show problems, go back to step 2. Proceed with incrementally more ambitious tests in the same manner, until you're ready to deploy an AI defense system. This is my own favorite metastrategy. The main reason it can fail is if the unconservative research we failed to stop creates unaligned TAI before we can deploy an AI defense system (currently, we have a long way to go to complete step 2). I think that it's pretty clear that a competent civilization would follow this path, since it seems like the only one which leads to a good long-term outcome without taking unnecessary risks[2]. Of course, in itself that is an insufficient argument to prove that, in our actual civilization, the conservative metastrategy is the best for those concerned with AI risk. But, it is suggestive. Beyond that, I won't lay out the case for the conservative metastrategy here. The interested reader can turn to 1 2 3 4 5. Incrementalist Metastrategy The incrementalist metastrategy follows the following algorithm: Find an advance in AI capability (by any means, including trial and error). Find a way to align the new AI design (prioritizing solutions that you expect to scale further). Validate alignment using a combination of empiricial tests and interpretability tools. If validation fails, go back to step 2. If possible, deploy an AI defense system using current level of capabilities. Go to step 1. This is (more or less) the metastrategy favored by adherents of prosaic alignment. In particular, this is what the relatively safety-conscious actors involved with leading AI labs present as their plan. There are 3 main mutually reinforcing problems with putting our hopes in this metastrategy, which I discuss below. There are 2 aspects to each problem: the "design" aspect, which is what would happen if the best version of the incrementalist metastrategy was implemented, and the "implementation" aspect, which is what happens in AI labs in practice (even when they claim to follow incrementalist metastrategy). Information Security Design If a new AI capability is found in step 1, and the knowledge is allowed to propagate, then irresponsible actors will continue to compound it with additional advances before the alignment problem is solved on the new level. Ideally, either the new capability should remain secret at least until the entire iteration is over, or government policy should prevent any actor from subverting the metastrategy, or some sufficient com...
undefined
Dec 31, 2023 • 42sec

EA - Your EA Forum 2023 Wrapped by Sarah Cheng

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Your EA Forum 2023 Wrapped, published by Sarah Cheng on December 31, 2023 on The Effective Altruism Forum. Last year we introduced the EA Forum Wrapped feature, and this year we've totally redesigned it for you - see your EA Forum 2023 Wrapped here. Thanks to everyone for visiting and contributing to the EA Forum this year! If you have any feedback or questions about the results, please feel free to leave a comment on this post. Consider sharing if you found something surprising or interesting. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app