The Nonlinear Library

The Nonlinear Fund
undefined
Jan 10, 2024 • 14min

LW - Does AI risk "other" the AIs? by Joe Carlsmith

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Does AI risk "other" the AIs?, published by Joe Carlsmith on January 10, 2024 on LessWrong. (Cross-posted from my website. Podcast version here, or search for "Joe Carlsmith Audio" on your podcast app. This essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that the individual essays can be read fairly well on their own, but see here for a brief summary of the essays that have been released thus far.) In my last essay, I discussed the way in which what I've called "deep atheism" (that is, a fundamental mistrust towards both "Nature" and "bare intelligence") can prompt an aspiration to exert extreme levels of control over the universe; I highlighted the sense in which both humans and AIs, on Yudkowsky's AI risk narrative, are animated by this sort of aspiration; and I discussed some ways in which our civilization has built up wariness around control-seeking of this kind. I think we should be taking this sort of wariness quite seriously. In this spirit, I want to look, in this essay, at Robin Hanson's critique of the AI risk discourse - a critique especially attuned the way in which this discourse risks control*-*gone-wrong. In particular, I'm interested in Hanson's accusation that AI risk "others" the AIs (see e.g. here, here, and here). Hearing the claim that AIs may eventually differ greatly from us, and become very capable, and that this could possibly happen fast, tends to invoke our general fear-of-difference heuristic. Making us afraid of these "others" and wanting to control them somehow ... "Hate" and "intolerance" aren't overly strong terms for this attitude.[1] Hanson sees this vice as core to the disagreement ("my best one-factor model to explain opinion variance here is this: some of us 'other' the AIs more"). And he invokes a deep lineage of liberal ideals in opposition. I think he's right to notice a tension in this vicinity. AI risk is, indeed, about fearing some sort of uncontrolled other. But is that always the bad sort of "othering?" Some basic points up front Well, let's at least avoid basic mistakes/misunderstandings. For one: hardcore AI risk folks like Yudkowsky are generally happy to care about AI welfare - at least if welfare means something like "happy sentience." And pace some of Hanson's accusations of bio-chauvinism, these folks are extremely not fussed about the fact that AI minds are made of silicon (indeed: come now). Of course, this isn't to say that AI welfare (and AI rights) issues don't get complicated (see e.g. here and here for a glimpse of some of the complications), or that humanity as a whole will get the "digital minds matter" stuff right. Indeed, I worry that we will get it horribly wrong - and I do think that the AI risk discourse under-attends to some of the tensions. But species-ism 101 (201?) - e.g., "I don't care about digital suffering" - isn't AI risk's vice. For two: clearly some sorts of otherness warrant some sorts of fear. For example: maybe you, personally, don't like to murder. But Bob, well: Bob is different. If Bob gets a bunch of power, then: yep, it's OK to hold your babies close. And often OK, too, to try to "control" Bob into not-killing-your-babies. Cf, also, the discussion of getting-eaten-by-bears in the first essay. And the Nazis, too, were different in their own way. Of course, there's a long and ongoing history of mistaking "different" for "the type of different that wants to kill your babies." We should, indeed, be very wary. But liberal tolerance has never been a blank check; and not all fear is hatred. Indeed, many attempts to diagnose the ethical mistake behind various canonical difference-related vices (racism, sexism, species-ism, etc) reveal a certain shallowness of commitment to difference-per-se. In particular: such vices are often understood as missing...
undefined
Jan 10, 2024 • 5min

LW - Saving the world sucks by Defective Altruism

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Saving the world sucks, published by Defective Altruism on January 10, 2024 on LessWrong. I don't want to save the world. I don't want to tile the universe with hedonium. I don't want to be cuckolded by someone else's pretty network-TV values. I don't want to do anything I don't want to do, and I think that's what (bad) EAs, mother Teresa, and proselytizing Christians all get wrong. Doing things because they sound nice and pretty and someone else says they're morally good suuucks. Who even decided that warm fuzzies, QALYs, or shrimp lives saved are even good axes to optimize? Because surely everyone doesn't arrive at that conclusion independently. Optimizing such universally acceptable, bland metrics makes me feel like one of those blobby, soulless corporate automata in bad tech advertisements. I don't see why people obsess over the idea of universal ethics and doing the prosocial thing. There's no such thing as the Universal Best Thing, and professing the high virtue of maximizing happiness smacks of an over-RLHFed chatbot. Altruism might be a "virtue", as in most people's evolved and social environments cause them to value it, but it doesn't have to be. The cosmos doesn't care what values you have. Which totally frees you from the weight of "moral imperatives" and social pressures to do the right thing. There comes a time in most conscientious, top-of-distribution kids' lives when they decide to Save the World. This is very bad. Unless they really do get a deep, intrinsic satisfaction from maximizing expected global happiness, they'll be in for a world of pain later on. After years of spinning their wheels, not getting anywhere, they'll realize that they hate the whole principle they've built their life around. That, deep down, their truest passion doesn't (and doesn't have to) involve the number of people suffering malaria, the quantity of sentient shrimps being factory farmed, or how many trillion people could be happy in a way they aren't 1000 years from now. I claim that scope insensitivity isn't a bug. That there are no bugs when it comes to values. That you should care about exactly what you want to care about. That if you want to team up and save the world from AI or poverty or mortality, you can, but you don't have to. You have the freedom to care about whatever you want and shouldn't feel social guilt for not liking the same values everyone else does. Their values are just as meaningful (or meaningless) as yours. Peer pressure is an evolved strategy to elicit collaboration in goofy mesa-optimizers like humans, not an indication of some true higher virtue. Life is complex, and I really doubt that what you should care about can be boiled down to something so simple as quality-adjusted life-years. I doubt it can be boiled down at all. You should care about whatever you care about, and that probably won't fit any neat moral templates an online forum hands you. It'll probably be complex, confused, and logically inconsistent, and I don't think that's a bad thing Why do I care about this so much? Because I got stuck in exactly this trap at the ripe old age of 12, and it fucked me up good. I decided I'd save the world, because a lot of very smart people on a very cool site said that I should. That it would make me feel good and be good. That it mattered. The result? Years of guilt, unproductivity, and apathy. Ending up a moral zombie that didn't know how to care and couldn't feel emotion. Wondering why enlightenment felt like hell. If some guy promised to send you to secular heaven if you just let him fuck your wife, you'd tell him to hit the road. But people jump straight into the arms of this moral cuckoldry. Choosing and caring about your values is a very deep part of human nature and identity, and you shouldn't let someone else do it for you. This advice proba...
undefined
Jan 10, 2024 • 7min

LW - On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche by Zack M Davis

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche, published by Zack M Davis on January 10, 2024 on LessWrong. Rob Bensinger argues that "ITT-passing and civility are good; 'charity' is bad; steelmanning is niche". The ITT - Ideological Turing Test - is an exercise in which one attempts to present one's interlocutor's views as persuasively as the interlocutor themselves can, coined by Bryan Caplan in analogy to the Turing Test for distinguishing between humans and intelligent machines. (An AI that can pass as human must presumably possess human-like understanding; an opponent of an idea that can pass as an advocate for it presumably must possess an advocate's understanding.) "Steelmanning" refers to the practice of addressing a stronger version of an interlocutor's argument, coined in disanalogy to "strawmanning", the crime of addressing a weaker version of an interlocutor's argument in the hopes of fooling an audience (or oneself) that the original argument has been rebutted. Bensinger describes steelmanning as "a useful niche skill", but thinks it isn't "a standard thing you bring out in most arguments." Instead, he writes, discussions should be structured around object-level learning, trying to pass each other's Ideological Turing Test, or trying resolve cruxes. I think Bensinger has it backwards: the Ideological Turing Test is a useful niche skill, but it doesn't belong on a list of things to organize a discussion around, whereas something like steelmanning naturally falls out of object-level learning. Let me explain. The ITT is a test of your ability to model someone else's models of some real-world phenomena of interest. But usually, I'm much more interested in modeling the real-world phenomena of interest directly, rather than modeling someone else's models of it. I couldn't pass an ITT for advocates of Islam or extrasensory perception. On the one hand, this does represent a distinct deficit in my ability to model what the advocates of these ideas are thinking, a tragic gap in my comprehension of reality, which I would hope to remedy in the Glorious Transhumanist Future if that were a real thing. On the other hand, facing the constraints of our world, my inability to pass an ITT for Islam or ESP seems ... basically fine? I already have strong reasons to doubt the existence of ontologically fundamental mental entities. I accept my ignorance of the reasons someone might postulate otherwise, not out of contempt, but because I just don't have the time. Or think of it this way: as a selfish seeker of truth speaking to another selfish seeker of truth, when would I want to try to pass my interlocutor's ITT, or want my interlocutor to try to pass my ITT? In the "outbound" direction, I'm not particularly selfishly interested in passing my interlocutor's ITT because, again, I usually don't care much about other people's beliefs, as contrasted to the reality that those beliefs are reputedly supposed to track. I listen to my interlocutor hoping to learn from them, but if some part of what they say seems hopelessly wrong, it doesn't seem profitable to pretend that it isn't until I can reproduce the hopeless wrongness in my own words. Crucially, the same is true in the "inbound" direction. I don't expect people to be able to pass my ITT before criticizing my ideas. That would make it harder for people to inform me about flaws in my ideas! But if I'm not particularly interested in passing my interlocutor's ITT or in my interlocutor passing mine, and my interlocutor presumably (by symmetry) feels the same way, why would we bother? All this having been said, I absolutely agree that, all else being equal, the ability to pass ITTs is desirable. It's useful as a check that you and your interlocutor are successfully communicating, rather than talking past each other. I...
undefined
Jan 10, 2024 • 56min

LW - Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer - a New Metaphor by RogerDearnaley

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer - a New Metaphor, published by RogerDearnaley on January 10, 2024 on LessWrong. Thanks to Quentin FEUILLADE--MONTIXI for the discussion in which we came up with this idea together, and for feedback on drafts. TL;DR A better metaphor for how LLMs behave, how they are trained, and particularly for how to think about the alignment strengths and challenges of LLM-powered agents. This is informed by simulator theory - hopefully people will find it more detailed, specific, and helpful than the old shoggoth metaphor. Humans often think in metaphors. A good metaphor can provide a valuable guide to intuition, or a bad one can mislead it. Personally I've found the shoggoth metaphor for LLMs rather useful, and it has repeatedly helped guide my thinking (as long as one remembers that the shoggoth is a shapeshifter, and thus a very contextual beast). However, as posts like Why do we assume there is a "real" shoggoth behind the LLM? Why not masks all the way down? make clear, not everyone finds this metaphor very helpful (my reaction was "Of course it's masks all the way down - that's what the eyes symbolize! It's made of living masks: masks of people."). Which admittedly doesn't match H.P. Lovecraft's description; perhaps it helps to have spent time playing around with base models in order to get to know the shoggoth a little better (if you haven't, I recommend it). So, I thought I'd try to devise a more useful and detailed metaphor, one that was a better guide for intuition, especially for alignment issues. During a conversation with Quentin FEUILLADE--MONTIXI we came up with one together (the stage and its animatronics were my suggestions, the puppeteer was his, and we tweaked it together). I'd like to describe this, in the hope that other people find it useful (or else that they rewrite it until they find one that works better for them). Along the way, I'll show how this metaphor can help illuminate a number of LLM behaviors and alignment issues, some well known, and others that seem to be less widely-understood. A Base Model: The Stage and its Animatronics A base-model LLM is like a magic stage. You construct it, then you read it or show it (at enormous length) a large proportion of the internet, and if you wish also books, scientific papers, images, movies, or whatever else you want. The stage is inanimate: it's not agentic, it's goal agnostic (well, unless you want consider 'contextually guess the next token' to be a goal, but it's not going to cheat by finding a way to make the next token more predictable, because that wasn't possible during its training and it's not agentic enough to be capable of conceiving that that might even be possible outside it). No Reinforcement Learning (RL) was used in its training, so concerns around Outer Alignment don't apply to it - we know exactly what its training objective was: guess next tokens right, just as we intended. We now even have some mathematical idea what it's optimizing. Nor, as we'll discuss later, do concerns around deceit, situational awareness, or gradient hacking apply to it. At this point, it's myopic, tool AI: it doesn't know or care whether we or the material world even exist, it only cares about the distribution of sequences of tokens, and all it does is repeatedly contextually generate a guess of the next token. So it plays madlibs like a professional gambler, in the same blindly monomaniacal sense that a chess machine plays chess like a grandmaster. By itself, the only risk from it is the possibility that someone else might hack your computer network to steal its weights, and what they might then do with it. Once you're done training the stage, you have a base model. Now you can flip its switch, tell the stage the title of a play, or better the first ...
undefined
Jan 10, 2024 • 8min

EA - Copenhagen Consensus Center's best investment papers for the sustainable development goals by Vasco Grilo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Copenhagen Consensus Center's best investment papers for the sustainable development goals, published by Vasco Grilo on January 10, 2024 on The Effective Altruism Forum. This is a linkpost to Copenhagen Consensus Center's 12 best investment papers for the sustainable development goals (SDGs), which were published in the Journal of Benefit-Cost Analysis in 2023. Some notes: Each paper does a cost-benefit analysis which accounts for health and economic benefits. The benefit-to-cost ratios across the 12 papers range from 18 (nutrition) to 125 (e-Government procurement). All 12 ratios are much higher than the 2.4 estimated for GiveDirectly's cash transfers to poor households in Kenya. 4 are similar to and 8 are higher than GiveWell's cost-effectiveness bar of around 24 (= 10*2.4), equal to 10 times the above. Cash transfers are often preferred due to being highly scalable, but the 12 papers deal with large investments too. As can be seen in the table below, taken from a companion post, all 12 interventions together have: An annual cost of 41 G 2020-$ (41 billion 2020 USD). Annual benefits of 2.14 T 2020-$ (2.14 trillion 2020 USD), of which 1.12 T 2020-$ are economic benefits corresponding to 14.6 % (= 1.12*1.13/(8.17 + 0.528)) of the gross domestic product (GDP) of low and lower-middle income countries in 2022. A benefit-to-cost ratio of 52.2 (= 2.14/0.041), 21.8 (= 52.2/2.4) times that of GiveDirectly's cash transfers to poor households in Kenya. I expect the benefit-to-cost ratios of the papers to be overestimates: The paper on malaria estimates a ratio of 48, whereas I infer GiveWell's is: 35.5 (= 14.8*2.4) for the Against Malaria Foundation (AMF), considering the mean cost-effectiveness across 8 countries of 14.8 times that of cash transfers. 40.8 (= 17.0*2.4) for the Malaria Consortium, considering the mean cost-effectiveness across 13 countries of 17.0 times that of cash transfers. The paper on malaria studies an annual investment of 1.1 G 2020-$, whereas GiveWell's estimates respect marginal donations. Consequently, assuming diminishing marginal returns, and that GiveWell's estimates are more accurate, that of the paper on malaria is a significant overestimate. I guess the same reasoning applies to other areas. I think 3 of the papers focus on areas which have not been funded by GiveWell nor Open Philanthropy[2]: e-Government procurement (benefit-to-cost ratio of 125). Trade (95). Land tenure security (21). As a side note, I wonder why GiveWell's (marginal) cost-effectiveness estimates do not roughly match its bar of 10 times that of cash transfers. Agricultural research and development Paper: Benefit-Cost Analysis of Increased Funding for Agricultural Research and Development in the Global South. Benefit-to-cost ratio: 33. Investment: Basic research and development, including capacity building, and technical and policy support with special focus on Low- and Lower Middle-Income countries. Research outcomes are difficult to predict, but an example could be crop yield increases using precision genetic technologies. Childhood immunization Paper: SDG Halftime Project: Benefit-Cost Analysis using Methods from the Decade of Vaccine Economics (DOVE) Model. Benefit-to-cost ratio: 101. Investment: Raise immunization coverage from 2022 levels to 2030 target for pentavalent vaccine, HPV, Japanese encephalitis, measles, measles-rubella, Men A, PCV, rotavirus, and yellow fever. Maternal and newborn health Paper: Achieving maternal and neonatal mortality development goals effectively: A cost-benefit analysis. Benefit-to-cost ratio: 87. Investment: Sufficient staff and resources at all birth facilities to deliver a package of basic emergency obstetric and newborn care and family planning services, including bag and mask for neonatal resuscitation, removal of retained products of...
undefined
Jan 10, 2024 • 4min

EA - Call for Expressions of Interest: NYU Wild Animal Welfare Summit by Sofia Fogel

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Call for Expressions of Interest: NYU Wild Animal Welfare Summit, published by Sofia Fogel on January 10, 2024 on The Effective Altruism Forum. Deadline: March 1, 2024 (followed by rolling submissions) Event: June 21-22, 2024 Location: New York University, New York, NY The NYU Wild Animal Welfare Program is hosting a two-day wild animal welfare summit on June 21-22, 2024. The aim of this event is to connect scholars with an interest in this topic, particularly scholars across a variety of fields and career stages. The first day of the summit will feature lightning talks and discussion sessions. The second day will feature breakout sessions for workshopping collaborative project ideas. Both days will also include vegan meals and plenty of networking opportunities. We welcome expressions of interest from scholars in all fields, particularly scholars who work in animal welfare or conservation science. Please note that funding for travel and hotel is available for early-career scholars, i.e., scholars within five years of their terminal degree. If you have interest in attending this summit, please send the below materials to Sofia Fogel at sofia.fogel@nyu.edu. We guarantee full consideration of all submissions received by March 1, 2024. We will also consider submissions received after that date on a rolling basis. Please include in your expression of interest: A CV or resume. A statement of interest with three elements: A short summary of your current research, your expected future research, and how your research relates to wild animal welfare. (500 words max.) (Optional) If you have ideas for collaborative research projects that you might like to discuss at this summit, please describe them. (250 words max.) (Optional) If you might like to give a lightning talk about your current or future research, please suggest a topic or set of topics. (250 words max.) Please note that if you answer questions (b) and (c), your answers can range from general (e.g., "Researching the effects of wildlife corridors on different kinds of species") to specific (e.g., "Measuring the effects of a new wildlife corridor in Yellowstone National Park on the movement of elk populations." Please also note that answering these questions does not commit you to discussing your ideas or presenting your work at the event. Topics that we see as within scope for this summit include, but are not limited to: How can we assess wild animal welfare at individual and population levels? How can we make welfare comparisons within and across species? What are the most common causes of morbidity and mortality for wild animals, and how do they vary within and across species? How does the project of improving wild animal welfare interact with the project of conserving species and ecosystems? What are the costs and benefits of different kinds of population control for different individuals, species, and ecosystems? How can we support individuals, species, and ecosystems in adapting to human-caused climate change and other such environmental changes? How can we support coordination and collaboration between scholars who work in animal welfare and environmental conservation, among other areas? How can we educate advocates, policymakers, and the general public about the relationship between human, animal, and environmental protection? If you are interested in these or related topics, we would love to hear from you! If you have any questions, feel free to contact Sofia Fogel at sofia.fogel@nyu.edu. Thank you to Animal Charity Evaluators and Open Philanthropy for your generous support of this program and event. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jan 10, 2024 • 3min

EA - Why can't we accept the human condition as it existed in 2010? by Hayven Frienby

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why can't we accept the human condition as it existed in 2010?, published by Hayven Frienby on January 10, 2024 on The Effective Altruism Forum. By this, I mean a world in which: Humans remain the dominant intelligent, technological species on Earth's landmasses for a long period of time (> ~10,000 years). AGI is never developed, or it gets banned / limited in the interests of human safety. AI never has much social or economic impact. Narrow AI never advances much beyond where it is today, or it becomes banned / limited in the interests of human safety. Mind uploading is impossible or never pursued. Life extension (beyond modest gains due to modern medicine) isn't possible, or is never pursued. Any form of transhumanist initiatives are impossible or never pursued. No contact is made with alien species or extraterrestrial AIs, no greater-than-human intelligences are discovered anywhere in the universe. Every human grows, peaks, ages, and passes away within ~100 years of their birth, and this continues for the remainder of the human species' lifetime. Most other EAs I've talked to have indicated that this sort of future is suboptimal, undesirable, or best avoided, and this seems to be a widespread position among AI researchers as well (1). Even MIRI founder Eliezer Yudkowsky, perhaps the most well-known AI abolitionist outside of EA circles, wouldn't go as far as to say that AGI should never be developed, and that transhumanist projects should never be pursued (2). And he isn't alone -- there are many, many researchers both within and outside of the EA community with similar views on P(extinction) and P(societal collapse), and they still wouldn't accept the idea that the human condition should never be altered via technological means. My question is why can't we just accept the human condition as it existed before smarter-than-human AI (and fundamental alterations to our nature) were considered to be more than pure fantasy? After all, the best way to stop a hostile, unaligned AI is to never invent it in the first place. The best way to avoid the destruction of future value by smarter-than-human artificial intelligence is to avoid obsession with present utility and convenience. So why aren't more EA-aligned organizations and initiatives (other than MIRI) presenting global, strictly enforced bans on advanced AI training as a solution to AI-generated x-risk? Why isn't there more discussion of acceptance (of the traditional human condition) as an antidote to the risks of AGI, rather than relying solely on alignment research and safety practices to provide a safe path forward for AI (I'm not convinced such a path exists)? Let's leave out the considerations of whether AI development can be practically stopped at this stage, and just focus more on the philosophical issues here. References: Katya_Grace (EA Forum Poster) (2024, January 5). Survey of 2,778 AI authors: six parts in pictures. Yudkowsky, E. S. (2023, March 29). The only way to deal with the threat from AI? Shut it down. Time. https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/ Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jan 10, 2024 • 1h 8min

EA - ウィリアム・マッカスキル「効果的利他主義の定義」 by EA Japan

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: ウィリアム・マッカスキル「効果的利他主義の定義」, published by EA Japan on January 10, 2024 on The Effective Altruism Forum. This is a Japanese translation of William MacAskill, 'The Definition of Effective Altruism' available at MacAskill's website. Translated by 清水颯(Hayate Shimizu, link to his Researchmap) 今日、世界にはさまざまな問題がある。7億5千万人以上の人々が1日1.90ドル以下(購買力平価換算)で生活している[1]。マラリアや下痢、肺炎など、簡単に予防できる原因で、毎年約600万人の子どもたちが亡くなっている[2]。気候変動は環境に大打撃を与え、経済に何兆ドルもの損失をもたらすと言われている[3]。世界の女性の3分の1は、性的または身体的な暴力に苦しんだことがある[4]。3,000発以上の核弾頭が世界中で高い警戒状態(high alert)に置かれていて、短時間の内に使える状態にある[5]。細菌は抗生物質に耐性を持ち始めている[6]。党派心は強まり、民主主義は衰退しているかもしれない[7]。 世界はこれほど多くの問題を抱えており、これらの問題が深刻であることを考えると、私たちはこれらの問題に対して何かをする責任があることは確かである。しかし、何をすればよいのだろうか。私たちが取り組みうる問題は数え切れないほどあり、また、それぞれの問題に取り組む方法もさまざまである。しかも、私たちの資源は限られているから、個人として、あるいは地球全体(globe)として、これらの問題を一度に解決することはできない。それゆえ、私たちは自分たちがもつ資源をどのように配分するのかを決めなければならない。しかし、私たちは何を基準にそのような決断を下すべきなのか。 その結果、効果的利他主義コミュニティは、世界の破局的リスクの軽減、家畜のアニマルウェルフェア、グローバルヘルスの分野で大きな成果を上げることに貢献した。2016年だけでも、効果的利他主義コミュニティは、効果の持続する殺虫剤処理された蚊帳を提供することで650万人の子どもをマラリアから守り、3億6000万羽の鶏をケージの檻の中の生活から救い出し、技術的AIセーフティを機械学習研究の主流領域として発展させることに大きな推進力と支援を提供した[13]。 この動きは、学術的な議論にも大きな影響を及ぼしてきた。このテーマに関する書籍には、ピーター・シンガー著『あなたが世界のためにできるたったひとつのこと:〈効果的な利他主義〉のすすめ』や私自身の『〈効果的な利他主義〉宣言!』などがあり[14]、効果的利他主義を支持ないし批判する学術論文は、Philosophy and Public AffairsやUtilitas、Journal of Applied Philosophy、Ethical Theory and Moral Practiceその他の刊行物に掲載されてきた[15]。Essays in Philosophyの一巻はこのテーマに特化しており、Boston Reviewには学者たちによる効果的利他主義についての論考が掲載されている[16]。 しかし、効果的利他主義について有意義な学術的議論を行うには、何について話しているのかについて合意を形成する必要がある。本章では、その一助となるべく、効果的利他主義センターの定義を紹介し、同センターがなぜそのような定義を選んだのかを説明し、その定義に対する正確な哲学的解釈を提供することを目指す。私は、効果的利他主義コミュニティで広く支持されているこの効果的利他主義の理解は、一般の人々の多くや効果的利他主義を批判する多くの人々が持っている効果的利他主義の理解とはかけ離れていると考えている。本稿では、なぜ私がこのような定義を好むのかを説明した後で、この機会を利用して、効果的利他主義に対して広く流布している誤解を訂正する。 始める前に、「効果的利他主義」を定義することで、道徳の根本的な側面を説明しようとしているわけではないことに注意することが重要である。経験的研究分野では、科学と工学を区別することができる。科学は、私たちの住む世界の一般的な真理を発見しようとするものである。工学は、科学的理解を用いて、社会に役立つ構造物やシステムを設計し、構築することである。 道徳哲学でも、同じような区別ができる。典型的に、道徳哲学は、道徳の本質に関する一般的な真理を発見することを目的としている。これは規範的科学に相当する。しかし、道徳哲学の中にも工学に相当する部分があり、例えば、社会で広く採用されれば、世界を改善することになる新しい道徳的概念を作り出すことができる。 「効果的利他主義」を定義することは、道徳性の基本的な側面を説明することではなく、工学的な問題なのだ。この観点から、私は定義が満たすべき二つの主要な要件を提案する。一つ目は、現在、効果的利他主義に従事していると言われている人たちの実際の実践、そしてコミュニティのリーダーが持っている効果的利他主義の理解に沿うことである。二つ目は、その概念が可能な限り公共的な価値を持つようにすることである。つまり、例えば、様々な道徳的見解に支持され、またその道徳的見解にとって有用であるほど十分に広い概念でありながら、その概念の使用者が世界をより良くするために、そうしなかった場合よりも多くのことを行えるほどには限定された概念が望まれる。もちろん、これはバランス感覚を要する作業になる。 1. 効果的利他主義の以前の定義 「効果的利他主義」という言葉は、「効果的利他主義センター」を設立する過程で、2011年12月3日に関係者17名による民主的なプロセスを経て作られた言葉である[17]。しかし、この用語の公式な定義は導入されていない。長年にわたり、効果的利他主義は、さまざまな人々によって、さまざまな方法で定義されてきた。以下はその例である。 私たちにとって「効果的利他主義」とは、持っている1ドル、1時間を使って、最大限の善いことをしようとすることである[18]。 効果的利他主義とは「どうしたら、自分にできる最大の違いを生み出せるだろうか」と問いかけ、その答えを見出すために、証拠と慎重な推論を用いることである[19]。 効果的利他主義は、非常にシンプルな考えに基づいている:私たちは、できる限りで最大の善を行うべきである〔・・・・・・〕最低限受け入れ可能な倫理的な生活を送るには、余剰資源の相当部分を、世界をより善い場所にするために使うことである。完全に倫理的な生活を送るには、できる限り最大の善を行うことである[20]。 効果的利他主義とは、質の高い証拠と慎重な推論を用いて、可能な限りで最大限、他者を助ける方法を考え出す研究分野である。また、そうして出た答えを真剣に受け止め、世界の最も差し迫った問題に対する最も有望な解決策に力を注ぐ人々のコミュニティでもある[21]。 効果的利他主義とは、他者に利益をもたらす最も効果的な方法を決定するために、証拠と理性を用いる哲学であり、社会運動である[22]。 以上の定義には、いくつかの共通点がある[23]。すべての定義が最大化という考え方を引き合いに出し、福利を高めるという価値であれ、ただ一般に善を達成するという価値であれ、ともかく何らかの価値の達成を話題にしている。しかし、相違点もある。定義(1)(3)は「善を行う」ことについて述べているのに対し、定義(4)と(5)は「他者を助ける」「他者に利益をもたらす」ことについて述べている。他の定義と異なり、(3)は効果的利他主義を、活動や研究分野、運動といった非規範的なプロジェクトではなく、規範的な主張としている。定義(2)、(4)、(5)は、証拠と慎重な推論を用いるという考えを引き合いに出しているが、定義(1)、(3)はそうしていない。 効果的利他主義センターの定義は、効果的利他主義を下記のように定義することで、これら各論点に態度を取っている。 効果的利他主義とは、証拠と理由を用いて、どうすれば他人のためになるかをできるだけ考え、それに基づいて行為することである[24]。 この定義は、私が中心となって、効果的利他主義コミュニティの多くのアドバイザーから意見を聞き、Julia WiseとRob Bensingerの多大な協力を得て作成した。この定義と、それに沿った一連の指針的価値は、効果的利他主義コミュニティの大多数のリーダーによって正式に承認されている[25]。 効果的利他主義に「公式」な定義はないが、当センターの定義は他のどの定義よりもそれに近い。しかし、効果的利他主義のこの声明は、哲学的な読者ではなく、一般的な読者を対象としているため、アクセスしやすくするために、ある程度の正確さが失われている。そのため、ここではより正確な定式化を行った上で、定義の内容を詳しく解説していきたい。私の定義は次のようなものであ...
undefined
Jan 9, 2024 • 57min

AF - Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer - a New Metaphor by Roger Dearnaley

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer - a New Metaphor, published by Roger Dearnaley on January 9, 2024 on The AI Alignment Forum. Thanks to Quentin FEUILLADE--MONTIXI for the discussion in which we came up with this idea together, and for feedback on drafts. TL;DR A better metaphor for how LLMs behave, how they are trained, and particularly for how to think about the alignment strengths and challenges of LLM-powered agents. This is informed by simulator theory - hopefully people will find it more detailed, specific, and helpful than the old shoggoth metaphor. Humans often think in metaphors. A good metaphor can provide a valuable guide to intuition, or a bad one can mislead it. Personally I've found the shoggoth metaphor for LLMs rather useful, and it has repeatedly helped guide my thinking (as long as one remembers that the shoggoth is a shapeshifter, and thus a very contextual beast). However, as posts like Why do we assume there is a "real" shoggoth behind the LLM? Why not masks all the way down? make clear, not everyone finds this metaphor very helpful (my reaction was "Of course it's masks all the way down - that's what the eyes symbolize! It's made of living masks: masks of people."). Which admittedly doesn't match H.P. Lovecraft's description; perhaps it helps to have spent time playing around with base models in order to get to know the shoggoth a little better (if you haven't, I recommend it). So, I thought I'd try to devise a more useful and detailed metaphor, one that was a better guide for intuition, especially for alignment issues. During a conversation with Quentin FEUILLADE--MONTIXI we came up with one together (the stage and its animatronics were my suggestions, the puppeteer was his, and we tweaked it together). I'd like to describe this, in the hope that other people find it useful (or else that they rewrite it until they find one that works better for them). Along the way, I'll show how this metaphor can help illuminate a number of LLM behaviors and alignment issues, some well known, and others that seem to be less widely-understood. A Base Model: The Stage and its Animatronics A base-model LLM is like a magic stage. You construct it, then you read it or show it (at enormous length) a large proportion of the internet, and if you wish also books, scientific papers, images, movies, or whatever else you want. The stage is inanimate: it's not agentic, it's goal agnostic (well, unless you want consider 'contextually guess the next token' to be a goal, but it's not going to cheat by finding a way to make the next token more predictable, because that wasn't possible during its training and it's not agentic enough to be capable of conceiving that that might even be possible outside it). No Reinforcement Learning (RL) was used in its training, so concerns around Outer Alignment don't apply to it - we know exactly what its training objective was: guess next tokens right, just as we intended. We now even have some mathematical idea what it's optimizing. Nor, as we'll discuss later, do concerns around deceit, situational awareness, or gradient hacking apply to it. At this point, it's myopic, tool AI: it doesn't know or care whether we or the material world even exist, it only cares about the distribution of sequences of tokens, and all it does is repeatedly contextually generate a guess of the next token. So it plays madlibs like a professional gambler, in the same blindly monomaniacal sense that a chess machine plays chess like a grandmaster. By itself, the only risk from it is the possibility that someone else might hack your computer network to steal its weights, and what they might then do with it. Once you're done training the stage, you have a base model. Now you can flip its switch, tell the stage the title of a play, or bett...
undefined
Jan 9, 2024 • 5min

LW - Learning Math in Time for Alignment by NicholasKross

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Learning Math in Time for Alignment, published by NicholasKross on January 9, 2024 on LessWrong. Epistemic status: Strong hunches, weakly held. At least some of this could be found false in experiments. If you want to do technical AI alignment research, you'll need some amount of non-trivial math knowledge. It may be more theoretical, or with more ML/biology grounding, but it'll definitely be math. How do you learn all this math? "Self-teaching" is almost a misnomer, compared to just "learning". I don't need to distill something for others, I only need myself to grok it. I may use distillation or adjacent techniques to help myself grok it, but like any N=1 self-experiment, it only needs to work for me. [1] So then... what helps me understand things? Formal rules that are written precisely Wordy concepts that one could use in an essay Math is technically the former, but real mathematicians (even the great ones!) actually use it more like the latter. That is, they use a lot of "intuition" built up over time. You can't survive on intuition alone (unless you have the genetic improbability of Ramanujan's brain). And you can't survive on rigor alone (according to all bounded human minds doing math research). Heck, even learning rigorously/boring is nontrivial (since e.g. small errors are harder to correct when you're learning an alien system). The Mathopedia concept is, in many ways, the "wordy" version. Viliam notes that math's "hardness" (i.e. objectivity) means you can't just teach it in the wordy version. After all, there is generally one real canonical definition for a mathematical object. And yet... both Viliam and Yudkowsky say that math is fun when you know what you're doing. I kind of agree! I've had fun doing (what seemed like) math, at least twice in my life! OK, so it's simple! Just make sure to understand everything thoroughly before moving to the next thing, and "play with the ideas" to understand them better. Except... there's a problem. AI timelines. Giving children quality tutoring and new K-12 curricula won't work even if we have 20 years before existentially-risky AI is used. 5 years is almost reasonable to learn deeply about a subfield or two, enough to make original contributions. AI alignment, if it involves enough math to justify this post, requires deeper-than-average understanding, and possibly an ability to create entirely new mathematics. And timelines might be as short as a year or two. [2] Tangent (for large grantmakers and orgs only) Why didn't MIRI or other groups prepare for this moment earlier? Why didn't MIRI say "OK, we have $X to fund researchers, and $Y left over, so let's put $Z towards hedging our short-timelines bets. We can do that using human enhancement and/or in-depth teaching of the relevant hard (math) parts. Let's do that now!"? I think it's something like... MIRI had pre-ML-calibrated short timelines. Now they have post-ML short timelines. In both cases, they wouldn't think "sharpening the saw"-type strategies worthwhile. And if short timelines are true now, then it's too late to use them. Luckily, insofar as AI governance does anything, we can get longer timelines. And insofar as you (a large grantmaker or org with funds/resources to spare on hedging your timeline scenarios) have enough money to hedge your timeline bets, you should fund and/or set up such longer-term programs. If you put 80% credence in 5-year timelines, but you also control $100 million in funding (e.g. you're OpenPhil), then you should be doing math-learning and intelligence enhancement programs! The Challenge So clearly, a person needs to be able to get deep understanding of lots of math (in backchaining-resistant worlds, that means lots of math). Within a year or two. In time to, and with the depth needed to, come up with new good ideas. This is the chal...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app