The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Jan 10, 2024 • 14min

LW - Does AI risk "other" the AIs? by Joe Carlsmith

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Does AI risk "other" the AIs?, published by Joe Carlsmith on January 10, 2024 on LessWrong. (Cross-posted from my website. Podcast version here, or search for "Joe Carlsmith Audio" on your podcast app. This essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that the individual essays can be read fairly well on their own, but see here for a brief summary of the essays that have been released thus far.) In my last essay, I discussed the way in which what I've called "deep atheism" (that is, a fundamental mistrust towards both "Nature" and "bare intelligence") can prompt an aspiration to exert extreme levels of control over the universe; I highlighted the sense in which both humans and AIs, on Yudkowsky's AI risk narrative, are animated by this sort of aspiration; and I discussed some ways in which our civilization has built up wariness around control-seeking of this kind. I think we should be taking this sort of wariness quite seriously. In this spirit, I want to look, in this essay, at Robin Hanson's critique of the AI risk discourse - a critique especially attuned the way in which this discourse risks control*-*gone-wrong. In particular, I'm interested in Hanson's accusation that AI risk "others" the AIs (see e.g. here, here, and here). Hearing the claim that AIs may eventually differ greatly from us, and become very capable, and that this could possibly happen fast, tends to invoke our general fear-of-difference heuristic. Making us afraid of these "others" and wanting to control them somehow ... "Hate" and "intolerance" aren't overly strong terms for this attitude.[1] Hanson sees this vice as core to the disagreement ("my best one-factor model to explain opinion variance here is this: some of us 'other' the AIs more"). And he invokes a deep lineage of liberal ideals in opposition. I think he's right to notice a tension in this vicinity. AI risk is, indeed, about fearing some sort of uncontrolled other. But is that always the bad sort of "othering?" Some basic points up front Well, let's at least avoid basic mistakes/misunderstandings. For one: hardcore AI risk folks like Yudkowsky are generally happy to care about AI welfare - at least if welfare means something like "happy sentience." And pace some of Hanson's accusations of bio-chauvinism, these folks are extremely not fussed about the fact that AI minds are made of silicon (indeed: come now). Of course, this isn't to say that AI welfare (and AI rights) issues don't get complicated (see e.g. here and here for a glimpse of some of the complications), or that humanity as a whole will get the "digital minds matter" stuff right. Indeed, I worry that we will get it horribly wrong - and I do think that the AI risk discourse under-attends to some of the tensions. But species-ism 101 (201?) - e.g., "I don't care about digital suffering" - isn't AI risk's vice. For two: clearly some sorts of otherness warrant some sorts of fear. For example: maybe you, personally, don't like to murder. But Bob, well: Bob is different. If Bob gets a bunch of power, then: yep, it's OK to hold your babies close. And often OK, too, to try to "control" Bob into not-killing-your-babies. Cf, also, the discussion of getting-eaten-by-bears in the first essay. And the Nazis, too, were different in their own way. Of course, there's a long and ongoing history of mistaking "different" for "the type of different that wants to kill your babies." We should, indeed, be very wary. But liberal tolerance has never been a blank check; and not all fear is hatred. Indeed, many attempts to diagnose the ethical mistake behind various canonical difference-related vices (racism, sexism, species-ism, etc) reveal a certain shallowness of commitment to difference-per-se. In particular: such vices are often understood as missing...

Jan 10, 2024 • 5min

LW - Saving the world sucks by Defective Altruism

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Saving the world sucks, published by Defective Altruism on January 10, 2024 on LessWrong. I don't want to save the world. I don't want to tile the universe with hedonium. I don't want to be cuckolded by someone else's pretty network-TV values. I don't want to do anything I don't want to do, and I think that's what (bad) EAs, mother Teresa, and proselytizing Christians all get wrong. Doing things because they sound nice and pretty and someone else says they're morally good suuucks. Who even decided that warm fuzzies, QALYs, or shrimp lives saved are even good axes to optimize? Because surely everyone doesn't arrive at that conclusion independently. Optimizing such universally acceptable, bland metrics makes me feel like one of those blobby, soulless corporate automata in bad tech advertisements. I don't see why people obsess over the idea of universal ethics and doing the prosocial thing. There's no such thing as the Universal Best Thing, and professing the high virtue of maximizing happiness smacks of an over-RLHFed chatbot. Altruism might be a "virtue", as in most people's evolved and social environments cause them to value it, but it doesn't have to be. The cosmos doesn't care what values you have. Which totally frees you from the weight of "moral imperatives" and social pressures to do the right thing. There comes a time in most conscientious, top-of-distribution kids' lives when they decide to Save the World. This is very bad. Unless they really do get a deep, intrinsic satisfaction from maximizing expected global happiness, they'll be in for a world of pain later on. After years of spinning their wheels, not getting anywhere, they'll realize that they hate the whole principle they've built their life around. That, deep down, their truest passion doesn't (and doesn't have to) involve the number of people suffering malaria, the quantity of sentient shrimps being factory farmed, or how many trillion people could be happy in a way they aren't 1000 years from now. I claim that scope insensitivity isn't a bug. That there are no bugs when it comes to values. That you should care about exactly what you want to care about. That if you want to team up and save the world from AI or poverty or mortality, you can, but you don't have to. You have the freedom to care about whatever you want and shouldn't feel social guilt for not liking the same values everyone else does. Their values are just as meaningful (or meaningless) as yours. Peer pressure is an evolved strategy to elicit collaboration in goofy mesa-optimizers like humans, not an indication of some true higher virtue. Life is complex, and I really doubt that what you should care about can be boiled down to something so simple as quality-adjusted life-years. I doubt it can be boiled down at all. You should care about whatever you care about, and that probably won't fit any neat moral templates an online forum hands you. It'll probably be complex, confused, and logically inconsistent, and I don't think that's a bad thing Why do I care about this so much? Because I got stuck in exactly this trap at the ripe old age of 12, and it fucked me up good. I decided I'd save the world, because a lot of very smart people on a very cool site said that I should. That it would make me feel good and be good. That it mattered. The result? Years of guilt, unproductivity, and apathy. Ending up a moral zombie that didn't know how to care and couldn't feel emotion. Wondering why enlightenment felt like hell. If some guy promised to send you to secular heaven if you just let him fuck your wife, you'd tell him to hit the road. But people jump straight into the arms of this moral cuckoldry. Choosing and caring about your values is a very deep part of human nature and identity, and you shouldn't let someone else do it for you. This advice proba...

Jan 10, 2024 • 7min

LW - On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche by Zack M Davis

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche, published by Zack M Davis on January 10, 2024 on LessWrong. Rob Bensinger argues that "ITT-passing and civility are good; 'charity' is bad; steelmanning is niche". The ITT - Ideological Turing Test - is an exercise in which one attempts to present one's interlocutor's views as persuasively as the interlocutor themselves can, coined by Bryan Caplan in analogy to the Turing Test for distinguishing between humans and intelligent machines. (An AI that can pass as human must presumably possess human-like understanding; an opponent of an idea that can pass as an advocate for it presumably must possess an advocate's understanding.) "Steelmanning" refers to the practice of addressing a stronger version of an interlocutor's argument, coined in disanalogy to "strawmanning", the crime of addressing a weaker version of an interlocutor's argument in the hopes of fooling an audience (or oneself) that the original argument has been rebutted. Bensinger describes steelmanning as "a useful niche skill", but thinks it isn't "a standard thing you bring out in most arguments." Instead, he writes, discussions should be structured around object-level learning, trying to pass each other's Ideological Turing Test, or trying resolve cruxes. I think Bensinger has it backwards: the Ideological Turing Test is a useful niche skill, but it doesn't belong on a list of things to organize a discussion around, whereas something like steelmanning naturally falls out of object-level learning. Let me explain. The ITT is a test of your ability to model someone else's models of some real-world phenomena of interest. But usually, I'm much more interested in modeling the real-world phenomena of interest directly, rather than modeling someone else's models of it. I couldn't pass an ITT for advocates of Islam or extrasensory perception. On the one hand, this does represent a distinct deficit in my ability to model what the advocates of these ideas are thinking, a tragic gap in my comprehension of reality, which I would hope to remedy in the Glorious Transhumanist Future if that were a real thing. On the other hand, facing the constraints of our world, my inability to pass an ITT for Islam or ESP seems ... basically fine? I already have strong reasons to doubt the existence of ontologically fundamental mental entities. I accept my ignorance of the reasons someone might postulate otherwise, not out of contempt, but because I just don't have the time. Or think of it this way: as a selfish seeker of truth speaking to another selfish seeker of truth, when would I want to try to pass my interlocutor's ITT, or want my interlocutor to try to pass my ITT? In the "outbound" direction, I'm not particularly selfishly interested in passing my interlocutor's ITT because, again, I usually don't care much about other people's beliefs, as contrasted to the reality that those beliefs are reputedly supposed to track. I listen to my interlocutor hoping to learn from them, but if some part of what they say seems hopelessly wrong, it doesn't seem profitable to pretend that it isn't until I can reproduce the hopeless wrongness in my own words. Crucially, the same is true in the "inbound" direction. I don't expect people to be able to pass my ITT before criticizing my ideas. That would make it harder for people to inform me about flaws in my ideas! But if I'm not particularly interested in passing my interlocutor's ITT or in my interlocutor passing mine, and my interlocutor presumably (by symmetry) feels the same way, why would we bother? All this having been said, I absolutely agree that, all else being equal, the ability to pass ITTs is desirable. It's useful as a check that you and your interlocutor are successfully communicating, rather than talking past each other. I...

Jan 10, 2024 • 56min

LW - Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer - a New Metaphor by RogerDearnaley

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer - a New Metaphor, published by RogerDearnaley on January 10, 2024 on LessWrong. Thanks to Quentin FEUILLADE--MONTIXI for the discussion in which we came up with this idea together, and for feedback on drafts. TL;DR A better metaphor for how LLMs behave, how they are trained, and particularly for how to think about the alignment strengths and challenges of LLM-powered agents. This is informed by simulator theory - hopefully people will find it more detailed, specific, and helpful than the old shoggoth metaphor. Humans often think in metaphors. A good metaphor can provide a valuable guide to intuition, or a bad one can mislead it. Personally I've found the shoggoth metaphor for LLMs rather useful, and it has repeatedly helped guide my thinking (as long as one remembers that the shoggoth is a shapeshifter, and thus a very contextual beast). However, as posts like Why do we assume there is a "real" shoggoth behind the LLM? Why not masks all the way down? make clear, not everyone finds this metaphor very helpful (my reaction was "Of course it's masks all the way down - that's what the eyes symbolize! It's made of living masks: masks of people."). Which admittedly doesn't match H.P. Lovecraft's description; perhaps it helps to have spent time playing around with base models in order to get to know the shoggoth a little better (if you haven't, I recommend it). So, I thought I'd try to devise a more useful and detailed metaphor, one that was a better guide for intuition, especially for alignment issues. During a conversation with Quentin FEUILLADE--MONTIXI we came up with one together (the stage and its animatronics were my suggestions, the puppeteer was his, and we tweaked it together). I'd like to describe this, in the hope that other people find it useful (or else that they rewrite it until they find one that works better for them). Along the way, I'll show how this metaphor can help illuminate a number of LLM behaviors and alignment issues, some well known, and others that seem to be less widely-understood. A Base Model: The Stage and its Animatronics A base-model LLM is like a magic stage. You construct it, then you read it or show it (at enormous length) a large proportion of the internet, and if you wish also books, scientific papers, images, movies, or whatever else you want. The stage is inanimate: it's not agentic, it's goal agnostic (well, unless you want consider 'contextually guess the next token' to be a goal, but it's not going to cheat by finding a way to make the next token more predictable, because that wasn't possible during its training and it's not agentic enough to be capable of conceiving that that might even be possible outside it). No Reinforcement Learning (RL) was used in its training, so concerns around Outer Alignment don't apply to it - we know exactly what its training objective was: guess next tokens right, just as we intended. We now even have some mathematical idea what it's optimizing. Nor, as we'll discuss later, do concerns around deceit, situational awareness, or gradient hacking apply to it. At this point, it's myopic, tool AI: it doesn't know or care whether we or the material world even exist, it only cares about the distribution of sequences of tokens, and all it does is repeatedly contextually generate a guess of the next token. So it plays madlibs like a professional gambler, in the same blindly monomaniacal sense that a chess machine plays chess like a grandmaster. By itself, the only risk from it is the possibility that someone else might hack your computer network to steal its weights, and what they might then do with it. Once you're done training the stage, you have a base model. Now you can flip its switch, tell the stage the title of a play, or better the first ...

Jan 10, 2024 • 8min

EA - Copenhagen Consensus Center's best investment papers for the sustainable development goals by Vasco Grilo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Copenhagen Consensus Center's best investment papers for the sustainable development goals, published by Vasco Grilo on January 10, 2024 on The Effective Altruism Forum. This is a linkpost to Copenhagen Consensus Center's 12 best investment papers for the sustainable development goals (SDGs), which were published in the Journal of Benefit-Cost Analysis in 2023. Some notes: Each paper does a cost-benefit analysis which accounts for health and economic benefits. The benefit-to-cost ratios across the 12 papers range from 18 (nutrition) to 125 (e-Government procurement). All 12 ratios are much higher than the 2.4 estimated for GiveDirectly's cash transfers to poor households in Kenya. 4 are similar to and 8 are higher than GiveWell's cost-effectiveness bar of around 24 (= 10*2.4), equal to 10 times the above. Cash transfers are often preferred due to being highly scalable, but the 12 papers deal with large investments too. As can be seen in the table below, taken from a companion post, all 12 interventions together have: An annual cost of 41 G 2020-$ (41 billion 2020 USD). Annual benefits of 2.14 T 2020-$ (2.14 trillion 2020 USD), of which 1.12 T 2020-$ are economic benefits corresponding to 14.6 % (= 1.12*1.13/(8.17 + 0.528)) of the gross domestic product (GDP) of low and lower-middle income countries in 2022. A benefit-to-cost ratio of 52.2 (= 2.14/0.041), 21.8 (= 52.2/2.4) times that of GiveDirectly's cash transfers to poor households in Kenya. I expect the benefit-to-cost ratios of the papers to be overestimates: The paper on malaria estimates a ratio of 48, whereas I infer GiveWell's is: 35.5 (= 14.8*2.4) for the Against Malaria Foundation (AMF), considering the mean cost-effectiveness across 8 countries of 14.8 times that of cash transfers. 40.8 (= 17.0*2.4) for the Malaria Consortium, considering the mean cost-effectiveness across 13 countries of 17.0 times that of cash transfers. The paper on malaria studies an annual investment of 1.1 G 2020-$, whereas GiveWell's estimates respect marginal donations. Consequently, assuming diminishing marginal returns, and that GiveWell's estimates are more accurate, that of the paper on malaria is a significant overestimate. I guess the same reasoning applies to other areas. I think 3 of the papers focus on areas which have not been funded by GiveWell nor Open Philanthropy[2]: e-Government procurement (benefit-to-cost ratio of 125). Trade (95). Land tenure security (21). As a side note, I wonder why GiveWell's (marginal) cost-effectiveness estimates do not roughly match its bar of 10 times that of cash transfers. Agricultural research and development Paper: Benefit-Cost Analysis of Increased Funding for Agricultural Research and Development in the Global South. Benefit-to-cost ratio: 33. Investment: Basic research and development, including capacity building, and technical and policy support with special focus on Low- and Lower Middle-Income countries. Research outcomes are difficult to predict, but an example could be crop yield increases using precision genetic technologies. Childhood immunization Paper: SDG Halftime Project: Benefit-Cost Analysis using Methods from the Decade of Vaccine Economics (DOVE) Model. Benefit-to-cost ratio: 101. Investment: Raise immunization coverage from 2022 levels to 2030 target for pentavalent vaccine, HPV, Japanese encephalitis, measles, measles-rubella, Men A, PCV, rotavirus, and yellow fever. Maternal and newborn health Paper: Achieving maternal and neonatal mortality development goals effectively: A cost-benefit analysis. Benefit-to-cost ratio: 87. Investment: Sufficient staff and resources at all birth facilities to deliver a package of basic emergency obstetric and newborn care and family planning services, including bag and mask for neonatal resuscitation, removal of retained products of...

Jan 10, 2024 • 4min

EA - Call for Expressions of Interest: NYU Wild Animal Welfare Summit by Sofia Fogel

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Call for Expressions of Interest: NYU Wild Animal Welfare Summit, published by Sofia Fogel on January 10, 2024 on The Effective Altruism Forum. Deadline: March 1, 2024 (followed by rolling submissions) Event: June 21-22, 2024 Location: New York University, New York, NY The NYU Wild Animal Welfare Program is hosting a two-day wild animal welfare summit on June 21-22, 2024. The aim of this event is to connect scholars with an interest in this topic, particularly scholars across a variety of fields and career stages. The first day of the summit will feature lightning talks and discussion sessions. The second day will feature breakout sessions for workshopping collaborative project ideas. Both days will also include vegan meals and plenty of networking opportunities. We welcome expressions of interest from scholars in all fields, particularly scholars who work in animal welfare or conservation science. Please note that funding for travel and hotel is available for early-career scholars, i.e., scholars within five years of their terminal degree. If you have interest in attending this summit, please send the below materials to Sofia Fogel at sofia.fogel@nyu.edu. We guarantee full consideration of all submissions received by March 1, 2024. We will also consider submissions received after that date on a rolling basis. Please include in your expression of interest: A CV or resume. A statement of interest with three elements: A short summary of your current research, your expected future research, and how your research relates to wild animal welfare. (500 words max.) (Optional) If you have ideas for collaborative research projects that you might like to discuss at this summit, please describe them. (250 words max.) (Optional) If you might like to give a lightning talk about your current or future research, please suggest a topic or set of topics. (250 words max.) Please note that if you answer questions (b) and (c), your answers can range from general (e.g., "Researching the effects of wildlife corridors on different kinds of species") to specific (e.g., "Measuring the effects of a new wildlife corridor in Yellowstone National Park on the movement of elk populations." Please also note that answering these questions does not commit you to discussing your ideas or presenting your work at the event. Topics that we see as within scope for this summit include, but are not limited to: How can we assess wild animal welfare at individual and population levels? How can we make welfare comparisons within and across species? What are the most common causes of morbidity and mortality for wild animals, and how do they vary within and across species? How does the project of improving wild animal welfare interact with the project of conserving species and ecosystems? What are the costs and benefits of different kinds of population control for different individuals, species, and ecosystems? How can we support individuals, species, and ecosystems in adapting to human-caused climate change and other such environmental changes? How can we support coordination and collaboration between scholars who work in animal welfare and environmental conservation, among other areas? How can we educate advocates, policymakers, and the general public about the relationship between human, animal, and environmental protection? If you are interested in these or related topics, we would love to hear from you! If you have any questions, feel free to contact Sofia Fogel at sofia.fogel@nyu.edu. Thank you to Animal Charity Evaluators and Open Philanthropy for your generous support of this program and event. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jan 10, 2024 • 3min

EA - Why can't we accept the human condition as it existed in 2010? by Hayven Frienby

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why can't we accept the human condition as it existed in 2010?, published by Hayven Frienby on January 10, 2024 on The Effective Altruism Forum. By this, I mean a world in which: Humans remain the dominant intelligent, technological species on Earth's landmasses for a long period of time (> ~10,000 years). AGI is never developed, or it gets banned / limited in the interests of human safety. AI never has much social or economic impact. Narrow AI never advances much beyond where it is today, or it becomes banned / limited in the interests of human safety. Mind uploading is impossible or never pursued. Life extension (beyond modest gains due to modern medicine) isn't possible, or is never pursued. Any form of transhumanist initiatives are impossible or never pursued. No contact is made with alien species or extraterrestrial AIs, no greater-than-human intelligences are discovered anywhere in the universe. Every human grows, peaks, ages, and passes away within ~100 years of their birth, and this continues for the remainder of the human species' lifetime. Most other EAs I've talked to have indicated that this sort of future is suboptimal, undesirable, or best avoided, and this seems to be a widespread position among AI researchers as well (1). Even MIRI founder Eliezer Yudkowsky, perhaps the most well-known AI abolitionist outside of EA circles, wouldn't go as far as to say that AGI should never be developed, and that transhumanist projects should never be pursued (2). And he isn't alone -- there are many, many researchers both within and outside of the EA community with similar views on P(extinction) and P(societal collapse), and they still wouldn't accept the idea that the human condition should never be altered via technological means. My question is why can't we just accept the human condition as it existed before smarter-than-human AI (and fundamental alterations to our nature) were considered to be more than pure fantasy? After all, the best way to stop a hostile, unaligned AI is to never invent it in the first place. The best way to avoid the destruction of future value by smarter-than-human artificial intelligence is to avoid obsession with present utility and convenience. So why aren't more EA-aligned organizations and initiatives (other than MIRI) presenting global, strictly enforced bans on advanced AI training as a solution to AI-generated x-risk? Why isn't there more discussion of acceptance (of the traditional human condition) as an antidote to the risks of AGI, rather than relying solely on alignment research and safety practices to provide a safe path forward for AI (I'm not convinced such a path exists)? Let's leave out the considerations of whether AI development can be practically stopped at this stage, and just focus more on the philosophical issues here. References: Katya_Grace (EA Forum Poster) (2024, January 5). Survey of 2,778 AI authors: six parts in pictures. Yudkowsky, E. S. (2023, March 29). The only way to deal with the threat from AI? Shut it down. Time. https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/ Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jan 10, 2024 • 1h 8min

EA - ウィリアム・マッカスキル「効果的利他主義の定義」 by EA Japan

Jan 9, 2024 • 57min

AF - Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer - a New Metaphor by Roger Dearnaley

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer - a New Metaphor, published by Roger Dearnaley on January 9, 2024 on The AI Alignment Forum. Thanks to Quentin FEUILLADE--MONTIXI for the discussion in which we came up with this idea together, and for feedback on drafts. TL;DR A better metaphor for how LLMs behave, how they are trained, and particularly for how to think about the alignment strengths and challenges of LLM-powered agents. This is informed by simulator theory - hopefully people will find it more detailed, specific, and helpful than the old shoggoth metaphor. Humans often think in metaphors. A good metaphor can provide a valuable guide to intuition, or a bad one can mislead it. Personally I've found the shoggoth metaphor for LLMs rather useful, and it has repeatedly helped guide my thinking (as long as one remembers that the shoggoth is a shapeshifter, and thus a very contextual beast). However, as posts like Why do we assume there is a "real" shoggoth behind the LLM? Why not masks all the way down? make clear, not everyone finds this metaphor very helpful (my reaction was "Of course it's masks all the way down - that's what the eyes symbolize! It's made of living masks: masks of people."). Which admittedly doesn't match H.P. Lovecraft's description; perhaps it helps to have spent time playing around with base models in order to get to know the shoggoth a little better (if you haven't, I recommend it). So, I thought I'd try to devise a more useful and detailed metaphor, one that was a better guide for intuition, especially for alignment issues. During a conversation with Quentin FEUILLADE--MONTIXI we came up with one together (the stage and its animatronics were my suggestions, the puppeteer was his, and we tweaked it together). I'd like to describe this, in the hope that other people find it useful (or else that they rewrite it until they find one that works better for them). Along the way, I'll show how this metaphor can help illuminate a number of LLM behaviors and alignment issues, some well known, and others that seem to be less widely-understood. A Base Model: The Stage and its Animatronics A base-model LLM is like a magic stage. You construct it, then you read it or show it (at enormous length) a large proportion of the internet, and if you wish also books, scientific papers, images, movies, or whatever else you want. The stage is inanimate: it's not agentic, it's goal agnostic (well, unless you want consider 'contextually guess the next token' to be a goal, but it's not going to cheat by finding a way to make the next token more predictable, because that wasn't possible during its training and it's not agentic enough to be capable of conceiving that that might even be possible outside it). No Reinforcement Learning (RL) was used in its training, so concerns around Outer Alignment don't apply to it - we know exactly what its training objective was: guess next tokens right, just as we intended. We now even have some mathematical idea what it's optimizing. Nor, as we'll discuss later, do concerns around deceit, situational awareness, or gradient hacking apply to it. At this point, it's myopic, tool AI: it doesn't know or care whether we or the material world even exist, it only cares about the distribution of sequences of tokens, and all it does is repeatedly contextually generate a guess of the next token. So it plays madlibs like a professional gambler, in the same blindly monomaniacal sense that a chess machine plays chess like a grandmaster. By itself, the only risk from it is the possibility that someone else might hack your computer network to steal its weights, and what they might then do with it. Once you're done training the stage, you have a base model. Now you can flip its switch, tell the stage the title of a play, or bett...

Jan 9, 2024 • 5min

LW - Learning Math in Time for Alignment by NicholasKross

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Learning Math in Time for Alignment, published by NicholasKross on January 9, 2024 on LessWrong. Epistemic status: Strong hunches, weakly held. At least some of this could be found false in experiments. If you want to do technical AI alignment research, you'll need some amount of non-trivial math knowledge. It may be more theoretical, or with more ML/biology grounding, but it'll definitely be math. How do you learn all this math? "Self-teaching" is almost a misnomer, compared to just "learning". I don't need to distill something for others, I only need myself to grok it. I may use distillation or adjacent techniques to help myself grok it, but like any N=1 self-experiment, it only needs to work for me. [1] So then... what helps me understand things? Formal rules that are written precisely Wordy concepts that one could use in an essay Math is technically the former, but real mathematicians (even the great ones!) actually use it more like the latter. That is, they use a lot of "intuition" built up over time. You can't survive on intuition alone (unless you have the genetic improbability of Ramanujan's brain). And you can't survive on rigor alone (according to all bounded human minds doing math research). Heck, even learning rigorously/boring is nontrivial (since e.g. small errors are harder to correct when you're learning an alien system). The Mathopedia concept is, in many ways, the "wordy" version. Viliam notes that math's "hardness" (i.e. objectivity) means you can't just teach it in the wordy version. After all, there is generally one real canonical definition for a mathematical object. And yet... both Viliam and Yudkowsky say that math is fun when you know what you're doing. I kind of agree! I've had fun doing (what seemed like) math, at least twice in my life! OK, so it's simple! Just make sure to understand everything thoroughly before moving to the next thing, and "play with the ideas" to understand them better. Except... there's a problem. AI timelines. Giving children quality tutoring and new K-12 curricula won't work even if we have 20 years before existentially-risky AI is used. 5 years is almost reasonable to learn deeply about a subfield or two, enough to make original contributions. AI alignment, if it involves enough math to justify this post, requires deeper-than-average understanding, and possibly an ability to create entirely new mathematics. And timelines might be as short as a year or two. [2] Tangent (for large grantmakers and orgs only) Why didn't MIRI or other groups prepare for this moment earlier? Why didn't MIRI say "OK, we have $X to fund researchers, and $Y left over, so let's put $Z towards hedging our short-timelines bets. We can do that using human enhancement and/or in-depth teaching of the relevant hard (math) parts. Let's do that now!"? I think it's something like... MIRI had pre-ML-calibrated short timelines. Now they have post-ML short timelines. In both cases, they wouldn't think "sharpening the saw"-type strategies worthwhile. And if short timelines are true now, then it's too late to use them. Luckily, insofar as AI governance does anything, we can get longer timelines. And insofar as you (a large grantmaker or org with funds/resources to spare on hedging your timeline scenarios) have enough money to hedge your timeline bets, you should fund and/or set up such longer-term programs. If you put 80% credence in 5-year timelines, but you also control $100 million in funding (e.g. you're OpenPhil), then you should be doing math-learning and intelligence enhancement programs! The Challenge So clearly, a person needs to be able to get deep understanding of lots of math (in backchaining-resistant worlds, that means lots of math). Within a year or two. In time to, and with the depth needed to, come up with new good ideas. This is the chal...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app