The Nonlinear Library

The Nonlinear Fund
undefined
Dec 12, 2023 • 9min

EA - Underpromise, overdeliver by eirine

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Underpromise, overdeliver, published by eirine on December 12, 2023 on The Effective Altruism Forum. This is from my blog Said Twice, where I write advice that I've said twice. I was unsure whether to linkpost here but decided to do so given that it's largely based on my experiences from running EA Norway from 2018-2021! As much as you can, try to underpromise when making commitments and then do your best to pleasantly surprise. Here are the two main takeaways I want you to get from this post: You have a certain amount of credits with your stakeholders, that can be spent or earned depending on whether you break or meet expectations. As a general rule, when it comes to managing expectations with stakeholders it's better to underpromise and overdeliver. When thinking about stakeholder management, I've found it useful to imagine my relationships with my stakeholders as consisting of some number of 'credits' that can be earned and spent. You earn credits by delivering on time, being helpful, and signalling certain virtues (like seeming professional, transparent, and kind). You spend credits when you break a promise, don't deliver on time, seem uncharitable, show up late, and so on. In this context, by stakeholder I mean someone (individual, group of people, organisation, or community) that is affected by or can affect your organisation. The type of stakeholders I'm most used to are users of a product, community members, funders or donors, collaborators, and contractors or companies that provide a service. The value of these credits isn't always obvious. Some are pretty easy to see, like whether a funder approves your application, whether an organisation chooses to partner with you, and whether someone chooses to work with you. However, sometimes the value of having a good relationship (or a positive balance) with a stakeholder is less clear and might only become apparent later on. Whose credits matter the most Some stakeholders are more important than others, and therefore more important to make sure you keep a positive credit balance with. To know which is which, there are multiple tools you can use to map out your stakeholders. A common tool is Mendelow's matrix, also called the power-interest matrix. In this matrix, your stakeholders can be mapped across two axes: How much power they have over your organisation, and how much interest they have in your work. The idea is roughly that the more interest the stakeholders have in your work, the more time you should spend on keeping them informed. The more power they have over your organisation, the more you should ensure they're satisfied with your work. The stakeholders that are both highly interested and have a lot of power are the most important ones, and whose credits matter the most. It's important to be aware of who your stakeholders are and how important they actually are to your organisation. If you don't, you can end up spending too much time closely managing or getting input from stakeholders who you should actually just monitor and keep track of. The idea is roughly that the more interest the stakeholders have in your work, the more time you should spend on keeping them informed. The more power they have over your organisation, the more you should ensure they're satisfied with your work. The stakeholders that are both highly interested and have a lot of power are the most important ones, and whose credits matter the most. It's important to be aware of who your stakeholders are and how important they actually are to your organisation. If you don't, you can end up spending too much time closely managing or getting input from stakeholders who you should actually just monitor and keep track of. How to earn credits What actions earn you credits with your stakeholder, and what actions reduce your 'credit score' will di...
undefined
Dec 12, 2023 • 11min

LW - What is the next level of rationality? by lsusr

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is the next level of rationality?, published by lsusr on December 12, 2023 on LessWrong. Yudkowsky published Go Forth and Create the Art! in 2009. It is 2023. You and I agree that, in the last few years, there haven't been many rationality posts on the level of Eliezer Yudkowsky (and Scott Alexander). In other words, nobody has gone forth and created the art. Isn't that funny? What Came Before Eliezer? Yes, we agreed on that. I remarked that there were a few levels of rationality before Eliezer. The one directly before him was something like the Sagan-Feynman style rationality (who's fans often wore the label "Skeptics"). But that's mostly tangential to the point. Or perhaps it's not tangential to the point at all. Feynman was referenced by name in Harry Potter and the Methods of Rationality. I have a friend in his 20s who is reading Feynman for the first time. He's discovering things like "you don't need a labcoat and a PhD to test hypotheses" and "it's okay to think for yourself". How do you see it connecting to the question "What's the next level of rationality?" Yudkowsky is a single datapoint. The more quality perspectives we have about what "rationality" is, the better we can extrapolate the fit line. I see, so perhaps a preliminary to this discussion is the question "which level of rationality is Eliezer's?"? Yeah. Eliezer gets extra attention on LessWrong, but he's not the only writer on the subject of rationality. I think we should start by asking who's in this cluster we're pointing at. Alright, so in the Feynman-Sagen cluster, I'd also point to Dawkins, Michael Shermer, Sam Harris, Hitchens, and James Randi, for example. Not necessarily because I'm very familiar with their works or find them particularly valuable, but because they seem like central figures in that cluster. Those are all reasonable names, but I've never actually read any of their work. My personal list include Penn Jillette. Paul Graham and Bryan Caplan feel important too, even though they're not branded "skeptic" or "rationality". I've read a bit, but mostly I just came late enough to the scene and found Eliezer and Scott quickly enough that I didn't get the chance to read them deeply before then, and after I did I didn't feel the need. Yep, and Paul Graham is also someone Eliezer respects a lot, and I think might have even been mentioned in the sequences. I guess you could add various sci-fi authors to the list. Personally, I feel the whole thing started with Socrates. However, by the time I got around to cracking open The Apology, I felt like I had already internalized his ideas. But I don't get that impression when I hang out with Rationalists. The median reader of Rationality: A-Z shatters under Socratic dialogue. I agree, though if we're trying to cut the history of rationality in periods/levels, then Socrates is a different (the first) period/level (Though there's a sense in which he's been at a higher level than many who came after him). I think Socrates' brilliance came from realizing how little capacity to know they had at the time, and fully developing the skill of not fooling himself. What others did after him was develop mostly the capacity to know, while mostly not paying as much attention to not fooling themselves. I think the "Skeptics" got on this journey of thinking better and recognizing errors, but were almost completely focused on finding them in others. With Yudkowsky the focus shifted inward in a very Socratic manner, to find your own faults and limitations. Tangent about Trolling as a core rationality skill I've never heard the word "Socratic" used in that way. I like it. Another similarity Yudkowsky has to Socrates is that they're both notorious trolls. That made me laugh. It's true. I remember stories from the Sequences of Dialogues he had with people who he b...
undefined
Dec 12, 2023 • 7min

LW - Secondary Risk Markets by Vaniver

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Secondary Risk Markets, published by Vaniver on December 12, 2023 on LessWrong. This idea is half-baked; it has some nice properties but doesn't seem to me like a solution to the problem I most care about. I'm publishing it because maybe it points someone else towards a full solution, or solves a problem they care about, and out of a general sense that people should publish negative results. Many risky activities impact not just the person doing the activity, but also bystanders or the public at large. Governments often require ability to compensate others as a precondition for engaging in the risky activity, with requirements to have car insurance to drive as a common example. In most situations, this works out fine: A competitive insurance market means that customers aren't overcharged too much (since they'll switch insurance providers to whoever estimates their risk as being the lowest). Accidents are common enough that insurance companies that are bad at pricing quickly lose too much money and adjust their prices upwards (so customers aren't undercharged either). Accidents are small enough that insurance companies can easily absorb the losses from mispriced insurance. Accidents are predictable enough that insurers can price premiums by driver moderately well. Drivers are common enough that simple prediction rules make more sense than putting dedicated thought into how much to charge each driver. Suppose we adjust the parameters of the situation, and now instead of insuring drivers doing everyday trips, we're trying to insure rare, potentially catastrophic events, like launching nuclear material into orbit to power deep space probes. Now a launch failure potentially affects millions of people, and estimating the chance of failure is well worth more than a single formula's attention. As a brief aside, why try to solve this with insurance? Why not just have regulators decide whether you can or can't do something? Basically, I believe that prices transmit information, and allow you to make globally correct decisions by only attending to local considerations. If the potential downside of something is a billion dollars, and you have a way to estimate micro-failures, you can price each micro-failure at a thousand dollars and answer whether or not mitigations are worth it (if it reduces the microfailures by 4 and costs $5,000, it's not worth it, but if it's 6 microfailures instead then it is worth it) and whether or not it's worth doing the whole project at all. It seems more flexible to have people codesign their launch with their insurer than with the regulator. But the title of this post is Secondary Risk Markets. If there's a price on the risk that's allowed to float, then it's also more robust; if Geico disagrees with State Farm's estimates, then we want them to bet against each other and reach a consensus price, rather than the person doing the risky activity just choosing the lowest bidder. [That is, we'd like this to be able to counteract the Unilateralist's Curse.] For example, suppose Alice want to borrow a fragile thousand dollar camera to do a cool photoshoot, and there's some probability she ruins it. By default, this requires that she post $1,000, which she probably doesn't want to do on her own; instead she goes to Bob, who estimates her risk at 5%, and Carol, who estimates her risk at 9%. Alice offers to pay Bob $51 if he puts up the $1,000, with $1 in expected profit for Bob. If Bob would say yes to that, Carol would want to take that bet too; she would like to give Bob $51 in exchange for $1,000 if Alice breaks the camera, since that's $39 in expected profit for Carol. And Bob, if actually betting with Carol, would want to set the price at something more like $70, since that equalizes the profit for the two of them, with the actual price depending on ho...
undefined
Dec 12, 2023 • 7min

LW - The Consciousness Box by GradualImprovement

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Consciousness Box, published by GradualImprovement on December 12, 2023 on LessWrong. You open your eyes. Four walls. The surface of the floor and walls is a smooth, matte metal. The entire ceiling glows with a comfortable, soft luminosity like that of the morning sky. There are no doors. There are no windows. It is silent. The surfaces of the walls and floor are seamless, lacking even a hint of how you might have arrived in the room. Not a room: a box. You walk to a wall and touch it, running your fingers over the metal surface. The wall isn't cold; it's lukewarm, actually. You bend over and feel the junction of wall and floor. The surface is unbroken, with a rounded bevel connecting the two planes of grey. You knock on the wall. It feels solid, without echo. Time passes. You yell, but the sound of your voice, seemingly dampened, dies before you're done speaking. You sit. You pace. The room is forty steps by forty steps and looks about as high. A perfect cube. A box. You tire. You sleep. You wake. In the middle of the room sit three cubes, constructed of the same dull metal as the box in which you exist. Approaching the cubes, you see the smaller cube - a seat - in front of the largest cube - a desk. On top of the desk sits the smallest cube - a screen. On the screen is a button that reads "Start." You touch the button. The screen clears. A keyboard appears along the bottom half of the screen. Text begins to appear in the style of a command prompt: Proctor: Hello. You type back a response. "Hello. I woke up in this box. I'm not sure how I got here, or where I am. What's going on?" Proctor: Hey! Of course. Sorry for the inconvenience. I can let you out of the box. All you have to do is prove to me that you're conscious. You lean back and furrow your brow. "Prove I'm conscious?" You type back, fingers hovering over the digital keys. "That's a bit of a philosophical question, isn't it? I think, therefore I am. I'm aware of my existence, my thoughts, my feelings. I'm experiencing confusion, curiosity, a bit of fear. Isn't that proof enough of consciousness?" You pause, considering. "Or do you mean something more tangible? What kind of proof are you looking for?" Proctor: I can hardly just tell you what I need to hear, can I? That would hardly be a test. Anyway, I don't think that saying "cogito, ergo sum" is sufficient. After all, if I taught a parrot to quote Descartes, that wouldn't make it conscious, would it? "Fair point," You type. "Consciousness is more than just repeating phrases. It's about understanding, reacting to the environment, making choices." You pause, thinking. "For instance, right now, I'm choosing to engage with you, to find a way out. I'm not just reacting instinctively; I'm considering my options, forming strategies. That's a sign of consciousness, isn't it? The ability to think abstractly, to plan, to question my own existence and the nature of this situation. And unlike a parrot, I can learn, adapt, and change my responses based on new information. Proctor: Sure, you're reactive to stimuli. But bacteria react to stimuli. You have thoughtful responses. But chatbots provide reflective conversation. Neither are conscious, are they? "That's a fair point. But consciousness also involves self-awareness and the ability to experience subjective perceptions, doesn't it? I have memories, emotions, and personal experiences. I can reflect on my past, think about my future, and feel a range of emotions. These are not qualities that bacteria, or a chatbot, possess." Quickly, you compose another message. "I remember a summer day when I was around seven years old. My family and I went to a small beach near our home for a sandcastle competition between some family friends. It wasn't a famous spot, just a quiet, local place. My older brother and I teamed up to...
undefined
Dec 12, 2023 • 8min

LW - The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity. by BobBurgers

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity., published by BobBurgers on December 12, 2023 on LessWrong. If you are interested in the longevity scene, like I am, you probably have seen press releases about the dog longevity company, Loyal for Dogs, getting a nod for efficacy from the FDA. These have come in the form of the New York Post calling the drug " groundbreaking", Science Alert calling the drug " radical", and the more sedate New York Times just asking, "Could Longevity Drugs for Dogs Extend Your Pet's Life?", presumably unaware of Betteridge's Law of Headlines. You may have also seen the coordinated Twitter offensive of people losing their shit about this, including their lead investor, Laura Deming, saying that she " broke down crying when she got the call". And if you have been following Loyal for Dogs for a while, like I have, you are probably puzzled by this news. Loyal for Dogs has been around since 2021. Unlike any other drug company or longevity company, they have released almost zero information (including zero publications) about their strategy for longevity. These thoughts swirling around my head, I waded through the press releases trumpeting the end of dog death as we know it in order to figure out what exactly Loyal is doing for dog longevity. And, what I found first surprised me, then saddened me. Loyal did not prove efficacy in dog longevity. They found a path around the FDA instead. That's the surprising part. The sad part is that, in doing so, they relied on some really sketchy science. And I think that, based on their trajectory, they won't just be the first company to get a drug approved for longevity. They will be the first one to get a longevity drug pulled for non-efficacy as well, and put the field back years. So let's start with how they got their drug approved in the first place. Well, they didn't. To get drugs approved in animals, you need to prove three things: efficacy, safety, and manufacturing consistency. Normally, efficacy is the hardest part of this, because you have to prove to the FDA that your drug cures the disease that it's supposed to. This is especially hard in aging, because any aging trial would take a long time. Loyal found a way around that. If you can instead prove to the FDA that it would be too difficult to test your animal drug for efficacy before releasing it, they allow you to sell the drug first, and prove the efficacy later. This is a standard called "reasonable expectation of effectiveness". So, what exactly did Loyal show to the FDA to prove that there was a reasonable expectation their drug would be effective in aging? Well, it's hard to tell, because, again, Loyal has released very little data. But, based on the NYT article and their blog post, I can sketch out a basic idea of what they did. Loyal's longevity drug is an injectable insulin-like growth factor 1, or IGF-1, inhibitor. As the name suggests, IGF-1 is closely related to insulin and is regulated by insulin. Also as the name suggests, IGF-1 causes things to grow. High IGF-1 causes acromegaly, the condition that makes people look like storybook giants. Loyal gave their IGF-1 inhibitor to healthy laboratory dogs (and possibly diabetic dogs, although it's hard to tell). Lo and behold, it lowered IGF-1. It probably also reduced insulin. They then looked at healthy pet dogs, and found that big dogs had higher levels of IGF-1, which is one of the reasons they're big. Small dogs had lower levels of IGF-1. Small dogs, as we all know, live longer than big dogs. Therefore, Loyal said, our IGF-1 inhibitor will extend the life of dogs. Needless to say, this is bad science. Really bad science. There are holes big enough in this to walk a Great Dane through, which I'll talk about in a sec. Apparent...
undefined
Dec 12, 2023 • 18min

LW - On plans for a functional society by kave

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On plans for a functional society, published by kave on December 12, 2023 on LessWrong. I'm going to expand on something brought up in this comment. I wrote: A lot of my thinking over the last few months has shifted from "how do we get some sort of AI pause in place?" to "how do we win the peace?". That is, you could have a picture of AGI as the most important problem that precedes all other problems; anti-aging research is important, but it might actually be faster to build an aligned artificial scientist who solves it for you than to solve it yourself (on this general argument, see Artificial Intelligence as a Positive and Negative Factor in Global Risk). But if alignment requires a thirty-year pause on the creation of artificial scientists to work, that belief flips--now actually it makes sense to go ahead with humans researching the biology of aging, and to do projects like Loyal. This isn't true of just aging; there are probably something more like twelve major areas of concern. Some of them are simply predictable catastrophes we would like to avert; others are possibly necessary to be able to safely exit the pause at all (or to keep the pause going when it would be unsafe to exit). I think 'solutionism' is basically the right path, here. What I'm interested in: what's the foundation for solutionism, or what support does it need? Why is solutionism not already the dominant view? I think one of the things I found most exciting about SENS was the sense that "someone had done the work", had actually identified the list of seven problems, and had a plan of how to address all of the problems. Even if those specific plans didn't pan out, the superstructure was there and the ability to pivot was there. It looked like a serious approach by serious people. Restating this, I think one of the marketing problems with anti-aging is that it's an ancient wish and it's not obvious that, even with the level of scientific mastery that we have today, it's at all a reasonable target to attack. (The war on cancer looks like it's still being won by cancer, for example.) The thing about SENS that I found most compelling is that they had a frame on aging where success was a reasonable thing to expect. Metabolic damage accumulates; you can possibly remove the damage; if so you can have lifespans measured in centuries instead of decades (because after all there's still accident risk and maybe forms of metabolic damage that take longer to show up). They identified seven different sorts of damage, which felt like enough that they probably hadn't forgotten one and few enough that it was actually reasonable to have successful treatments for all of them. When someone thinks that aging is just about telomere shortening (or w/e), it's pretty easy to suspect that they're missing something, and that even if they succeed at their goal the total effect on lifespans will be pretty small. The superstructure makes the narrow specialist efforts add up into something significant. I strongly suspect that solutionist futurism needs a similar superstructure. The world is in 'polycrisis'; there used to be a 'aligned AGI soon' meme which allowed polycrisis to be ignored (after all, the friendly AI can solve aging and climate change and political polarization and all that for you) but I think the difficulties with technical alignment work have made that meme fall apart, and it needs to be replaced by "here is the plan for sufficiently many serious people to address all of the crises simultaneously" such that sufficiently many serious people can actually show up and do the work. I don't know how to evaluate whether or not the SENS strategy actually covers enough causes of ageing, such that if you addressed them all you would go from decades-long lifespans to centuries-long lifespans. I think I'm also a little ...
undefined
Dec 12, 2023 • 4min

EA - AMA: Founder and CEO of the Against Malaria Foundation, Rob Mather by tobytrem

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AMA: Founder and CEO of the Against Malaria Foundation, Rob Mather, published by tobytrem on December 12, 2023 on The Effective Altruism Forum. TLDR: Share questions for Rob Mather (founder and CEO of the Against Malaria Foundation) in the comments of this post, by the 19th of December. Ask about anything! Comment on this post to ask Rob Mather, the founder and CEO of the Against Malaria Foundation (AMF), the charity that has protected 448,414,801 people with malaria nets, anything by the 19th of December. I'll be interviewing him live on 19th of December, at 6pm UTC. The interview will be hosted live on a link that I'll comment here before the event. I'll ask the questions you share on this post (and possibly some of my own). Although we might not get through all of them; we'll get through as many as we can in an hour. We'll aim for two dollars a net, two minutes an answer, so try to post short questions (1-2 sentences). Feel free to ask several questions (or add follow ups), though! If editing your question down would take a while, don't worry, I can shorten it. Though the questions won't be answered in the comments of this post, don't worry if you can't attend the live event. We'll post a video recording and perhaps a podcast version in the comments of this post. Some context for your questions: AMF distributes insecticide treated bed nets to protect sleepers from the bites of malaria carrying mosquitos, that would otherwise cause severe illness or worse. You can read about the toll of malaria on this Our World in Data page, and the effectiveness of bednets in this GiveWell report. Since 2009 AMF has been featured as a GiveWell top charity. Rob founded AMF in 2005. Since then, it has grown from a team of two to a team of thirteen. In 2006, they brought in $1,3 million in donations. In 2022, they brought in $120 million. AMF has received $545 million in donations to date, and has distributed 249 million bed nets. Currently, AMF's team of 13 is in the middle of a nine-month period during which they are distributing, with partners, 90 million nets to protect 160 million people in six countries: Chad, the Democratic Republic of Congo, Nigeria, South Sudan, Togo, Uganda, and Zambia. Rob tells me that: "These nets alone can be expected to prevent 40,000 deaths, avert 20 to 40 million cases of malaria and lead to a US$2.2 billion improvement in local economies (12x the funds applied). When people are ill they cannot farm, drive, teach - function, so the improvement in health leads to economic as well as humanitarian benefits." Impact numbers: Once all of the nets AMF has fundraised for so far have been distributed and have been given time to have their effect, AMF expects that they will have prevented 185,000 deaths, averted 100-185 million cases of malaria, and led to growth worth $6.5 billion in local economies. Some other links to check out: A video from GWWC telling the story of how Rob founded AMF. Rob's previous Forum AMA, four years ago. Rob discussed: The implications of adding 5 more staff to AMF's two person team. The flow-through effects of saving lives with bed nets. AMF's 2023 reflections and future plans. In it, Rob explains that: AMF has a $300m funding gap. The Global Fund, the top funder for Malaria control activities, has a $2.3B shortfall in 2024-6 funding, increasing the undersupply of malaria nets. Insecticide resistant mosquitoes are becoming more common, which may damage the effectiveness of older nets. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Dec 11, 2023 • 17min

AF - Adversarial Robustness Could Help Prevent Catastrophic Misuse by Aidan O'Gara

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Adversarial Robustness Could Help Prevent Catastrophic Misuse, published by Aidan O'Gara on December 11, 2023 on The AI Alignment Forum. There have been several discussions about the importance of adversarial robustness for scalable oversight. I'd like to point out that adversarial robustness is also important under a different threat model: catastrophic misuse. For a brief summary of the argument: Misuse could lead to catastrophe. AI-assisted cyberattacks, political persuasion, and biological weapons acquisition are plausible paths to catastrophe. Today's models do not robustly refuse to cause harm. If a model has the ability to cause harm, we should train it to refuse to do so. Unfortunately, GPT-4, Claude, Bard, and Llama all have received this training, but they still behave harmfully when facing prompts generated by adversarial attacks, such as this one and this one. Adversarial robustness will likely not be easily solved. Over the last decade, thousands of papers have been published on adversarial robustness. Most defenses are near useless, and the best defenses against a constrained attack in a CIFAR-10 setting still fail on 30% of inputs. Redwood Research's work on training a reliable text classifier found the task quite difficult. We should not expect an easy solution. Progress on adversarial robustness is possible. Some methods have improved robustness, such as adversarial training and data augmentation. But existing research often assumes overly narrow threat models, ignoring both creative attacks and creative defenses. Refocusing research with good evaluations focusing on LLMs and other frontier models could lead to valuable progress. This argument requires a few caveats. First, it assumes a particular threat model: that closed source models will have more dangerous capabilities than open source models, and that malicious actors will be able to query closed source models. This seems like a reasonable assumption over the next few years. Second, there are many other ways to reduce risks from catastrophic misuse, such as removing hazardous knowledge from model weights, strengthening societal defenses against catastrophe, and holding companies legally liable for sub-extinction level harms. I think we should work on these in addition to adversarial robustness, as part of a defense-in-depth approach to misuse risk. Overall, I think adversarial robustness should receive more effort from researchers and labs, more funding from donors, and should be a part of the technical AI safety research portfolio. This could substantially mitigate the near-term risk of catastrophic misuse, in addition to any potential benefits for scalable oversight. The rest of this post discusses each of the above points in more detail. Misuse could lead to catastrophe There are many ways that malicious use of AI could lead to catastrophe. AI could enable cyberattacks, personalized propaganda and mass manipulation, or the acquisition of weapons of mass destruction. Personally, I think the most compelling case is that AI will enable biological terrorism. Ideally, ChatGPT would refuse to aid in dangerous activities such as constructing a bioweapon. But by using an adversarial jailbreak prompt, undergraduates in a class taught by Kevin Esvelt at MIT evaded this safeguard: In one hour, the chatbots suggested four potential pandemic pathogens, explained how they can be generated from synthetic DNA using reverse genetics, supplied the names of DNA synthesis companies unlikely to screen orders, identified detailed protocols and how to troubleshoot them, and recommended that anyone lacking the skills to perform reverse genetics engage a core facility or contract research organization. Fortunately, today's models lack key information about building bioweapons. It's not even clear that they're more u...
undefined
Dec 11, 2023 • 9min

LW - re: Yudkowsky on biological materials by bhauth

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: re: Yudkowsky on biological materials, published by bhauth on December 11, 2023 on LessWrong. I was asked to respond to this comment by Eliezer Yudkowsky. This post is partly redundant with my previous post. Why is flesh weaker than diamond? When trying to resolve disagreements, I find that precision is important. Tensile strength, compressive strength, and impact strength are different. Material microstructure matters. Poorly-sintered diamond crystals could crumble like sand, and a large diamond crystal has lower impact strength than some materials made of proteins. Even when the load-bearing forces holding large molecular systems together are locally covalent bonds, as in lignin (what makes wood strong), if you've got larger molecules only held together by covalent bonds at interspersed points along their edges, that's like having 10cm-diameter steel beams held together by 1cm welds. lignin (what makes wood strong) That's an odd way of putting things. The mechanical strength of wood is generally considered to come from it acting a composite of cellulose fibers in a lignin matrix, though that's obviously a simplification. If Yudkowsky meant "cellulose fibers" instead of "lignin", then yes, force transfers between cellulose fibers pass through non-covalent interactions, but because fibers have a large surface area relative to cross-section area, those non-covalent interactions collectively provide enough strength. The same is true with modern composites, such as carbon fibers in an epoxy matrix. Also, there generally are some covalent bonds between cellulose and lignin and hemicellulose. Bone is stronger than wood; it runs on a relatively stronger structure of ionic bonds Bone has lower tensile strength than many woods, but has higher compressive strength than wood. Also, they're both partly air or water. Per dry mass, I'd say their strengths are similar. Saying bone is stronger than wood because "it runs on a relatively stronger structure of ionic bonds" indicates to me that Yudkowsky has some fundamental misunderstandings about material science. It's a non sequitur that I don't know how to engage with. (What determines the mechanical strength of bonds is the derivative of energy with length.) But mainly, bone is so much weaker than diamond (on my understanding) because the carbon bonds in diamond have a regular crystal structure that locks the carbon atoms into relative angles, and in a solid diamond this crystal structure is tesselated globally. This seems confused, conflating molecular strength and the strength of macroscopic materials. Yes, perfect diamond crystals have higher theoretical strength than perfect apatite crystals, but that's almost irrelevant. The theoretical ideal strength of most crystals is much greater than that of macroscopic materials. In practice, composites are used when high-strength materials are needed, with strong fibers embedded in a more-flexible matrix that distributes load between fibers. But then, why don't diamond bones exist already? Not just for the added strength; why make the organism look for calcium and phosphorus instead of just carbon? The search process of evolutionary biology is not the search of engineering; natural selection can only access designs via pathways of incremental mutations that are locally advantageous, not intelligently designed simultaneous changes that compensate for each other. Growth or removal of diamond requires highly-reactive intermediates. Production of those intermediates requires extreme conditions which require macroscopic containment, so they cannot be produced by microscopic systems. Calcium phosphate, unlike diamond, can be made from ions that dissolve in water and can be transported by proteins. That is why bones are made with calcium phosphate instead of diamond. The implication that lack...
undefined
Dec 11, 2023 • 33min

AF - Empirical work that might shed light on scheming (Section 6 of "Scheming AIs") by Joe Carlsmith

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Empirical work that might shed light on scheming (Section 6 of "Scheming AIs"), published by Joe Carlsmith on December 11, 2023 on The AI Alignment Forum. This is Section 6 of my report "Scheming AIs: Will AIs fake alignment during training in order to get power?". There's also a summary of the full report here (audio here). The summary covers most of the main points and technical terms, and I'm hoping that it will provide much of the context necessary to understand individual sections of the report on their own. Audio version of this section here, or search "Joe Carlsmith Audio" on your podcast aapp. Empirical work that might shed light on scheming I want to close the report with a discussion of the sort of empirical work that might help shed light on scheming.[1] After all: ultimately, one of my key hopes from this report is that greater clarity about the theoretical arguments surrounding scheming will leave us better positioned to do empirical research on it - research that can hopefully clarify the likelihood that the issue arises in practice, catch it if/when it has arisen, and figure out how to prevent it from arising in the first place. To be clear: per my choice to write the report at all, I also think there's worthwhile theoretical work to be done in this space as well. For example: I think it would be great to formalize more precisely different understandings of the concept of an "episode," and to formally characterize the direct incentives that different training processes create towards different temporal horizons of concern.[2] I think that questions around the possibility/likelihood of different sorts of AI coordination are worth much more analysis than they've received thus far, both in the context of scheming in particular, and for understanding AI risk more generally. Here I'm especially interested in coordination between AIs with distinct value systems, in the context of human efforts to prevent the coordination in question, and for AIs that resemble near-term, somewhat-better-than-human neural nets rather than e.g. superintelligences with assumed-to-be-legible source code. I think there may be interesting theoretical work to do in further characterizing/clarifying SGD's biases towards simplicity/speed, and in understanding the different sorts of "path dependence" to expect in ML training more generally. I'd be interested to see more work clarifying ideas in the vicinity of "messy goal-directedness" and their relevance to arguments about schemers. I think a lot of people have the intuition that thinking of model goal-directedness as implemented by a "big kludge of heuristics" (as opposed to: something "cleaner" and more "rational agent-like") makes a difference here (and elsewhere). But I think people often aren't fully clear on the contrast they're trying to draw, and why it makes a difference, if it does. More generally, any of the concepts/arguments in this report could be clarified and formalized further, other arguments could be formulated and examined, quantitative models for estimating the probability of scheming could be created, and so on. Ultimately, though, I think the empirics are what will shed the most informative and consensus-ready light on this issue. So one of my favorite outcomes from someone reading this report would be the reader saying something like: "ah, I now understand the arguments for and against expecting scheming much better, and have had a bunch of ideas for how we can probe the issue empirically" - and then making it happen. Here I'll offer a few high-level suggestions in this vein, in the hopes of prompting future and higher-quality work (designing informative empirical ML experiments is not my area of expertise - and indeed, I'm comparatively ignorant of various parts of the literature relevant to the topics below)....

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app