The Nonlinear Library

The Nonlinear Fund
undefined
Jan 9, 2024 • 19min

LW - A model of research skill by L Rudolf L

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A model of research skill, published by L Rudolf L on January 9, 2024 on LessWrong. Doing research means answering questions no one yet knows the answer to. Lots of impactful projects are downstream of being good at this. A good first step is to have a model for what the hard parts of research skill are. Two failure modes There are two opposing failure modes you can fall into when thinking about research skill. The first is the deferential one. Research skill is this amorphous complicated things, so the only way to be sure you have it is to spend years developing it within some ossified ancient bureaucracy and then have someone in a funny hat hand you a piece of paper (bonus points for Latin being involved). The second is the hubristic one. You want to do, say, AI alignment research. This involves thinking hard, maybe writing some code, maybe doing some maths, and then writing up your results. You're good at thinking - after all, you read the Sequences, like, 1.5 times. You can code. You did a STEM undergrad. And writing? Pffft, you've been doing that since kindergarten! I think there's a lot to be said for hubris. Skills can often be learned well by colliding hard with reality in unstructured ways. Good coders are famously often self-taught. The venture capitalists who thought that management experience and a solid business background are needed to build a billion-dollar company are now mostly extinct. It's less clear that research works like this, though. I've often heard it said that it's rare for a researcher to do great work without having been mentored by someone who was themselves a great researcher. Exceptions exist and I'm sceptical that any good statistics exist on this point. However, this is the sort of hearsay an aspiring researcher should pay attention to. It also seems like the feedback signal in research is worse than in programming or startups, which makes it harder to learn. Methodology, except "methodology" is too fancy a word To answer this question, and steer between deferential confusion and hubristic over-simplicity, I interviewed people who had done good research to try to understand their models of research skill. I also read a lot of blog posts. Specifically, I wanted to understand what about research a bright, agentic, technical person trying to learn at high speed would likely fail at and either not realise or not be able to fix quickly. I did structured interviews with Neel Nanda (Google DeepMind; grokking), Lauro Langosco (Krueger Lab; goal misgeneralisation), and Michael Webb (Quantum Leap, ex-DeepMind and ex-Stanford economics; Are Ideas Getting Harder to Find?). I also learned a lot from unstructured conversations with Ferenc Huszar, Dmitrii Krasheninnikov, Sören Mindermann, Owain Evans, and several others. I then procrastinated on this project for 6 months touched grass and formed inside views by doing the MATS research program under the mentorship of Owain Evans. I owe a lot to the people I spoke to and their willingness to give their time and takes, but my interpretation and model should not taken as one they would necessarily endorse. My own first-hand research experience consists mainly of a research-oriented CS (i.e. ML) master's degree, followed by working as a full-time researcher for 6 months and counting. There are many who have better inside views than I do on this topic. The Big Three In summary: There are a lot of ways reality could be (i.e. hypotheses), and a lot of possible experiment designs. You want to avoid brute-forcing your way through these large spaces as much as possible, and instead be good at picking likely-true hypotheses or informative experiments. Being good at this is called research taste, and it's largely an intuitive thing that develops over a lot of time spent engaging with a field. Once you have some...
undefined
Jan 9, 2024 • 43min

EA - [Podcast + Transcript] AMA: Founder and CEO of AMF, Rob Mather by tobytrem

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Podcast + Transcript] AMA: Founder and CEO of AMF, Rob Mather, published by tobytrem on January 9, 2024 on The Effective Altruism Forum. This is a transcript for the AMA with Rob Mather, CEO of AMF, which I recorded live on the 19th of December. To listen to a recording of the live AMA as a podcast, follow the link above for the RSS feed, or: Use these links to listen to the podcast on Spotify, Pocketcasts, or Apple Music. Click the speaker icon above to listen to the recording without leaving this page. The questions for the AMA, which were edited and supplemented to, can be found on the original AMA post. Hosting an AMA as a live event, followed by a podcast and a transcript, is a bit of an experiment for us, so please do comment or Forum dm me with any feedback you might have. All of your (and my) questions to Rob are in bold, so you can skim them quickly. Thanks to Rob Mather for his time, and Dane Magaway for her help with this transcript. AMA with Rob Mather, recorded 19th December '23 Toby Tremlett: Welcome to this live AMA with Rob Mather, CEO of the Against Malaria Foundation. I'm Toby Tremlett, the EA Forum's content manager. If you're interested in effective altruism, you've probably heard of Rob's charity, the Against Malaria Foundation. For almost two decades, they've been doing crucial work to protect people, especially children, from malaria. To date, around 450 million people have been protected with malaria bed nets from this charity. Once all of their currently funded nets have been distributed, AMF estimates it would have prevented 185,000 deaths. And it's not just AMF saying this, they've been a GiveWell Top Charity since 2009. So to get straight into the AMA, we're going to keep the answers pretty short and snappy. I think Rob said he's going to stick to two minutes per answer. And yeah, Rob, thank you for making the time for coming along for this. Rob Mather: Pleasure. Toby Tremlett: On the theme of making the time, somebody said that they've organized two small fundraisers with AMF, and in both cases, you were incredibly proactive and helpful, taking time to immediately respond to emails and hop onto calls. They say many thanks, but a question remains, where do you find the time and which time management strategies do you use? You have two minutes of time. Rob Mather: I don't use any particular strategies, I'm afraid. I think what I would say is we certainly leverage technology here, so that a lot of the things that I perhaps would normally do as a CEO of a charity I don't do because technology takes over. And perhaps I can give a couple of examples. One of the things that we have to do as a charity is we have to file our accounts. We have to do that, in our case, in 14 countries and there are typically between 10 and 15 documents we have to prepare for each country. Lots of documents, lots of information that would normally take months of a number of people probably putting that together. And we broadly have that content all available to us within nine hours of the end of our financial year because at the end of the day, finances are just ones and zeros so we can automate the living daylights out of it. And therefore a whole series of effort that would otherwise go into admin that would take my time effectively is struck down to just a sliver of time. I think that's one element [that] allows me to put my time in [another] direction. The second thing I would say is that the structure of AMF is very streamlined. We're very focused on what we do. There is a lot of complexity in many ways around distributing nets, particularly around the operations. That's the bit that really requires an awful lot of very careful attention to make sure nets get to people. And because we have a very simple series of steps, if you like, that we go through when we'r...
undefined
Jan 9, 2024 • 17min

EA - Reflections on my first year of AI safety research by Jay Bailey

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Reflections on my first year of AI safety research, published by Jay Bailey on January 9, 2024 on The Effective Altruism Forum. Last year, I wrote a post about my upskilling in AI alignment. To this day, I still get people occasionally reaching out to me because of this article, to ask questions about getting into the field themselves. I've also had several occasions to link people to the article who asked me about getting into the field from other means, like my local AI Safety group. Essentially, what this means is that people clearly found this useful (credit to the EA Forum for managing to let the article be findable to those who need it, a year after its publication!) and therefore people would likely find a sequel useful too! This post is that sequel, but reading the first post is not necessary to read this one. The major lesson of this post is this: I made a ton of mistakes, but those mistakes taught me things. By being open to that feedback and keeping my eye on the ball, I managed to find work that suited me in the field eventually. Just like the previous post, I'm happy to answer more questions via PM or in the comments. It's worth noting, this isn't a bold story of me getting a ton of stuff done. Most of the story, by word count, is me flailing around unsure of what to do and making a lot of mistakes along the way. I don't think you'll learn a lot about how to be a good researcher from this post, but I hope you might learn some tips to avoid being a bad one. Summary I was a software engineer for 3-4 years with little to no ML experience before I was accepted for my initial upskilling grant. (More details are in my initial post) I attended SERI MATS, working on aligning language models under Owain Evans. Due to a combination of factors, some my fault and some not, I don't feel like I got a great deal of stuff done. I decided to pivot away from evals towards mechanistic interpretability since I didn't see a good theory of change for evals - this was two weeks before GPT-4 came out and the whole world sat up and took notice. Doh! After upskilling in mechanistic interpretability, I struggled quite a bit with the research. I eventually concluded that it wasn't for me, but was already funded to work on it. Fortunately I had a collaborator, and eventually I wound up using my engineering skills to accelerate his research instead of trying to contribute to the analysis directly. After noticing my theory of change for evals had changed now that governments and labs were committing to red-teaming, I applied for some jobs in the space. I received an offer to work in the UK's task force, which I accepted. List of Lessons It's important to keep in mind two things - your theory of change for how your work helps reduce existential risk, and your comparative advantage in the field. These two things determined what I should work on, and keeping them updated was crucial for me finding a good path in the end. Poor productivity is more likely to be situational than you might think, especially if you're finding yourself having unusual difficulty compared to past projects or jobs. It's worth considering how your situation might be tweaked before blaming yourself. Trying out different subfields is useful, but don't be afraid to admit when one isn't working out as well as you'd like. See the first lesson. If you're going to go to a program like SERI MATS, do so because you have a good idea of what you want, not just because it's the thing to do or it seems generically helpful. I'm not saying you can't do such a program for that reason, but it is worth thinking twice about it. It is entirely possible to make mistakes, even several of them, and still wind up finding work in the field. There is no proper roadmap, everyone needs to figure things out as they go. While it's worth having...
undefined
Jan 8, 2024 • 26min

LW - When "yang" goes wrong by Joe Carlsmith

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: When "yang" goes wrong, published by Joe Carlsmith on January 8, 2024 on LessWrong. (Cross-posted from my website. Podcast version here, or search "Joe Carlsmith Audio" on your podcast app. This essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that the individual essays can be read fairly well on their own, but see here for a brief summary of the essays that have been released thus far.) Becoming God In my last essay, I wrote about "deep atheism" - a fundamental mistrust towards Nature, and towards bare intelligence. I took Eliezer Yudkowsky as a paradigmatic deep atheist, and I tried to highlight the connection between his deep atheism and his concern about misaligned AI. I'm sympathetic to many aspects of Yudkowsky's view. I'm a shallow atheist, too; I'm skeptical of moral realism, too; and I, too, aspire to be a scout, and to look at hard truths full on. What's more, I find Yudkowsky's brand of deep-but-still-humanistic atheism more compelling, as an existential orientation, than many available alternatives. And I share Yudkowsky's concern about AI risk. Indeed, it was centrally him, and others thinking along similar lines, who first got me worried. But I also want to acknowledge and examine some difficult questions that a broadly Yudkowskian existential orientation can raise, especially in the context of AGI. In particular: a lot of the vibe here is about mistrust towards the yang of the Real, that uncontrolled Other. And it's easy to move from this to a desire to take stuff into the hands of your own yang; to master the Real until it is maximally controlled; to become, you know, God - or at least, as God-like as possible. You've heard it before - it's an old rationalist dream. And let's be clear: it's alive and well. But even with theism aside, many of the old reasons for wariness still apply. Moloch and Stalin As an example of this becoming-God aspiration, consider another influential piece of rationalist canon: Scott Alexander's "Meditations on Moloch." Moloch, for Alexander, is the god of uncoordinated competition; and fear of Moloch is its own, additional depth of atheism. Maybe you thought you could trust evolution, or free markets, or "spontaneous order," or the techno-capital machine. But oops, no: those gods just eat you too. Now, to really assess this story, we at least need to look more closely at various empirical questions - for example, about exactly how uncompetitive different sorts of goodness are, even in the limit;[2] about how much coordination to expect, by default, from greater-than-human intelligence;[3] and about where our specific empirical techno-capitalist machine will go, if you "let 'er rip."[4] And indeed, Alexander himself often seems to soften his atheism about goodness ("Elua"), and to suggest that it has some mysterious but fearsome power of its own, which you can maybe, just a little bit, start to trust in. "Somehow Elua is still here. No one knows exactly how. And the gods who oppose Him tend to find Themselves meeting with a surprising number of unfortunate accidents." Goodness, for Alexander, is devious and subtle; it's actually a terrifying unspeakable Elder God after all. Of course, if goodness is just another utility function, just another ranking-over-worlds, it's unclear where it would get such a status, especially if it's meant to have an active advantage over e.g. maximize-paperclips, or maximize-power. But here, and in contrast to Yudkowsky, Alexander nevertheless seems to invite some having-a-parent; some mild sort of yin. More on this in a later essay. Ultimately, though, Alexander's solution to Moloch is heavy on yang. So let me confess guilt to one of Hurlock's accusations: I am a transhumanist and I really do want to rule the universe. Not personally - I mean, I wouldn't ...
undefined
Jan 8, 2024 • 20min

AF - A starter guide for evals by Marius Hobbhahn

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A starter guide for evals, published by Marius Hobbhahn on January 8, 2024 on The AI Alignment Forum. This is a linkpost for https://www.apolloresearch.ai/blog/a-starter-guide-for-evals This is a starter guide for model evaluations (evals). Our goal is to provide a general overview of what evals are, what skills are helpful for evaluators, potential career trajectories, and possible ways to start in the field of evals. Evals is a nascent field, so many of the following recommendations might change quickly and should be seen as our current best guess. Why work on evals? Model evaluations increase our knowledge about the capabilities, tendencies, and flaws of AI systems. Evals inform the public, AI organizations, lawmakers, and others and thereby improve their decision-making. However, similar to testing in a pandemic or pen-testing in cybersecurity, evals are not sufficient, i.e. they don't increase the safety of the model on their own but are needed for good decision-making and can inform other safety approaches. For example, evals underpin Responsible Scaling Policies and thus already influence relevant high-stakes decisions about the deployment of frontier AI systems. Thus, evals are a highly impactful way to improve the decision-making about AI systems. Evals are a nascent field and there are many fundamental techniques to be developed and questions to be answered. Since evals do not require as much background knowledge as many other fields, it is much easier to get started and possible to make meaningful contributions from very early on. What are model evaluations (evals)? Evals refers to a broad category of approaches that we roughly summarize as: The systematic measurement of properties in AI systems More concretely, evals typically attempt to make a quantitative or qualitative statement about the capabilities or propensities of an AI system. For example, we could ask if a model has the capability to solve a specific coding problem or the propensity to be power-seeking. In general, evals are not restricted to safety-related properties but often when people talk about evals, they mention them in a safety context. There is a difference between red-teaming and benchmarking. Red-teaming is actively looking for specific capabilities or propensities while interacting with the model. It is an attempt to answer the question "Can we find this capability in a model when we try hard to find it?". In other words, red-teaming is an attempt to show the existence of certain capabilities/properties, but it is not trying to make a claim about how likely those are to occur under real-use conditions. Red-teaming typically involves interacting with the model and actively looking for ways to elicit the desired behavior, e.g. by testing many different model inputs and strategies and actively iterating on them. In contrast, benchmarking makes a statement about the likelihood of a model behaving in a specific way on a certain dataset, e.g. the likelihood of a behavior occurring under real-use conditions. A benchmarking effort should be designed while interacting with the model as little as possible in order to prevent overfitting to the capabilities or tendencies of any particular model. Both red-teaming and benchmarking are important and serve a purpose. Red-teaming can provide an estimate of the potential danger of a system, e.g. whether the model can manipulate its users. Benchmarking can provide an estimate of how likely an AI system would show these tendencies under specific conditions, e.g. how likely the model is to manipulate its users in realistic scenarios. Currently, evals are often a mix between red-teaming and benchmarking but we expect the two categories to get more and more distinct. There is a difference between capability and alignment evaluations. Capability evaluati...
undefined
Jan 8, 2024 • 44min

LW - 2023 Prediction Evaluations by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 2023 Prediction Evaluations, published by Zvi on January 8, 2024 on LessWrong. It is that time of the year. One must ask not only whether predictions were right or wrong, whether one won or lost, but what one was and should have been thinking, whether or not good decisions were made, whether the market made sense. The main subject will be the 2023 ACX Predictions, where I performed buy/sell/hold along with sharing my logic. The numbers quoted are from mid-February 2023, first Manifold, then Metaculus. Section 1: World Politics Will Vladimir Putin be President of Russia at the end of 2023 (85%/90%)? Last year I thought markets were too confident Putin would keep power. This year I think this is not confident enough and Metaculus is more accurate at 90%. Metaculus is also doing a better job adjusting as time passes. Things seem to be stabilizing, and every day without big bad news is good news for Putin here on multiple levels. I bought M500 of YES shares, which moved this to 86%. I increased my position later, and won M179. The market had occasional spikes downward when Putin looked to potentially be in danger, and for a while it failed to decay properly. Looking back, there was clearly risk that events in Ukraine could have led to Putin's ouster, and he also could have head health issues. It was clear that I could have gotten much better per diem odds later in the year. So even though I won this bet, I don't think it was especially good, and Metaculus was overconfident. Will Ukraine control the city of Sevastopol at the end of 2023 (14%/8%)? Getting Sevastopol is a heavy lift. Russia is not about to abandon it, Ukraine has other priorities first, and Ukraine's ability to go on offensives is far from unlimited even in good scenarios. Metaculus is at 8% and once again that sounds more right to me. I bought M250 of NO here and M250 of NO in another similar market that was trading modestly higher, driving the price here to 13%. I think this was a good bet. Certainly Russia could have completely collapsed, but even then holding onto Crimea was likely. I won M52 here and M50 in the other market. There wasn't much decay until the second half of the year, but also things looked good for Ukraine for a while, so I think the market acted reasonably relative to itself. Will Ukraine control the city of Luhansk at the end of 2023 (28%/13%)? This spent a bunch of time near 50% in early January, then went down. Once again Metaculus has been consistently lower and it is down at 13%. That feels very low, I'd probably be closer to 20% although I am doing much less work keeping up with the war these days, but 28% is a lot given how things are progressing right now. I bought M250 of NO shares, driving the price to 25%. I won M92. I think the assessment of 20% here sounds reasonable. It would not have been that shocking if Ukraine had made it to Luhansk, but it was never likely. Will Ukraine control the city of Zaporizhzhia at the end of 2023 (81%/69%)? Metaculus is at 69%. Here I'm more inclined to lean to the Manifold number, and would want to do research before I committed much. It is not great to be selling 81% shots in prediction markets, in general. I bought 10 NO shares to keep tracking. This resolved YES. I bought out on September 6 at 94%, losing M4 on net. I am not following closely enough to know what the right price would have been. Will there be a lasting cease-fire in the Russia-Ukraine war at the end of 2023 (25%/24%)? Here Manifold roughly agrees and is at 24%, down several percent in the last few days which gives me pause about selling. As everyone digs in rhetorically and literally this does seem like it is getting less likely, and the criteria seems easy to not fulfil, but a year is a long time. I bought M10 of NO for tracking purposes. On reflection I think this was to...
undefined
Jan 8, 2024 • 13min

EA - Economic Growth - Donation suggestions and ideas by DavidNash

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Economic Growth - Donation suggestions and ideas, published by DavidNash on January 8, 2024 on The Effective Altruism Forum. There was a recent post about economic growth & effective altruism by Karthik Tadepalli. He pointed out that a lot of people agree that economic growth is important, but it hasn't really led to many suggestions for specific interventions. I thought it would be good to get the ball rolling[1] by asking a few people what they think are good donation opportunities in this area, or if not, do they think this area is neglected when you have governments, development banks, investors etc all focused on growth. I'm hoping there will be more in depth research into this in 2024 to see whether there are opportunities for smaller/medium funders, and how competitive it is with the best global health interventions. I have fleshed out a few of the shorter responses with more details on what the suggested organisation does. Shruti Rajagopalan (Mercatus Center): XKDR Forum - Founded by Ajay Shah and Susan Thomas, it aims to advance India's growth journey through economic research, data analysis, and policy engagement, with a focus on core areas like macroeconomics, finance, and judiciary. Susan Thomas has a track record of running a fantastic research group at Indira Gandhi Institute of Development Research and Ajay Shah brings years of experience from fostering research groups at NIPFP and time as consultant to the Finance Ministry, Government of India. Both are excellent economists; their strengths include thinking about big questions from first principles, as well as a strong commitment to economic growth and freedom. They are also very good incubators of talent, and have some excellent young researchers working with them - e.g. Former Emergent Ventures winners Prosperiti -A non-profit organization dedicated to economic growth, greater economic freedom and job opportunities for Indians. It is the only all-female founded research think tank in India with cofounders Bhuvana Anand and Baishali Boman at the helm. Their key focus is on labor regulation, especially gendered regulation. They also work on state and local level regulation impacting businesses, pointing out restrictive labor regulations to state and local government partners. Their core strategy is to offer actionable research on state regulations, assist state governments with the detailed correction of laws and regulations, and also channels the findings to the Union government. Former Emergent Ventures winners Artha Global - Policy consulting organization that assists developing world governments in designing, implementing, and institutionalizing growth and prosperity-focused policy frameworks. Originally the IDFC Institute, Artha was re-founded under CEO Reuben Abraham after institutional changes to continue the team's work under a new banner. Artha places a strong emphasis on strengthening state capacity as a critical factor in translating intentions into real impact and unlocking India's growth potential. Instead of just focusing on technical inputs, Artha also focuses on coordinated policy implementation. Reuben Abraham's extensive global network identifies talented potential collaborators across government and private institutions. His and Artha's strength lies in bringing together disparate actors and backing them to find shared solutions. Former Emergent Ventures winners Growth Teams - Founded by Karthik Akhileshwaran and Jonathan Mazumdar, Growth Teams believes sustaining higher broad-based growth and job creation is imperative for alleviating Indian poverty. They are also advised by growth theorists and empiricists like Lant Pritchett. With federal reforms largely exhausted since the 1990s, the onus is now on states to pursue vital labor, land, capital, industrial, and environmental reform...
undefined
Jan 7, 2024 • 16min

LW - Bayesians Commit the Gambler's Fallacy by Kevin Dorst

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bayesians Commit the Gambler's Fallacy, published by Kevin Dorst on January 7, 2024 on LessWrong. TLDR: Rational people who start out uncertain about an (in fact independent) causal process and then learn from unbiased data will rule out "streaky" hypotheses more quickly than "switchy" hypotheses. As a result, they'll commit the gambler's fallacy: expecting the process to switch more than it will. In fact, they'll do so in ways that match a variety of empirical findings about how real people commit the gambler's fallacy. Maybe it's not a fallacy, after all. (This post is based on a full paper.) Baylee is bored. The fluorescent lights hum. The spreadsheets blur. She needs air. As she steps outside, she sees the Prius nestled happily in the front spot. Three days in a row now - the Prius is on a streak. The Jeep will probably get it tomorrow, she thinks. This parking battle - between a Prius and a Jeep - has been going on for months. Unbeknownst to Baylee, the outcomes are statistically independent: each day, the Prius and the Jeep have a 50% chance to get the front spot, regardless of how the previous days have gone. But Baylee thinks and acts otherwise: after the Prius has won the spot a few days in a row, she tends to think the Jeep will win next. (And vice versa.) So Baylee is committing the gambler's fallacy: the tendency to think that streaks of (in fact independent) outcomes are likely to switch. Maybe you conclude from this - as many psychologists have - that Baylee is bad at statistical reasoning. You're wrong. Baylee is a rational Bayesian. As I'll show: when either data or memory are limited, Bayesians who begin with causal uncertainty about an (in fact independent) process - and then learn from unbiased data - will, on average, commit the gambler's fallacy. Why? Although they'll get evidence that the process is neither "switchy" nor "streaky", they'll get more evidence against the latter. Thus they converge asymmetrically to the truth (of independence), leading them to (on average) commit the gambler's fallacy along the way. More is true. Bayesians don't just commit the gambler's fallacy - they do so in way that qualitatively matches a wide variety of trends found in the empirical literature on the gambler's fallacy. This provides evidence for: Causal-Uncertainty Hypothesis: The gambler's fallacy is due to causal uncertainty combined with rational responses to limited data and memory. This hypothesis stacks up favorably against extant theories of the gambler's fallacy in terms of both explanatory power and empirical coverage. See the paper for the full argument - here I'll just sketch the idea. Asymmetric Convergence Consider any process that can have one of two repeatable outcomes - Prius vs. Jeep; heads vs. tails; hit vs. miss; 1 vs. 0; etc. Baylee knows that the process (say, the parking battle) is "random" in the sense that (i) it's hard to predict, and (ii) in the long run, the Prius wins 50% of the time. But that leaves open three classes of hypotheses: Steady: The outcomes are independent, so each day there's a 50% chance the Prius wins the spot. (Compare: a fair coin toss.) Switchy: The outcomes tend to switch: after the Prius wins a few in a row, the Jeep becomes more likely to win; after the Jeep wins a few, vice versa. (Compare: drawing from a deck of cards without replacement - after a few red cards, a black card becomes more likely.) Sticky: The outcomes tend to form streaks: after the Prius wins a few, it becomes more likely to win again; likewise for the Jeep. (Compare: basketball shots - after a player makes a few, they become "hot" and so are more likely to make the next one. No, the "hot hand" is not a myth.[1]) So long as each of these hypotheses is symmetric around 50%, they all will lead to (i) the process being hard to predict, and (ii...
undefined
Jan 7, 2024 • 57min

LW - Defending against hypothetical moon life during Apollo 11 by eukaryote

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Defending against hypothetical moon life during Apollo 11, published by eukaryote on January 7, 2024 on LessWrong. [Header image: Photo of the lunar lander taken during Apollo 11.] In 1969, after successfully bringing men back from landing on the moon, the astronauts, spacecraft, and all the samples from the moon surface were quarantined for 21 days. This was to account for the possibility that they were carrying hostile moon germs. Once the quarantine was up and the astronauts were not sick, and extensive biological testing on them and the samples showed no signs of infection or unexpected life, the astronauts were released. We know now that the moon is sterile. We didn't always know this. That was one of the things we hoped to find out from the Apollo 11 program, which was the first time not only that people would visit another celestial body, but that material from another celestial body would be brought back in a relatively pristine fashion to earth. The possibilities were huge. The possibilities included life, although nobody thought this was especially likely. But in that slim chance of life, there was a chance that life would be harmful to humans or the earth environment. Human history is full of organisms wrecking havoc when introduced to a new location - smallpox in the Americas, rats in Pacific Islands, water hyacinth outside of South America. NASA, Congress, and various other federal agencies were apparently convinced to spend millions of dollars building an extensive new facility and take extensive other measures to address this possibility. This is how a completely abstract argument about alien germs was taken seriously and mitigated at great effort and expense during the 1969 Apollo landing. I've added my sources throughout, but a lot of this work draws from two very good pieces: Michael Meltzer's When Biospheres Collide [1] and Mangus and Larsen's Lunar Receiving Laboratory Project History[2]. Terms Forward contamination: The risk that organisms from earth would be present on a spacecraft and would be carried onto a planet (or other celestial body). They might even be able to replicate there. The risks from forward contamination are: Harming current research efforts (including determining if there is indigenous life on a planet) Permanently harming future research efforts Permanently disrupting a pristine natural environment (whether or not it has indigenous life) Back contamination: The theoretical risk that organisms indigenous to another celestial body are returned to earth - alongside samples or inadvertently - and replicate in the environment or as a pathogen. The risks from back contamination are: Earth ecosystems, crops, or humans are harmed NASA's modern terms are "restricted vs. unrestricted earth return," about material samples (rocks, dust, gas, etc) returning from celestial bodies. Samples that are understood to be sterile and harmless would not be subjected to quarantine. Since we are now very certain that the moon is sterile, new samples coming back from the moon would be considered unrestricted. (A space agency might still want to handle an unrestricted sample with special precautions, but these would be to keep the sample protected, not because they thought the sample might contain organisms.) Apollo 11 is the first restricted earth return process. Regarding the facility, I default to using "Lunar Receiving Laboratory" or "LRL" here, which did end up being the name of the facility in question; you will also sometimes see "Lunar Sample Receiving Laboratory" or "LSRL" for the same. How back contamination risks became a concern From 1959, concern over back contamination risk was extremely niche. By 1966, mitigation of back contamination risk had become a requirement for the entire moon landing mission. How did this happen? Forward contamin...
undefined
Jan 7, 2024 • 9min

AF - Deceptive AI Deceptively-aligned AI by Steve Byrnes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Deceptive AI Deceptively-aligned AI, published by Steve Byrnes on January 7, 2024 on The AI Alignment Forum. Tl;dr: A "deceptively-aligned AI" is different from (and much more specific than) a "deceptive AI". I think this is well-known and uncontroversial among AI Alignment experts, but I see people getting confused about it sometimes, so this post is a brief explanation of how they differ. You can just look at the diagram below for the upshot. Some motivating context: There have been a number of recent arguments that future AI is very unlikely to be deceptively-aligned. Others disagree, and I don't know which side is right. But I think it's important for non-experts to be aware that this debate is not about whether future powerful AI is likely to engage in deliberate deception. Indeed, while the arguments for deceptive alignment are (IMO) pretty complex and contentious, I will argue that there are very much stronger and more straightforward reasons to expect future powerful AI to be deceptive, at least sometimes, in the absence of specific interventions to avoid that. 1. Definitions Deceptive alignment is a particular scenario where: A "ground-truth system" (possibly individual human evaluators, or possibly an automated system of some sort) provides an ML model with training signals (rewards if this is reinforcement learning (RL), supervisory ground truth signals if this is supervised or self-supervised learning (SL)), The AI starts emitting high-reward outputs (if this is RL) or low-loss outputs (if this is SL), i.e. outputs that one might naively interpret as evidence that training is going as intended, …but the AI is actually emitting those outputs because of ulterior motives - more specifically, the AI has situational awareness and a secret desire for some arbitrary thing X, and the AI wants to not get updated and/or it wants to get deployed, so that it can go make X happen, and therefore the AI emits the outputs that the high-reward or low-loss outputs since it reasons that doing so would be instrumentally useful for making X happen. (Why would it be instrumentally useful? Well, for one thing, the AI might reason that emitting high-reward or low-loss outputs would minimize the extent to which ongoing training will change its inclinations, which it would probably see as bad by the usual instrumental-convergence argument. the original source where the term "deceptive alignment" was coined in 2019, or Joe Carlsmith's report, or many other discussions.) By contrast, deception is much broader - it's any situation where the AI is interacting with humans for any reason, and the AI deceives a human by knowingly providing them with false or misleading information. Thus, deceptive alignment would be a special case of deception - namely, the case where deception occurs in the context of model training, and involves the AI emitting low-loss / high-reward outputs, in order to hide its secret ulterior motives, and to create a false impression that the training run is going as planned. 2. Very simple example of "deception" that is not "deceptive alignment" Suppose I use RL to train an AI to make money, and that I do so in the most obvious way possible - I give the AI an actual real-world bank account, and set its RL reward signal to be positive whenever the account balance goes up, and negative when the account balance goes down. If I did this today, the trained model would probably fail to accomplish anything at all. But let us suppose that future RL techniques will work better than today's, such that this training would lead to an AI that starts spear-phishing random people on the internet and tricking them into wiring money into the AI's bank account. Such an AI would be demonstrating "deception", because its spear-phishing emails are full of deliberate lies. But this AI w...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app