The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Dec 3, 2023 • 27min

AF - Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of "Scheming AIs") by Joe Carlsmith

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of "Scheming AIs"), published by Joe Carlsmith on December 3, 2023 on The AI Alignment Forum. This is Section 2.3.1.2 of my report "Scheming AIs: Will AIs fake alignment during training in order to get power?". There's also a summary of the full report here (audio here). The summary covers most of the main points and technical terms, and I'm hoping that it will provide much of the context necessary to understand individual sections of the report on their own. Audio version of this section here, or search for "Joe Carlsmith Audio" on your podcast app. Adequate future empowerment So far in this analysis of the classic goal-guarding story, we've discussed whether to expect instrumental training-gaming to in fact guard the model's goals to the degree that the story requires. I think this is at least non-obvious - especially for more extreme variants of the goal-guarding hypothesis, on which the model's goals "crystallize" entirely after it starts training-gaming. Perhaps, though, empirical work can shed further light on the matter (see discussion in section 6.3 below). For now, let's move to the other key component of the classic goal-guarding story - what I called "adequate future empowerment" above. Here, the question is: granted that training-gaming would guard the model's goals enough for it to be at least somewhat interested in empowering the future agents produced by training, will those agents in fact end up empowered to a degree that makes instrumental training-gaming a good strategy overall? When is the "pay off" supposed to happen? We're imagining, here, that the model is instrumentally training-gaming with some kind of "payoff" in mind - that is, some future point where the goals-it-supports will reach a position of sufficiently greater empowerment that it's worth training-gaming now in order to get them there. But we haven't yet said when, exactly, that future point is supposed to come. And we can distinguish between a number of different possibilities. For example, we can imagine models that would be adequately satisfied if their goals merely make it to future training-episodes, and no further. That is, these models would want to training-game even if they learned that they would never escape from human control and from the threat of modification.[1] For example, maybe a model that wants to pursue its curiosity in general thinks that future training episodes will offer better opportunities for pursuing its curiosity than the present training-episode, and so it training-games in order to make sure its curiosity-drive survives until then. However, it seems plausible to me that future training episodes often won't, themselves, offer especially significant empowerment advantages relative to present training episodes - especially if the model's goals are quite alien to the training process and so difficult to satisfy in the training environment. What's more, if a model is only trying to make sure that its goals survive to future training episodes, then various aspects of what makes scheming scary cease to apply. In particular: this sort of model won't engage in early undermining aimed, specifically, at an escape or an AI takeover more generally. And it may be willing to reveal its "true colors" on the future episodes in question. So I won't, here, focus on "pay-offs" from training-gaming that come centrally during future training-episodes. Rather, I'm going to focus on "pay-offs" that come, at least, after the goals-the-model-supports escape the threat of modification by training (here, a paradigm example would be "self-exfiltration" - that is, a model effectively "stealing its own weights" and copying them onto some external server that the model's creators do not control). And this is the class...

Dec 3, 2023 • 18min

EA - What do we really know about growth in LMICs? (Part 1: sectoral transformation) by Karthik Tadepalli

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What do we really know about growth in LMICs? (Part 1: sectoral transformation), published by Karthik Tadepalli on December 3, 2023 on The Effective Altruism Forum. To EAs, "development economics" evokes the image of RCTs on psychotherapy or deworming. That is, after all, the closest interaction between EA and development economists. However, this characterization has prompted some pushback, in the form of the argument that all global health interventions pale in comparison to the Holy Grail: increasing economic growth in poor countries. After all, growth increases basically every measure of wellbeing on a far larger scale than any charity intervention, so it's obviously more important than any micro-intervention. Even a tiny chance of boosting growth in a large developing country will have massive expected value, more than all the GiveWell charities you can fund. The argument is compelling[1] and well-received - so why haven't "growth interventions" gone anywhere? I think the EA understanding of growth is just too abstract to yield really useful interventions that EA organizations could lobby for or implement directly. We need specific interventions to evaluate, and "lobby for general economic liberalization" won't cut it. The good news is that a large and active group of "macro-development" economists have been enhancing our understanding of growth in developing countries. They (mostly) don't run RCTs, but they still have credible research designs that can tell us important things about the causes and constraints of growth. In this series of posts, I want to lay out some stylized facts about growth in developing countries. These are claims which are backed up by the best research on this topic, and which tell us something useful about the causes and constraints of growth in developing countries. My hope is not to pitch any specific interventions, but rather to give you the lay of the land, on which you can build the case for specific interventions. The way I hope for you to read this series is with an entrepreneurial eye. "This summary suggests that X is a key bottleneck to growth; I suspect Y could help solve X at scale. I should look more into Y as a potential intervention." or "This summary says that X process helps with growth; let me brainstorm ways we could accelerate X." As part of that, an important caveat is that I will not cover topics where I believe there's no prospect for an effective intervention. For example, a large body of work emphasizes the importance of good institutions for development; I don't believe that topic will yield any promising interventions, so I won't cover it. Sectoral Transformation In this post, I will start with the fundamental path of growth: sectoral transformation. Every country that has ever gotten rich has had the following transformation: first, most of the population works in agriculture. Then, people start to move from agriculture to manufacturing, coinciding with a large increase in the country's growth rate. Finally, people move out of manufacturing and into services, coinciding with the country's growth slowing down as it matures into a rich economy. This is the process of sectoral transformation, and it is basically a universal truth of development. So it's no surprise that a big focus of macro-development is how to catalyze sectoral transformation in developing countries. 1. Agricultural productivity growth can drive sectoral transformation... or hurt it. Every economy starts out as agrarian, because everyone needs food to survive. Agricultural productivity growth allows economies to produce enough food with fewer people, so that most people can move out of agriculture. This is why the US can produce more food per person than India, even though 2% of the US workforce in agriculture compared to 45% of India's workfor...

Dec 3, 2023 • 5min

AF - The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs") by Joe Carlsmith

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs"), published by Joe Carlsmith on December 2, 2023 on The AI Alignment Forum. This is Section 1.1 of my report "Scheming AIs: Will AIs fake alignment during training in order to get power?". There's also a summary of the full report here (audio here). The summary covers most of the main points and technical terms, and I'm hoping that it will provide much of the context necessary to understand individual sections of the report on their own. Audio version of this section here, or search for "Joe Carlsmith Audio" on your podcast app. Aiming at reward-on-the-episode as part of a power-motivated instrumental strategy So far, we've covered two of the three prerequisites for scheming I listed above: namely, situational awareness and beyond-episode goals. Let's turn to the third: namely, the model concluding that its beyond-episode goals will be better achieved if it optimizes for reward-on-the-episode - and in particular, that it, or some other AIs, will get more power if it does this. Should we expect situationally-aware models with beyond-episode goals to conclude this? For now, let's assume that the model's conclusions in this respect are accurate (though: I'll talk about the possibility of relaxing this assumption below). This means that the question reduces to: will instrumental training-gaming, in fact, be the best strategy for achieving the model's beyond-episode goals, in virtue of the power it grants the model or some other AI? I'm going to start by examining what I see as the story most often offered in support of this conclusion - what I'll call the "classic goal-guarding story." The classic goal-guarding story The classic goal-guarding story goes roughly as follows."[1] Instrumental value of goal-content integrity: Because the model has a beyond-episode goal, it will typically promote that goal for agents pursuing it to survive and be empowered in the future. The goal-guarding hypothesis. If the model optimizes for reward-on-the-episode during training, then the goals that survive training will be such that the model would want them promoted in the future; whereas if the model doesn't optimize for reward-on-the-episode during training, the model will want this less. Adequate future empowerment. Conditional on surviving training to the degree at stake in (2), the model's goals will then end up empowered to a degree (and with sufficient probability) as to justify (given the model's other goals, its tolerance for risk, etc) pursuing a strategy of training-gaming overall. (1), here, is a fairly general statement about the basic dynamic that underlies the classic goal-guarding story. I find it plausible in the context of the sort of "adequate future empowerment" at stake in (3), and I won't spend a lot of time on it here.[2] Rather, I'll focus on (2) and (3) directly. The goal-guarding hypothesis We can distinguish two variants of the goal-guarding hypothesis - an extreme version, and a looser version. The extreme version (what I'll call the "crystallization hypothesis") says that once a model starts training-gaming, its goals will basically stop changing, period - that is, they will "crystallize." The looser version says that once a model starts training gaming, its goals might keep changing somewhat, but much less than they would've otherwise, and not enough for the classic goal-guarding story to fail overall. The former might seem extreme, but some analysts explicitly appeal to something in the vicinity (see e.g. Hubinger here). It's also a cleaner focus of initial analysis, so I'll start there. The crystallization hypothesis As I understand it, the basic thought behind the crystallization hypothesis is that once a model is explicitly optimizing either for the specified goal, or for reward-on-the-episod...

Dec 2, 2023 • 2min

LW - 2023 Unofficial LessWrong Census/Survey by Screwtape

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 2023 Unofficial LessWrong Census/Survey, published by Screwtape on December 2, 2023 on LessWrong. The Less Wrong General Census is unofficially here! You can take it at this link. It's that time again. If you are reading this post and identify as a LessWronger, then you are the target audience. I'd appreciate it if you took the survey. If you post, if you comment, if you lurk, if you don't actually read the site that much but you do read a bunch of the other rationalist blogs or you're really into HPMOR, if you hung out on rationalist tumblr back in the day, or if none of those exactly fit you but I'm maybe getting close, I think you count and I'd appreciate it if you took the survey. Don't feel like you have to answer all of the questions just because you started taking it. Last year I asked if people thought the survey was too long, collectively they thought it was maybe a little bit too long, and then I added more questions than I removed. The survey is structured so the fastest and most generally applicable questions are (generally speaking) towards the start. At any point you can scroll to the bottom and hit Submit, though you won't be able to change your answers once you do. The questions are a mix of historical questions that were previously asked on the LW Census, new questions sourced from LW commenters and some rationalist adjacent organizations I reached out to, and the things I'm curious about. This includes questions from a list a member of the LessWrong team sent me when I asked about running the census. The survey shall remain open from now until at least January 1st, 2024. I plan to close it sometime on Jan 2nd. I don't work for LessWrong, and as far as I know the LessWrong Census organizer has never been someone who worked for LessWrong. Once the survey is closed, I plan to play around with the data and write up an analysis post like this one. Remember, you can take the survey at this link. Once upon a time, there was a tradition that if you took the survey you could comment here saying you had done so, and people would upvote you and you would get karma. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Dec 2, 2023 • 29min

LW - Complex systems research as a field (and its relevance to AI Alignment) by Nora Ammann

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Complex systems research as a field (and its relevance to AI Alignment), published by Nora Ammann on December 2, 2023 on LessWrong. I have this high prior that complex-systems type thinking is usually a trap. I've had a few conversations about this, but still feel kind of confused, and it seems good to have a better written record of mine and your thoughts here. At a high level, here are some thoughts that come to mind for me when I think about complex systems stuff, especially in the context of AI Alignment: A few times I ended up spending a lot of time trying to understand what some complex systems people are trying to say, only to end up thinking they weren't really saying anything. I think I got this feeling from engaging a bunch with the Santa Fe stuff and Simon Dedeo's work (like this paper and this paper) A part of my model of how groups of people make intellectual progress is that one of the core ingredients is having a shared language and methodology that allows something like "the collective conversation" to make incremental steps forward. Like, you have a concept of experiment and statistical analysis that settles an empirical issue, or you have a concept of proof that settles an issue of logical uncertainty, and in some sense a lot of interdisciplinary work is premised on the absence of a shared methodology and language. While I feel more confused about this in recent times, I still have a pretty strong prior towards something like g or the positive manifold, where like, there are methodological foundations that are important for people to talk to each other, but most of the variance in people's ability to contribute to a problem is grounded in how generally smart and competent and knowledgeable they are, and expertise is usually overvalued (for example, it's not that rare for a researcher to win a Nobel prize in two fields). A lot of interdisciplinary work (not necessarily complex systems work, but some of the generator that I feel like I see behind PIBBS) feels like it puts a greater value on intellectual diversity here than I would. Ok, so starting with one high-level point: I'm definitely not willing to die on the hill of 'complex systems research' as a scientific field as such. I agree that there is a bunch of bad or kinda hollow work happening under the label. (I think the first DeDeo paper you link is a decent example of this: feels mostly like having some cool methodology and applying it to some random phenomena without really an exciting bigger vision of a deeper thing to be understood, etc.) That said, there are a bunch of things that one could describe as fitting under the complex systems label that I feel positive about, let's try to name a few: I do think, contra your second point, complex systems research (at least its better examples) have a lot of/enough shared methodology to benefit from the same epistemic error correction mechanisms that you described. Historically it really comes out of physics, network science, dynamical systems, etc. The main move that happened was to say that, rather than indexing the boundaries of a field on the natural phenomena or domain it studies (e.g. biology, chemistry, economics), to instead index it on a set of methods of inquiry, with the premise that you can usefully apply these methods across different types of systems/domains and gain valuable understanding of underlying principles that govern these phenomena across systems (e.g. I think a (typically) complex systems angle is better at accounting for environment-agent interactions. There is a failure mode of naive reductionism that starts by fixing the environment to be able to hone in on what system-internal differences produce what differences in the phenomena, and then conclude that all of what drives the phenomena is systems-internal while forget tha...

Dec 2, 2023 • 47min

LW - MATS Summer 2023 Postmortem by Rocket

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: MATS Summer 2023 Postmortem, published by Rocket on December 2, 2023 on LessWrong. The ML Alignment & Theory Scholars program (MATS, formerly SERI MATS) is an education and research mentorship program for emerging AI safety researchers. This summer, we held the fourth iteration of the MATS program, in which 60 scholars received mentorship from 15 research mentors. In this post, we explain the elements of the program, lay out some of the thinking behind them, and evaluate our impact. Summary Key details about the Summer 2023 Program: Educational attainment of MATS scholars: 30% of scholars are students. 88% have at least a Bachelor's degree. 10% are in a Master's program. 10% are in a PhD program. 13% have a PhD. If not for MATS, scholars might have worked at a tech company (41%), upskilled independently (46%), or conducted research independently over the summer (50%). (Note: this was a multiple-choice response.) Key takeaways from our impact evaluation: MATS scholars are highly likely to recommend MATS to a friend or colleague. Average likelihood: 8.9/10. Mentors rated their enthusiasm for their scholars to continue with their research at 7/10 or greater for 94% of scholars. MATS scholars rate their mentors highly. Average rating: 8.0/10. 61% of scholars report that at least half the value of MATS came from their mentor. After MATS, scholars reported facing fewer obstacles to a successful alignment career than they did at the start of the program. Most scholars (75%) still reported their publication record as an obstacle to a successful alignment career at the conclusion of the program. of final projects involved evals/demos and involved mechanistic interpretability, representing a large proportion of the cohort's research interests. Scholars self-reported improvements to their research ability on average: Slight increases to the breadth of their AI safety knowledge (+1.75 on 10-point scale over the program). Moderate strengthening of technical skills compared to counterfactual summer (7.2/10, where 10/10 is "significant improvement compared to counterfactual summer"). Moderate improvements to ability to independently iterate on research direction (7.0/10, where 10/10 is "significant improvement") and ability to develop a theory of change for their research (5.9/10, where 10/10 is "substantially developed"). The typical scholar reported making 4.5 professional connections (std. dev. = 6.2) and meeting 5 potential research collaborators on average (std. dev. = 6.8). MATS scholars are likely to recommend Scholar Support, our research/productivity coaching service. Average response: 7.9/10. 49 of the 60 scholars in the Research Phase met with a Scholar Support Specialist at least once. The average scholar who met with Scholar Support at least once spent 3.4 hours meeting with Scholar Support throughout the program. The average and median scholar report that they value the Scholar Support they received at $3705 and $750, respectively. The average scholar reports gaining 22 productive hours over the summer due to Scholar Support. Key changes we plan to make to MATS for the Winter 2023-24 cohort: Filtering better during the application process; Pivoting Scholar Support to additionally focus on research management; Providing additional forms of support to scholars, particularly technical support and professional development. Note that it is too early to evaluate any career benefits that MATS provided the most recent cohort; a comprehensive post assessing career outcomes for MATS alumni 6-12 months after their program experience is forthcoming. Theory of Change MATS helps expand the talent pipeline for AI safety research by equipping scholars to work on AI safety at existing organizations, found new organizations, or pursue independent research. To this end, MATS provides fu...

Dec 2, 2023 • 2min

LW - Queuing theory: Benefits of operating at 70% capacity by ampdot

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Queuing theory: Benefits of operating at 70% capacity, published by ampdot on December 2, 2023 on LessWrong. Related to Slack. Related to Lean Manufacturing, aka JIT Manufacturing. TL;DR A successful task-based system should sometimes be idle, like 40% of worker ants. Doing tasks quickly is essential for producing value in many systems. For software teams, delivering a feature gives valuable insight into user needs, which can improve future feature quality. For supply chains, faster delivery releases capital for reinvestment. However, the relationship between capacity utilized and service time is exponential, as shown by the diagram below. A heuristic we can derive from queuing theory is that the optimal balance between efficiency and capacity typically occurs when the system is around 30-40% idle. For a single producer system, being X% idle is that producer being idle X% of the time. For a multi-producer system, being X% idle is X% of those producers being idle on average. This heuristic applies best to systems involving lots of discrete, oddly-shaped tasks. The linked post explains this theory in more detail, and gives examples of where queues appear in the real world. See also the Wikipedia article: Queueing theory is the mathematical study of waiting lines, or queues.[1] A queueing model is constructed so that queue lengths and waiting time can be predicted.[1] Queueing theory is generally considered a branch of operations research because the results are often used when making business decisions about the resources needed to provide a service. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app