The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Jun 24, 2024 • 15min

AF - LLM Generality is a Timeline Crux by Egg Syntax

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LLM Generality is a Timeline Crux, published by Egg Syntax on June 24, 2024 on The AI Alignment Forum. Short Summary LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible. Longer summary There is ML research suggesting that LLMs fail badly on attempts at general reasoning, such as planning problems, scheduling, and attempts to solve novel visual puzzles. This post provides a brief introduction to that research, and asks: Whether this limitation is illusory or actually exists. If it exists, whether it will be solved by scaling or is a problem fundamental to LLMs. If fundamental, whether it can be overcome by scaffolding & tooling. If this is a real and fundamental limitation that can't be fully overcome by scaffolding, we should be skeptical of arguments like Leopold Aschenbrenner's (in his recent 'Situational Awareness') that we can just 'follow straight lines on graphs' and expect AGI in the next few years. Introduction Leopold Aschenbrenner's recent 'Situational Awareness' document has gotten considerable attention in the safety & alignment community. Aschenbrenner argues that we should expect current systems to reach human-level given further scaling[1], and that it's 'strikingly plausible' that we'll see 'drop-in remote workers' capable of doing the work of an AI researcher or engineer by 2027. Others hold similar views. Francois Chollet and Mike Knoop's new $500,000 prize for beating the ARC benchmark has also gotten considerable recent attention in AIS[2]. Chollet holds a diametrically opposed view: that the current LLM approach is fundamentally incapable of general reasoning, and hence incapable of solving novel problems. We only imagine that LLMs can reason, Chollet argues, because they've seen such a vast wealth of problems that they can pattern-match against. But LLMs, even if scaled much further, will never be able to do the work of AI researchers. It would be quite valuable to have a thorough analysis of this question through the lens of AI safety and alignment. This post is not that[3], nor is it a review of the voluminous literature on this debate (from outside the AIS community). It attempts to briefly introduce the disagreement, some evidence on each side, and the impact on timelines. What is general reasoning? Part of what makes this issue contentious is that there's not a widely shared definition of 'general reasoning', and in fact various discussions of this use various terms. By 'general reasoning', I mean to capture two things. First, the ability to think carefully and precisely, step by step. Second, the ability to apply that sort of thinking in novel situations[4]. Terminology is inconsistent between authors on this subject; some call this 'system II thinking'; some 'reasoning'; some 'planning' (mainly for the first half of the definition); Chollet just talks about 'intelligence' (mainly for the second half). This issue is further complicated by the fact that humans aren't fully general reasoners without tool support either. For example, seven-dimensional tic-tac-toe is a simple and easily defined system, but incredibly difficult for humans to play mentally without extensive training and/or tool support. Generalizations that are in-distribution for humans seems like something that any system should be able to do; generalizations that are out-of-distribution for humans don't feel as though they ought to count. How general are LLMs? It's important to clarify that this is very much a matter of degree. Nearly everyone was surprised by the degree to which the last generation of state-of-the-art LLMs like GPT-3 generalized; for example, no one I know of predicted that LLMs trained on primarily English-language sources would be able to do translation between languages. Some in the field argued as...

Jun 24, 2024 • 6min

EA - 35 Interactive Learning Modules Relevant to EAs / Effective Altruism (that are all free) by spencerg

Discover 35 interactive learning modules relevant to Effective Altruism, covering topics like judgment calibration, psychological predictions, global trends, biosafety, and the long-term future. All tools are free and designed to enhance critical thinking, communication skills, and decision-making for altruistic endeavors.

Jun 24, 2024 • 11min

EA - We should value income doublings equally across time and place (Founders Pledge) by NicoT

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We should value income doublings equally across time and place (Founders Pledge), published by NicoT on June 23, 2024 on The Effective Altruism Forum. Hi! I'm Nico and I'm on the research team at Founders Pledge. We noticed that the way we compare current to future income benefits is in tension with how we compare income benefits across interventions. However, aligning these two comparisons - choosing the same function for utility from consumption for both - might lead to large changes in our CEAs. So, we are now thinking about how to choose the right approach. Since our framework is based on GiveWell's, which is used by other organisations, too, I expect that we're facing the same issues. I'm posting here as a way of thinking out loud and with the hope of getting input from others. Summary Founders Pledge and GiveWell both use different values of η (elasticity of marginal utility from consumption) when modelling isoelastic utility from consumption depending on the context. Across interventions, we assume η=1. Over time within an intervention, we assume η1.59. We should choose the same η for both models as having different η values leads us to prefer doubling the incomes of richer people relative to poorer people. Practically, this inconsistency leads to strange conclusions in existing CEAs. Taking GiveWell's Unlimit Health (deworming) CEA as a stylised example: For two people in Madagascar, we value doubling the income of someone who makes $2,500 30% as much as for someone who makes $500. When the person making $2,500 lives in Côte d'Ivoire, however, we value doubling their income the same (100% as much) as for the person in Madagascar who makes $500. Resolving this isn't straightforward and has large implications for our prioritisation. For example: Using η=1 everywhere - which implies that income doublings have the same value regardless of absolute income levels - doubles the cost-effectiveness of education and deworming programs and makes economic growth and poverty graduation interventions look substantially better. Using η=1.87, which is implied by our discount rate, everywhere requires our evaluations to take into account the income levels of recipients and prioritise lower-income regions more. An income doubling in Malawi would be worth roughly 1.9x as much as in Ethiopia, 3.4x as much as in Kenya, 6.4x as much as in Egypt, and 75x as much as in the US. The same is true within countries: in India, an income doubling in Bihar would be worth 3.4x as much as an income doubling in Andhra Pradesh. I'm hoping this post will start a conversation around what the right value of η is. Our inconsistent η values Summary: GiveWell's framework, which we use, explicitly uses log-utility from consumption, which implies isoelastic utility with η=1. However, our (and GiveWell's) discount rate uses η1.59. We use η=1 when comparing between interventions/places. But we use η1.59 for comparisons across time, where income doublings are worth 2.6% less in a year from now solely because incomes will be higher then. We should use the same η for comparisons across time and place. Not doing so leads us to prefer doubling the incomes of richer vs poorer people (see next section). We use η=1 (log-utility) to compare the value of income benefits across people or interventions[1]. That assumption is convenient because it allows us to disregard absolute income levels: an income doubling is as valuable from $250 to $500 as it is from $2.5k to $5k. Because of that, we can make statements like "the value of a 10% income increase from a deworming program in India equals the value of a 10% income increase from a cash transfer program in Kenya" without knowing the incomes of the recipients. At the same time, we use η1.59 when comparing the value of income benefits in different years within an inte...

Jun 23, 2024 • 5min

LW - Enriched tab is now the default LW Frontpage experience for logged-in users by Ruby

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Enriched tab is now the default LW Frontpage experience for logged-in users, published by Ruby on June 23, 2024 on LessWrong. In the past few months, the LessWrong team has been making use of the latest AI tools (given that they unfortunately exist[1]) for art, music, and deciding what we should all be reading. Our experiments with the latter, i.e. the algorithm that chooses which posts to show on the frontpage, has produced results sufficiently good that at least for now, we're making Enriched the default for logged-in users[2]. If you're logged in and you've never switched tabs before, you'll now be on the Enriched tab. (If you don't have an account, making one takes 10 seconds.) To recap, here are the currently available tabs (subject to change): Latest: 100% post from the Latest algorithm (using karma and post age to sort[3]) Enriched (new default): 50% posts from the Latest algorithm, 50% posts from the recommendations engine Recommended: 100% posts from the recommendations engine, choosing posts specifically for you based on your history Subscribed: a feed of posts and comments from users you have explicitly followed Bookmarks: this tab appears if you have bookmarked any posts Note that posts which are the result of the recommendation engine have a sparkle icon after the title (on desktop, space permitting): Posts from the last 48 hours have their age bolded: Why make Enriched the default? To quote from my earlier post about frontpage recommendation experiments: A core value of LessWrong is to be timeless and not news-driven. However, the central algorithm by which attention allocation happens on the site is the Hacker News algorithm[2], which basically only shows you things that were posted recently, and creates a strong incentive for discussion to always be centered around the latest content. This seems very sad to me. When a new user shows up on LessWrong, it seems extremely unlikely that the most important posts for them to read were all written within the last week or two. I do really like the simplicity and predictability of the Hacker News algorithm. More karma means more visibility, older means less visibility. Very simple. When I vote, I basically know the full effect this has on what is shown to other users or to myself. But I think the cost of that simplicity has become too high, especially as older content makes up a larger and larger fraction of the best content on the site, and people have been becoming ever more specialized in the research and articles they publish on the site. We found that a hybrid posts list of 50% Latest and 50% Recommended lets us get the benefits of each algorithm[4]. The Latest component of the list allows people to stay up to date with the most recent new content, provides predictable visibility for new posts, and is approximately universal in that everyone sees those posts which makes posts a bit more common-knowledge-y. The Recommended component of the list allows us to present content that's predicted to be most interesting/valuable to a user from across thousands of posts from the last 10+ years, not being limited to just recent stuff. Shifting the age of posts When we first implemented recommendations, they were very recency biased. My guess is that's because the data we were feeding it was of people reading and voting on recent posts, so it knew those were the ones we liked. In a manner less elegant than I would have prefered, we constrained the algorithm to mostly serving content 30 or 365 days older. You can see the evolution of the recommendation engine, on the age dimension, here: I give more detailed thoughts about what we found in the course of developing our recommendation algorithm in this comment below. Feedback, please Although we're making Enriched the general default, this feature direction is still expe...

Jun 23, 2024 • 2min

LW - Bed Time Quests & Dinner Games for 3-5 year olds by Gunnar Zarncke

Author Gunnar Zarncke discusses creative bedtime quests and dinner games for 3-5 year olds, focusing on engaging children through new rule sets and interactive activities like 'Magic Time'. These innovative parenting practices provide mental stimulation and variety in childcare routines.

Jun 22, 2024 • 42min

EA - Impartialist Sentientism and Existential Anxiety about Moral Circle Explosion by Rafael Ruiz

Join Rafael Ruiz, a Philosophy PhD candidate focusing on moral progress and moral circle expansion, as he discusses impartialist sentientism and existential anxiety regarding moral circle expansion. Explore invertebrate welfare, AI sentience, long-termism, anti-speciesism, population ethics, and the ethical complexities of invertebrate and AI sentience in this thought-provoking podcast episode.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app