The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Feb 27, 2024 • 30min

LW - Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders by Evan Anders

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders, published by Evan Anders on February 27, 2024 on LessWrong. Note: The second figure in this post originally contained a bug pointed out by @LawrenceC, which has since been fixed. Summary Sparse Autoencoders (SAEs) reveal interpretable features in the activation spaces of language models, but SAEs don't reconstruct activations perfectly. We lack good metrics for evaluating which parts of model activations SAEs fail to reconstruct, which makes it hard to evaluate SAEs themselves. In this post, we argue that SAE reconstructions should be tested using well-established benchmarks to help determine what kinds of tasks they degrade model performance on. We stress-test a recently released set of SAEs for each layer of the gpt2-small residual stream using randomly sampled tokens from Open WebText and the Lambada benchmark where the model must predict a specific next token. The SAEs perform well on prompts with context sizes up to the training context size, but their performance degrades on longer prompts. In contexts shorter than or equal to the training context, the SAEs that we study generally perform well. We find that the performance of our late-layer SAEs is worse than early-layer SAEs, but since the SAEs all have the same width, this may just be because there are more features to resolve in later layers and our SAEs don't resolve them. In contexts longer than the training context, SAE performance is poor in general, but it is poorest in earlier layers and best in later layers. Introduction Last year, Anthropic and EleutherAI/Lee Sharkey's MATS stream showed that sparse autoencoders (SAEs) can decompose language model activations into human-interpretable features. This has led to a significant uptick in the number of people training SAEs and analyzing models with them. However, SAEs are not perfect autoencoders and we still lack a thorough understanding of where and how they miss information. But how do we know if an SAE is "good" other than the fact that it has features we can understand? SAEs try to reconstruct activations in language models - but they don't do this perfectly. Imperfect activation reconstruction can lead to substantial downstream cross-entropy (CE) loss increases. Generally "good" SAEs retrieve 80-99% of the CE loss (compared to a generous baseline of zero ablation), but only retrieving 80% of the CE loss is enough to substantially degrade the performance of a model to that of a much smaller model (per scaling laws). The second basic metric often used in SAE evaluation is the average per-token ℓ0 norm of the hidden layer of the autoencoder. Generally this is something in the range of ~10-60 in a "good" autoencoder, which means that the encoder is sparse. Since we don't know how many features are active per token in natural language, it's useful to at least ask how changes in ℓ0 relate to changes in SAE loss values. If high-loss data have drastically different ℓ0 from the SAE's average performance during training, that can be evidence of either off-distribution data (compared to the training data) or some kind of data with more complex information. The imperfect performance of SAEs on these metrics could be explained in a couple of ways: The fundamental assumptions of SAEs are mostly right, but we're bad at training SAEs. Perhaps if we learn to train better SAEs, these problems will become less bad. Perhaps we need to accept higher ℓ0 norms (more features active per token). This would not be ideal for interpretability, though. Perhaps there's part of the signal which is dense or hard for an SAE to learn and so we are systematically missing some kind of information. Maybe a more sophisticated sparsity enforcement could help with this. The fundamental assumption...

Feb 27, 2024 • 2min

EA - How much parenting harms productivity and how you can reduce it by Nicholas Kruus

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How much parenting harms productivity and how you can reduce it, published by Nicholas Kruus on February 26, 2024 on The Effective Altruism Forum. Intro Many EAs factor children's effects on their personal impact when deciding whether to have them ( example). To offer some insight for potential parents, I tried to summarize the best research I could find on parenthood's impact on people's productivity, though I was surprised at the lack of robust literature (especially more recently). The following information comes from four studies: one published in Science[1], one published in Nature[2], one from the Federal Reserve of St. Louis[3], and one published in Social Studies of Science[4]. They all focus on academics and quantitatively measure productivity with research output metrics (the footnotes contain more detail about each). How parenting impacts productivity TLDR: The trend for having one child seems to be a short-term reduction in productivity (median: 17%, mean: ~23%) for mothers that peters out after ~10 years. There is usually little effect on fathers, but fathers who are primary caregivers (or otherwise more engaged with their children) suffer similar short-term (<10 year) productivity losses. Each additional child seems to decrease short-term productivity by an additional 11%. Science: Short-term (<10 years after having children): Consistent effects on mothers: The paper finds a ~17%, ~24%, and ~48% decrease in productivity[1] for those in computer science, business, and history, respectively. The authors propose that the different levels of cooperation in these fields may explain the variation in productivity impact - i.e., those in more cooperative fields may suffer lower productivity losses. They also note their results likely underestimate the effects because their sample didn't include parents who left academia (which may have been prompted by their children directly or indirectly) Inconsistent effects on fathers lead the authors to conclude there is "no clear evidence" of a short-term productivity decrease for fathers. Long-term (>10 years after having children): Inconclusive results for both mothers and fathers. Federal Reserve of St. Louis: Short-term (<12 years after having children): Women's productivity[3] decreased by 15%-17% on average. The total productivity cost of having one, two, or three preteens was 9.5%, 22%, and 33%. "Men's productivity is not associated with their family situation in an economically significant manner." Long-term (>12 years after having children): Parenting has no effect on productivity for mothers or fathers, so long as they have their children on purpose after they turn 30 years old. Social Studies of Science: Overall: 8% and 12% decline in research productivity and visibility[4], respectively, for men and women combined. For women, the decrease was 15%. To illustrate the cumulative effect of this, mothers were 2 years behind their childless counterparts in the number of papers they published 18 years after having their children. How to minimize productivity impacts Have kids later Economists who become mothers before 30 suffer a 13% decrease in overall (short- & long-term) productivity[3], whereas those having children after 30 do not ( Fed of St. Louis). Employment at an institution 100 ranks higher correlates with an additional 1-year delay before having children. However, this might be explained by personality: Perhaps, the type of people who wait to have children are the type of people who become employed at higher-ranked institutions ( Science). Take parental leave Taking parental leave shorter than 1 month does not mitigate productivity losses, but parental leave longer than 1 month and less than 12 months correlated with an 11%-17% productivity[2] improvement ( Nature). Be a lazier parent and divide labor bet...

Feb 26, 2024 • 22min

EA - How we started our own EA charity (and why we decided to wrap up) by KvPelt

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How we started our own EA charity (and why we decided to wrap up), published by KvPelt on February 26, 2024 on The Effective Altruism Forum. This post shares our journey in starting an Effective Altruism (EA) charity/project focused on Mediterranean fish welfare, the challenges we faced, our key learnings, and the reasons behind our decision to conclude the project. Actual research results are published in a Literature review and article. Key points The key points of this post are summarized as follows: We launched a project with the goal of enhancing fish welfare in Mediterranean aquaculture. We chose to limit our project to gathering information and decided against continuing our advocacy efforts after our initial six months. Our strategy, which focused on farmer-friendly outreach, was not effective in engaging farmers. The rationale behind our decision is the recognition that existing organizations are already performing excellent work, and we believe that funders should support these established organizations instead of starting a new one. The support and resources from the Effective Altruism (EA) and animal welfare communities were outstanding. Despite the project not achieving its intended outcomes, we view the overall experience positively. It's common for new charities to not succeed; the key is to quickly determine the viability of your idea, which we believe we have accomplished. Note: Ren has recently begun working as a guest fund manager for the EA Funds Animal Welfare Fund. The views that we express in this article are our views, and we are not speaking for the fund. Personal/Project background Before delving into our project we'll provide a quick background of our profiles and how we got to starting this project. Koen During my Masters in Maritime/Offshore engineering (building floating things) I got interested in animal welfare. Due to engagement with my EA university group (EA Delft) and by attending EAG(x)Rotterdam I became interested and motivated to use my career to work on animal welfare. I hoped to apply my maritime engineering background in a meaningful way, which led me to consider aquatic animal welfare. I attended EAGLondon in 2023 with the goal of finding career opportunities and surprisingly this worked! I talked to many with backgrounds in animal welfare (AW) and engineering and in one of my 1on1's I met someone who would later connect me with Ren. Ren As a researcher, Ren has been working at Animal Ask for the past couple of years conducting research to support the animal advocacy movement. However, Ren still feels really sad about the scale of suffering endured by animals, and this was the motivation to launch a side project. Why work on Mediterranean fish welfare? This project originated out of a desire to work on alleviating extreme-suffering. More background on the arguments to focus on extreme-suffering is discussed in Ren's earlier forum post. When the welfare of nonhuman animals is not taken into account during slaughter, extreme-suffering is likely to occur. Also, from Ren's existing work at Animal Ask, they knew that stunning before slaughter is often quite well-understood and tractable. Therefore, Ren produced a systematic spreadsheet of every farmed animal industry in developed countries (i.e., those countries where Ren felt safe and comfortable working). This spreadsheet included information on a) the number of animals killed, and b) whether those animals were already being stunned before slaughter. Three industries emerged as sources of large-scale, intense suffering: 1. Farmed shrimp in the United States, 2. Farmed shrimp in Australia, and 3. Sea bass and sea bream in the Mediterranean. Ren actually looked at farmed shrimp initially, and work on these projects may continue in the future, but there are some technical reasons ...

Feb 26, 2024 • 19min

LW - Ideological Bayesians by Kevin Dorst

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ideological Bayesians, published by Kevin Dorst on February 26, 2024 on LessWrong. TLDR: It's often said that Bayesian updating is unbiased and converges to the truth - and, therefore, that biases must emerge from non-Bayesian sources. That's not quite right. The convergence results require updating on your total evidence - but for agents at all like us, that's impossible - instead, we must selectively attend to certain questions, ignoring others. Yet correlations between what we see and what questions we ask - "ideological" Bayesian updating - can lead to predictable biases and polarization. Professor Polder is a polarizing figure. His fans praise him for his insight; his critics denounce him for his aggression. Ask his fans, and they'll supply you with a bunch of instances when he made an insightful comment during discussions. They'll admit that he's sometimes aggressive, but they can't remember too many cases - he certainly doesn't seem any more aggressive than the average professor. Ask his critics, and they'll supply you with a bunch of instances when he made an aggressive comment during discussions. They'll admit that he's sometimes insightful, but they can't remember too many cases - he certainly doesn't seem any more insightful than the average professor. This sort of polarization is, I assume, familiar. But let me tell you a secret: Professor Polder is, in fact, perfectly average - he has an unremarkably average number of both insightful and aggressive comments. So what's going on? His fans are better at noticing his insights, while his critics are better at noticing his aggression. As a result, their estimates are off: his fans think he's more insightful than he is, and his critics think he's more aggressive than he is. Each are correct about individual bits of the picture - when they notice aggression or insight, he is being aggressive or insightful. But none are correct about the overall picture. This source of polarization is also, I assume, familiar. It's widely appreciated that background beliefs and ideology - habits of mind, patterns of salience, and default forms of explanation - can lead to bias, disagreement, and polarization. In this broad sense of "ideology", we're familiar with the observation that real people - especially fans and critics - are often ideological.[1] But let me tell you another secret: Polder's fans and critics are all Bayesians. More carefully: they all maintain precise probability distributions over the relevant possibilities, and they always update their opinions by conditioning their priors on the (unambiguous) true answer to a partitional question. How is that possible? Don't Bayesians, in such contexts, update in unbiased[2] ways, always converge to the truth, and therefore avoid persistent disagreement? Not necessarily. The trick is that which question they update on is correlated with what they see - they have different patterns of salience. For example, when Polder makes a comment that is both insightful and aggressive, his fans are more likely to notice (just) the insight, while his critics are more likely to notice (just) the aggression. This can lead to predictable polarization. I'm going to give a model of how such correlations - between what you see, and what questions you ask about it - can lead otherwise rational Bayesians to diverge from both each other and the truth. Though simplified, I think it sheds light on how ideology might work. Limited-Attention Bayesians Standard Bayesian epistemology says you must update on your total evidence. That's nuts. To see just how infeasible that is, take a look at the following video. Consider the question: what happens to the exercise ball? I assume you noticed that the exercise ball disappeared. Did you also notice that the Christmas tree gained lights, the bowl changed c...

Feb 26, 2024 • 10min

LW - China-AI forecasts by NathanBarnard

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: China-AI forecasts, published by NathanBarnard on February 26, 2024 on LessWrong. The rate at which China is able to advance towards TAI is a crucial consideration in for many policy questions. My current take is that, without significant political reforms which seem very unlikely while Xi is alive (although considerably more likely after his death,) it's very unlikely that China will be able to mount a meaningful challenge to AI firms in US and allies in the race for TAI. I don't think it requires democratic reforms are required to China to be competitive with the US and allies, but I do think rule of law reforms are likely to be required. The first post is going to be me forecasting Chinese growth on the theory that, if China reaches rich country status, it's likely that it will be able to compete with the US and allies for leadership in AI. I'll write a second post looking at Chinese AI efforts in particular. The outside view Most countries that become middle income countries, have, thus far, stayed at middle income level. Chinese per capita income is currently at almost exactly the world average level. The only countries (and territories) in the last 70 years that have gone low income to high income countries in the last 70 years (without oil wealth) are South Korea, Taiwan, Singapore (which does have substantial oil wealth,) and Hong Kong, although it seems very likely that Malaysia will join that club in the near future. The majority of countries have managed to emerge from low-income status to middle income status because you only need to get a few things right. If you can get your population to urbanize, have basic rule by law so that firms have basic protection from violence, and get a high enough savings rate to accumulate physical capital you can get to middle income status just using catch up growth. Catch up growth is the reason conceptually why middle-income status - rather than getting to a given level of GDP per capita - is the correct misuse. When growing with catch up growth you can just growth by accumulating physical capital using standard technologies that have been already been developed, like the technology for light manufacturing or civil engineering. Past this point though countries get rich by being able to develop and use technologies close to or at the frontier. China has successfully managed to accumulate capital to utilize catch-up technologies, like steelmaking and light manufacturing. It's quite successfully managed to urbanize it's population and now seems to have reached the Lewis turning point where young people who try to leave their villages to find work cities often don't find it and have to stay in their villages, in the much lower productivity jobs. Democracy and rule of law rates give another outside view on Chinese growth prospects. Of the 53 rich countries and territories that aren't oil states or microstates, only 2 aren't democracies - Singapore and Hong Kong - and none lack rule by law and all have low levels of corruption. China currently lacks democracy, has high levels of corruption (although roughly normal levels for a middle income country is my perception,) and sort of middling levels of rule by law. An important part of countries getting to high income status is new firms forming and competing to deploy and create ~frontier technologies and process. This is harder to do than accumulating enough capital and having low enough levels of violence and corruption to be able to build decent housing, supply reliable electricity and water, and have large numbers of workers do semi-skilled manual labour at scale. Specifically this can all be done while elites earn large rents by establishing monopolies (or more generally accruing market power) that they exclude non-elites from. The role that democracy plays in this story is ...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

The Nonlinear Library

Episodes

Mentioned books

LW - Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders by Evan Anders

EA - What posts would you like someone to write? by tobytrem

EA - Meta Charity Funders: Launching the 2nd round by Vilhelm Skoglund

LW - How I build and run behavioral interviews by benkuhn

LW - Can an AI do our alignment homework for us? by Chris Leong

EA - Nuclear war tail risk has been exaggerated? by Vasco Grilo

EA - How much parenting harms productivity and how you can reduce it by Nicholas Kruus

EA - How we started our own EA charity (and why we decided to wrap up) by KvPelt

LW - Ideological Bayesians by Kevin Dorst

LW - China-AI forecasts by NathanBarnard

The AI-powered Podcast Player