The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Feb 2, 2024 • 2min

AF - Survey for alignment researchers: help us build better field-level models by Cameron Berg

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Survey for alignment researchers: help us build better field-level models, published by Cameron Berg on February 2, 2024 on The AI Alignment Forum. AE Studio is launching a short, anonymous survey for alignment researchers, in order to develop a stronger model of various field-level dynamics in alignment. This appears to be an interestingly neglected research direction that we believe will yield specific and actionable insights related to the community's technical views and more general characteristics. The survey is a straightforward 5-10 minute Google Form with some simple multiple choice questions. For every alignment researcher who completes the survey, we will donate $40 to a high-impact AI safety organization of your choosing (see specific options on the survey). We will also send each alignment researcher who wants one a customized report that compares their personal results to those of the field. Together, we hope to not only raise some money for some great AI safety organizations, but also develop a better field-level model of the ideas and people that comprise alignment research. We will open-source all data and analyses when we publish the results. Thanks in advance for participating and for sharing this around with other alignment researchers! Survey full link: https://forms.gle/d2fJhWfierRYvzam8 Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

Feb 2, 2024 • 2min

LW - Most experts believe COVID-19 was probably not a lab leak by DanielFilan

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Most experts believe COVID-19 was probably not a lab leak, published by DanielFilan on February 2, 2024 on LessWrong. The Global Catastrophic Risks Institute conducted an anonymous survey of relevant experts on whether they thought COVID was more likely caused by a lab accident (aka lab leak) or zoonotic spillover. Their summary, bolding is mine: The study's experts overall stated that the COVID-19 pandemic most likely originated via a natural zoonotic event, defined as an event in which a non-human animal infected a human, and in which the infection did not occur in the course of any form of virological or biomedical research. The experts generally gave a lower probability for origin via a research-related accident, but most experts indicated some chance of origin via accident and about one fifth of the experts stated that an accident was the more likely origin. These beliefs were similar across experts from different geographic and academic backgrounds. The experts mostly expressed the view that more research on COVID-19's origin could be of value. About half of the experts stated that major gaps still remain in the understanding COVID-19's origin, and most of the other experts also stated that some research is still needed. About 40% of experts stated that clarity on COVID-19 origins would provide a better understanding of the potential origins of future pandemics. Given clarity on COVID-19's origin, experts also proposed a variety of governance changes for addressing future pandemics, including measures to prevent initial human infection, measures to prevent initial infection from becoming pandemic, and measures to mitigate the harm once the pandemic occurs. The vast majority of the experts express the belief that a natural zoonotic event will likely be the origin of the next pandemic. The experts also provided a set of clear recommendations for preventing, preparing for and responding to future pandemics, which generally align with many previous studies. Link to the main report is here, and link to their (much longer) methodological and analytical annex is here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 2, 2024 • 3min

LW - On Not Requiring Vaccination by jefftk

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Not Requiring Vaccination, published by jefftk on February 2, 2024 on LessWrong. A friend recently wrote that they wouldn't be attending a mask-optional contra dance weekend I'm playing at because it doesn't require vaccination. As an organizer of unrelated dance events, which also don't require vaccination, here's how I see the situation. When the covid vaccines first came out they were a huge improvement over the status quo. Getting the vaccine reduced your risk of severe illness or death, reduced your chance of catching it, and reduced your chance of giving it to others after you were sick. Our events initially required vaccination, which I think was the right call. At this point, however, there are a few different things you might mean if you say your event requires vaccination. Usually it's either: A complete primary series counts, even if it's one shot of J&J from early 2021. The most recent ("updated") booster is required. The CDC used to call the first category "fully vaccinated", but no longer talks about it prominently. They've switched to focusing on the second one, which they call "up to date". This change makes sense: from a perspective of avoiding getting infected and passing it on, only a recent booster does very much. A few months ago I wrote about the results in Menegale et. al (2023) where they saw that as an infection control measure vaccines wane quickly: effectiveness halves about three times a year. Additionally, lots of people got sick this winter, which acts similarly to an a vaccination. Given how quickly the vaccine wears off, I'd be less concerned about risk from someone unvaccinated who'd had covid over Christmas than someone who got their booster in the early Fall and dodged the winter wave. What does this mean for a dance event? If you want a vaccination requirement to be doing anything useful, you need to require people be up-to-date with their vaccine. Risk from someone last boosted in Fall 2022 or before is not appreciably different from someone who was never vaccinated. Requiring up-to-date vaccination rules out 80% of people 18+, though if they're excited enough about your event they could go get the shot. Unless your event is in the fall or early winter, even requiring up-to-date vaccination doesn't help much: if someone got their "updated" booster in the fall they're down to ~35% of peak efficacy by the time of an April event. If an event does want to reduce risk of transmission, I'd recommend considering some combination of ventilation, filters, tests, and masking. Comment via: facebook, lesswrong, mastodon Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 2, 2024 • 27min

LW - Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small by Joseph Bloom

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small, published by Joseph Bloom on February 2, 2024 on LessWrong. This work was produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, under mentorship from Neel Nanda and Arthur Conmy. Funding for this work was provided by the Manifund Regranting Program and donors as well as LightSpeed Grants. This is intended to be a fairly informal post sharing a set of Sparse Autoencoders trained on the residual stream of GPT2-small which achieve fairly good reconstruction performance and contain fairly sparse / interpretable features. More importantly, advice from Anthropic and community members has enabled us to train these fairly more efficiently / faster than before. The specific methods that were most useful were: ghost gradients, learning rate warmup, and initializing the decoder bias with the geometric median. We discuss each of these in more detail below. 5 Minute Summary We're publishing a set of 12 Sparse AutoEncoders for the GPT2 Small residual stream. These dictionaries have approximately 25,000 features each, with very few dead features (mainly in the early layers) and high quality reconstruction (log loss when the activations are replaced with the output is 3.3 - 3.6 as compared with 3.3 normally). The L0's range from 5 in the first layer to 70 in the 9th SAE (increasing by about 5-10 per layer and dropping in the last two layers. By choosing a fixed dictionary size, we can see how statistics like the number of dead features or reconstruction cross entropy loss change with layer giving some indication of how properties of the feature distribution change with layer depth. We haven't yet extensively analyzed these dictionaries, but will share automatically generated dashboards we've generated. Readers can access the Sparse Autoencoder weights in this HuggingFace Repo. Training code and code for loading the weights / model and data loaders can be found in this Github Repository. Training curves and feature dashboards can also be found in this wandb report. Users can download all 25k feature dashboards generated for layer 2 and 10 SAEs and the first 5000 of the layer 5 SAE features here (note the left hand of column of the dashboards should currently be ignored). Layer Variance Explained L1 loss L0* % Alive Features Reconstruction CE Log Loss 0 99.15% 4.58 12.24 80.0% 3.32 1 98.37% 41.04 14.68 83.4% 3.33 2 98.07% 51.88 18.80 80.0% 3.37 3 96.97% 74.96 25.75 86.3% 3.48 4 95.77% 90.23 33.14 97.7% 3.44 5 94.90% 108.59 43.61 99.7% 3.45 6 93.90% 136.07 49.68 100% 3.44 7 93.08% 138.05 57.29 100% 3.45 8 92.57% 167.35 65.47 100% 3.45 9 92.05% 198.42 71.10 100% 3.45 10 91.12% 215.11 53.79 100% 3.52 11 93.30% 270.13 59.16 100% 3.57 Original Model 3.3 Summary Statistics for GPT2 Small Residual Stream SAEs. *L0 = Average number of features firing per token. Training SAEs that we were happy with used to take much longer than it is taking us now. Last week, it took me 20 hours to train a 50k feature SAE on 1 billion tokens and over the weekend it took 3 hours for us to train 25k SAE on 300M tokens with similar variance explained, L0 and CE loss recovered. We attribute the improvement to having implemented various pieces of advice that have made our lives a lot easier: Ghost Gradients / Avoiding Resampling: Prior to ghost gradients (which we were made aware of last week in the Anthropic January Update), we were training SAEs with approximately 50k features on 1 billion tokens with 3 resampling events to reduce the number of dead features. This took around 20 hours and might cost about $10 with an A6000 GPU. With ghost gradients, we don't need to resample (or wait for loss curves to plateau after resampling). Now we can train on only 300M tokens instead. Simultaneously, since we now...

Feb 2, 2024 • 2min

LW - Wrong answer bias by lukehmiles

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Wrong answer bias, published by lukehmiles on February 2, 2024 on LessWrong. The correct answer leaves nothing to be said. The wrong answer starts a conversation, a research program, an investigation, a journey, an institute, a paper, a book, a youtube channel, a lifestyle, a tribe. Why is reading a textbook so boring? Very few people do it. It just has right answers and few great questions to ponder. Blogs arguing about nutrition are great reading though. If you write a research paper saying "all that shit is dumb just do the obvious thing" then you'll probably have trouble getting it published. I tried once but my professor shut it down saying it's a bad strat basically. "Clean energy" is great as a conversation piece or phrase in your mission statement but "just deregulate nuclear lmao why are you wasting your time" is definitely not in my experience. It's fine if bad ideas are all we ever talk about but the trouble comes when it's time for someone to sit down and do their work. The doctor trying to cure a thing mostly heard about the RCTs on all the shitty methods that barely do anything, so they pick their favorite of those. The AI safety implementer mostly heard discussions about bad methods and probably tries to patch one of those. (And fixing the mistake in a bad method is a not a good way to make a good method.) The parents in the parent group of course mostly talk about their failed attempts to fix the problems they do have, and mostly forget about the problems they never had or quickly fixed. Have hope, you can combat wrong answer bias with these simple tricks. Do not write or speak Do not read or listen Hate the process of fixing things Savor the experience of shit working like it should Act like you have no time or money and won't ever have any (helps if you don't and won't) Assume your answer is too complicated Assume your question is too loaded Assume any task is actually really easy you're doing it wrong Hate struggle Hate nuance If you look impressive then you're doing it wrong Love true success If it looks like you barely did anything at all then you're doing it right Observe people & orgs who get shit done or don't have a problem you have Turn your back on anyone struggling with anything for any reason. They're probably looking for a frend or comunity instead of just focusing on nailing shit all the time like you. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 2, 2024 • 2min

LW - Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis by simeon c

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis, published by simeon c on February 2, 2024 on LessWrong. The programme thesis of Davidad's agenda to develop provably safe AI has just been published. You can find extra details by downloading this doc. For context, Davidad is a Programme Director at ARIA who will grant somewhere between £10M and £50M over the next 3 years to pursue his research agenda. It is the most comprehensive public document detailing his agenda to date. Here's the most self-sufficient graph explaining it at a high-level although you'll have to dive into the details and read it several times to start grasping the many dimensions of it. I'm personally very excited by Davidad's moonshot that I currently see as the most credible alternative to scaled transformers, which I consider to be too flawed to be a credible safe path, mostly because: Ambitious LLM interpretability seems very unlikely to work out: Why: the failed attempts at making meaningful progress of the past few years + the systematic wall of understanding of ~80% of what's going on across reverse engineering attempts Adversarial robustness to jailbreak seems unlikely to work out: Why: failed attempts at solving it + a theoretical paper of early 2023 that I can't find right now + increasingly large context windows Safe generalization with very high confidence seems quite unlikely to work out Why: absence of theory on transformers + weak interpretability A key motivation for pursuing moonshots a la Davidad is, as he explains in his thesis, to shift the incentives from the current race to the bottom, by derisking credible paths to AI systems where we have strong reasons to expect confidence in the safety of systems. See the graph below: Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Feb 2, 2024 • 27min

AF - Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small by Joseph Isaac Bloom

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small, published by Joseph Isaac Bloom on February 2, 2024 on The AI Alignment Forum. UPDATE: Since we posted this last night, someone pointed out that our implementation of ghost grads has a non-trivial error (which makes the results a-priori quite surprising). We computed the ghost grad forward pass using Exp(Relu(W_enc(x)[dead_neuron_mask])) rather than Exp((W_enc(x)[dead_neuron_mask])). I'm running some ablation experiments now to get to the bottom of this. This work was produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, under mentorship from Neel Nanda and Arthur Conmy. Funding for this work was provided by the Manifund Regranting Program and donors as well as LightSpeed Grants. This is intended to be a fairly informal post sharing a set of Sparse Autoencoders trained on the residual stream of GPT2-small which achieve fairly good reconstruction performance and contain fairly sparse / interpretable features. More importantly, advice from Anthropic and community members has enabled us to train these fairly more efficiently / faster than before. The specific methods that were most useful were: ghost gradients, learning rate warmup, and initializing the decoder bias with the geometric median. We discuss each of these in more detail below. 5 Minute Summary We're publishing a set of 12 Sparse AutoEncoders for the GPT2 Small residual stream. These dictionaries have approximately 25,000 features each, with very few dead features (mainly in the early layers) and high quality reconstruction (log loss when the activations are replaced with the output is 3.3 - 3.6 as compared with 3.3 normally). The L0's range from 5 in the first layer to 70 in the 9th SAE (increasing by about 5-10 per layer and dropping in the last two layers. By choosing a fixed dictionary size, we can see how statistics like the number of dead features or reconstruction cross entropy loss change with layer giving some indication of how properties of the feature distribution change with layer depth. We haven't yet extensively analyzed these dictionaries, but will share automatically generated dashboards we've generated. Readers can access the Sparse Autoencoder weights in this HuggingFace Repo. Training code and code for loading the weights / model and data loaders can be found in this Github Repository. Training curves and feature dashboards can also be found in this wandb report. Users can download all 25k feature dashboards generated for layer 2 and 10 SAEs and the first 5000 of the layer 5 SAE features here (note the left hand of column of the dashboards should currently be ignored). Layer Variance Explained L1 loss L0* % Alive Features Reconstruction CE Log Loss 0 99.15% 4.58 12.24 80.0% 3.32 1 98.37% 41.04 14.68 83.4% 3.33 2 98.07% 51.88 18.80 80.0% 3.37 3 96.97% 74.96 25.75 86.3% 3.48 4 95.77% 90.23 33.14 97.7% 3.44 5 94.90% 108.59 43.61 99.7% 3.45 6 93.90% 136.07 49.68 100% 3.44 7 93.08% 138.05 57.29 100% 3.45 8 92.57% 167.35 65.47 100% 3.45 9 92.05% 198.42 71.10 100% 3.45 10 91.12% 215.11 53.79 100% 3.52 11 93.30% 270.13 59.16 100% 3.57 Original Model 3.3 Summary Statistics for GPT2 Small Residual Stream SAEs. *L0 = Average number of features firing per token. Training SAEs that we were happy with used to take much longer than it is taking us now. Last week, it took me 20 hours to train a 50k feature SAE on 1 billion tokens and over the weekend it took 3 hours for us to train 25k SAE on 300M tokens with similar variance explained, L0 and CE loss recovered. We attribute the improvement to having implemented various pieces of advice that have made our lives a lot easier: Ghost Gradients / Avoiding Resampling: Prior to ghost gradients (which we were made aware of last week in the Anthropic Jan...

Feb 1, 2024 • 35min

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

The Nonlinear Library

Episodes

Mentioned books

AF - Survey for alignment researchers: help us build better field-level models by Cameron Berg

LW - Most experts believe COVID-19 was probably not a lab leak by DanielFilan

LW - On Not Requiring Vaccination by jefftk

LW - Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small by Joseph Bloom

LW - Wrong answer bias by lukehmiles

LW - Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis by simeon c

AF - Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small by Joseph Isaac Bloom

AF - Evaluating Stability of Unreflective Alignment by james.lucassen

EA - Managing risks while trying to do good by Wei Dai

EA - Increasingly vague interpersonal welfare comparisons by MichaelStJules

The AI-powered Podcast Player