The Nonlinear Library

The Nonlinear Fund
undefined
Nov 2, 2023 • 8min

LW - Saying the quiet part out loud: trading off x-risk for personal immortality by disturbance

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Saying the quiet part out loud: trading off x-risk for personal immortality, published by disturbance on November 2, 2023 on LessWrong. Statement: I want to deliberately balance the caution and the recklessness in developing AGI, such that it gets created in the last possible moment so that I and my close ones do not die. This Statement confuses me. There are several observations I can make about it. There are also many questions I want to ask but have no idea how to answer. The goal of this post is to deconfuse myself, and to get feedback on the points that I raised (or failed to raise) below. First observation: The Statement is directly relevant to LW interests. It ties together the issues of immortality and AI risk, both of which are topics people here are interested in. There are countless threads, posts and discussions about high-level approaches to AI safety, both in the context of "is" (predictions) and "ought" (policy). At the same time, there is still a strong emphasis on the individual action, deliberating on which choices to make to improve the to marginal effects of living life in a certain way . The same is true for immortality. It has been discussed to death, both from the high-level and from the individual, how-do-I-sign-up-for-Alcor point of view. The Statement has been approached from the "is" , but not from the "ought" perspective. At the same time: Second observation: No one talks about the Statement. I have never met anyone who expressed this opinion, neither in-person nor online, even after being a part (although, somewhat on the periphery) of the rationalist community for several years. Not only that, I have not been able to find any post or comment thread on LW or SSC/ACX that discusses it, argues for or against it, or really gives it any attention whatsoever. I am confused by this since the Statement seems to be fairly straightforward. One reason might be the: Third observation : Believing in the Statement is low status, as it constitutes an almost-taboo opinion. Not only no one is discussing it, but the few times when I expressed the Statement in person (at EA-infiltrated rationalists meetups), it was treated with suspicion or hostility. Although to be honest, I'm not sure how much this is me potentially misinterpreting the reactions. I got the impression that it is seen as sociopathic. Maybe it is? Fourth observation : Believing in the Statement is incompatible with long-termism, and it runs counter to significantly valuing future civilisation in general. Fifth observation: Believing in the Statement is compatible with folk morality and revealed preferences of most of the population. Most people value their lives, and the lives of those around them to a much greater extent than those far away from them. This is even more true for the future lives. The revealed-preference discount factor is bounded away from 1. Sixth observation: The Statement is internally consistent. I don't see any problems with it on the purely logical level. Rational egoism (or variants thereof) constitutes a valid ethical theory, although it is potentially prone to self-defeat. Seventh observation: Because openly admitting to believing in the Statement is disadvantageous, it is possible that many people in fact hold this opinion secretly. I have no idea how plausible this is. Judging this point is one of my main goals in writing this post. The comments are a good place for debating the meta-level points, but, if I am right about the cost of holding this opinion - not so much for counting its supporters. An alternative is this anonymous poll I created please vote if you're reading this. Eighth observation: The Statement has the potential to explain some of the variance of attitudes to AI risk-taking. One way of interpreting this observation might be that people arguing a...
undefined
Nov 2, 2023 • 9min

EA - Complementary notes on Alvea [unofficial] by mxschons

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Complementary notes on Alvea [unofficial], published by mxschons on November 2, 2023 on The Effective Altruism Forum. My ex-colleagues just posted the official Alvea winddown announcement as well as Kyle's reflections on our activities over the past 1.5 years. I wanted to complement these posts with some additional comments. Publication of the South Africa Trial Results Our academic publication of the South Africa Trial results are currently under review, but you can see the submitted manuscript here . We also published the complete data-set as well as the full study report . While Kyle describes our efficacy results correctly as underwhelming (btw - the Janssen comparator as well), the trial made several important scientific contributions. To our knowledge this study is the first time where A naked DNA based SARS-CoV-2 booster candidate was studied in preimmunized humans (with ~80% having hybrid immunity from previous infections). Unusually high doses of up to 8 mg DNA plasmid were administered intradermally / subcutaneously during a single visit. A SARS-CoV-2 vaccine candidate was compared during its first Phase-1 safety study against a licensed comparator (Janssen's Ad26.COV2.S). (Sidenote: it is insane that this seems to be the only COVID-19 trial that managed to get an comparator) Lessons learned In addition to Kyle's list, here are some more I'd like to point out: Experience matters: Alvea matured substantially over time. From my perspective, many important internal operations mistakes could have been prevented if there had been more experience in the team earlier on. Same for research decisions and other mistakes we made. Sometimes you really want this person who has 4,000+ h of work experience and not 40h of research experience. Maybe even often. You can hire people remotely: It is very common for orgs to be hesitant to hire internationally. I myself would not have been able to work for Alvea if Grigory wouldn't have been creative about contracting and working with Employer of Records to employ ~50% of the 45 person team. I claim that organizations are missing out on top talent if they are only fishing in their country. We had kick-ass values I am still identifying with (shoutout to Eric for shaping our company culture with those) Ownership: We take personal responsibility for achieving exceptional results. We draw energy from doing the "impossible." Agility: We move with speed and flexibility in the face of constant uncertainty. Truthseeking: We chase the truth, especially when it requires courage. We experiment to change our minds quickly. Care: We have each other's back. Our mutual support sparks our growth as a team and our impact in the world. Medical countermeasures matter: I get the vibes in EA that people are not excited about medical countermeasures / their deployment. I'd like to challenge that. Ambulances, firefighters, and police take less than 15 minutes to arrive at an emergency scene. Starting a clinical trial 100d? Humanity needs an equivalent pharmaceutical preparedness and response ability to stop pandemics in their tracks. For instance, I would like to see work on a 100h response plan (in addition to the 100d CEPI timeline) to an outbreak using candidate vaccines and ultra-rapid ring-vaccination trials. So consider PanopliaLabs when allocating your donations this year. Drug Development Explainer Series I just cross posted an explainer series on drug development from my website I wrote up with James Smith and Kirsten Angeles . I argue it captures most of the insights on the pharma landscape we made at Alvea and recommend it to everyone who wants to get a comprehensive overview of how drugs are made. One article in the series describes some of the core cultural and operational activities that allowed Alvea to execute that quickly. For conveni...
undefined
Nov 2, 2023 • 39min

LW - Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk by 1a3orn

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk, published by 1a3orn on November 2, 2023 on LessWrong. 0: TLDR I examined all the biorisk-relevant citations from a policy paper arguing that we should ban powerful open source LLMs. None of them provide good evidence for the paper's conclusion. The best of the set is evidence from statements from Anthropic -- which rest upon data that no one outside of Anthropic can even see, and on Anthropic's interpretation of that data. The rest of the evidence cited in this paper ultimately rests on a single extremely questionable "experiment" without a control group. In all, citations in the paper provide an illusion of evidence ("look at all these citations") rather than actual evidence ("these experiments are how we know open source LLMs are dangerous and could contribute to biorisk"). A recent further paper on this topic (published after I had started writing this review) continues this pattern of being more advocacy than science. Almost all the bad papers that I look at are funded by Open Philanthropy. If Open Philanthropy cares about truth, then they should stop burning the epistemic commons by funding "research" that is always going to give the same result no matter the state of the world. 1: Principles What could constitute evidence that powerful open-source language models contribute or will contribute substantially to the creation of biological weapons, and thus that we should ban them? That is, what kind of anticipations would we need to have about the world to make that a reasonable thing to think ? What other beliefs are a necessary part of this belief making any sense at all? Well, here are two pretty-obvious principles to start out with: Principle of Substitution : We should have evidence of some kind that the LLMs can (or will) provide information that humans cannot also easily access through other means -- i.e., through the internet, textbooks, YouTube videos, and so on. Blocker Principle : We should have evidence that the lack of information that LLMs can (or will) provide is in fact a significant blocker to the creation of bioweapons. The first of these is pretty obvious. As example: There's no point in preventing a LLM from telling me how to make gunpowder, because I can find out how to do that from an encyclopedia, a textbook, or a novel like Blood Meridian. If you can substitute some other source of information for an LLM with only a little inconvenience, then an LLM does not contribute to the danger. The second is mildly less obvious. In short, it could be that most of the blocker to creating an effective bioweapon is not knowledge -- or the kind of knowledge that an LLM could provide -- but something else. This "something else" could be access to DNA synthesis; it could be the process of culturing a large quantity of the material; it could be the necessity of a certain kind of test; or it could be something else entirely. You could compare to atomic bombs -- the chief obstacle to building atomic bombs is probably not the actual knowledge of how to do this, but access to refined uranium. Thus, rather than censor every textbook on atomic physics, we can simply control access to refined uranium. Regardless, if this other blocker constitutes 99.9% of the difficulty in making an effective bioweapon, and lack of knowledge only constitutes 0.1% of the difficulty, then an LLM can only remove that 0.1% of the difficulty, and so open source LLMs would only contribute marginally to the danger. Thus, bioweapons risk would not be a good reason to criminalize open-source LLMs. (I am not speaking theoretically here -- a paper from a researcher at the Future of Humanity Institute argues that the actual product development cycle involved in creating a bioweapon is far, far more of an obstacle to its cre...
undefined
Nov 2, 2023 • 4min

EA - EAGxVirtual: Speaker announcements, timings, and other updates by Sasha Berezhnoi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EAGxVirtual: Speaker announcements, timings, and other updates, published by Sasha Berezhnoi on November 2, 2023 on The Effective Altruism Forum. EAGxVirtual is fast approaching and we're excited to share some more details about the event! This post covers updates from the team, including dates and times, content, unique features, and demographic data. In the previous post , we covered the conference theme, reasons to attend, and reviews from the previous attendees. Content: what to expect We're very excited to announce our key speakers for this event: Peter Singer on the most pressing moral issues facing humanity . Bruce Friedrich, President of The Good Food Institute on longtermism and alternative proteins. Carl Robichaud, Co-lead on nuclear policy grantmaking at Longview Philanthropy on a turning point in the story of nuclear weapons. Olga Kikou, Head of the EU Office of Compassion in World Farming on ending the cage age in the EU. Neel Nanda, Research Engineer at DeepMind on open problems in mechanistic interpretability. We are working hard on the program. Beyond the above talks (and many more talks and workshops!), you can expect office hours hosted by experts and EA orgs, fireside chats, group meetups and icebreakers, lightning talks from attendees, and unofficial satellite events. The tentative schedule is available here (all times are in UTC). Please note that the schedule is subject to change. The final schedule will be available on the Swapcard app, which we aim to launch next week. Taking action anywhere in the world We have already received 600 applications from people representing over 70 countries. We welcome all who have a genuine interest in learning more or connecting, including those who are new to effective altruism. If you are a highly-engaged EA, you can make a difference by being responsive to requests from first-time attendees. The map below shows the geographical distribution of the participants: We would love to see more applications. If you know someone who you think should attend the conference, please encourage them to apply by sending them this link: eagxvirtual.com The deadline for applications is 11:59 pm UTC on Thursday, 16 November. Apply here if you haven't already. Dates and times The conference will be taking place from 10 am UTC on Friday, November 17th, until 11:59 pm UTC on Sunday, November 19th. We don't expect you to always be online you can be flexible with your participation! It's completely okay if you can attend only on one of the days. Recordings will be available for registered attendees, so you can watch the sessions you missed later. Friday will feature introductory-level content for participants who are relatively new to EA and a career fair on Gather Town . Saturday and Sunday will have full-day schedules, starting at 7 am UTC each day. There will be a break in the program on Sunday between 2 am and 7 am UTC. Conference features Our main content and networking platform for the conference is the Swapcard . We will share access to the app with all the attendees on November 6 and provide guidance on how to use it and get the most out of the conference. We collaborate with EA Gather Town to make an always-available virtual venue for the attendees to spark more connections and unstructured discussions throughout the conference. Extensive stewardship program . We will highlight ambassadors across different cause areas whom you can speak to get advice or feedback on your career plans. Evergreen discussion space : we are inviting everyone to use EA Anywhere Slack as a discussion space. No more Slacks that are abandoned immediately after the conference is over! Ways to contribute If you want to represent your organization at the career fair or host office hours, please, fill out this form . Apply to give a Lightning talk if ...
undefined
Nov 2, 2023 • 14min

EA - Participate in the Donation Election and the first weekly theme (starting 7 November) by Lizka

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Participate in the Donation Election and the first weekly theme (starting 7 November), published by Lizka on November 2, 2023 on The Effective Altruism Forum. TLDR : Participate in the Donation Election, start writing posts for "Effective giving spotlight" week (starting this Tuesday!), and more. There are many ways to participate: Get involved in Giving Season events Donate to the Donation Election Fund to encourage discussion and participation (we're partially matching donations ) Add candidates to the Election Pre-vote [1] to register your interest (pre-votes are anonymous, but we'll know how many people pre-voted for a given candidate in the Election) Share your experience donating, fundraising, earning to give, or more or your uncertainties or considerations about where we should donate and how we should fundraise Fundraise for your project Explain how your project would use extra funding (particularly for Marginal Funding Week ), share impact analyses or retrospectives, invite Forum users to ask you questions , and see if your project should be listed as a candidate in the Election Explore The Giving portal Writing about effective giving Follow the Donation Election (This post is an update to our earlier announcement about Giving Season on the Forum.) Giving Season & weekly discussion themes Start preparing for discussion themes on the EA Forum: Theme and dates Description Effective Giving Spotlight (7-14 November) - starting this Tuesday! How can we grow the amount of funding going to effective projects aimed at improving the world? We'll feature people's experiences with donating, fundraising, earning to give, etc. See more details below. Marginal Funding Week (14-21 November) How would your project use extra funding? To decide whether donating to a given project or organization is cost-effective, it's really useful to know how marginal funding would get used. We'll invite EA organizations and projects to describe how they would use extra donations - a bit like this post about LTFF or otherwise share more about what they do. We might also try to collect a summary of the key information in one post at the end of the week. Donation Debate Week (21-28 November) Where should we donate (and what should we vote for in the Donation Election)? Discuss which interventions and projects are most cost-effective and how they should vote (and donate!). I'm hoping to see estimates (including rough "back of the envelope calculations" BOTECs ) and productive disagreement or identification of what the cruxes that drive different conclusions are. We'll also probably feature some classic writing on the topic, some relevant AMAs , and more. First weekly theme: Effective Giving Spotlight (November 7-14) A lot of promising projects are funding-constrained [2] they'd get more done if they had more funding. "Effective Giving Spotlight" week will feature content on how we can grow the amount of funding going to effective projects aimed at improving the world. ef Consider participating! You can write posts (or link-posts , including things like reviews of classic writing on the topic), comment on others' posts, or share the event. You might want to write about: Your experience donating, fundraising, earning to give, etc. - lessons, things you've changed your mind on, postmortems Thoughts on who should consider earning to give Uncertainties you have about where to donate What effective projects would be particularly useful … or anything else related to effective giving. If you're not sure if something would be useful or relevant, please feel free to reach out to me or to the Forum team. We'll post smaller announcements with more details about the other themes. Participate in the Donation Election In December, Forum users [1] will vote on how the Donation Election Fund should be allocate...
undefined
Nov 2, 2023 • 2min

LW - Estimating effective dimensionality of MNIST models by Arjun Panickssery

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Estimating effective dimensionality of MNIST models, published by Arjun Panickssery on November 2, 2023 on LessWrong. The local learning coefficent λ is a measure of a model's "effective dimensionality" that captures its complexity (more background here ). Lau et al recently described a sampling method (SGLD) using noisy gradients to find a stochastic estimate ^ λ in a computationally tractable way (good explanation here ): I present results ( Github repo ) of the "Task Variability" project suggested on the DevInterp project list . To see how degeneracy scales with task difficulty and model class, I trained a fully-connected MLP and a CNN (both with ~120k parameters) on nine MNIST variants with different subsets of the labels (just the labels {0, 1}, then {0, 1, 2}, etc.). All models were trained to convergence using the same number of training data points. I implement Lau et al's algorithm on each of the trained models. The results below are averaged over three runs: The results for the full MNIST dataset are comparable to Lau et al's results while using models ten times smaller trained for ten times fewer epochs. The sampling method is finicky and sensitive to the hyperparameter choices of learning rate and noise factor. It will fail by producing negative or very high ^ λ values if the noise ϵ and the distance penalty γ (see lines 6 and 7 in the pseudocode above) isn't calibrated to the model's level of convergence. The results show a linear scaling law relating number of labels to task complexity. The CNN typically has a lower ^ λ than the MLP, which matches intuitions that some of the complexity is "stored" in the architecture because the convolutions apply a useful prior on functions good at solving image recognition tasks. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Nov 2, 2023 • 4min

AF - Mech Interp Challenge: November - Deciphering the Cumulative Sum Model by TheMcDouglas

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Mech Interp Challenge: November - Deciphering the Cumulative Sum Model, published by TheMcDouglas on November 2, 2023 on The AI Alignment Forum. I'm writing this post to discuss solutions to the October challenge, and present the challenge for this November. If you've not read the first post in this sequence , I'd recommend starting there - it outlines the purpose behind these challenges, and recommended prerequisite material. November Problem The problem for this month is interpreting a model which has been trained to classify the cumulative sum of a sequence. The model is fed sequences of integers, and is trained to classify the cumulative sum at a given sequence position. There are 3 possible classifications: 0 (if the cumsum is negative), 1 (if the cumsum is zero), 2 (if the cumsum is positive). For example, if the sequence is: Then the classifications would be: The model is not attention only . It has one attention layer with a single head, and one MLP layer. It does not have layernorm at the end of the model. It was trained with weight decay, and an Adam optimizer with linearly decaying learning rate. I don't expect this problem to be as difficult as some of the others in this sequence, however the presence of MLPs does provide a different kind of challenge. You can find more details on the Streamlit page . Feel free to reach out if you have any questions! October Problem - Solutions In the second half of the sequence, the attention heads perform the algorithm "attend back to (and copy) the first token which is larger than me". For example, in a sequence like: we would have the second 3 token attending back to the first 5 token (because it's the first one that's larger than itself), the second 5 attending back to 7, etc. The SEP token just attends to the smallest token. Some more refinements to this basic idea: The two attending heads split responsibilities across the vocabulary. Head 0.0 is the less important head; it deals with values in the range 28-37 (roughly). Head 0.1 deals with most other values. In subsequences x < y < z where the three numbers are close together, x will often attend to z rather than to y . So why isn't this an adversarial example, i.e. why does the model still correctly predict y follows x ? Answer - the OV circuit shows that when we attend to source token s , we also boost things slightly less thn s , and suppress things slightly more than s . So in the case of x < y < z , we have: Attention to y will boost y a lot, and suppress z a bit. Attention to z will boost z a lot, and boost y a bit. So even if z gets slightly more attention than y , it might still be the case that y gets predicted with higher probability. Sequences with large jumps are adversarial examples (because they're rare in the training data, which was randomly generated from choosing subsets without replacement). Best Submissions We received more submissions for this month's problem than any other in the history of the series, so thanks to everyone who attempted! The best solution to this problem was by Vlad K , who correctly identified the model's tendency to produce unexpected attention patterns when 3 numbers are close together, and figured out how the model manages to produce correct classifications anyway. Best of luck for this and future challenges! Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
undefined
Nov 2, 2023 • 6min

EA - Are 1-in-5 Americans familiar with EA? by David Moss

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Are 1-in-5 Americans familiar with EA?, published by David Moss on November 2, 2023 on The Effective Altruism Forum. YouGov recently reported the results of a survey (n=1000) suggesting that about "one in five (22%) Americans are familiar with effective altruism." [1] We think these results are exceptionally unlikely to be true. Their 22% figure is very similar to the proportion of Americans we previously found claim to have heard of effective altruism (19%) in our earlier survey (n=6130). But, after conducting appropriate checks, we estimated that much lower percentages are likely to have genuinely heard of EA [2] (2.6% after the most stringent checks, which we speculate is still likely to be somewhat inflated [3] ). Is it possible that these numbers have simply dramatically increased following the FTX scandal? Fortunately, we have tested this with multiple followup surveys explicitly designed with this possibility in mind. [4] In our most recent survey (conducted October 6th [5] ), we estimated that approximately 16% (13.0%-20.4%) of US adults would claim to have heard of EA. Yet, when we add in additional checks to assess whether people appear to have really heard of the term, or have a basic understanding of what it means, this estimate drops to 3% (1.7% to 4.4%), and even to approximately 1% with a more stringent level of assessment. [6] These results are roughly in line with our earlier polling in May 2022, as well as additional polling we conducted between May 2022 and October 2023, and do not suggest any dramatic increase in awareness of effective altruism, although assessing small changes when base rates are already low is challenging. We plan to continue to conduct additional surveys, which will allow us to assess possible changes from just before the trial of Sam Bankman-Fried to after the trial. Attitudes towards EA YouGov also report that respondents are, even post-FTX, overwhelmingly positive towards EA, with 81% of those who (claim to) have heard of EA approving or strongly approving of EA. Fortunately, this positive view is broadly in line with our own findings- across different ways of breaking down who has heard of EA and different levels of stringency- which we aim to report on separately at a later date. However, our earlier work did find that awareness of FTX was associated with more negative attitudes towards EA. Conclusions The point of this post is not to criticise YouGov in particular. However, we do think it's worth highlighting that even highly reputable polling organizations should not be assumed to be employing all the additional checks that may be required to understand a particular question. This may apply especially in relation to niche topics like effective altruism, or more technical topics like AI, where additional nuance and checks may be required to assess understanding. ^ Also see this quick take . ^ There are many reasons why respondents may erroneously claim knowledge of something. But simply put, one reason is that people like demonstrating their knowledge, and may err on the side of claiming to have heard of something even if they are not sure. Moreover, if the component words that make up a term are familiar, then the respondent may either mistakenly believe they have already encountered the term, or think it is sufficient that they believe they can reasonably infer what the term means from its component parts to claim awareness (even when explicitly instructed not to approach the task this way!). Some people also appear to conflate the term with others - for example, some amalgamation of inclusive fitness/reciprocal altruism appears quite common. For reference, over 12% of people claim to have heard of the specific term "Globally neutral advocacy": A term that the research team invented, which returns no google results as...
undefined
Nov 2, 2023 • 5min

LW - Snapshot of narratives and frames against regulating AI by Jan Kulveit

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Snapshot of narratives and frames against regulating AI, published by Jan Kulveit on November 2, 2023 on LessWrong. This is a speculative map of a hot discussion topic. I'm posting it in question form in the hope we can rapidly map the space in answers. Looking at various claims at X and at the AI summit, it seems possible to identify some key counter-regulation narratives and frames that various actors are pushing. Because a lot of the public policy debate won't be about "what are some sensible things to do" within a particular frame, but rather about fights for frame control, or "what frame to think in" , it seems beneficial to have at least some sketch of a map of the discourse. I'm posting this as a question with the hope we can rapidly map the space, and one example of a "local map": "It's about open source vs. regulatory capture" It seems the coalition against AI safety, most visibly represented by Yann LeCun and Meta, has identified "it's about open source vs. big tech" as a favorable frame in which they can argue and build a coalition of open-source advocates who believe in the open-source ideology, academics who want access to large models, and small AI labs and developers believing they will remain long-term competitive by fine-tuning smaller models and capturing various niche markets. LeCun and others attempt to portray themselves as the force of science and open inquiry , while the scaling labs proposing regulation are the evil big tech attempting regulatory capture . Because this seems to be the prefered anti-regulation frame, I will spend most time on this. Apart from the mentioned groups, this narrative seems to be memetically fit in a "proudly cynical" crowd which assumes everything everyone is doing or saying is primarily self-interested and profit-driven. Overall, the narrative has clear problems with explaining away inconvenient facts, including: Thousands of academics calling for regulation are uncanny counter-evidence for x-risk being just a ploy by the top labs. The narrative strategy seems to explain this by some of the senior academics just being deluded, and others also pursuing a self-interested strategy in expectation of funding. Many of the people explaining AI risk now were publicly concerned about AI risk before founding labs, and at times when it was academically extremely unprofitable, sometimes sacrificing standard academic careers. The narrative move is to just ignore this. Also, many things are just assumed - for example, if the resulting regulation would be in the interest o frontrunners. What could be memetically viable counter-arguments within the frame? Personally, I tend to point out that motivation to avoid AI risk is completely compatible with self-interest. Leaders of AI labs also have skin in the game. Also, recently I try to ask people to use the explanatory frame of 'cui bono' also to the other side, namely, Meta. One possible hypothesis here is Meta just loves open source and wants everyone to flourish. A more likely hypothesis is Meta wants to own the open-source ecosystem. A more complex hypothesis is Meta doesn't actually love open source that much but has a sensible, self-interested strategy, aimed at a dystopian outcome. To understand the second option, it's a prerequisite to comprehend the "commoditize the complement" strategy. This is a business approach where a company aims to drive down the cost or increase the availability of goods or services complementary to its own offerings. The outcome is an increase in the value of the company's services. Some famous successful examples of this strategy include Microsoft and PC hardware: PC hardware became a commodity, while Microsoft came close to monopolizing the OS, extracting huge profits. Or, Apple's App Store: The complement to the phone is the apps. Apps have becom...
undefined
Nov 2, 2023 • 4min

LW - Public Weights? by jefftk

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Public Weights?, published by jefftk on November 2, 2023 on LessWrong. While this is close to areas I work in, it's a personal post. No one reviewed this before I published it, or asked me to (or not to) write something. All mistakes are my own. A few days ago, some of my coworkers at SecureBio put out a preprint, "Will releasing the weights of future large language models grant widespread access to pandemic agents?" ( Gopal et. al 2023 ) They took Facebook/Meta's Llama-2-70B large language model (LLM) and (cheaply!) adjusted it to remove the built in safeguards, after which it was willing to answer questions on how to get infectious 1918 flu . I like a bunch of things about the paper, but I also think it suffers from being undecided on whether it's communicating: Making LLMs public is dangerous because by publishing the weights you allow others to easily remove safeguards. Once you remove the safeguards, current LLMs are already helpful in getting at the key information necessary to cause a pandemic. I think it demonstrates the first point pretty well. The main way we avoid LLMs from telling people how to cause harm is to train them on a lot of examples of someone asking how to cause harm and being told "no", and this can easily be reversed by additional training with "yes" examples. So even if you get incredibly good at this, if you make your LLM public you make it very easy for others to turn it into something that compliantly shares any knowledge it contains. Now, you might think that there isn't actually any dangerous knowledge, at least not within what an LLM could have learned from publicly available sources. I think this is pretty clearly not true: the process of creating infectious 1918 flu is scattered across the internet and hard for most people to assemble. If you had an experienced virologist on call and happy to answer any question, however, they could walk you there through a mixture of doing things yourself and duping others into doing things. And if they were able to read and synthesize all virology literature they could tell you how to create things quite a bit worse than this former pandemic. GPT-4 is already significantly better than Llama-2, and GPT-5 in 2024 is more likely than not . Public models will likely continue to move forward, and while it's unlikely that we get a GPT-4 level Llama-3 in 2024 I do think the default path involves very good public models within a few years. At which point anyone with a good GPU can have their own personal amoral virologist advisor. Which seems like a problem! But the paper also seems to be trying to get into the question of whether current models are capable of teaching people how to make 1918 flu today. If they just wanted to assess whether the models were willing and able to answer questions on how to create bioweapons they could have just asked it. Instead, they ran a hackathon to see whether people could, in one hour, get the no-safeguards model to fully walk them through the process of creating infectious flu. I think the question of whether LLMs have already lowered the bar for causing massive harm through biology is a really important one, and I'd love to see a follow-up that addressed that with a no-LLM control group. That still wouldn't be perfect, since outside the constraints of a hackathon you could take a biology class, read textbooks, or pay experienced people to answer your questions, but it would tell us a lot. My guess is that the synthesis functionality of current LLMs is actually adding something here and a no-LLM group would do quite a bit worse, but 83% of people seem to disagree with me: Even if no-safeguards public LLMs don't lower the bar today, and given how frustrating Llama-2 can be this wouldn't be too surprising, it seems pretty likely we get to where they do significantly lower...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app