The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Jan 2, 2024 • 6min

LW - Bayesian updating in real life is mostly about understanding your hypotheses by Max H

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bayesian updating in real life is mostly about understanding your hypotheses, published by Max H on January 1, 2024 on LessWrong. My sense is that an increasingly common viewpoint around here is that the last ~20 years of AI development and AI x-risk discourse are well-described by the following narrative: Eliezer Yudkowsky (and various others who were at least initially heavily influenced by his ideas) developed detailed models of key issues likely to be inherent in the process of developing smarter-than-human AI. These models were somewhere between "maybe plausible" and "quite compelling" at the time that they were put forth, but recent developments in AI (e.g. behavioral characteristics of language models, smoothness / gradualness of scaling) have shown that reality just isn't panning out in quite the way Eliezer's models predicted. These developments haven't entirely falsified Eliezer's models and key predictions, but there are now plenty of alternative models and theories. Some or all of these competing models either are or claim to: have a better recent track record of predicting near-term AI developments better retrodict past developments[1] be backed by empirical results in machine learning and / or neuroscience feel more intuitively plausible and evidence-backed to people with different backgrounds and areas of expertise Therefore, even if we can't entirely discount Eliezer's models, there's clearly a directional Bayesian update which any good Bayesian (including Eliezer himself) should be able to make by observing recent developments and considering alternate theories which they support. Even if the precise degree of the overall update (and the final landing place of the posterior) remains highly uncertain and debatable, the basic direction is clear. Without getting into the object-level too much, or even whether the narrative as a whole reflects the actual views of particular real people, I want to make some remarks on the concept of belief updating as typically used in narratives like this. Note, there's a sense in which any (valid) change in one's beliefs can be modeled as a Bayesian update of some kind, but here I am specifically referring to the popular rationalist practice of thinking and communicating explicitly in terms of the language of probabilities and likelihood ratios. There are some questionable assumptions embedded in (what I suspect are) common views of (a) how the updating process is supposed to work in general and (b) how to apply the process validly to the particular case of updating one's models of AI development and x-risk. When such views are expressed implicitly in the context of a sentiment that "updating" is broadly virtuous / desirable / correct, I find that there tends to be a lot of gloss over important caveats and prerequisites that keep the underlying mental motion tethered to reality - that is, ensure it remains a systematic (if rough and approximate) method for valid reasoning under uncertainty. The rest of this post is a review of some of the key concepts and requirements for Bayesian updating to work as intended, with some examples and non-examples of how these requirements can fail to be met in practice. My conclusion is not that the practice of explicit Bayesian updating is inherently flawed, but that it must be applied with attention to the preconditions and assumptions firmly in mind at all times. Local validity at each step must be tracked strictly and adhered to closely enough to ensure that the process as a whole actually holds together as a method for systematically minimizing expected predictive error. Further, I think that most of the utility of explicit reasoning and communication in Bayesian terms derives not from the end result (whether that end result is a precise numerical posterior probability or just a rou...

Jan 1, 2024 • 5min

AF - Mech Interp Challenge: January - Deciphering the Caesar Cipher Model by CallumMcDougall

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Mech Interp Challenge: January - Deciphering the Caesar Cipher Model, published by CallumMcDougall on January 1, 2024 on The AI Alignment Forum. I'm writing this post to discuss solutions to the November challenge, and present the challenge for this January. If you've not read the first post in this sequence, I'd recommend starting there - it outlines the purpose behind these challenges, and recommended prerequisite material. January Problem The problem for this month is interpreting a model which has been trained to classify a sequence according to the Caeser cipher shift value which was used to encode it. The sequences have been generated by taking English sentences containing only lowercase letters & punctuation, and choosing a random value X between 0 and 25 to rotate the letters (e.g. if the value was 3, then a becomes d, b becomes e, and so on, finishing with z becoming c). The model was trained using cross entropy loss to predict the shift value X for the text it's been fed, at every sequence position (so for a single sequence, the correct value will be the same at every sequence position, but since the model has bidirectional attention, it will find it easier to predict the value of X at later sequence positions). There are 3 different modes to the problem, to give you some more options! Each mode corresponds to a different dataset, but the same task & same model architecture. Easy mode In easy mode, the data was generated by: Choosing the 100 most frequent 3-letter words in the English Language (as approximated from a text file containing the book "Hitchhiker's Guide To The Galaxy") Choosing words from this len-100 list, with probabilities proportional to their frequency in the book Separating these words with spaces The model uses single-character tokenization. The vocabulary size is 27: each lowercase letter, plus whitespace. Medium mode This is identical to easy, the only difference is that the words are drawn from this len-100 list uniformly, rather than according to their true frequencies. Hard mode In hard mode, the data was generated from random slices of OpenWebText (i.e. natural language text from the internet). It was processed by converting all uppercase characters to lowercase, then removing all characters except for the 26 lowercase letters plus the ten characters "\n .,:;?!'" (i.e. newline, space, and 8 common punctuation characters). In all 3 modes, the model's architecture is the same, and it was trained the same way. The model is attention only. It has 2 attention layers, with 2 heads per layer. It was trained with weight decay, and an Adam optimizer with linearly decaying learning rate. I don't expect this problem to be as difficult as some of the others in this sequence, however the presence of MLPs does provide a different kind of challenge. You can find more details on the Streamlit page, or this Colab notebook. Feel free to reach out if you have any questions! November Problem - Solutions The single attention head implements uniform attention to all previous tokens in the sequence. The OV matrix is essentially one-dimensional: it projects each token with value s onto su, where u is some vector in the residual stream learned by the model. The component of the residual stream in this direction then represents the cumulative mean (note, the cumulative mean rather than the cumulative sum, because attention is finite - for example, we expect the component to be the same after the sequences (1, 1, 2) and (1, 1, 2, 1, 1, 2) because net attention to each different token value will be the same). The model's "positive cumsum prediction direction" aligns closely with u, and vice-versa for the "negative cumsum prediction direction" - this allows the model to already get >50% accuracy before the MLP even comes into play. But without the MLP, the mod...

Jan 1, 2024 • 6min

LW - 2023 in AI predictions by jessicata

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 2023 in AI predictions, published by jessicata on January 1, 2024 on LessWrong. Lots of people have made AI predictions in 2023. Here I compile a subset. I have a habit of setting an email reminder for the date of the prediction, when I see AI predictions, so that when they are resolved I can point out their accuracy or inaccuracy. I have compiled most of the email reminders from 2023 in chronological format (predictions with an early to late target date). I'm planning to make these posts yearly, checking in on predictions whose date has expired. Feel free to add more references to predictions made in 2023 to the comments. In some cases people are referring to the predictions of others in a way that could be taken to imply that they agree. This is not a certain interpretation, but I'm including them for the sake of completeness. March 2024 the gears to ascension: "Hard problem of alignment is going to hit us like a train in 3 to 12 months at the same time some specific capabilities breakthroughs people have been working on for the entire history of ML finally start working now that they have a weak AGI to apply to, and suddenly critch's stuff becomes super duper important to understand." October 2024 John Pressman: "6-12 month prediction (80%): The alignment problem as the core of AI X-Risk will become a historical artifact as it's largely solved or on track to being solved in the eyes of most parties and arguments increasingly become about competition and misuse. Few switch sides." July 2025 Jessica Taylor: "Wouldn't be surprised if this exact prompt got solved, but probably something nearby that's easy for humans won't be solved?" The prompt: "Find a sequence of words that is: - 20 words long - contains exactly 2 repetitions of the same word twice in a row - contains exactly 2 repetitions of the same word thrice in a row" (note: thread contains variations and a harder problem.) November 2026 Max Tegmark: "It's crazy how the time left to weak AGI has plummeted from 20 years to 3 in just 18 months on http://metaculus.com. So you better stop calling AGI a 'long-term' possibility, or someone might call you a dinosaur stuck in the past" The Metaculus question. Siqi Chen: "what it means is within 3 years you will either be dead or have a god as a servant". Elon Musk: "If you say 'smarter than the smartest human at anything'? It may not quite smarter than all humans - or machine-augmented humans, because, you know, we have computers and stuff, so there's a higher bar... but if you mean, it can write a novel as good as JK Rowling, or discover new physics, invent new technology? I would say we are less than 3 years from that point." December 2026 Jai Bhavnani: "Baseline expectation: 90%+ of smart contracts will get exploited in the next 3 years. These exploits will be found by AIs. We need solutions." October 2028 Stuart Russell: "Everyone has gone from 30-50 years, to 3-5 years." November 2028 Tammy: "when i say 'we have approximately between 0 and 5 years' people keep thinking that i'm saying 'we have approximately 5 years'. we do not have approximately 5 years. i fucking wish. we have approximately between 0 and 5 years. we could actually all die of AI next month." December 2028 Tyler John: "Yep. If discontinuous leaps in AI capabilities are 3-5 years away we should probably start to think a little bit about how to prepare for that. The EU AI Act has been in development for 5 years and still isn't passed yet. We just can't take the wait and see approach any longer." Mustafa Stuleyman: "[Current models have already] ... arguably passed the Turing Test. I've proposed a test which involves [AIs] going off and taking $100,000 investment, and over the course of three months, try to set about creating a new product, researching the market, seeing what consumers might like, gen...

Jan 1, 2024 • 17min

LW - Planning to build a cryptographic box with perfect secrecy by Lysandre Terrisse

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Planning to build a cryptographic box with perfect secrecy, published by Lysandre Terrisse on January 1, 2024 on LessWrong. Summary Since September 2023, I started learning a lot of math and programming skills in order to develop the safest cryptographic box in the world (and yes, I am aiming high). In these four months, I learned important things you may want to know: Fully Homomorphic Encryption (FHE) schemes with perfect secrecy do exist. These FHE schemes do not need any computational assumption. These FHE schemes are tractable (in the worst case, encrypting a program before running it makes it three times slower). We can therefore run infinitely dangerous programs without obtaining any information about them or their outputs. This may be useful in order to run a superintelligence without destroying the world. However, these schemes work only on quantum computers. In this post, I will firstly talk about how I learned about this FHE scheme, then I will explain my plan for making this cryptographic box, and finally, I will mention some ethical concerns about this cryptographic box. Before reading this post, I recommend you to read this post by Paul Christiano, and the comments that go with it. These are very informative, and they sharpened my views for this project. Paul Christiano presents a way to extract a friendly AI from an unfriendly one. This being only one example of what can be done with a cryptographic box, I will mostly consider cryptographic boxes as a solution to a problem that I call the malign computation problem. Introduction In August 2022, I started reading AGI Safety Literature Review. At one point, the authors tell this: One way to box an AGI is to homomorphically encrypt it. Trask (2017) shows how to train homomorphically encrypted neural networks. By homomorphically encrypting an AGI, its predictions and actions also come out encrypted. A human operator with the secret key can choose to decrypt them only when he wants to. When I have read this for the first time, I told myself that I should check this work because it seemed important. And then I completely forgot about it. Then, in April 2023, during a PHP lesson, I realized that the problem of processing a request made by a malevolent user is similar to the problem of boxing a superintelligence. After the lesson, I asked the teacher how to prevent code injections, and he gave me two answers: Do not show your code to the public. This answer didn't convince me, because even current hackers know how to go around this precaution. Encrypt the request before processing it. This is the moment I remembered the quote from AGI Safety Literature Review. After looking back at every note that I made about AI Safety, I managed to find back the work made by Trask. Trask's work Trask's post shows how to build an encrypted AI using the Efficient Integer Vector Homomorphic Encryption. However, since this scheme (along with every other FHE scheme I know about on classical computers) relies on computational assumptions, we have some problems: The scheme may not be safe. A computational assumption consists of stating "There is no efficient way to solve this problem". However, we do not know how to prove any such statement, as this would solve the PNP problem. Most FHE schemes (including this one) depend on the Learning With Errors (LWE) problem. Although LWE is quite secure for the moment, I won't bet the existence of all life on Earth on it. Similarly, I won't bet the safety of a superintelligence on it. This scheme takes too long to compute. In practice, the first superintelligence will probably have more than a hundred billion weights and biases, making this scheme very expensive or even unusable. This scheme isn't fully homomorphic. Basically, a cryptographic scheme is said to be homomorphic when we can run s...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

The Nonlinear Library

Episodes

Mentioned books

LW - Stop talking about p(doom) by Isaac King

EA - EA Barcelona: Our first year of impact by Melanie Brennan

LW - Gentleness and the artificial Other by Joe Carlsmith

LW - Apologizing is a Core Rationalist Skill by johnswentworth

LW - Boston Solstice 2023 Retrospective by jefftk

AF - Steering Llama-2 with contrastive activation additions by Nina Rimsky

LW - Bayesian updating in real life is mostly about understanding your hypotheses by Max H

AF - Mech Interp Challenge: January - Deciphering the Caesar Cipher Model by CallumMcDougall

LW - 2023 in AI predictions by jessicata

LW - Planning to build a cryptographic box with perfect secrecy by Lysandre Terrisse

The AI-powered Podcast Player