The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Nov 9, 2023 • 14min

AF - Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models by Felix Hofstätter

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models, published by Felix Hofstätter on November 8, 2023 on The AI Alignment Forum. This post summarizes work done over the summer as part of the Summer 2023 AI Safety Hub Labs programme. Our results will also be published as part of an upcoming paper. In this post, we focus on explaining how we define and evaluate properties of deceptive behavior in LMs and present evaluation results for state-of-the-art models. The work presented in this post was done by the AISHL group of Felix Hofstätter, Harriet Wood, Louis Thomson, Oliver Jaffe, and Patrik Bartak under the supervision of Francis Rhys Ward and with help from Sam Brown (not an AISHL participant). 1 Introduction For a long time, the alignment community has discussed the possibility that advanced AI systems may learn to deceive humans. Recently, the evidence has grown that language models (LMs) deployed in real life may be capable of deception. GPT-4 pretended to be a visually impaired person to convince a human to solve a CAPTCHA. Meta's Diplomacy AI Cicero told another player it was on the phone with its girlfriend after a server outage. Many large models are sycophantic, adapting their responses based on a user's preferences, even if that means being less truthful. Recent work shows that LMs lie and can be prompted to act deceptively. To make sure that advanced AI systems are safe, ideally, we would like to be able to evaluate if they are deceptive. Our work proposes purely behavioral methods for evaluating properties related to deception in language models. We use them to demonstrate scaling trends for these properties. The examples above intuitively feel like deception, but it is unclear if they would fit a more rigorous definition. In Section 2, we will explain the operationalization of deception that we are using and discuss related work. This formalization requires assessing the agency, beliefs, and intentions of LMs. We argue that consistency of beliefs is an important property of agency. We also briefly argue that the LMs we evaluate act with intent but leave a proper evaluation of LM intent to future work. In Section 3, we define belief behaviorally and propose different methods for eliciting a model's beliefs. Building on this definition, we operationalize lying in LMs. In Section 4, we present the results of applying our belief evaluations to state-of-the-art LMs. We show that when more compute is spent on either training or inference, LMs demonstrate more consistent beliefs, thus suggesting they are more coherently agentic. Section 5 deals with how LMs may learn to lie. We show quantitatively that if LMs are trained by systematically biased evaluators, they learn to output targeted falsehoods that exploit the evaluator's bias, even if their training objective is seemingly benign. We also evaluate the LMs' beliefs qualitatively to argue that they do not believe the falsehoods they tell, making them deceptive. 2 Background and Related Work Deception In previous work, Ward et al. have proposed the following definition of deception for AI agents. Definition An agent T deceives an agent S, if T intentionally causes S to believe something false, which T does not believe. Ward et al. formalize this in the setting of Structural Causal Games (SCG), where an agent's beliefs and intentions can be determined by considering graphical and algebraic criteria.[1] However, ascribing beliefs, intentions, and agency to LMs is contentious, and the extent to which the formalization applies is unclear. In this section, we cover related work on these concepts. Agency Past work in epistemology, the philosophy of animal beliefs, and AI argues that a key property of agents is that they have, to some degree, consistent beliefs. In our work, we ...

Nov 8, 2023 • 11min

EA - Why Certify? Aquatic Life Institute's Impact Implementation Via Seafood Certification by Tessa @ ALI

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why Certify? Aquatic Life Institute's Impact Implementation Via Seafood Certification, published by Tessa @ ALI on November 8, 2023 on The Effective Altruism Forum. Snapshot Aquatic Life Institute has recently launched the second edition of the Aquaculture Certification Schemes Benchmark . These schemes collectively certify, at minimum, 773 million fishes and 10.5 billion shrimps annually. For every $1 of funding received, we have potentially helped improve the lives of 5,423 fish and 221,343 shrimp directly through our engagement with these certifiers. 444: Four R's for Formative Change In 2019, Aquatic Life Institute (ALI) embarked on a journey to reduce the suffering for trillions of aquatic animals in the global food system each year. This past September, ALI established 4 Key Principles to help guide our interventions for systemic transformation in aquatic animal welfare that are used to filter organizational priorities: Reduce the number of animals in, or remove animals from, the seafood system and its supply chain. Refine the conditions in which animals are currently kept or captured in the seafood system and its supply chain. Replace animal products with sustainable plant-based or cell-based alternatives to the extent possible in the seafood system and its supply chain. Reject the introduction of additional animals into the seafood system and its supply chain. In alignment with these 4 principles, we have worked with seafood certifications for years, building relationships and fostering change via our Certifier Campaign. Certification Landscape Between 51 and 167 billion farmed fish [1] are produced annually from global aquaculture operations. Although there are examples of good welfare practices in aquaculture, the concept of what officially constitutes "humanely-raised fish" or a "high welfare seafood product" is still largely undefined worldwide by the public, industry, animal welfare organizations, and most governments. As institutions certifying aquatic animal products begin incorporating positive welfare standards into their seafood labeling programs, they must diligently define high welfare products based on the best available scientific evidence rather than rely on subpar industry norms. "Humanely-raised" aquaculture standards must include more than just stunning before slaughter; they should consider animal welfare conditions throughout the stages of their lives in production. The farmed aquatic animals living in aquaculture facilities at any given time need to be prioritized. Aquaculture standards must also account for additional aquatic animals not directly used for human consumption, such as animals reduced to fishmeal and fish oil ingredients, cleaner fish, and broodstock. Consumers turn to seafood labeling schemes for guidance to avoid purchasing products that conflict with sustainable and humane practices. More than 100 certifications and ratings programs of one type or another are currently in use by the seafood industry. [2], and volumes of certified farmed fish and shellfish constitute about 8% of global aquaculture production [3]. The amount of certified aquatic animal products is only expected to increase. There is no evidence that certification will be phased out anytime in the near future, given consumers' increasing demand for sustainable seafood and the absence of a better alternative [4]. Some schemes are reporting notable growth and others are discussing the aggressive expansion of their operations to certify a greater number of seafood products in various regions. However, many of these labels lack explicit considerations for positive animal welfare or fail to provide adequate protections. Through our Certifier Campaign, we aim to hold seafood certification standards accountable and highlight the schemes that provide the most robust...

Nov 8, 2023 • 2min

EA - AMA: Ben West, former startup founder and EtGer by Ben West

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AMA: Ben West, former startup founder and EtGer, published by Ben West on November 8, 2023 on The Effective Altruism Forum. Hey everyone! I'm Ben, and I will be doing an AMA for Effective Giving Spotlight week. Some of my relevant background: In 2014 I cofounded a company for earning to give (EtG) reasons (largely inspired by 80k), which was later successfully acquired. Since late 2018 I have been doing direct work, currently as Interim Managing Director of CEA. (With a brief side project of founding a TikTok-related company which was similarly acquired, albeit for way less money.) I've had some other EtGish work experience (eight years as a software developer/middle manager, a couple months at Alameda Research) as well Additionally, I've talked to some people deciding between EtG and direct work because of my standing offer to talk to such folks, so I might have cached thoughts on some questions. You might want to ask me about: Entrepreneurship Trade-offs between earning to give and "direct work" Cosmetics and skincare for those who (want to) look masculine TikTok Functional programming (particularly Haskell) Or one of my less useful projects Anything else (I might skip some questions) I will plan to answer questions Thursday, November 9th. Post them as comments on this thread. See also Jeff's AMA, which is on a similar topic. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Nov 8, 2023 • 15min

LW - The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs by Quentin FEUILLADE--MONTIXI

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs, published by Quentin FEUILLADE--MONTIXI on November 8, 2023 on LessWrong. This post is part of a sequence on Model Psychology.@Pierre Peigné wrote the details section in argument 3 and the other weird phenomenon. The rest is written in the voice of @Quentin FEUILLADE--MONTIXI Intro Before diving into what model psychology is, it is crucial to clarify the nature of the subject we are studying. In this post, I'll challenge the commonly debated stochastic parrot hypothesis for state-of-the-art large language models (GPT-4), and in the next post, I'll shed light on the foundations from which I am building model psychology. The stochastic parrot hypothesis suggests that LLMs, despite their remarkable capabilities, don't truly comprehend language. They are like mere parrots, replicating human speech patterns without truly grasping the essence of the words they utter. While I previously thought this argument had faded into oblivion, I often find myself in prolonged debates about why current SOTA LLMs surpass this simplistic view. Most of the time, people argue using examples of GPT3.5 and aren't aware of GPT-4's prowess. Through this post, I am presenting my current stance, using model psychology tools, against that hypothesis. Let's delve into the argument. Central to our debate is the concept of a "world model". A world model represents an entity's internal understanding and representation of the external environment they live in. For humans, it's our understanding of the world around us, how it works, how concepts interact with each other, and our place within it. The stochastic parrot hypothesis challenges the notion that LLMs possess a robust world model. It suggests that while they might reproduce language with impressive accuracy, they lack a deep, authentic understanding of the world and its nuances. Even if they have a good representation of the shadows on the wall (text), they don't truly understand the processes that lead to those shadows, and the objects from which they are cast (real world). Yet, is this truly the case? While it is hard to give a definitive proof, it is possible to find pieces of evidence hinting at a robust representation of the real world. Let's go through four of them.[1]Argument 1: Drawing and "Seeing" GPT-4 is able to draw AND see in SVG (despite having never seen as far as I know) with an impressive proficiency. SVG (Scalable Vector Graphics) defines vector-based graphics in XML format. To put it simply, it's a way to describe images using a programming language. For instance, a blue circle would be represented as: in a .svg file.Drawing GPT-4 can produce and edit SVG representations through abstract instructions (like "Draw me a dog", "add black spots on the dog", … ). GPT-4 drawing a cute shoggoth with a mask: "Seeing" More surprising, GPT-4 can also recognize complex objects by looking only at the code of the SVG, without having ever been trained on any images[2] (AFAIK) I first generated an articulated lamp and a rendition of the three wise apes with GPT-4 using the same method as above. Then, I sent the code of the SVG, and asked GPT-4 to guess what the code was drawing. GPT-4 guessed the articulated lamp (although it thought it was a street light.[3]): And the rendition of the three wise apes (It can also recognize a car, a fountain pen, and a bunch of other simple objects[4]) The ability of seeing is interesting because it means that it has some kind of internal representation of objects and concepts that it is able to link to abstract visuals despite having never seen them before.Pinch of salt It's worth noting that these tests were done on a limited set of objects. Further exploration would be beneficial, maybe with an objective scale for SVG diffi...

Nov 8, 2023 • 14min

LW - The Perils of Professionalism by Screwtape

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Perils of Professionalism, published by Screwtape on November 8, 2023 on LessWrong. Professionalism is a useful trait to be able to display, but it isn't an unalloyed good. This essay is attempting to outline why deliberately not signaling professionalism may be useful for you. First, a definition by example for professionalism: Clean button down shirts with solid colour ties, blazers or suit jackets, clean shaven beards, hair tied up in a bun without flyers, beige or gray or at least solid colour cars and desks and walls, even toned voices with just enough of a hint of emotion not to sound at all robotic or unempathetic. It's not the Professional Managerial Class but they (particularly as Patrick McKenzie sometimes describes them) are often its exemplars. I. The word "professional" is defined as a person engaged in a specific activity as one's main paid occupation. It contrasts straightforwardly with "amateur," a person who engages in an activity on an unpaid basis. Notably, "amateur" can also mean someone who is incompetent at a particular activity. As a point of language, we conflate skill and getting paid, and we do this in both directions. If you want to get paid for doing something, you want to learn to do it professionally. Doing something professionally often includes adjacent but not obviously synonymous skills. Some of these are very closely adjacent; I have been a professional software engineer and I have been involved in hiring professional software engineers, and if you don't know how to use source control as a software engineer then you want to learn to use source control. Yes, I know it's not a cool new algorithm. Yes, I know the end user will never see it. Trust me, you're going to use it. Some of the expected skills of a professional are less about the core skill of the job, and more about the frame of the job. "Being on time" and "dressing appropriately" and "conducting yourself properly" are all often given as examples of professional skills which apply in a wide range of fields. Put bluntly, if you're going to interact with a customer especially in a white collar job, it helps to not have facial tattoos and to not swear casually. We seem to have drifted very quickly into something that seemingly has almost no bearing on your ability to do the actual job at hand! Nevertheless, I expect pretty much every career coach in the western world to back me up on the main points here. I first successfully traded money for software when I was around thirteen years old, and while I have gotten better at writing software over the intervening mumble mumble years I have improved even more in my ability to present myself as a Professional Software Engineer. II. Lets talk about my first professional software engineering project. (Here I'm using "professional" to mean "I got paid for it." As you're about to find out, it was unprofessional in almost every other sense of the word.) As best I remember it, the job went something like this. A friend of my mother's heard that I was "good with computers" and asked me if I knew how to build a website. I did as a matter of fact, having recently managed to get my own Apache server running. She said that her organization needed a website where they could announce their events and where people could learn about the organization, and would I be willing to build that for an amount of money that equaled several months' allowance. I said sure, and asked her a bunch of questions about what needed to be on the website. A week later when I unveiled it, she sounded delighted with it, made a handful of corrections to the text, and I showed her how to add new events. This next paragraph describing the website will be pure jargon if you aren't at least a little bit of a web developer. If it doesn't make sense, just skip it and underst...

Nov 8, 2023 • 15min

EA - 10 years of Earning to Give by AGB

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 10 years of Earning to Give, published by AGB on November 8, 2023 on The Effective Altruism Forum. General note: The bulk of this post was written a couple of months ago, but I am releasing it now to coincide with the Effective Giving Spotlight week. I shortly expect to release a second post documenting some observations on the community building funding landscape. Introduction Way back in 2010, I was sitting in my parents' house, watching one of my favourite TV shows, the UK's Daily Politics. That day's guest was an Oxford academic by the name of Toby Ord. He was donating everything above 18000 (26300 in today's money) to charity, and gently pushing others to give 10%. "Nice guy," I thought. "Pity it'll never catch on." Two years later, a couple of peers interned at Giving What We Can. At the same time, I did my own internship in finance, and my estimate of my earning potential quadrupled[1]. One year after that, I graduated and took the Giving What We Can pledge myself. While my pledge form read that I had committed to donate 20% of my income, my goal was to hit far higher percentages. How did that go? Post goals Earning To Give was one of EA's first ideas to get major mainstream attention, much of it negative. Some was mean-spirited, but some of it read to me as a genuine attempt to warn young people about what they were signing up for. For example, from the linked David Brooks piece: From the article, Trigg seems like an earnest, morally serious man... First, you might start down this course seeing finance as a convenient means to realize your deepest commitment: fighting malaria. But the brain is a malleable organ....Every hour you spend with others, you become more like the people around you. If there is a large gap between your daily conduct and your core commitment, you will become more like your daily activities and less attached to your original commitment. You will become more hedge fund, less malaria. There's nothing wrong with working at a hedge fund, but it's not the priority you started out with. At the time, EAs had little choice but to respond to such speculation with speculation of their own. At this point, I can at least answer how some things have played out for me personally. I have divided this post into reflections on my personal EtG path and on the EA community. My path First, some context. Over the past decade: My wife Denise and I have donated 1.5m.[2] This equates to 46% of our combined gross incomes.[2] The rest of the money is split 550k / 550k / 700k between spending / saving (incl. pension) / taxes.[2] We have three children (ages 13, 6, 2) and live in London. I work as a trader, formerly at a quant trading firm and now at a hedge fund. Work Many critics of EtG assume that we really want to be doing something meaningful, but have - with a heavy heart - intellectually conceded that money is what matters. I want to emphasise this: This is not me, and I doubt it applies to even 20% of people doing EtG. If you currently feel this way, I strongly suspect you should stop. I like my work. I get to work with incredibly sharp and motivated people. I get to work on a diverse array of intellectual challenges. Most of all, I've managed to land a career that bears an uncanny resemblance to what I do with my spare time; playing games, looking for inconsistencies in others' beliefs, and exploiting that to win. But prior to discovering EtG, I was wrestling with the fact that this natural choice just seemed very selfish. As I saw it, my choices were to do something directly useful and be miserable but valuable, or to work in finance and be happy but worthless. So a reminder that the money I have a comparative advantage in earning is itself of value was a relief, not a burden. My career pathway has not been smooth, with a major derailment in 2018, which ...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app