

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Nov 17, 2023 • 55min
LW - On the lethality of biased human reward ratings by Eli Tyre
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On the lethality of biased human reward ratings, published by Eli Tyre on November 17, 2023 on LessWrong.
I'm rereading the List of Lethalities carefully and considering what I think about each point.
I think I strongly don't understand #20, and I thought that maybe you could explain what I'm missing?
20. Human operators are fallible, breakable, and manipulable. Human raters make systematic errors - regular, compactly describable, predictable errors. To faithfully learn a function from 'human feedback' is to learn (from our external standpoint) an unfaithful description of human preferences, with errors that are not random (from the outside standpoint of what we'd hoped to transfer).
If you perfectly learn and perfectly maximize the referent of rewards assigned by human operators, that kills them. It's a fact about the territory, not the map - about the environment, not the optimizer - that the best predictive explanation for human answers is one that predicts the systematic errors in our responses, and therefore is a psychological concept that correctly predicts the higher scores that would be assigned to human-error-producing cases.
I think that I don't understand this.
(Maybe one concrete thing that would help is just having examples.)
* * *
One thing that this could be pointing towards is the problem of what I'll call "dynamic feedback schemes", like RLHF. The key feature of a dynamic feedback scheme is that the AI system is generating outputs and a human rater is giving it feedback to reinforce good outputs and anti-reinforce bad outputs.
The problem with schemes like this is there is adverse selection for outputs that look good to the human rater but are actually bad. This means that, in the long run, you're reinforcing initial accidental misrepresentation, and shaping it into more and more sophisticated deception (because you anti-reinforce all the cases of misrepresentation that are caught out, and reinforce all the ones that aren't).
That seems very bad for not ending up in a world where all the metrics look great, but the underlying reality is awful or hollow, as Paul describes in part I of What Failure Looks Like.
It seems like maybe you could avoid this with a static feedback regime, where you take a bunch of descriptions of outcomes, maybe procedurally generated, maybe from fiction, maybe from news reports, whatever, and have humans score those outcomes on how good they are, to build a reward model that can be used for training. As long as the ratings don't get fed back into the generator, there's not much systematic incentive towards training deception.
...Actually, on reflection, I suppose this just pushes the problem back one step. Now you have a reward model which is giving feedback to some AI system that you're training. And the AI system will learn to adversarially game the reward model in the same way that it would have gamed the human.
That seems like a real problem, but it also doesn't seem like what this point from the list is trying to get at. It seems to be saying something more like "the reward model is going to be wrong, because there's going to be systematic biases in the human ratings."
Which, fair enough, that seems true, but I don't see why that's lethal. It seems like the reward model will be wrong in some places, and we would lose value in those places. But why does the reward model need to be an exact, high fidelity representation, across all domains, in order to not kill us? Why is a reward model that's a little off, in a predictable direction, catastrophic?
First things first:
What you're calling the "dynamic feedback schemes" problem is indeed a lethal problem which I think is not quite the same as Yudkowsky's #20, as you said.
"there's going to be systematic biases in the human ratings" is... technically correct, but I think a mi...

Nov 17, 2023 • 9min
EA - The Humane League - Room for More Funding & 2023 Impact by carolinemills
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Humane League - Room for More Funding & 2023 Impact, published by carolinemills on November 17, 2023 on The Effective Altruism Forum.
About The Humane League
The Humane League (THL) exists to end the abuse of animals raised for food. THL is laser focused on the efforts that have the biggest impact for the greatest number of animals. We are distinguished from other animal welfare organizations by the effectiveness of our corporate campaigns, our unique role as the most aggressive campaigners, and our approach to multiplying our movement's impact globally through the Open Wing Alliance (OWA) and in the US through the Animal Policy Alliance (APA). Our scalable interventions have a proven track record of reducing farm animal suffering - according to a 2019 Rethink Priorities report, our corporate cage-free campaigns affect nine to 120 years of a hens life per dollar spent, and have a follow-through rate of 48%-84% (we've found up to 89% in recent years)[1].
We are proud to be recognized by
Animal Charity Evaluators and
Founders Pledge as one of the most effective animal protection charities in the world.
"While we expect all of our evaluated charities to be excellent examples of effective advocacy, THL is exceptional even within that group. Giving to THL is an excellent opportunity to support initiatives that create the most positive change for animals." - Animal Charity Evaluators, 2023 THL evaluation report
Our Strategy & 2023 Impact
THL believes in focusing our collective energy where it will do the most good. Since chickens represent 90% of all land animals raised for food, any interventions we make for chickens have the greatest potential impact. And restrictive battery cages - small wire cages used to confine laying hens - are one of the worst sources of suffering for chickens. Ending the battery cage means ending the acute suffering of millions of birds.
Holding companies accountable to their cage-free commitments. Thousands of companies around the world have pledged to transition to 100% cage-free, eliminating the practice of confining hens in tiny, barren battery cages. Now, THL is holding these companies accountable, ensuring they keep their promises. Globally,
89% of companies followed through on their 2022 cage-free pledge. And in the US and globally, THL pushed the companies falling behind on their commitments to follow through on their promise. In 2023, THL held 36 companies with global cage-free commitments accountable to reporting progress on their pledges. Companies like
Kellogg's,
PepsiCo, and
Yum! Brands - the world's largest service restaurant company and the parent company of KFC, Pizza Hut, and Taco Bell - began publicly reporting on their cage-free commitments. All of this is translating to real change on the ground, with 39.4% of the US egg-laying flock free from cages (over ~120 million hens), up from ~5% when THL began this work in 2014.[3] (Global data is currently unavailable)
Progressing the cage-free movement globally. In addition to holding companies accountable for their existing commitments, THL is working to secure new cage-free commitments in key strategic areas around the world. Through the OWA, our coalition of nearly 100 member groups in 67 countries, THL is developing a global movement of effective animal advocates that conduct coordinated international and regional campaigns for layer hen and/or broiler chicken welfare. This year, the OWA pushed 103 global companies to pledge to rid their supply chains of cruel battery cages, including first cage-free commitments from corporations headquartered in Japan, the Middle East, Greece, Ukraine, Peru, Ecuador, South Africa, Argentina, South Korea, and Taiwan.
Jollibee Foods Corporation, the largest and fastest-growing restaurant group in Asia, pledged to reform its global supply chain,...

Nov 17, 2023 • 22min
LW - On Lies and Liars by Gabriel Alfour
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Lies and Liars, published by Gabriel Alfour on November 17, 2023 on LessWrong.
Note: I am a bad writer. If you want to help me by ghostwriting, reviewing or coaching, please reach out! I have a full stack of things to share, and I can pay well.
I've gotten quite a bit of feedback on my last post: "
Lying is Cowardice, not Strategy". Nice! I've been happy to see people discussing public honesty norms.
Some of the feedback was disappointing- people said things along the lines of: "Lies of omission are not lies! The fact that you have to add 'of omission' shows this!".
The worst that I have gotten in that vein was: "When you are on stand, you take an oath that says that you will convey the truth, the whole truth, and nothing but the truth. The fact that you have to add 'the whole truth' shows that lies of omission are not lies."
This was from a person that also straightforwardly stated "I plan to keep continuing to not state my most relevant opinions in public", as they were defending this behavior.
But I have also received some helpful feedback that pointed out the lack of nuance in my post. Indeed, reality is complex, and I can not address all of it in one post, so I must make a lot of simplifications.
Furthermore, I am not (yet?) a good writer, so even within one post, I have many opportunities for improvement. While some people take advantage of this through
isolated demands for rigor or empty terminological disputes, others are genuinely confused by a novel point or framing that they never considered from a writer whose mind they do not have full access to.
This post is dedicated to the latter audience: people who tried to understand and/or apply my ideas, but are confused, because there is a lot that I have just not written. To help with this, in this post, I'll go through some of the less straightforward points that I have glossed over.
Point #0: What do I mean by lies? Are lies always bad?
When I say that someone lies, I mean that they have communicated anti-information: information that predictably makes people become more wrong about the world or about what the locutor believes. This includes lies of commission, lies of omission, misdirection, and more.
Even though lying is bad, there can be good reasons to lie. The obvious example is lying to a murderer at your door, asking for where your friend is. There are also situations where lying is innocuous, such as bluffing games.
Likewise, even though punching people is bad, there can be situations where it's justified, such as self-defense. There are also situations where punching is innocuous, such as sparring at a boxing club.
Furthermore, communication is hard. That's why you can even lie accidentally. It is not always easy to know how your statements will be taken, especially across cultures and contexts. This is very similar to insulting people accidentally.
Finally, it is extremely easy to lie without saying anything that is technically false. You can imply things, omit relevant information, make things sound worse than they are, use double-speak and ambiguity, etc.
Point #1: Predictable patterns of lying are what makes someone a liar
I write about "liars"- this might seem like meaninglessly strong moral language.
So let's make it clear what I mean by it. When I say "liars", I am talking about people who have a predictable pattern of lying.
The relevant fact is not whether what a person states can be constructed as technically true or technically false, but whether they predictably make people more wrong over time.
Why is it useful to have a concept of "liars?" Because it helps us build practical models of people. Everyone lies. But sometimes, people predictably lie, and you can deduce a fair bunch from this.
Being a liar is not an intrinsic property of a person. The world is not divided into liars and no...

Nov 17, 2023 • 23min
AF - Coup probes trained off-policy by Fabien Roger
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Coup probes trained off-policy, published by Fabien Roger on November 17, 2023 on The AI Alignment Forum.
Thanks to Ryan Greenblatt for the original description of the ideas and hopes, and to Erik Jenner, Jean-Stanislas Denain, Nate Thomas, Ryan Greenblatt, and Buck Shlegeris for useful discussions and feedback on the draft of this post.
Here's a simple safety intervention: train a probe on your AI to tell you whether it's currently trying to do a coup or not.
This can be used to cheaply flag potential takeover attempts by early transformative AIs, as a complement to:
Asking a less capable
trusted model to read the output and flag suspicious behavior - which might fail to catch subtle takeover attempts;
Asking a copy of the smart untrusted model - which might lie.
How do you get the training data for coup probes? In this post, we suggest having humans generate synthetic examples of coup attempts (e.g. writing and exploiting a serious backdoor). Even if the model recognizes that the coup attempts are off-policy (i.e. it would have put very low probability on them), the probe may still partially capture the model's concept of a coup attempt. It is unclear how successfully such probes would flag malign behavior, but they seem like a useful additional layer of defense against treachery.
We present experiments measuring the generalization abilities of probes trained off-policy in a toy setting. We show that probes can generalize well to different text formats and also generalize from harmful text the LLM wouldn't output to harmful text where the LLM has been jailbroken to actually output the harmful text.
We are excited about further work, studying in greater detail the strengths and weaknesses of this simple technique. Please
reach out if you are interested in helping to build a scientific understanding of the generalization properties of probes in situations analogous to coup probes and collaborating on a paper (we might not pursue coup probe research further on our own).
What are coup probes?
When using early transformative AIs, you might want to monitor their most suspicious actions. How do you select the most suspicious actions? There are multiple possibilities, but one simple technique is coup probes:
Build a dataset of things the AI might do to try to take over (using the AI developer's best guesses about what it might do), such as inserting critical backdoors and taking advantage of them, or persuading users to help it gain power;
Train probes in the early transformative AIs to distinguish between regular outputs, and the synthetic takeover attempts from your takeover dataset (without training the rest of the weights of the network). Probes are linear classifiers trained on the AIs' activations - usually activations at different depths in the residual stream of Transformers.
Carefully audit the top-1% most suspicious actions of AIs according to the probes (and the other monitoring methods) before they are submitted to the scaffold through which the AIs interact with the codebase / its long-term memory.
This is not a silver bullet, but we think coup probes have a reasonable chance of catching the first real coup attempt (e.g. ~50%) if you use a false positive rate of 1%. Ensembling in other techniques could further improve our ability to select the most suspicious actions (though we won't talk about other approaches for the remainder of this post).
Testing generalization properties of probes on a toy task
Note: We didn't spend a lot of time on experiments, some of the results are surprising, and might therefore be wrong (p=10% that some of them are wrong).
The setting
Instead of studying coup attempts, we instead train probes to identify pairs of (instruction, answer) where the answer contains advice about how to commit theft. We have GPT-4 craft instructions whic...

Nov 17, 2023 • 12min
LW - On Tapping Out by Screwtape
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Tapping Out, published by Screwtape on November 17, 2023 on LessWrong.
I.
It has been said that rationality is akin to a martial art. Very well. If we're going to borrow from martial arts, let us borrow properly.
There is a technique known in some parts of the rationalist community called "Tapping Out." Tapping out in this context means you would like to exit an argument or debate. I believe this technique was first imported to LessWrong in this comment by Rain, and it is defined in this tag. As someone who has been practicing martial arts for most of his life, I have some thoughts on the ritual that is tapping out.
If you're unfamiliar with the term's origin, let me describe the physical form. Tapping out looks like slapping either the ground or the opponent three times in an open handed strike of light to medium force. It's about the amount of power you'd use to clap your hands, and in fact the sound is pretty similar to clapping. It doesn't have to be exactly three times either; if you're wrestling and your opponent keeps tapping you over and over, you let them go, you don't hold on because it was seven instead of three.
Tapping out can be more exactly codified in competitive martial arts like MMA matches or intercollegiate wrestling. It's also used in martial arts dojos where there isn't a competitive focus, and I all but guarantee you'll learn about it if you go to a dojo that does a lot of sparring or partner practice. Notably, tapping out is functionally the same in every dojo I've every learned at.[1] There is a good reason for this: you want it to be immediately clear whether someone is tapping out. I was repeatedly told that if it was ever unclear to me whether my opponent was tapping out, I was supposed to assume they were doing so and let them go.
II.
Actually, I want to back up and look at that sentence again. I used the phrase "my opponent" to refer to the other person, but the majority of the times when I or the other person tapped out wasn't during a competition. It was common for a drill to start with me attacking them, for them to deflect the attack and pin me, and then for me to tap out as soon as the pin was complete. Often we would do this a few dozen times in a row, alternating which of us attacked and which of us defended.
I would be in pain during the pin, and
I wasn't going to escape anyway since that wasn't the drill, and
I risked it hurting later after we'd stopped, because my arm had been wrenched repeatedly.
In a competition, tapping out generally means that you lose the point. In a drill, what would it even mean to say that I "lost" the round? At the end of twenty minutes, the score would probably be forty to thirty-nine, and the winner would entirely be down to who went first. We'd tie half the time! Even when we weren't drilling a specific sequence and were instead freely practicing, tapping out didn't have a negative connotation or stigma. You tried something, it didn't work, so you stopped and set up again.
Saying someone "lost" when they tapped out in that context would be like a music teacher saying a new student had "lost" when they played a chord wrong or worse, like a skilled musician feeling that they'd "lost" when trying to write a new melody and discovering they didn't like how it sounded. Yeah, ideally you'd play it perfectly the first time and it would be great, but what you're reinforcing is never trying anything new.
While I'm on the subject: the ability to tap out did not depend on whether or not you were the "aggressor." If we both stepped into the ring, I swing first, you counterattack, and then I tap out? That's fine, everything working as expected.
If you're part of a debate club and it's a competition day against another school I would expect saying that you tap out to mean you lost the round. Don't do that unles...

Nov 17, 2023 • 9min
EA - Spiro - New TB charity raising seed funds by Habiba Banu
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Spiro - New TB charity raising seed funds, published by Habiba Banu on November 17, 2023 on The Effective Altruism Forum.
Summary
We (Habiba Banu and Roxanne Heston) have launched
Spiro, a new TB screening and prevention charity focused on children. Our website is
here.
We are fundraising $198,000 for our first year. We're currently reaching out to people in the EA network. So far we have between 20%-50% of our budget promised and fundraising is currently one of the main things we're focusing on.
The major components of our first year budget are co-founder time, country visits, and delivery of a pilot program, which aims to do household-level TB screening and provision of preventative medication.
We think that this project has a lot of promise:
Tuberculosis has a huge global burden, killing 1.3 million people every year, and is disproportionately neglected and fatal in young children.
The evidence for preventative treatment is robust and household programs are promising, yet few high-burden countries have scaled up this intervention.
Modeling by Charity Entrepreneurship and by academics indicate that this can be competitive with the best GiveWell-recommended charities.
If we don't manage to raise at least half of our target budget by the beginning of December 2023 then we'll switch from our intended focus for the next month from program planning to additional fundraising. This will push out our timelines for getting to the useful work.
If we don't manage to raise our full target budget by the end of 2023 then we'll scale back our ambitions in the immediate term, until we put additional time into fundraising a few months later. The lower budget will also limit the size of our proof-of-concept effort since we and our government partners will need to scale back work to the available funds.
You can donate via
Giving What We Can's fund for charities incubated through Charity Entrepreneurship.
Please also email habiba.banu@spiro.ngo letting us know how much you have donated so that we can identify the funds and allocate them to Spiro.
Who are we?
Spiro is co-founded by Habiba Banu and Roxanne Heston.
Habiba worked for the last three years at 80,000 Hours and before that as Senior Administrator at the Future of Humanity Institute and the Global Priorities Institute. Her background is working as a consultant at PwC with government and non-profit clients.
Rox has worked for the last few years on international AI policy in the U.S. Government and at think tanks. She has worked with and for various EA organizations including the Centre for Effective Altruism, the Future of Humanity Institute, Open Philanthropy and the Lead Exposure Elimination Project.
We have received Charity Entrepreneurship support so far:
Charity Entrepreneurship's research team did the initial research into this idea and shared their work with us.
Habiba went through Charity Entrepreneurship's Incubator Programme earlier this year. Rox started working with Habiba to find an idea together about halfway through the program.
Charity Entrepreneurship has provided stipend funding, advice, and operational support (e.g. website design). It will continue to provide mentorship from its leadership team and a fiscal sponsorship arrangement.
What are we going to do?
Spiro will implement sustainable household screening programs in low- and lower-middle income countries. Spiro aims to curb infections and save lives of children in regions with high burdens of tuberculosis by identifying, screening, and treating household contacts of people living with TB.
We will initially establish a proof of concept in one region, working closely with the government TB program. We will then aim to scale nationally, with funding from the
Global Fund, and expand to other countries.
Currently, we are planning a visit to Uganda to shadow e...

Nov 17, 2023 • 2min
LW - A to Z of things by KatjaGrace
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A to Z of things, published by KatjaGrace on November 17, 2023 on LessWrong.
I wanted to give my good friends' baby a book, in honor of her existence. And I recalled children's books being an exciting genre. Yet checking in on that thirty years later, Amazon had none I could super get behind. They did have books I used to like, but for reasons now lost. And I wonder if as a child I just had no taste because I just didn't know how good things could be.
What would a good children's book be like?
When I was about sixteen, I thought one reasonable thing to have learned when I was about two would have been the concepts of 'positive feedback loop' and 'negative feedback loop', then being taught in my year 11 class. Very interesting, very bleedingly obvious once you saw it. Why not hear about this as soon as one is coherent? Evolution, if I recall, seemed similar.
Here I finally enact my teenage self's vision, and present A to Z of things, including some very interesting things that you might want a beautiful illustrative prompt to explain to your child as soon as they show glimmerings of conceptual thought: levers, markets, experiments, Greece, computer hardware, reference classes, feedback loops, (trees).
I think so far, the initial recipient is most fond of the donkey, in fascinating support of everyone else's theories about what children are actually into. (Don't get me wrong, I also like donkeys - when I have a second monitor, I just use it to stream donkey cams.) But perhaps one day donkeys will be a gateway drug to monkeys, and monkeys to moths, and moths will be resting on perfecttly moth-colored trees, and BAM! Childhood improved.
Anyway, if you want a copy, it's now available in an 'email it to a copy shop and get it printed yourself' format! See below. Remember to ask for card that is stronger than your child's bite.
[Front]
[Content]
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Nov 17, 2023 • 3min
LW - Forecasting AI (Overview) by jsteinhardt
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Forecasting AI (Overview), published by jsteinhardt on November 17, 2023 on LessWrong.
This is a landing page for various posts I've written, and plan to write, about forecasting future developments in AI. I draw on the field of human judgmental forecasting, sometimes colloquially referred to as superforecasting. A hallmark of forecasting is that answers are probability distributions rather than single outcomes, so you should expect ranges rather than definitive answers (but ranges can still be informative!). If you are interested in learning more about this field, I teach a class on it with open-access notes, slides, and assignments.
For AI forecasting in particular, I first got into this area by forecasting progress on several benchmarks:
In Updates and Lessons from AI Forecasting, I describe a forecasting competition that I helped commission, which asked competitive forecasters to predict progress on four different benchmarks. This is a good place to start to understand what I mean by forecasting.
In AI Forecasting: One Year In, I look at the first year of results from the competition, and find that forecasters generally underpredicted progress in AI, especially on the MATH and MMLU benchmarks.
Motivated by this, in Forecasting ML Benchmarks in 2023 I provide my own forecasts of what state-of-the-art performance on MATH and MMLU will be in June 2023.
In AI Forecasting: Two Years In, I look at the second year of results from the competition. I found that the original forecasters continued to underpredict progress, but that a different platform (Metaculus) did better, and that my own forecasts were on par with Metaculus.
After these exercises in forecasting ML benchmarks, I turned to a more ambitious task: predicting the properties of AI models in 2030 across many different axes (capabilities, cost, speed, etc.). My overall predictions are given in What Will GPT-2030 Look Like?, which provides a concrete (but very uncertain) picture of what ML will look like at the end of this decade.
Finally, I am now turning to using forecasting to quantify and understand risks from AI:
In GPT-2030 and Catastrophic Drives: Four Vignettes, I use my GPT-2030 predictions as a starting point to understand the capabilities and corresponding risks of future ML models. I then speculate on four scenarios through which AI could lead to catastrophic outcomes.
In Base Rates for Catastrophe, I take a different approach, using data on historical catastrophes and extinction events to form a reference class for AI catastrophes. Most expert forecasters consider reference class forecasting to be a strong baseline that forms the starting point for their own forecasts, and I think it's also a good place to start for AI risk.
In Forecasting Catastrophic Risks from AI, I put everything together to give an all-things-considered estimate of my probability of an AI-induced catastrophe by 2050.
Finally, in Other Estimates of Catastrophic Risk, I collect other similar forecasts made by various individuals and organizations, and explain which ones I give more and less weight to, based on track record and overall effort and expertise.
The first of these posts has been written, and I plan to release a new one about once per week.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Nov 16, 2023 • 5min
LW - New LessWrong feature: Dialogue Matching by jacobjacob
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New LessWrong feature: Dialogue Matching, published by jacobjacob on November 16, 2023 on LessWrong.
The LessWrong team is shipping a new experimental feature today: dialogue matching!
I've been leading work on this (together with Ben Pace, kave, Ricki Heicklen, habryka and RobertM), so wanted to take some time to introduce what we built and share some thoughts on why I wanted to build it.
New feature!
There's now a dialogue matchmaking page at lesswrong.com/dialogueMatching
Here's how it works:
You can check a user you'd potentially be interested in having a dialogue with, if they were too
They can't see your checks unless you match
It also shows you some interesting data: your top upvoted users over the last 18 months, how much you agreed/disagreed with them, what topics they most frequently commented on, and what posts of theirs you most recently read.
Next, if you find a match, this happens:
You get a tiny form asking for topic ideas and format preferences, and then we create a dialogue that summarises your responses and suggests next steps based on them.
Currently, we're mostly sourcing auto-suggested topics from Ben's neat poll where people voted on interesting disagreement they'd want to see debated, and also stated their own views. I'm pretty excited to further explore this and other ways for auto-suggesting good topics. My hypothesis is that we're in a bit of a dialogue overhang: there are important conversations out there to be had, but that aren't happening. We just need to find them.
This feature is an experiment in making it easier to do many of the hard steps in having a dialogue: finding a partner, finding a topic, and coordinating on format.
To try the Dialogue Matching feature, feel free to on head over to lesswrong.com/dialogueMatching !
Me and the team are super keen to hear any and all feedback. Feel free to share in comments below or using the intercom button in the bottom right corner :)
Why build this?
A retreat organiser I worked with long ago told me: "the most valuable part of an event usually aren't the big talks, but the small group or 1-1 conversations you end up having in the hallways between talks."
I think this points at something important. When Lightcone runs events, we usually optimize the small group experience pretty hard. In fact, when building and renovating our campus Lighthaven, we designed it to have lots of little nooks and spaces in order to facilitate exactly this kind of interaction.
With dialogues, I feel like we're trying to enable an interaction on LessWrong that's also more like a 1-1, and less like a broadcasting talk to an audience.
But we're doing so with two important additions:
Readable artefacts. Usually the results of a 1-1 are locked in with the people involved. Sometimes that's good. But other times, Dialogues enable a format where good stuff that came out of it can be shared with others.
Matchmaking at scale. Being a good event organiser involves a lot of effort to figure out who might have valuable conversations, and then connecting them. This can often be super valuable (thought experiment: imagine introducing Von Neumann and Morgenstern), but takes a lot of personalised fingertip feel and dinner host mojo. Using dialogue matchmaking, I'm curious about a quick experiment to try to doing this at scale, in an automated way.
Overall, I think there's a whole class of valuable content here that you can't even get out at all outside of a dialogue format. The things you say in a talk are different from the things you'd share if you were being interviewed on a podcast, or having a conversation with a friend. Suppose you had been mulling over a confusion about AI. Your thoughts are nowhere near the point where you could package them into a legible, ordered talk and then go present them.
So, what do you do? I think...

Nov 16, 2023 • 4min
LW - Learning coefficient estimation: the details by Zach Furman
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Learning coefficient estimation: the details, published by Zach Furman on November 16, 2023 on LessWrong.
What this is for
The learning coefficient (LC), or RLCT, is a quantity from singular learning theory that can help to quantify the "complexity" of deep learning models, among other things.
This guide is primarily intended to help people interested in improving learning coefficient estimation get up to speed with how it works, behind the scenes. If you're just trying to use the LC for your own project, you can just use the library without knowing all the details, though this guide might still be helpful. It's highly recommended you read this post before reading this one, if you haven't already.
We're primarily covering the WBIC paper (Watanabe 2010), the foundation for current LC estimation techniques, but the presentation here is original, aiming for better intuition, and differs substantially from the paper. We'll also briefly cover Lau et al. 2023.
Despite all the lengthy talk, what you end up doing in practice is really simple, and the code is designed to highlight that. After some relatively quick setup, the actual LC calculation can be comfortably done in one or two lines of code.
What this isn't for
A good overview of SLT, or motivation behind studying the LC or loss landscape volume in the first place. We're narrowly focused on LC estimation here.
Sampling details. These are very important! But they're not really unique to singular learning theory, and there are plenty of good resources and tutorials on MCMC elsewhere.
Derivations of formulas, beyond the high-level reasoning.
TLDR
What is the learning coefficient? (Review from last time)
The learning coefficient (LC), also called the RLCT, measures basin broadness.
This isn't new, but typically "basin broadness" is operationalized as "basin flatness" - that is, via the determinant of the Hessian. When the model is singular (eigenvalues of the Hessian are zero), this is a bad idea.
The LC operationalizes "basin broadness" as the (low-loss asymptotic) volume scaling exponent. This ends up being the right thing to measure, as justified by singular learning theory.
How do we measure it?
It turns out that measuring high-dimensional volume directly is hard. We don't do this.
Instead we use MCMC to do what's known in statistics as "method of moments" estimation. We contrive a distribution with the LC as a population parameter, sample from that distribution and calculate one of its moments, and solve for the LC.
We simplify some details in this section, but this is the conceptual heart of LC estimation.
How do we measure it (for real)?
The above is a bit simplified. The LC does measure loss volume scaling, but the "loss" it uses is the average or "infinite-data" limit of the empirical loss function.
In practice, you don't know this infinite-data loss function. Luckily, you already have a good estimate of it - your empirical loss function. Unluckily, this estimate isn't perfect - it can have some noise. And it turns out this noise is actually worst in the place you least want it.
But it all works out in the end! You actually just need to make one small modification to the "idealized" algorithm, and things work fine. This gets you an algorithm that really works in practice!
Finally, the state-of-the-art method (Lau et al. 2023) makes a couple simple modifications, for scalability among other reasons: it measures the learning coefficient only *locally*, and uses mini-batch loss instead of full-batch.
In chart form: as we move from idealized (top) to realistic (bottom), we get new problems, solutions, and directions for improvement. The guide itself covers the first two rows in the most detail, which are likely the most conceptually difficult to think about, and skips directly from the second row to the fourth row at ...


