

LessWrong (Curated & Popular)
LessWrong
Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Episodes
Mentioned books

Feb 3, 2023 • 7min
"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares
https://www.lesswrong.com/posts/Zp6wG5eQFLGWwcG6j/focus-on-the-places-where-you-feel-shocked-everyone-sWriting down something I’ve found myself repeating in different conversations:If you're looking for ways to help with the whole “the world looks pretty doomed” business, here's my advice: look around for places where we're all being total idiots.Look for places where everyone's fretting about a problem that some part of you thinks it could obviously just solve.Look around for places where something seems incompetently run, or hopelessly inept, and where some part of you thinks you can do better.Then do it better.

43 snips
Feb 2, 2023 • 1h 7min
"Basics of Rationalist Discourse" by Duncan Sabien
https://www.lesswrong.com/posts/XPv4sYrKnPzeJASuk/basics-of-rationalist-discourse-1IntroductionThis post is meant to be a linkable resource. Its core is a short list of guidelines (you can link directly to the list) that are intended to be fairly straightforward and uncontroversial, for the purpose of nurturing and strengthening a culture of clear thinking, clear communication, and collaborative truth-seeking."Alas," said Dumbledore, "we all know that what should be, and what is, are two different things. Thank you for keeping this in mind."There is also (for those who want to read more than the simple list) substantial expansion/clarification of each specific guideline, along with justification for the overall philosophy behind the set.

Jan 31, 2023 • 39min
"Sapir-Whorf for Rationalists" by Duncan Sabien
https://www.lesswrong.com/posts/PCrTQDbciG4oLgmQ5/sapir-whorf-for-rationalistsCasus Belli: As I was scanning over my (rather long) list of essays-to-write, I realized that roughly a fifth of them were of the form "here's a useful standalone concept I'd like to reify," à la cup-stacking skills, fabricated options, split and commit, and sazen. Some notable entries on that list (which I name here mostly in the hope of someday coming back and turning them into links) include: red vs. white, walking with three, setting the zero point[1], seeding vs. weeding, hidden hinges, reality distortion fields, and something-about-layers-though-that-one-obviously-needs-a-better-word.While it's still worthwhile to motivate/justify each individual new conceptual handle (and the planned essays will do so), I found myself imagining a general objection of the form "this is just making up terms for things," or perhaps "this is too many new terms, for too many new things." I realized that there was a chunk of argument, repeated across all of the planned essays, that I could factor out, and that (to the best of my knowledge) there was no single essay aimed directly at the question "why new words/phrases/conceptual handles at all?"So ... voilà.

Jan 31, 2023 • 9min
"My Model Of EA Burnout" by Logan Strohl
https://www.lesswrong.com/posts/pDzdb4smpzT3Lwbym/my-model-of-ea-burnout(Probably somebody else has said most of this. But I personally haven't read it, and felt like writing it down myself, so here we go.)I think that EA [editor note: "Effective Altruism"] burnout usually results from prolonged dedication to satisfying the values you think you should have, while neglecting the values you actually have.Setting aside for the moment what “values” are and what it means to “actually” have one, suppose that I actually value these things (among others):

Jan 25, 2023 • 23min
"The Social Recession: By the Numbers" by Anton Stjepan Cebalo
https://www.lesswrong.com/posts/Xo7qmDakxiizG7B9c/the-social-recession-by-the-numbersThis is a linkpost for https://novum.substack.com/p/social-recession-by-the-numbersFewer friends, relationships on the decline, delayed adulthood, trust at an all-time low, and many diseases of despair. The prognosis is not great.One of the most discussed topics online recently has been friendships and loneliness. Ever since the infamous chart showing more people are not having sex than ever before first made the rounds, there’s been increased interest in the social state of things. Polling has demonstrated a marked decline in all spheres of social life, including close friends, intimate relationships, trust, labor participation, and community involvement. The trend looks to have worsened since the pandemic, although it will take some years before this is clearly established.The decline comes alongside a documented rise in mental illness, diseases of despair, and poor health more generally. In August 2022, the CDC announced that U.S. life expectancy has fallen further and is now where it was in 1996. Contrast this to Western Europe, where it has largely rebounded to pre-pandemic numbers. Still, even before the pandemic, the years 2015-2017 saw the longest sustained decline in U.S. life expectancy since 1915-18. While my intended angle here is not health-related, general sociability is closely linked to health. The ongoing shift has been called the “friendship recession” or the “social recession.”

Jan 24, 2023 • 21min
"Recursive Middle Manager Hell" by Raemon
https://www.lesswrong.com/posts/pHfPvb4JMhGDr4B7n/recursive-middle-manager-hellI think Zvi's Immoral Mazes sequence is really important, but comes with more worldview-assumptions than are necessary to make the points actionable. I conceptualize Zvi as arguing for multiple hypotheses. In this post I want to articulate one sub-hypothesis, which I call "Recursive Middle Manager Hell". I'm deliberately not covering some other components of his model[1].tl;dr: Something weird and kinda horrifying happens when you add layers of middle-management. This has ramifications on when/how to scale organizations, and where you might want to work, and maybe general models of what's going on in the world.You could summarize the effect as "the org gets more deceptive, less connected to its original goals, more focused on office politics, less able to communicate clearly within itself, and selected for more for sociopathy in upper management."You might read that list of things and say "sure, seems a bit true", but one of the main points here is "Actually, this happens in a deeper and more insidious way than you're probably realizing, with much higher costs than you're acknowledging. If you're scaling your organization, this should be one of your primary worries."

Jan 12, 2023 • 34min
"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin
https://www.lesswrong.com/posts/L4anhrxjv8j2yRKKp/how-discovering-latent-knowledge-in-language-models-withoutCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.IntroductionA few collaborators and I recently released a new paper: Discovering Latent Knowledge in Language Models Without Supervision. For a quick summary of our paper, you can check out this Twitter thread.In this post I will describe how I think the results and methods in our paper fit into a broader scalable alignment agenda. Unlike the paper, this post is explicitly aimed at an alignment audience and is mainly conceptual rather than empirical. Tl;dr: unsupervised methods are more scalable than supervised methods, deep learning has special structure that we can exploit for alignment, and we may be able to recover superhuman beliefs from deep learning representations in a totally unsupervised way.Disclaimers: I have tried to make this post concise, at the cost of not making the full arguments for many of my claims; you should treat this as more of a rough sketch of my views rather than anything comprehensive. I also frequently change my mind – I’m usually more consistently excited about some of the broad intuitions but much less wedded to the details – and this of course just represents my current thinking on the topic.

Jan 12, 2023 • 9min
"The Feeling of Idea Scarcity" by John Wentworth
https://www.lesswrong.com/posts/mfPHTWsFhzmcXw8ta/the-feeling-of-idea-scarcityHere’s a story you may recognize. There's a bright up-and-coming young person - let's call her Alice. Alice has a cool idea. It seems like maybe an important idea, a big idea, an idea which might matter. A new and valuable idea. It’s the first time Alice has come up with a high-potential idea herself, something which she’s never heard in a class or read in a book or what have you.So Alice goes all-in pursuing this idea. She spends months fleshing it out. Maybe she writes a paper, or starts a blog, or gets a research grant, or starts a company, or whatever, in order to pursue the high-potential idea, bring it to the world.And sometimes it just works!… but more often, the high-potential idea doesn’t actually work out. Maybe it turns out to be basically-the-same as something which has already been tried. Maybe it runs into some major barrier, some not-easily-patchable flaw in the idea. Maybe the problem it solves just wasn’t that important in the first place.From Alice’ point of view, the possibility that her one high-potential idea wasn’t that great after all is painful. The idea probably feels to Alice like the single biggest intellectual achievement of her life. To lose that, to find out that her single greatest intellectual achievement amounts to little or nothing… that hurts to even think about. So most likely, Alice will reflexively look for an out. She’ll look for some excuse to ignore the similar ideas which have already been tried, some reason to think her idea is different. She’ll look for reasons to believe that maybe the major barrier isn’t that much of an issue, or that we Just Don’t Know whether it’s actually an issue and therefore maybe the idea could work after all. She’ll look for reasons why the problem really is important. Maybe she’ll grudgingly acknowledge some shortcomings of the idea, but she’ll give up as little ground as possible at each step, update as slowly as she can.

Jan 12, 2023 • 10min
"Models Don't 'Get Reward'" by Sam Ringer
https://www.lesswrong.com/posts/TWorNr22hhYegE4RT/models-don-t-get-rewardCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.In terms of content, this has a lot of overlap with Reward is not the optimization target. I'm basically rewriting a part of that post in language I personally find clearer, emphasising what I think is the core insightWhen thinking about deception and RLHF training, a simplified threat model is something like this:A model takes some actions.If a human approves of these actions, the human gives the model some reward.Humans can be deceived into giving reward in situations where they would otherwise not if they had more knowledge.Models will take advantage of this so they can get more reward.Models will therefore become deceptive.Before continuing, I would encourage you to really engage with the above. Does it make sense to you? Is it making any hidden assumptions? Is it missing any steps? Can you rewrite it to be more mechanistically correct?I believe that when people use the above threat model, they are either using it as shorthand for something else or they misunderstand how reinforcement learning works. Most alignment researchers will be in the former category. However, I was in the latter.

Dec 21, 2022 • 1h 19min
"The next decades might be wild" by Marius Hobbhahn
https://www.lesswrong.com/posts/qRtD4WqKRYEtT5pi3/the-next-decades-might-be-wildCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.I’d like to thank Simon Grimm and Tamay Besiroglu for feedback and discussions.This post is inspired by What 2026 looks like and an AI vignette workshop guided by Tamay Besiroglu. I think of this post as “what would I expect the world to look like if these timelines (median compute for transformative AI ~2036) were true” or “what short-to-medium timelines feel like” since I find it hard to translate a statement like “median TAI year is 20XX” into a coherent imaginable world.I expect some readers to think that the post sounds wild and crazy but that doesn’t mean its content couldn’t be true. If you had told someone in 1990 or 2000 that there would be more smartphones and computers than humans in 2020, that probably would have sounded wild to them. The same could be true for AIs, i.e. that in 2050 there are more human-level AIs than humans. The fact that this sounds as ridiculous as ubiquitous smartphones sounded to the 1990/2000 person, might just mean that we are bad at predicting exponential growth and disruptive technology. Update: titotal points out in the comments that the correct timeframe for computers is probably 1980 to 2020. So the correct time span is probably 40 years instead of 30. For mobile phones, it's probably 1993 to 2020 if you can trust this statistic.I’m obviously not confident (see confidence and takeaways section) in this particular prediction but many of the things I describe seem like relatively direct consequences of more and more powerful and ubiquitous AI mixed with basic social dynamics and incentives.


