LessWrong (Curated & Popular)

LessWrong

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Episodes

Mentioned books

Feb 3, 2023 • 7min

"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares

https://www.lesswrong.com/posts/Zp6wG5eQFLGWwcG6j/focus-on-the-places-where-you-feel-shocked-everyone-sWriting down something I’ve found myself repeating in different conversations:If you're looking for ways to help with the whole “the world looks pretty doomed” business, here's my advice: look around for places where we're all being total idiots.Look for places where everyone's fretting about a problem that some part of you thinks it could obviously just solve.Look around for places where something seems incompetently run, or hopelessly inept, and where some part of you thinks you can do better.Then do it better.

43 snips

Feb 2, 2023 • 1h 7min

"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin

https://www.lesswrong.com/posts/L4anhrxjv8j2yRKKp/how-discovering-latent-knowledge-in-language-models-withoutCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.IntroductionA few collaborators and I recently released a new paper: Discovering Latent Knowledge in Language Models Without Supervision. For a quick summary of our paper, you can check out this Twitter thread.In this post I will describe how I think the results and methods in our paper fit into a broader scalable alignment agenda. Unlike the paper, this post is explicitly aimed at an alignment audience and is mainly conceptual rather than empirical. Tl;dr: unsupervised methods are more scalable than supervised methods, deep learning has special structure that we can exploit for alignment, and we may be able to recover superhuman beliefs from deep learning representations in a totally unsupervised way.Disclaimers: I have tried to make this post concise, at the cost of not making the full arguments for many of my claims; you should treat this as more of a rough sketch of my views rather than anything comprehensive. I also frequently change my mind – I’m usually more consistently excited about some of the broad intuitions but much less wedded to the details – and this of course just represents my current thinking on the topic.

Jan 12, 2023 • 9min

"The Feeling of Idea Scarcity" by John Wentworth

https://www.lesswrong.com/posts/mfPHTWsFhzmcXw8ta/the-feeling-of-idea-scarcityHere’s a story you may recognize. There's a bright up-and-coming young person - let's call her Alice. Alice has a cool idea. It seems like maybe an important idea, a big idea, an idea which might matter. A new and valuable idea. It’s the first time Alice has come up with a high-potential idea herself, something which she’s never heard in a class or read in a book or what have you.So Alice goes all-in pursuing this idea. She spends months fleshing it out. Maybe she writes a paper, or starts a blog, or gets a research grant, or starts a company, or whatever, in order to pursue the high-potential idea, bring it to the world.And sometimes it just works!… but more often, the high-potential idea doesn’t actually work out. Maybe it turns out to be basically-the-same as something which has already been tried. Maybe it runs into some major barrier, some not-easily-patchable flaw in the idea. Maybe the problem it solves just wasn’t that important in the first place.From Alice’ point of view, the possibility that her one high-potential idea wasn’t that great after all is painful. The idea probably feels to Alice like the single biggest intellectual achievement of her life. To lose that, to find out that her single greatest intellectual achievement amounts to little or nothing… that hurts to even think about. So most likely, Alice will reflexively look for an out. She’ll look for some excuse to ignore the similar ideas which have already been tried, some reason to think her idea is different. She’ll look for reasons to believe that maybe the major barrier isn’t that much of an issue, or that we Just Don’t Know whether it’s actually an issue and therefore maybe the idea could work after all. She’ll look for reasons why the problem really is important. Maybe she’ll grudgingly acknowledge some shortcomings of the idea, but she’ll give up as little ground as possible at each step, update as slowly as she can.

Jan 12, 2023 • 10min

"Models Don't 'Get Reward'" by Sam Ringer

https://www.lesswrong.com/posts/TWorNr22hhYegE4RT/models-don-t-get-rewardCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.In terms of content, this has a lot of overlap with Reward is not the optimization target. I'm basically rewriting a part of that post in language I personally find clearer, emphasising what I think is the core insightWhen thinking about deception and RLHF training, a simplified threat model is something like this:A model takes some actions.If a human approves of these actions, the human gives the model some reward.Humans can be deceived into giving reward in situations where they would otherwise not if they had more knowledge.Models will take advantage of this so they can get more reward.Models will therefore become deceptive.Before continuing, I would encourage you to really engage with the above. Does it make sense to you? Is it making any hidden assumptions? Is it missing any steps? Can you rewrite it to be more mechanistically correct?I believe that when people use the above threat model, they are either using it as shorthand for something else or they misunderstand how reinforcement learning works. Most alignment researchers will be in the former category. However, I was in the latter.

Dec 21, 2022 • 1h 19min

"The next decades might be wild" by Marius Hobbhahn

https://www.lesswrong.com/posts/qRtD4WqKRYEtT5pi3/the-next-decades-might-be-wildCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.I’d like to thank Simon Grimm and Tamay Besiroglu for feedback and discussions.This post is inspired by What 2026 looks like and an AI vignette workshop guided by Tamay Besiroglu. I think of this post as “what would I expect the world to look like if these timelines (median compute for transformative AI ~2036) were true” or “what short-to-medium timelines feel like” since I find it hard to translate a statement like “median TAI year is 20XX” into a coherent imaginable world.I expect some readers to think that the post sounds wild and crazy but that doesn’t mean its content couldn’t be true. If you had told someone in 1990 or 2000 that there would be more smartphones and computers than humans in 2020, that probably would have sounded wild to them. The same could be true for AIs, i.e. that in 2050 there are more human-level AIs than humans. The fact that this sounds as ridiculous as ubiquitous smartphones sounded to the 1990/2000 person, might just mean that we are bad at predicting exponential growth and disruptive technology. Update: titotal points out in the comments that the correct timeframe for computers is probably 1980 to 2020. So the correct time span is probably 40 years instead of 30. For mobile phones, it's probably 1993 to 2020 if you can trust this statistic.I’m obviously not confident (see confidence and takeaways section) in this particular prediction but many of the things I describe seem like relatively direct consequences of more and more powerful and ubiquitous AI mixed with basic social dynamics and incentives.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

LessWrong (Curated & Popular)

Episodes

Mentioned books

"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares

"Basics of Rationalist Discourse" by Duncan Sabien

"Sapir-Whorf for Rationalists" by Duncan Sabien

"My Model Of EA Burnout" by Logan Strohl

"The Social Recession: By the Numbers" by Anton Stjepan Cebalo

"Recursive Middle Manager Hell" by Raemon

"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin

"The Feeling of Idea Scarcity" by John Wentworth

"Models Don't 'Get Reward'" by Sam Ringer

"The next decades might be wild" by Marius Hobbhahn

The AI-powered Podcast Player