LessWrong (Curated & Popular)

LessWrong
undefined
Aug 28, 2023 • 12min

"Assume Bad Faith" by Zack_M_Davis

I've been trying to avoid the terms "good faith" and "bad faith". I'm suspicious that most people who have picked up the phrase "bad faith" from hearing it used, don't actually know what it means—and maybe, that the thing it does mean doesn't carve reality at the joints.People get very touchy about bad faith accusations: they think that you should assume good faith, but that if you've determined someone is in bad faith, you shouldn't even be talking to them, that you need to exile them.What does "bad faith" mean, though? It doesn't mean "with ill intent."Source:https://www.lesswrong.com/posts/pZrvkZzL2JnbRgEBC/feedbackloop-first-rationalityNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓
undefined
Aug 23, 2023 • 15min

"Large Language Models will be Great for Censorship" by Ethan Edwards

LLMs can do many incredible things. They can generate unique creative content, carry on long conversations in any number of subjects, complete complex cognitive tasks, and write nearly any argument. More mundanely, they are now the state of the art for boring classification tasks and therefore have the capability to radically upgrade the censorship capacities of authoritarian regimes throughout the world.Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort. Thanks to ev_ and Kei for suggestions on this post.Source:https://www.lesswrong.com/posts/oqvsR2LmHWamyKDcj/large-language-models-will-be-great-for-censorshipNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓
undefined
Aug 22, 2023 • 6min

"6 non-obvious mental health issues specific to AI safety" by Igor Ivanov

Intro: I am a psychotherapist, and I help people working on AI safety. I noticed patterns of mental health issues highly specific to this group. It's not just doomerism, there are way more of them that are less obvious. If you struggle with a mental health issue related to AI safety, feel free to leave a comment about it and about things that help you with it. You might also support others in the comments. Sometimes such support makes a lot of difference and people feel like they are not alone.All the examples in this post are changed in a way that it's impossible to recognize a specific person behind them.Source:https://www.lesswrong.com/posts/tpLzjWqG2iyEgMGfJ/6-non-obvious-mental-health-issues-specific-to-ai-safetyNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓
undefined
Aug 22, 2023 • 8min

"Ten Thousand Years of Solitude" by agp

This podcast explores the intense isolation and unique development of tribes on Tasmania, where they were cut off from the outside world for ten thousand years. It discusses the poverty and lack of technology among the Tasmanians and the irreversible cultural losses they experienced due to isolation.
undefined
Aug 21, 2023 • 1h 19min

"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël

Charbel-Raphaël critiques theories of interpretability, questioning their practicality in industry. Discusses limitations of pixel attribution techniques and the need for accuracy. Explores the challenges of interpreting AI models for deception detection. Advocates for cognitive emulation over traditional visualization methods for transparency in AI models. Emphasizes the importance of balancing safety and capabilities in AI alignment research.
undefined
Aug 15, 2023 • 7min

"Inflection.ai is a major AGI lab" by Nikola

Inflection.ai (co-founded by DeepMind co-founder Mustafa Suleyman) should be perceived as a frontier LLM lab of similar magnitude as Meta, OpenAI, DeepMind, and Anthropic based on their compute, valuation, current model capabilities, and plans to train frontier models. Compared to the other labs, Inflection seems to put less effort into AI safety.Thanks to Laker Newhouse for discussion and feedback.Source:https://www.lesswrong.com/posts/Wc5BYFfzuLzepQjCq/inflection-ai-is-a-major-agi-labNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓
undefined
Aug 15, 2023 • 16min

"Feedbackloop-first Rationality" by Raemon

I've been workshopping a new rationality training paradigm. (By "rationality training paradigm", I mean an approach to learning/teaching the skill of "noticing what cognitive strategies are useful, and getting better at them.")I think the paradigm has promise. I've beta-tested it for a couple weeks. It’s too early to tell if it actually works, but one of my primary goals is to figure out if it works relatively quickly, and give up if it isn’t not delivering.  The goal of this post is to:Convey the frameworkSee if people find it compelling in its current formSolicit ideas for improvements, before I decide whether to invest heavily into a larger experiment around it.Source:https://www.lesswrong.com/posts/pZrvkZzL2JnbRgEBC/feedbackloop-first-rationalityNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓[Curated Post] ✓
undefined
Aug 9, 2023 • 17min

"When can we trust model evaluations?" bu evhub

In "Towards understanding-based safety evaluations," I discussed why I think evaluating specifically the alignment of models is likely to require mechanistic, understanding-based evaluations rather than solely behavioral evaluations. However, I also mentioned in a footnote why I thought behavioral evaluations would likely be fine in the case of evaluating capabilities rather than evaluating alignment:However, while I like the sorts of behavioral evaluations discussed in the GPT-4 System Card (e.g. ARC's autonomous replication evaluation) as a way of assessing model capabilities, I have a pretty fundamental concern with these sorts of techniques as a mechanism for eventually assessing alignment.That's because while I think it would be quite tricky for a deceptively aligned AI to sandbag its capabilities when explicitly fine-tuned on some capabilities task (that probably requires pretty advanced gradient hacking), it should be quite easy for such a model to pretend to look aligned.In this post, I want to try to expand a bit on this point and explain exactly what assumptions I think are necessary for various different evaluations to be reliable and trustworthy. For that purpose, I'm going to talk about four different categories of evaluations and what assumptions I think are needed to make each one go through.Source:https://www.lesswrong.com/posts/dBmfb76zx6wjPsBC7/when-can-we-trust-model-evaluationsNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[Curated Post] ✓
undefined
17 snips
Aug 9, 2023 • 36min

"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez

This podcast discusses the importance of researching model organisms of misalignment to understand the causes of alignment failures in AI systems. It explores different strategies for model training and deployment, such as input tagging and evaluating output with a preference model. The risks associated with using model organisms in research, including deceptive alignment, are also explored.
undefined
Aug 4, 2023 • 10min

"My current LK99 questions" by Eliezer Yudkowsky

So this morning I thought to myself, "Okay, now I will actually try to study the LK99 question, instead of betting based on nontechnical priors and market sentiment reckoning."  (My initial entry into the affray, having been driven by people online presenting as confidently YES when the prediction markets were not confidently YES.)  And then I thought to myself, "This LK99 issue seems complicated enough that it'd be worth doing an actual Bayesian calculation on it"--a rare thought; I don't think I've done an actual explicit numerical Bayesian update in at least a year.In the process of trying to set up an explicit calculation, I realized I felt very unsure about some critically important quantities, to the point where it no longer seemed worth trying to do the calculation with numbers.  This is the System Working As Intended.Source:https://www.lesswrong.com/posts/EzSH9698DhBsXAcYY/my-current-lk99-questionsNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓[Curated Post] ✓

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app