LessWrong (Curated & Popular)

LessWrong
undefined
May 24, 2024 • 7min

“EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024” by scasper

The podcast delves into Anthropic's latest sparse autoencoder research, highlighting brilliant experiments, insights, and concerns about safety washing. The author reflects on predictions made about the paper's accomplishments, pointing out underperformance. Discussion also covers limitations of Anthropic's interpretability research, concerns about promotional strategies, and the need to prioritize a safety agenda.
undefined
May 22, 2024 • 7min

“What’s Going on With OpenAI’s Messaging?” by ozziegoen

Delve into OpenAI's messaging strategy, balancing transformative AI goals with safety concerns. Explore how the organization navigates public relations, competitive pressures, and investor interests. Analyze communication tactics, the importance of actions over promises, and the challenges of maintaining a coherent message.
undefined
May 21, 2024 • 29min

“Language Models Model Us” by eggsyntax

Exploring the ability of language models to deduce personal information from text, concerns about privacy and manipulation. Analyzing GPT 3.5's accuracy in determining gender, education, and ethnicity. Comparing model performance and discussing personal interests, routines, and philosophical musings of individuals.
undefined
May 21, 2024 • 51sec

Jaan Tallinn’s 2023 Philanthropy Overview

Jaan Tallinn discusses his philanthropic efforts in 2023, exceeding his commitment by funding $44M worth of grants. Detailed breakdown of fund allocations and the minimum price of ETH. Narrated by Type 3 Audio for LessWrong.
undefined
7 snips
May 21, 2024 • 1h 25min

“OpenAI: Exodus” by Zvi

Former OpenAI member Ilya Sutskever and Jan Leike leave amid safety concerns. Allegations of safety neglect, resource shortages, and culture shift at OpenAI. Departures prompt discussions on AI ethics, CEO's leadership, and employee rights. Controversy arises over legal tactics, non-disclosure agreements, and ethics within the organization.
undefined
May 20, 2024 • 7min

DeepMind’s ”​​Frontier Safety Framework” is weak and unambitious

Discussing DeepMind's 'Frontier Safety Framework' weaknesses and lack of ambition compared to other AI labs' safety plans. Addressing concerns about internal deployment specifics, evaluation frequency, and the absence of formal commitments.
undefined
May 18, 2024 • 11min

Do you believe in hundred dollar bills lying on the ground? Consider humming

Introduction. [Reminder: I am an internet weirdo with no medical credentials]A few months ago, I published some crude estimates of the power of nitric oxide nasal spray to hasten recovery from illness, and speculated about what it could do prophylactically. While working on that piece a nice man on Twitter alerted me to the fact that humming produces lots of nasal nitric oxide. This post is my very crude model of what kind of anti-viral gains we could expect from humming.I’ve encoded my model at Guesstimate. The results are pretty favorable (average estimated impact of 66% reduction in severity of illness), but extremely sensitive to my made-up numbers. Efficacy estimates go from ~0 to ~95%, depending on how you feel about publication bias, what percent of Enovid's impact can be credited to nitric oxide, and humming's relative effect. Given how made up speculative some [...]--- First published: May 16th, 2024 Source: https://www.lesswrong.com/posts/NBZvpcBx4ewqkdCdT/do-you-believe-in-hundred-dollar-bills-lying-on-the-ground-1 --- Narrated by TYPE III AUDIO.
undefined
May 12, 2024 • 15min

Deep Honesty

Most people avoid saying literally false things, especially if those could be audited, like making up facts or credentials. The reasons for this are both moral and pragmatic — being caught out looks really bad, and sustaining lies is quite hard, especially over time. Let's call the habit of not saying things you know to be false ‘shallow honesty’[1].Often when people are shallowly honest, they still choose what true things they say in a kind of locally act-consequentialist way, to try to bring about some outcome. Maybe something they want for themselves (e.g. convincing their friends to see a particular movie), or something they truly believe is good (e.g. causing their friend to vote for the candidate they think will be better for the country).Either way, if you think someone is being merely shallowly honest, you can only shallowly trust them: you might be confident that [...]The original text contained 7 footnotes which were omitted from this narration. --- First published: May 7th, 2024 Source: https://www.lesswrong.com/posts/szn26nTwJDBkhn8ka/deep-honesty --- Narrated by TYPE III AUDIO.
undefined
May 2, 2024 • 14min

On Not Pulling The Ladder Up Behind You

Epistemic Status: Musing and speculation, but I think there's a real thing here. 1.When I was a kid, a friend of mine had a tree fort. If you've never seen such a fort, imagine a series of wooden boards secured to a tree, creating a platform about fifteen feet off the ground where you can sit or stand and walk around the tree. This one had a rope ladder we used to get up and down, a length of knotted rope that was tied to the tree at the top and dangled over the edge so that it reached the ground. Once you were up in the fort, you could pull the ladder up behind you. It was much, much harder to get into the fort without the ladder. Not only would you need to climb the tree itself instead of the ladder with its handholds, but [...]The original text contained 1 footnote which was omitted from this narration. --- First published: April 26th, 2024 Source: https://www.lesswrong.com/posts/k2kzawX5L3Z7aGbov/on-not-pulling-the-ladder-up-behind-you --- Narrated by TYPE III AUDIO.
undefined
May 2, 2024 • 1h 21min

Mechanistically Eliciting Latent Behaviors in Language Models

Produced as part of the MATS Winter 2024 program, under the mentorship of Alex Turner (TurnTrout).TL,DR: I introduce a method for eliciting latent behaviors in language models by learning unsupervised perturbations of an early layer of an LLM. These perturbations are trained to maximize changes in downstream activations. The method discovers diverse and meaningful behaviors with just one prompt, including perturbations overriding safety training, eliciting backdoored behaviors and uncovering latent capabilities.Summary In the simplest case, the unsupervised perturbations I learn are given by unsupervised steering vectors - vectors added to the residual stream as a bias term in the MLP outputs of a given layer. I also report preliminary results on unsupervised steering adapters - these are LoRA adapters of the MLP output weights of a given layer, trained with the same unsupervised objective.I apply the method to several alignment-relevant toy examples, and find that the [...]The original text contained 15 footnotes which were omitted from this narration. --- First published: April 30th, 2024 Source: https://www.lesswrong.com/posts/ioPnHKFyy4Cw2Gr2x/mechanistically-eliciting-latent-behaviors-in-language-1 --- Narrated by TYPE III AUDIO.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app