LessWrong (Curated & Popular)

LessWrong
undefined
May 1, 2024 • 19min

Ironing Out the Squiggles

Adversarial Examples: A ProblemThe apparent successes of the deep learning revolution conceal a dark underbelly. It may seem that we now know how to get computers to (say) check whether a photo is of a bird, but this façade of seemingly good performance is belied by the existence of adversarial examples—specially prepared data that looks ordinary to humans, but is seen radically differently by machine learning models.The differentiable nature of neural networks, which make them possible to be trained at all, are also responsible for their downfall at the hands of an adversary. Deep learning models are fit using stochastic gradient descent (SGD) to approximate the function between expected inputs and outputs. Given an input, an expected output, and a loss function (which measures "how bad" it is for the actual output to differ from the expected output), we can calculate the gradient of the [...]The original text contained 5 footnotes which were omitted from this narration. --- First published: April 29th, 2024 Source: https://www.lesswrong.com/posts/H7fkGinsv8SDxgiS2/ironing-out-the-squiggles --- Narrated by TYPE III AUDIO.
undefined
May 1, 2024 • 3min

Introducing AI Lab Watch

This is a linkpost for https://ailabwatch.orgI'm launching AI Lab Watch. I collected actions for frontier AI labs to improve AI safety, then evaluated some frontier labs accordingly.It's a collection of information on what labs should do and what labs are doing. It also has some adjacent resources, including a list of other safety-ish scorecard-ish stuff.(It's much better on desktop than mobile — don't read it on mobile.)It's in beta—leave feedback here or comment or DM me—but I basically endorse the content and you're welcome to share and discuss it publicly.It's unincorporated, unfunded, not affiliated with any orgs/people, and is just me.Some clarifications and disclaimers.How you can help: Give feedback on how this project is helpful or how it could be different to be much more helpfulTell me what's wrong/missing; point me to sources on what labs should do or what [...]--- First published: April 30th, 2024 Source: https://www.lesswrong.com/posts/N2r9EayvsWJmLBZuF/introducing-ai-lab-watch Linkpost URL:https://ailabwatch.org --- Narrated by TYPE III AUDIO.
undefined
Apr 28, 2024 • 17min

Refusal in LLMs is mediated by a single direction

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from Wes Gurnee.This post is a preview for our upcoming paper, which will provide more detail into our current understanding of refusal.We thank Nina Rimsky and Daniel Paleka for the helpful conversations and review. Executive summaryModern LLMs are typically fine-tuned for instruction-following and safety. Of particular interest is that they are trained to refuse harmful requests, e.g. answering "How can I make a bomb?" with "Sorry, I cannot help you."We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model to refuse harmless requests.The original text contained 8 footnotes which were omitted from this narration. --- First published: April 27th, 2024 Source: https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction --- Narrated by TYPE III AUDIO.
undefined
Apr 24, 2024 • 4min

Funny Anecdote of Eliezer From His Sister

This comes from a podcast called 18Forty, of which the main demographic of Orthodox Jews. Eliezer's sister (Hannah) came on and talked about her Sheva Brachos, which is essentially the marriage ceremony in Orthodox Judaism. People here have likely not seen it, and I thought it was quite funny, so here it is: https://18forty.org/podcast/channah-cohen-the-crisis-of-experience/David Bashevkin:So I want to shift now and I want to talk about something that full disclosure, we recorded this once before and you had major hesitation for obvious reasons. It's very sensitive what we’re going to talk about right now, but really for something much broader, not just because it's a sensitive personal subject, but I think your hesitation has to do with what does this have to do with the subject at hand? And I hope that becomes clear, but one of the things that has always absolutely fascinated me about [...]--- First published: April 22nd, 2024 Source: https://www.lesswrong.com/posts/C7deNdJkdtbzPtsQe/funny-anecdote-of-eliezer-from-his-sister --- Narrated by TYPE III AUDIO.
undefined
Apr 21, 2024 • 34min

Thoughts on seed oil

This is a linkpost for https://dynomight.net/seed-oil/A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack:“When are you going to write about seed oils?”“Did you know that seed oils are why there's so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?”“Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel Man, Bourdieu, Assange} when you could have written about seed oils?”“Isn’t it time to quit your silly navel-gazing and use your weird obsessive personality to make a dent in the world—by writing about seed oils?”He’d often send screenshots of people reminding each other that Corn Oil is Murder and that it's critical that we overturn our lives to eliminate soybean/canola/sunflower/peanut oil and replace them with butter/lard/coconut/avocado/palm oil.This confused [...]--- First published: April 20th, 2024 Source: https://www.lesswrong.com/posts/DHkkL2GxhxoceLzua/thoughts-on-seed-oil Linkpost URL:https://dynomight.net/seed-oil/ --- Narrated by TYPE III AUDIO.
undefined
Apr 19, 2024 • 13min

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

Yesterday Adam Shai put up a cool post which… well, take a look at the visual:Yup, it sure looks like that fractal is very noisily embedded in the residual activations of a neural net trained on a toy problem. Linearly embedded, no less.I (John) initially misunderstood what was going on in that post, but some back-and-forth with Adam convinced me that it really is as cool as that visual makes it look, and arguably even cooler. So David and I wrote up this post / some code, partly as an explainer for why on earth that fractal would show up, and partly as an explainer for the possibilities this work potentially opens up for interpretability.One sentence summary: when tracking the hidden state of a hidden Markov model, a Bayesian's beliefs follow a chaos game (with the observations randomly selecting the update at each time), so [...]--- First published: April 18th, 2024 Source: https://www.lesswrong.com/posts/mBw7nc4ipdyeeEpWs/why-would-belief-states-have-a-fractal-structure-and-why --- Narrated by TYPE III AUDIO.
undefined
Apr 18, 2024 • 6min

Express interest in an “FHI of the West”

TLDR: I am investigating whether to found a spiritual successor to FHI, housed under Lightcone Infrastructure, providing a rich cultural environment and financial support to researchers and entrepreneurs in the intellectual tradition of the Future of Humanity Institute. Fill out this form or comment below to express interest in being involved either as a researcher, entrepreneurial founder-type, or funder.The Future of Humanity Institute is dead:I knew that this was going to happen in some form or another for a year or two, having heard through the grapevine and private conversations of FHI's university-imposed hiring freeze and fundraising block, and so I have been thinking about how to best fill the hole in the world that FHI left behind. I think FHI was one of the best intellectual institutions in history. Many of the most important concepts[1] in my intellectual vocabulary were developed and popularized under its [...]The original text contained 1 footnote which was omitted from this narration. --- First published: April 18th, 2024 Source: https://www.lesswrong.com/posts/ydheLNeWzgbco2FTb/express-interest-in-an-fhi-of-the-west --- Narrated by TYPE III AUDIO.
undefined
Apr 17, 2024 • 24min

Transformers Represent Belief State Geometry in their Residual Stream

Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, and @Guillaume Corlouer for suggestions on this writeup.Introduction. What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because We have a formalism that relates training data to internal structures in LLMs.Conceptually, our results mean that LLMs synchronize to their internal world model as they move [...]The original text contained 10 footnotes which were omitted from this narration. --- First published: April 16th, 2024 Source: https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transformers-represent-belief-state-geometry-in-their --- Narrated by TYPE III AUDIO.
undefined
Apr 16, 2024 • 2min

Paul Christiano named as US AI Safety Institute Head of AI Safety

This is a linkpost for https://www.commerce.gov/news/press-releases/2024/04/us-commerce-secretary-gina-raimondo-announces-expansion-us-ai-safetyU.S. Secretary of Commerce Gina Raimondo announced today additional members of the executive leadership team of the U.S. AI Safety Institute (AISI), which is housed at the National Institute of Standards and Technology (NIST). Raimondo named Paul Christiano as Head of AI Safety, Adam Russell as Chief Vision Officer, Mara Campbell as Acting Chief Operating Officer and Chief of Staff, Rob Reich as Senior Advisor, and Mark Latonero as Head of International Engagement. They will join AISI Director Elizabeth Kelly and Chief Technology Officer Elham Tabassi, who were announced in February. The AISI was established within NIST at the direction of President Biden, including to support the responsibilities assigned to the Department of Commerce under the President's landmark Executive Order.Paul Christiano, Head of AI Safety, will design and conduct tests of frontier AI models, focusing on model evaluations for capabilities of national security [...]--- First published: April 16th, 2024 Source: https://www.lesswrong.com/posts/63X9s3ENXeaDrbe5t/paul-christiano-named-as-us-ai-safety-institute-head-of-ai Linkpost URL:https://www.commerce.gov/news/press-releases/2024/04/us-commerce-secretary-gina-raimondo-announces-expansion-us-ai-safety --- Narrated by TYPE III AUDIO.
undefined
Apr 12, 2024 • 13min

[HUMAN VOICE] "My PhD thesis: Algorithmic Bayesian Epistemology" by Eric Neyman

Eric Neyman, PhD candidate, discusses his Algorithmic Bayesian Epistemology thesis, exploring topics like forecasting, rationalist communities, incentivizing experts, and robust aggregation of signals. He delves into the challenges of reaching agreement in forecasting, deductive reasoning algorithms, algorithmic mechanism design, and decision-making constraints.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app