LessWrong (Curated & Popular) cover image

"Gemma Needs Help" by Anna Soligo

LessWrong (Curated & Popular)

00:00

Interpretability and internal emotion reduction

Anna summarizes layer ablations and evidence that DPO reduces internal negative-emotion representations across layers.

Play episode from 12:10
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app