
Introduction to Mechanistic Interpretability
BlueDot Narrated
00:00
Intervening in model processes
Perrin Walker discusses isolating harmful parts of networks and feature steering to modify outputs.
Play episode from 02:49
Transcript

Perrin Walker discusses isolating harmful parts of networks and feature steering to modify outputs.