
Introduction to Mechanistic Interpretability
BlueDot Narrated
00:00
Early feature steering results
Perrin Walker notes attempts to weaken or amplify features to influence model outputs, with playful examples.
Play episode from 11:14
Transcript

Perrin Walker notes attempts to weaken or amplify features to influence model outputs, with playful examples.