Instrumenting subspaces to detect bad outputs

Alizishaan describes identifying model subregions that activate during undesirable generations to stop problems early.

Play episode from 23:22

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!