
Controlling AI Models from the Inside
Practical AI
00:00
Instrumenting subspaces to detect bad outputs
Alizishaan describes identifying model subregions that activate during undesirable generations to stop problems early.
Play episode from 23:22
Transcript


