"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis cover image

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

00:00

Avoiding reward-hacking and frozen-probe trick

Dan details running probes on a frozen model to prevent obfuscation and why backproping through probes fails.

Play episode from 50:25
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app