Latent Space: The AI Engineer Podcast cover image

The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI

Latent Space: The AI Engineer Podcast

00:00

Post-training as first wedge for edits

Myra and Mark argue surgical post-training edits can fix reward hacking, sycophancy, and unwanted behaviors.

Play episode from 07:11
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app