Latent Space: The AI Engineer Podcast

The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI

685 snips
Feb 6, 2026
Myra Deng, Head of Product at Goodfire AI who turns interpretability research into production, and Mark Bissell, mechanistic interpretability engineer with Palantir roots, discuss making model internals actionable. They cover lightweight probes, token-level safety filters, real-time steering of huge models, post-training surgical edits, and applying these tools across language, vision, and genomics.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes

Rakuten Token-Level PII Deployment

  • Rakuten uses Goodfire to detect token-level PII at inference time to avoid sending private data to third-party providers.
  • They faced tokenization quirks, multilingual constraints, and synthetic→real transfer limits when deploying the system.

Prefer Lightweight Probes For Guardrails

  • Use interpretability probes for low-latency, inference-time guardrails instead of hosting a second large model.
  • Prefer lightweight probes to reduce cost and latency compared with black-box LM judges.

Real-Time Steering At Frontier Scale

  • Real-time activation steering can toggle behaviors (e.g., 'Gen-Z slang') on a 1T-parameter model without breaking tool use.
  • Steering can change demeanor and chain-of-thought while preserving functional abilities like tool calls.
Get the Snipd Podcast app to discover more snips from this episode
Get the app