The Information Bottleneck

Training Is Nothing Like Learning with Naomi Saphra (Harvard)

41 snips
Mar 13, 2026
Naomi Saphra, Kempner Research Fellow at Harvard and incoming assistant professor at Boston University, studies training dynamics and interpretability in deep learning. She explains why training is more like evolution than human learning. Topics include grokking and hidden phase transitions, symmetry breaking and head specialization, how code and tokenization shape behavior, and why run-to-run non-determinism matters.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Multilingual Models Build A Shared Latent Interlingua

  • Multilingual models develop an interlingua-like geometry enabling transfer, but it's not literally English translation.
  • Naomi observes Chinese models often mix languages in chain-of-thought, suggesting a shared latent space that resembles English patterns.
INSIGHT

Image Labels Embed Human Priorities

  • Image classification is inherently multimodal because labels map images to human language, biasing datasets toward what humans express.
  • Naomi points out ImageNet labels reflect WordNet choices, so datasets encode human salience, not raw visual taxonomy.
INSIGHT

Sparse Autoencoders Are Topic Models On Activations

  • Sparse autoencoders on internal activations are essentially topic models and risk finding features that merely reflect input statistics, not mechanisms.
  • Naomi warns overcomplete dictionaries and random-seed variability make SAE features unreliable mechanistic evidence.
Get the Snipd Podcast app to discover more snips from this episode
Get the app