Training Is Nothing Like Learning with Naomi Saphra (Harvard)

41 snips

Mar 13, 2026

Naomi Saphra, Kempner Research Fellow at Harvard and incoming assistant professor at Boston University, studies training dynamics and interpretability in deep learning. She explains why training is more like evolution than human learning. Topics include grokking and hidden phase transitions, symmetry breaking and head specialization, how code and tokenization shape behavior, and why run-to-run non-determinism matters.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Multilingual Models Build A Shared Latent Interlingua

Multilingual models develop an interlingua-like geometry enabling transfer, but it's not literally English translation.
Naomi observes Chinese models often mix languages in chain-of-thought, suggesting a shared latent space that resembles English patterns.

INSIGHT

Image Labels Embed Human Priorities

Image classification is inherently multimodal because labels map images to human language, biasing datasets toward what humans express.
Naomi points out ImageNet labels reflect WordNet choices, so datasets encode human salience, not raw visual taxonomy.

INSIGHT

Sparse Autoencoders Are Topic Models On Activations

Sparse autoencoders on internal activations are essentially topic models and risk finding features that merely reflect input statistics, not mechanisms.
Naomi warns overcomplete dictionaries and random-seed variability make SAE features unreliable mechanistic evidence.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Naomi Saphra, Kempner Research Fellow at Harvard and incoming Assistant Professor at Boston University, joins us to explain why you can't do interpretability without understanding training dynamics, in the same way you can't do biology without evolution.

Naomi argues that many structures researchers find inside trained models are vestigial, they mattered early in training but are meaningless by the end. Grokking is one case of a broader phenomenon: models go through multiple consecutive phase transitions during training, driven by symmetry breaking and head specialization, but the smooth loss curve hides all of it. We talk about why training is nothing like human learning, and why our intuitions about what's hard for models are consistently wrong - code in pretraining helps language reasoning, tokenization drives behaviors people attribute to deeper cognition, and language already encodes everything humans care about. We also get into why SAEs are basically topic models, the Platonic representation hypothesis, using AI to decode animal communication, and why non-determinism across training runs is a real problem that RL and MoE might be making worse.

Timeline:

(00:12) Introduction and guest welcome

(01:01) Why training dynamics matter - the evolutionary biology analogy

(03:05) Jennifer Aniston neurons and the danger of biological parallels

(04:48) What is grokking and why it's one instance of a broader phenomenon

(08:25) Phase transitions, symmetry breaking, and head specialization

(11:53) Double descent, overfitting, and the death of classical train-test splits

(15:10) Training is nothing like learning

(16:08) Scaling axes - data, model size, compute, and why they're not interchangeable

(19:29) Data quality, code as reasoning fuel, and GPT-2's real contribution

(20:43) Multilingual models and the interlingua hypothesis

(25:58) The Platonic representation hypothesis and why image classification was always multimodal

(29:12) Sparse autoencoders, interpretability, and Marr's levels

(37:32) Can we ever truly understand what models know?

(43:59) The language modality chauvinist argument

(51:55) Vision, redundancy, and self-supervised learning

(57:18) World models - measurable capabilities over philosophical definitions

(1:00:14) Is coding really a solved task?

(1:04:18) Non-determinism, scaling laws, and why one training run isn't enough

(1:10:12) Naomi's new lab at BU and recruiting

Music:

"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
"Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
Changes: trimmed

About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.