Latent Space: The AI Engineer Podcast

Owning the AI Pareto Frontier — Jeff Dean

1232 snips
Feb 12, 2026
Jeff Dean, Google’s Chief AI Scientist and TPU co-designer, reflects on decades of scaling ML and hardware co-design. He talks about owning the Pareto frontier with both Pro and low-latency Flash models. Topics include distillation as the engine for deployable models, energy- and latency-first design, sparse trillion-parameter networks, long-context ambitions, and hardware-software co-design for future AI systems.
Ask episode
AI Snips
Chapters
Transcript
Episode notes

Keep Personal Data In Retrieval, Not Core Model

  • Use private retrieval plus a single general model instead of embedding personal data into base models.
  • Jeff Dean suggests retrieval-as-a-tool for personal assistants so the core model stays general and upgradable.

Staged Retrieval Mimics Search For Trillions

  • LLM-based systems will mirror search pipelines: filter trillions to thousands to a final small set for reasoning.
  • Jeff Dean describes staged retrieval (30k → 117 docs) to give the illusion of attending to the whole internet.

Energy, Not FLOPs, Drives Efficiency

  • Energy per operation (picojoules) matters more than FLOPs when designing ML systems.
  • Jeff Dean analyzes batching and data motion: moving parameters costs far more energy than a single multiply.
Get the Snipd Podcast app to discover more snips from this episode
Get the app