
The Hidden Engine of Vision with Peyman Milanfar (Google)
The Information Bottleneck
Continual learning, privacy and on-device updates
Peyman discusses novelty-focused updates, privacy concerns, and trade-offs of on-device versus server-side continual learning.
How Denoising Secretly Powers Everything in AI
Peyman Milanfar is a Distinguished Scientist at Google, leading its Computational Imaging team. He's a member of the National Academy of Engineering, an IEEE Fellow, and one of the key people behind the Pixel camera pipeline. Before Google, he was a professor at UC Santa Cruz for 15 years and helped build the imaging pipeline for Google Glass at Google X. Over 35,000 citations.
Peyman makes a provocative case that denoising, long dismissed as a boring cleanup task, is actually one of the most fundamental operations in modern ML, on par with SGD and backprop. Knowing how to remove noise from a signal basically means you have a map of the manifold that signals live on, and that insight connects everything from classical inverse problems to diffusion models.
We go from early patch-based denoisers to his 2010 "Is Denoising Dead?" paper, and then to the question that redirected his research: if denoising is nearly solved, what else can denoisers do? That led to Regularization by Denoising (RED), which, if you unroll it, looks a lot like a diffusion process, years before diffusion models existed. We also cover how his team shipped a one-step diffusion model on the Pixel phone for 100x ProRes Zoom, the perception-distortion-authenticity tradeoff in generative imaging, and a new paper on why diffusion models don't actually need noise conditioning. The conversation wraps with a debate on why language has dominated the AI spotlight while vision lags, and Peyman's argument that visual intelligence, grounded in physics and robotics, is coming next.
Timeline
0:00 Intro and Peyman's background
1:22 Why denoising matters more than you think Sensor diversity and Tesla's vision-only bet
15:04 BM3D and why it was secretly an MMSE estimator
17:02 "Is Denoising Dead?" then what else can denoisers do?
18:07 Plug-and-play methods and Regularization by Denoising (RED)
26:18 Denoising, manifolds, and the compression connection
28:12 Energy-based models vs. diffusion: "The Geometry of Noise"
31:40 Natural gradient descent and why flow models work
34:48 Gradient-free optimization and high-dimensional noise
45:13 Image quality and the perception-distortion tradeoff
48:39 Information theory, rate-distortion, and generative models
52:57 Denoising vs. editing
54:25 The changing role of theory
57:07 Hobbyist tools vs. shipping consumer products
59:40 Coding agents, vibe coding, and domain expertise
1:05:00 Vision and more complex-dimensional signals
1:09:31 Do models need to interact with the physical world?
1:11:28 Continual learning and novelty-driven updates
1:13:00 On-device learning and privacy
1:15:01 Why has language dominated AI? Is vision next?
1:17:14 How kids learn: vision first, language later
1:19:36 Academia vs. industry
1:22:28 10,000 citations vs. shipping to millions, why choose?
Music:
- "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
- "Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
- Changes: trimmed
About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.


