
The Hidden Engine of Vision with Peyman Milanfar (Google)
The Information Bottleneck
Denoising versus editing and role of theory
Peyman distinguishes restoration from editing and argues theory now guides stability, control, and architecture choices.
How Denoising Secretly Powers Everything in AI
Peyman Milanfar is a Distinguished Scientist at Google, leading its Computational Imaging team. He's a member of the National Academy of Engineering, an IEEE Fellow, and one of the key people behind the Pixel camera pipeline. Before Google, he was a professor at UC Santa Cruz for 15 years and helped build the imaging pipeline for Google Glass at Google X. Over 35,000 citations.
Peyman makes a provocative case that denoising, long dismissed as a boring cleanup task, is actually one of the most fundamental operations in modern ML, on par with SGD and backprop. Knowing how to remove noise from a signal basically means you have a map of the manifold that signals live on, and that insight connects everything from classical inverse problems to diffusion models.
We go from early patch-based denoisers to his 2010 "Is Denoising Dead?" paper, and then to the question that redirected his research: if denoising is nearly solved, what else can denoisers do? That led to Regularization by Denoising (RED), which, if you unroll it, looks a lot like a diffusion process, years before diffusion models existed. We also cover how his team shipped a one-step diffusion model on the Pixel phone for 100x ProRes Zoom, the perception-distortion-authenticity tradeoff in generative imaging, and a new paper on why diffusion models don't actually need noise conditioning. The conversation wraps with a debate on why language has dominated the AI spotlight while vision lags, and Peyman's argument that visual intelligence, grounded in physics and robotics, is coming next.
Timeline
0:00 Intro and Peyman's background
1:22 Why denoising matters more than you think Sensor diversity and Tesla's vision-only bet
15:04 BM3D and why it was secretly an MMSE estimator
17:02 "Is Denoising Dead?" then what else can denoisers do?
18:07 Plug-and-play methods and Regularization by Denoising (RED)
26:18 Denoising, manifolds, and the compression connection
28:12 Energy-based models vs. diffusion: "The Geometry of Noise"
31:40 Natural gradient descent and why flow models work
34:48 Gradient-free optimization and high-dimensional noise
45:13 Image quality and the perception-distortion tradeoff
48:39 Information theory, rate-distortion, and generative models
52:57 Denoising vs. editing
54:25 The changing role of theory
57:07 Hobbyist tools vs. shipping consumer products
59:40 Coding agents, vibe coding, and domain expertise
1:05:00 Vision and more complex-dimensional signals
1:09:31 Do models need to interact with the physical world?
1:11:28 Continual learning and novelty-driven updates
1:13:00 On-device learning and privacy
1:15:01 Why has language dominated AI? Is vision next?
1:17:14 How kids learn: vision first, language later
1:19:36 Academia vs. industry
1:22:28 10,000 citations vs. shipping to millions, why choose?
Music:
- "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
- "Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
- Changes: trimmed
About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.


