
The MAD Podcast with Matt Turck AI is Already Building AI | Google DeepMind’s Mostafa Dehghani
128 snips
Apr 2, 2026 Mostafa Dehghani, AI researcher at Google DeepMind known for Universal and Vision Transformers, and multimodal Gemini work. He unpacks what looping and recursive self-improvement mean in practice. He highlights bottlenecks like evaluation, formal verification limits, and model collapse. He discusses the shift from pre-training to post-training and why continual learning is the next big frontier.
AI Snips
Chapters
Transcript
Episode notes
Internship On Transformers Changed His Career
- Mostafa got into transformers after an intern placement at Google Brain where he initially doubted the idea but joined the transformer team and it changed his career path.
- That internship led directly to work on Universal Transformers and recursion-in-depth ideas.
Universal Transformers Pioneered Test Time Looping
- Universal Transformers introduced looping (reusing parameters and adaptive computation) so models can apply extra compute at test time to hard problems without adding parameters.
- This idea presaged test-time compute increases and influenced later concepts like negative sparsity and mixture-of-experts tradeoffs.
Vision Transformer Succeeded With A Simple Patch Idea
- Vision Transformer arose from a simple experiment: split an image into 16x16 patches, treat each as a token, and scale a transformer; the simple patchify idea outperformed more complex convolution mimicking.
- Mostafa notes surprise that a straightforward patch approach yielded strong scaled representations.

