AI is Already Building AI | Google DeepMind’s Mostafa Dehghani

128 snips

Apr 2, 2026

Mostafa Dehghani, AI researcher at Google DeepMind known for Universal and Vision Transformers, and multimodal Gemini work. He unpacks what looping and recursive self-improvement mean in practice. He highlights bottlenecks like evaluation, formal verification limits, and model collapse. He discusses the shift from pre-training to post-training and why continual learning is the next big frontier.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Internship On Transformers Changed His Career

Mostafa got into transformers after an intern placement at Google Brain where he initially doubted the idea but joined the transformer team and it changed his career path.
That internship led directly to work on Universal Transformers and recursion-in-depth ideas.

INSIGHT

Universal Transformers Pioneered Test Time Looping

Universal Transformers introduced looping (reusing parameters and adaptive computation) so models can apply extra compute at test time to hard problems without adding parameters.
This idea presaged test-time compute increases and influenced later concepts like negative sparsity and mixture-of-experts tradeoffs.

ANECDOTE

Vision Transformer Succeeded With A Simple Patch Idea

Vision Transformer arose from a simple experiment: split an image into 16x16 patches, treat each as a token, and scale a transformer; the simple patchify idea outperformed more complex convolution mimicking.
Mostafa notes surprise that a straightforward patch approach yielded strong scaled representations.

Get the Snipd Podcast app to discover more snips from this episode

Get the app