Y Combinator Startup Podcast

The GPT Moment for Robotics Is Here

151 snips
Apr 16, 2026
Quan Vuong, co-founder of Physical Intelligence and a robotics researcher behind RT-2, explores why robotics may be hitting its GPT moment. He gets into cross-embodiment training, the data bottleneck, and zero-shot progress. They also dig into cloud-run robot models, laundry folding and warehouse packing, and why cheaper hardware could spark a wave of focused robotics startups.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Vision Language Models Finally Crossed Into Control

  • Robotics improved when vision-language models supplied semantics and planning, then RT2 and PaLM-E adapted that knowledge into robot actions.
  • Quan Vuong said robots could place a Coke can by Taylor Swift despite zero robot data for Taylor Swift or those objects.
INSIGHT

Cross Embodiment Training Beat Robot Specialists

  • Training across many robot embodiments beat specialist policies, suggesting models learn abstract control rather than one robot's quirks.
  • In Open X, the generalist trained on 10 platforms performed 50% better than specialists optimized for single embodiments.
INSIGHT

The Real Robotics Bottleneck Is Data Capture

  • Robotics data scarcity is partly a capture problem; useful robot experience exists, but teams rarely store it in trainable form.
  • Quan Vuong argues cross-embodiment infrastructure scales faster than manufacturing one standard robot, and some hard tasks now work zero-shot after once needing hundreds of hours.
Get the Snipd Podcast app to discover more snips from this episode
Get the app