RoboPapers

Ep#57: Learning Dexterity from Human Videos with Gen2Act and SPIDER

Jan 6, 2026
Homanga Bharadwaj, research scientist at Meta Reality Labs and incoming Johns Hopkins assistant professor, works on teaching robots from human video. He discusses Gen2Act, which generates human videos from language to guide robot actions. He also covers SPIDER, which retargets human hand and object motion through simulation for dexterous, contact-rich tasks.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Use Human Rather Than Robot Video

  • Pretrained video models excel at human videos but struggle to generate plausible robot-specific embodiments.
  • Generating humans lets you use zero-shot models that generalize across scenes and tasks.
ANECDOTE

Fine-Tuning Collapsed Generalization

  • The team tried fine-tuning a video model on robot data and saw it collapse to the limited robot distribution.
  • Zero-shot human video generation preserved broader generalization better than fine-tuning.
ADVICE

Co-Train With Generated Video Pairs

  • Train robot policies by pairing each robot trajectory with a generated human video from the same initial frame and language prompt.
  • Use auxiliary motion prediction losses to force the policy to attend to generated video motion cues.
Get the Snipd Podcast app to discover more snips from this episode
Get the app