RoboPapers

Ep#61: 1x World Model

Feb 4, 2026
Daniel Ho, Director of Evaluations at 1X who builds world-model-based control for humanoid robots. He describes using internet and egocentric videos as imagined worlds to generate zero-shot robot behaviors. The conversation covers how prompts and action labels guide imagined rollouts, training recipes across web/ego/robot data, evaluation with learned simulators, and challenges like contact-rich tasks and latency.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Robot Data Primarily Teaches Morphology Not Tasks

  • Robot data can serve mainly to teach morphology and kinematics rather than task breadth.
  • 1X used a pick-and-place robot dataset as a 'shim' so the model learns range-of-motion constraints while generalizing tasks from human video.
INSIGHT

Contact Rich Tasks Need Targeted Data Or Autonomy Rollouts

  • Hard contact-rich tasks like scrubbing dishes remain challenging for zero-shot world models and need targeted data or autonomous rollouts to improve.
  • 1X scored ~20% on scrubbing; they plan to hill-climb by collecting autonomy rollouts and training on successes/failures.
INSIGHT

Third Person Video Still Valuable For Humanoid Transfer

  • Learning from exocentric (third-person) video remains valuable because humanoid morphology matches human form, making transfer data-efficient.
  • 1X leverages both exocentric web-scale videos and egocentric mid-training to exploit human similarity.
Get the Snipd Podcast app to discover more snips from this episode
Get the app