
RoboPapers Ep#61: 1x World Model
Feb 4, 2026
Daniel Ho, Director of Evaluations at 1X who builds world-model-based control for humanoid robots. He describes using internet and egocentric videos as imagined worlds to generate zero-shot robot behaviors. The conversation covers how prompts and action labels guide imagined rollouts, training recipes across web/ego/robot data, evaluation with learned simulators, and challenges like contact-rich tasks and latency.
AI Snips
Chapters
Transcript
Episode notes
Robot Data Primarily Teaches Morphology Not Tasks
- Robot data can serve mainly to teach morphology and kinematics rather than task breadth.
- 1X used a pick-and-place robot dataset as a 'shim' so the model learns range-of-motion constraints while generalizing tasks from human video.
Contact Rich Tasks Need Targeted Data Or Autonomy Rollouts
- Hard contact-rich tasks like scrubbing dishes remain challenging for zero-shot world models and need targeted data or autonomous rollouts to improve.
- 1X scored ~20% on scrubbing; they plan to hill-climb by collecting autonomy rollouts and training on successes/failures.
Third Person Video Still Valuable For Humanoid Transfer
- Learning from exocentric (third-person) video remains valuable because humanoid morphology matches human form, making transfer data-efficient.
- 1X leverages both exocentric web-scale videos and egocentric mid-training to exploit human similarity.

