Ep#61: 1x World Model

Feb 4, 2026

Daniel Ho, Director of Evaluations at 1X who builds world-model-based control for humanoid robots. He describes using internet and egocentric videos as imagined worlds to generate zero-shot robot behaviors. The conversation covers how prompts and action labels guide imagined rollouts, training recipes across web/ego/robot data, evaluation with learned simulators, and challenges like contact-rich tasks and latency.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Robot Data Primarily Teaches Morphology Not Tasks

Robot data can serve mainly to teach morphology and kinematics rather than task breadth.
1X used a pick-and-place robot dataset as a 'shim' so the model learns range-of-motion constraints while generalizing tasks from human video.

INSIGHT

Contact Rich Tasks Need Targeted Data Or Autonomy Rollouts

Hard contact-rich tasks like scrubbing dishes remain challenging for zero-shot world models and need targeted data or autonomous rollouts to improve.
1X scored ~20% on scrubbing; they plan to hill-climb by collecting autonomy rollouts and training on successes/failures.

INSIGHT

Third Person Video Still Valuable For Humanoid Transfer

Learning from exocentric (third-person) video remains valuable because humanoid morphology matches human form, making transfer data-efficient.
1X leverages both exocentric web-scale videos and egocentric mid-training to exploit human similarity.

Get the Snipd Podcast app to discover more snips from this episode

Get the app