
RoboPapers Ep#12: VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
Aug 14, 2025
01:08:43
How can world models be used for training autonomous driving? Learn by watching this episode with Florent Bartoccioni!
We explores the potential of large-scale generative video models to enhance autonomous driving capabilities, introducing an open-source autoregressive video model (VaViM) and a companion video-action model (VaVAM). VaViM is a simple autoregressive model that predicts frames using spatio-temporal token sequences, while VaVAM leverages the learned representations to generate driving trajectories through imitation learning. Together, they offer a complete perception-to-action pipeline.
This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit robopapers.substack.com
