Inference by Turing Post

Transformers Are Not the End Game | World Models, Physical AI, and AI’s Next Frontier

Apr 7, 2026
Sanja Fidler, VP of AI Research at NVIDIA and Spatial Intelligence Lab lead, studies world models, 3D spatial intelligence, and physical AI. She discusses how transformers and world models complement each other. She highlights why 3D and multimodal sensing matter for robotics and self-driving. She explores learned simulators like AlpaDreams and the hard gaps left in physical interaction and real-time simulation.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Transformers Complement World Models

  • Transformers are a general architecture that can power tasks like language, video, or 3D world models rather than being mutually exclusive with world models.
  • Sanja Fidler explains world models simulate virtual worlds (e.g., generating camera views) and can be implemented using transformer-based or other architectures.
INSIGHT

Transformers Are Not The End Game

  • Transformers are unlikely to be the final architecture; the field is already exploring alternatives like state space models and mixtures of experts.
  • Fidler stresses architectures will evolve to reduce compute and data needs so smaller teams can experiment.
ANECDOTE

AlphaDreams Demo Shows Real Time Interactive Worlds

  • NVIDIA announced AlphaDreams following Cosmos, moving from slow generative video chunks to interactive, real-time simulation.
  • Fidler describes a demo with a steering wheel where the model runs like a game engine and users can drive in the loop.
Get the Snipd Podcast app to discover more snips from this episode
Get the app