This episode is sponsored by AGNTCY. Unlock agents at scale with an open Internet of Agents.

Visit https://agntcy.org/ and add your support.

How will AI evolve once it can understand and reason about the 3D world, not just text on a screen?

In this episode of Eye on AI, host Craig Smith speaks with Fei Fei Li about the rise of spatial intelligence and the world models that could transform how machines perceive, imagine, and interact with reality.

We explore how spatial intelligence goes beyond language to connect perception, action, and reasoning in physical environments. You will hear how models like Marble build consistent and persistent 3D spaces, why multimodal inputs matter, and what it takes to create digital worlds that are useful for robotics, simulation, design, and creative workflows. Fei Fei also explains the challenges of long term memory, continuous learning, and the search for training objectives that mirror the role next token prediction plays in language models.

Learn how spatial reasoning unlocks new possibilities in robotics and telepresence, why classical physics engines still matter, and how future AI systems may merge perception, planning, and imagination. You will also hear Fei Fei's perspective on the limits of current architectures, why true understanding is different from human understanding, and how world models could shape the next generation of intelligent systems.

Stay Updated: Craig Smith on X: https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI

#303 Fei-Fei Li: Spatial Intelligence, World Models & the Future of AI

Eye On A.I.

Multimodality vs. video-only inputs

The AI-powered Podcast Player