
The Neuron: AI Explained Kari Briski at GTC 2026: The Future of NVIDIA AI & Nemotron 3
56 snips
Mar 19, 2026 Kari Briski, VP of Generative AI Software for Enterprise at NVIDIA, leads the Nemotron model family. She unveils Nemotron 3 Super and explains how a 120B model can perform like a 12B through routing and efficiency. Conversation covers multi-agent systems moving to production, OpenClaw as a system-level security and orchestration approach, and the rapid growth of open-model token use.
AI Snips
Chapters
Transcript
Episode notes
Super Balances Smarts With Token Economics
- Nemotron 3 Super was designed both to boost intelligence and to make multi-agent systems economical by improving token efficiency and latency.
- Briski framed the model as a response to rising agent-to-agent token usage and the need for smarter, cheaper token generation in production.
Architecture And Co-Design Deliver 3x Speedups
- Architectural choices—hybrid Mamba state space plus Transformer, latent MOE and compounds—produce 3–5x latency improvements.
- Briski called this extreme co-design: matching model architecture to infrastructure for smaller compute footprint and faster token throughput.
Failures Often Come From The Harness Not The Model
- Production failures are often about memory management and harnesses rather than just model instruction following.
- Briski emphasized evaluating the total system—orchestrator, prompting, memory and tools—when diagnosing context failures.
