Azeem Azhar's Exponential View What NVIDIA’s bet on OpenClaw means for the future of AI and your token budget
139 snips
Mar 25, 2026 A deep dive into why inference is overtaking training and what that means for company budgets. A new OpenClaw-style harness is framed as a turning point for deploying agents at scale. Token consumption skyrocketing from millions to hundreds of millions per day and how organizations should govern and budget for it. Practical examples of agent workloads, model choice strategies, and shifting economics of compute.
AI Snips
Chapters
Transcript
Episode notes
Inference Shift Is Reshaping The AI Economy
- AI inference is becoming the dominant phase of the AI economy as models are used to generate outputs for users rather than just being trained.
- Azeem highlights NVIDIA's trillion-dollar order book and OpenClaw's rapid rise as signals that inference demand is exploding across industries.
A Millionfold Compute Surge In Two Years
- Reasoning models and agentic systems produced a millionfold increase in compute demand in two years due to 10,000x per-user compute rise and 100x usage growth.
- NVIDIA's trillion-dollar backlog reflects customers expecting this exponential inference surge to continue.
Hybrid Architectures Unlock Decode Efficiency
- NVIDIA is evolving from pure-GPU systems to hybrid architectures (GPU plus Groq-like units) to boost decode throughput and energy efficiency.
- The combined systems promise ~35x throughput per megawatt versus current Blackwell GPUs for inference workloads.
