What NVIDIA’s bet on OpenClaw means for the future of AI and your token budget

139 snips

Mar 25, 2026

A deep dive into why inference is overtaking training and what that means for company budgets. A new OpenClaw-style harness is framed as a turning point for deploying agents at scale. Token consumption skyrocketing from millions to hundreds of millions per day and how organizations should govern and budget for it. Practical examples of agent workloads, model choice strategies, and shifting economics of compute.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Inference Shift Is Reshaping The AI Economy

AI inference is becoming the dominant phase of the AI economy as models are used to generate outputs for users rather than just being trained.
Azeem highlights NVIDIA's trillion-dollar order book and OpenClaw's rapid rise as signals that inference demand is exploding across industries.

INSIGHT

A Millionfold Compute Surge In Two Years

Reasoning models and agentic systems produced a millionfold increase in compute demand in two years due to 10,000x per-user compute rise and 100x usage growth.
NVIDIA's trillion-dollar backlog reflects customers expecting this exponential inference surge to continue.

INSIGHT

Hybrid Architectures Unlock Decode Efficiency

NVIDIA is evolving from pure-GPU systems to hybrid architectures (GPU plus Groq-like units) to boost decode throughput and energy efficiency.
The combined systems promise ~35x throughput per megawatt versus current Blackwell GPUs for inference workloads.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Welcome to Exponential View, the show where I explore how exponential technologies such as AI are reshaping our future. I’ve been studying AI and exponential technologies at the frontier for over ten years.

Each week, I share some of my analysis or speak with an expert guest to make light of a particular topic.

To keep up with the Exponential transition, subscribe to this channel or to my newsletter:

https://www.exponentialview.co/

----

Last week Jensen Huang shared the numbers from NVIDIA’s order book: AI compute demand has grown a millionfold in two years. Much GTC coverage focused on chips, robots, data centers in space, but I think Jensen revealed something far more important in his keynote: “the inference inflection has arrived,” and this is about to transform how all companies should manage their budgets. The inference era is already the operating assumption of the world’s most valuable company.

In this week’s podcast, I cover:

(1:20) NVIDIA's $1 trillion order book

(1:56) OpenClaw: our era's web browser

(7:54) Training vs Inference: how AI is changing

(12:50) Pre-fill vs. decode: the technical split

(18:06) The Harness: why OpenClaw changes everything

(18:59) The engine is useless without a car

(22:21) From 100M to 870M tokens per day

(24:29) Meet my agent R Mini Arnold's team

(26:16) AI focus group simulations at $10–50 a run

(29:36) Jensen's self-interest (and why he's still right)

(33:07) AI governance: token budgets don't belong with IT