
No Priors: Artificial Intelligence | Technology | Startups Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud
538 snips
May 1, 2026 Tuhin Srivastava, Baseten co-founder and CEO building AI inference infrastructure, talks about the AI inference crunch and Baseten’s rapid rise. He gets into why custom models and the app layer still matter. They explore GPU shortages, long-term capacity bets, Chinese open models, multi-cloud scaling, and the messy operational realities of running inference at scale.
AI Snips
Chapters
Transcript
Episode notes
GPU Contracts Now Favor Capitalized Buyers
- Buying top-end GPU capacity now increasingly requires multi-year commitments, prepayments, and strong financing, not just demand forecasts.
- Tuhin Srivastava says 1,000 B200s from a good cloud can require three-to-five-year contracts plus 20% to 30% TCV prepaid.
Why Software Makes Inference Infrastructure Sticky
- GPU rentals alone are commodity, but inference software wrapped around compute becomes sticky and hard to replace.
- Tuhin Srivastava says none of Baseten's top 30 customers have churned, with roughly 400% annual NDR tied to the software layer.
Why NVIDIA Still Has The Near Term Edge
- A multi-chip future is plausible, but NVIDIA still wins near term because supply chain execution, CUDA, and ecosystem speed matter more than raw chip ideas.
- Tuhin Srivastava notes alternative vendors often tie supply to one buyer, which blocks broader developer ecosystems from forming.

