No Priors: Artificial Intelligence | Technology | Startups

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

538 snips

May 1, 2026

Tuhin Srivastava, Baseten co-founder and CEO building AI inference infrastructure, talks about the AI inference crunch and Baseten’s rapid rise. He gets into why custom models and the app layer still matter. They explore GPU shortages, long-term capacity bets, Chinese open models, multi-cloud scaling, and the messy operational realities of running inference at scale.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

GPU Contracts Now Favor Capitalized Buyers

Buying top-end GPU capacity now increasingly requires multi-year commitments, prepayments, and strong financing, not just demand forecasts.
Tuhin Srivastava says 1,000 B200s from a good cloud can require three-to-five-year contracts plus 20% to 30% TCV prepaid.

Why Software Makes Inference Infrastructure Sticky

GPU rentals alone are commodity, but inference software wrapped around compute becomes sticky and hard to replace.
Tuhin Srivastava says none of Baseten's top 30 customers have churned, with roughly 400% annual NDR tied to the software layer.

Why NVIDIA Still Has The Near Term Edge

A multi-chip future is plausible, but NVIDIA still wins near term because supply chain execution, CUDA, and ecosystem speed matter more than raw chip ideas.
Tuhin Srivastava notes alternative vendors often tie supply to one buyer, which blocks broader developer ecosystems from forming.

Get the Snipd Podcast app to discover more snips from this episode

Get the app