
Semi Doped An Interview with Microsoft's Saurabh Dighe About Maia 200
19 snips
Jan 28, 2026 Saurabh Dighe, Corporate VP leading Microsoft’s custom AI accelerator efforts, walks through Maia 200 and why it was built for inference. He discusses the design tradeoffs that target price-performance, the scale-up architecture, on-die NICs and large SRAM plus HBM, and how Maia will be delivered across Microsoft’s fleet and partners.
AI Snips
Chapters
Transcript
Episode notes
Inference-First Design Philosophy
- Microsoft designed Maia 200 to optimize inference price-performance and power rather than peak training throughput.
- The primary goal was to lower planet-scale inference cost by co-designing silicon, systems, and software.
Inference Is An Efficiency Frontier
- Inference workloads sit on an efficiency frontier with tradeoffs across latency, cost, and throughput.
- Microsoft embraces heterogeneous infrastructure to match different points on that frontier for customer use cases.
Early Bet On LLMs Shaped Architecture
- Microsoft began concrete Maia 200 architecture work in 2022–2023 informed by research and model teams.
- That early bet shifted Maia from CNN-era balance to LLM-optimized compute, HBM3E, and larger HBM capacity.
