Scaling cloud inference as distributed OS

Rajat describes inference at scale becoming a distributed operating-system problem across multi-GPU clusters and accelerators.

Play episode from 34:54

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Rajat Monga, CVP AI Frameworks @ Microsoft, joins the podcast to discuss his leadership and founder journey, from Google Brain / Tensorflow to inference.io and back to Microsoft. He dissects what it means to refound vs. start from scratch, the value of the open source community, and strategies for discovering what problem to solve when going the startup route. We also cover how to determine your users’ hidden incentives and what that means for both product development & marketing, along with navigating the balance between a product’s usefulness and consumers’ willingness to pay for it. Additionally, Rajat shares about what he’s currently up to at Microsoft and the emerging ML / AI technologies he’s most excited about.

ABOUT RAJAT MONGA

Rajat Monga is responsible for enabling an efficient AI stack at Microsoft from cloud to the edge. Before joining Microsoft, Rajat was founder and CEO of Inference.io, a smart analytics platform powered by AI. During his decade-long tenure at Google, he co-founded and led TensorFlow, and was a founding member of Google Brain. He’s built out and led many engineering teams, and designed large scale distributed systems including web scale crawling and eBay’s search engine. Rajat is a graduate of IIT Delhi.

Unblocked: The context engine your coding agents are missing.

Give your coding agents the context your best engineers have.

Your agents can read code, but they don’t know how your team works. Rules and MCPs give access to information but not understanding. That’s why you still have to tell them where to look and what to look for.

Unblocked gives your agents the history, conventions, and decisions behind your code so they generate mergeable output without the back and forth. It automatically surfaces the right context for every task, so agents stay on track without the set up tax or the correction loops.

getunblocked.com/elc

SHOW NOTES:

Rajat’s journey with Google Brain: Scaling deep learning from single PCs to thousands of machines with Jeff Dean & Andrew Ng (2:57)
Moving from Google Brain to TensorFlow: Why new hardware and architectures required a total system rebuild (6:02)
The "refounding" question: Choosing between starting from scratch or evolving an existing system (8:33)
Why Google open-sourced TensorFlow to set industry standards and avoid supporting external copies (10:16)
How open-source enabled global innovation, from Japanese cucumber sorting to African plant health (12:02)
Transitioning as a leader: Why Rajat left Google during the height of TensorFlow to found a company (13:57)
The discovery phase at inference.io: Navigating the pivot from IoT into solving data analytics gaps (15:31)
Lessons on PMF: Moving beyond a "useful" product to one that solves a truly critical customer pain point (16:52)
Why habits are harder to change than technology and the challenge of competing with established workflows (21:02)
Marketing strategies: Tailoring personas for top-down prestige versus bottom-up personal efficiency (23:19)
Deciding when to stop: A founder’s framework for re-evaluating bets based on current knowledge (24:57)
Rajat’s new role at Microsoft: Overseeing Edge infrastructure and large-scale Cloud AI inference (27:46)
Dissecting ML edge strategy: Using ONNX Runtime to unify AI performance across Windows, iOS, and Android (30:02)
Edge AI trends: Shifting from experimental models to production models optimized for cost and privacy (31:20)
The future of Edge: How on-device processing will power AI in robotics, smart glasses, and wearables (33:23)
Scaling inference: Treating multi-GPU clusters like a distributed operating system for AI models (34:25)
Rapid fire questions (37:45)

LINKS AND RESOURCES

Epic Disruptions: 11 Innovations That Shaped Our Modern World - Innovation expert Scott Anthony masterfully weaves together the fascinating stories behind history's most transformative disruptions—from ninth-century China to twenty-first-century Silicon Valley. Through eleven pivotal innovations, including the printing press, mass-produced automobiles, the McDonald's revolutionary food system, and the iPhone, Anthony reveals the hidden patterns behind world-changing breakthroughs.

This episode wouldn’t have been possible without the help of our incredible production team:

Patrick Gallagher - Producer & Co-Host

Jerry Li - Co-Host

Noah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/

Dan Overheim - Audio Engineer, Dan’s also an avid 3D printer - https://www.bnd3d.com/

Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books