The Circuit

EP 162: TPUs Via Cloud Next, Intel Earnings, Foundry Scarcity

30 snips
Apr 27, 2026
A deep dive into Google’s new TPU v5p/v5i launches and the shift toward disaggregated training and inference silicon. A look at memory choices, HBM scarcity, SRAM inference ideas, and board/network moves to cut latency at scale. A surprising take on Intel’s strong earnings, CPU resurgence, foundry capacity and packaging backlogs. Debate over High‑NA EUV economics and whether the current semiconductor rally can last.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Memory And Latency Are The Inference Bottlenecks

  • Google emphasized memory and latency innovations on board and at the rack level to serve inference at scale.
  • They increased on‑chip RAM (north of 200Gb per chip using HBM3 plus SRAM) and built board/networking changes to reduce cross‑chip latency.
ANECDOTE

Google Compares Training To Web Indexing

  • Ben compared training then serving models to Google's early web indexing story to justify heavy training CapEx now.
  • The analogy: indexing (training) was costly but monetization occurred when serving the index (inference) at scale, which Google expects with Gemini.
INSIGHT

Intel's Quarter Moved It Past Existential Risk

  • Intel's latest quarter beat expectations and shifted the narrative from existential risk to normal operational questions like margins and capacity.
  • Strong CPU demand driven by AI (need to 'feed' GPUs) plus improving yields helped revenue and guidance beats.
Get the Snipd Podcast app to discover more snips from this episode
Get the app