SemiAnalysis Weekly

Feb 25, 2026 - NVIDIA Vera Rubin NVL72 ft. Extreme Co-Design [Jordan Nanos, Myron Xie, Copper Wei (Wega), Howie]

11 snips
Feb 26, 2026
Copper Wei (Wega), an accelerator and AI supply‑chain expert, walks through Rubin’s system‑level design, PCB and power delivery, and memory procurement. The conversation covers HBM4 bandwidth and supplier tradeoffs. They discuss NVLink 6, cable‑less PCB trays, thermal innovations like microchannel lids, chiller‑less datacenters, and Rubin’s power and deployment timelines.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Adaptive Compression Delivers Realistic Sparsity Gains

  • Rubin raises peak FP4 compute to 35 PFLOPS dense and markets an effective 50 PFLOPS via an adaptive compression transformer engine.
  • The transformer engine dynamically compresses data streams rather than using prior structured sparsity, reclaiming speed without forced zeros that harmed convergence.
INSIGHT

HBM4 Bandwidth Depends On Supplier Base Die Choices

  • Rubin moves to HBM4 and doubles IO per stack to 2048, targeting ~22 TB/s per GPU versus ~8 TB/s on Blackwell.
  • Memory vendors differ: Micron stuck to DRAM base die causing speed trouble, Hynix used N12, Samsung used 4nm logic.
INSIGHT

Capacity Held Constant While Bandwidth Climbs

  • Rubin keeps ~288 GB HBM per chip; Rubin Ultra hits larger capacities by doubling stacks to 16 and using 16-high HBM4E.
  • Increasing stack height and IO doubles packaging complexity and limits straightforward capacity scaling.
Get the Snipd Podcast app to discover more snips from this episode
Get the app