
SemiAnalysis Weekly Feb 25, 2026 - NVIDIA Vera Rubin NVL72 ft. Extreme Co-Design [Jordan Nanos, Myron Xie, Copper Wei (Wega), Howie]
11 snips
Feb 26, 2026 Copper Wei (Wega), an accelerator and AI supply‑chain expert, walks through Rubin’s system‑level design, PCB and power delivery, and memory procurement. The conversation covers HBM4 bandwidth and supplier tradeoffs. They discuss NVLink 6, cable‑less PCB trays, thermal innovations like microchannel lids, chiller‑less datacenters, and Rubin’s power and deployment timelines.
AI Snips
Chapters
Transcript
Episode notes
Adaptive Compression Delivers Realistic Sparsity Gains
- Rubin raises peak FP4 compute to 35 PFLOPS dense and markets an effective 50 PFLOPS via an adaptive compression transformer engine.
- The transformer engine dynamically compresses data streams rather than using prior structured sparsity, reclaiming speed without forced zeros that harmed convergence.
HBM4 Bandwidth Depends On Supplier Base Die Choices
- Rubin moves to HBM4 and doubles IO per stack to 2048, targeting ~22 TB/s per GPU versus ~8 TB/s on Blackwell.
- Memory vendors differ: Micron stuck to DRAM base die causing speed trouble, Hynix used N12, Samsung used 4nm logic.
Capacity Held Constant While Bandwidth Climbs
- Rubin keeps ~288 GB HBM per chip; Rubin Ultra hits larger capacities by doubling stacks to 16 and using 16-high HBM4E.
- Increasing stack height and IO doubles packaging complexity and limits straightforward capacity scaling.
