Feb 25, 2026 - NVIDIA Vera Rubin NVL72 ft. Extreme Co-Design [Jordan Nanos, Myron Xie, Copper Wei (Wega), Howie]

11 snips

Feb 26, 2026

Copper Wei (Wega), an accelerator and AI supply‑chain expert, walks through Rubin’s system‑level design, PCB and power delivery, and memory procurement. The conversation covers HBM4 bandwidth and supplier tradeoffs. They discuss NVLink 6, cable‑less PCB trays, thermal innovations like microchannel lids, chiller‑less datacenters, and Rubin’s power and deployment timelines.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Adaptive Compression Delivers Realistic Sparsity Gains

Rubin raises peak FP4 compute to 35 PFLOPS dense and markets an effective 50 PFLOPS via an adaptive compression transformer engine.
The transformer engine dynamically compresses data streams rather than using prior structured sparsity, reclaiming speed without forced zeros that harmed convergence.

INSIGHT

HBM4 Bandwidth Depends On Supplier Base Die Choices

Rubin moves to HBM4 and doubles IO per stack to 2048, targeting ~22 TB/s per GPU versus ~8 TB/s on Blackwell.
Memory vendors differ: Micron stuck to DRAM base die causing speed trouble, Hynix used N12, Samsung used 4nm logic.

INSIGHT

Capacity Held Constant While Bandwidth Climbs

Rubin keeps ~288 GB HBM per chip; Rubin Ultra hits larger capacities by doubling stacks to 16 and using 16-high HBM4E.
Increasing stack height and IO doubles packaging complexity and limits straightforward capacity scaling.

Get the Snipd Podcast app to discover more snips from this episode

Get the app