

Greg Steinbrecher
Workload systems and GPU-cluster engineer focused on building software for efficient GPU communication and model training; joined OpenAI to optimize GPU utilization and cluster reliability.
Best podcasts with Greg Steinbrecher
Ranked by the Snipd community

59 snips
May 6, 2026 • 38min
Episode 18 - Why AI needs a new kind of supercomputer network
Mark Handley, a longtime networking researcher and UCL professor, and Greg Steinbrecher, a GPU-cluster engineer optimizing large-scale training, explain how AI training breaks traditional networks. They discuss why synchronous GPU workloads stall on small faults, how Multipath Reliable Connection sprays traffic and trims packets to avoid failures, and why making this open standard can lower cost and boost reliability for massive GPU fleets.

May 6, 2026 • 38min
Episode 18 - Why AI needs a new kind of supercomputer network
Meet Greg Steinbrecher, a workload systems engineer who makes GPUs talk efficiently, and Mark Handley, a veteran networking researcher shaping data center protocols. They discuss why large-scale GPU training breaks internet networking assumptions. They explain Multipath Reliable Connection, packet trimming, fast failure detection, and how a new network design boosts reliability and speed for massive model training.


