Greg Steinbrecher

Workload systems and GPU-cluster engineer focused on building software for efficient GPU communication and model training; joined OpenAI to optimize GPU utilization and cluster reliability.

Best podcasts with Greg Steinbrecher

Ranked by the Snipd community

59 snips

May 6, 2026 • 38min

Episode 18 - Why AI needs a new kind of supercomputer network

Mark Handley, a longtime networking researcher and UCL professor, and Greg Steinbrecher, a GPU-cluster engineer optimizing large-scale training, explain how AI training breaks traditional networks. They discuss why synchronous GPU workloads stall on small faults, how Multipath Reliable Connection sprays traffic and trims packets to avoid failures, and why making this open standard can lower cost and boost reliability for massive GPU fleets.

May 6, 2026 • 38min

Episode 18 - Why AI needs a new kind of supercomputer network

Meet Greg Steinbrecher, a workload systems engineer who makes GPUs talk efficiently, and Mark Handley, a veteran networking researcher shaping data center protocols. They discuss why large-scale GPU training breaks internet networking assumptions. They explain Multipath Reliable Connection, packet trimming, fast failure detection, and how a new network design boosts reliability and speed for massive model training.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app