Google SRE Prodcast cover image

The One With Damion Yates and Building AI systems

Google SRE Prodcast

00:00

Lockstep training jobs and criticality

Damion details how large language model training runs in lockstep, making every machine critical and outages costly.

Play episode from 24:00
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app