
Damion Yates
Reliability engineering lead at Google DeepMind who helped establish reliability practices and infrastructure for large-scale AI experimentation, specializing in accelerator monitoring, resource allocation, and promoting SRE mindsets within research organizations.
Best podcasts with Damion Yates
Ranked by the Snipd community

Feb 26, 2026 • 31min
The One With Damion Yates and Building AI systems
Damion Yates, reliability engineering lead at Google DeepMind who built reliability practices for large-scale AI research. He talks about creating a reliability team from scratch. He covers training researchers in resilience, building proactive tooling and guardrails, handling lockstep training where one failure halts huge runs, and why avoiding lucky silence matters for dependable AI systems.


