The InfoQ Podcast

Safely Changing Software to Avoid Incidents: A Conversation with Justin Sheehy

29 snips
Sep 8, 2025
In a captivating conversation, Justin Sheehy, Chief Architect at Akamai, shares insights on making software safer and more resilient. He discusses the futility of root cause analysis and stresses the importance of a shared language for incident discussions. The need for malleable and observable software is highlighted, along with the understanding that all technology decisions are inherently business-oriented. Sheehy also addresses how AI's rise complicates engineers' abilities to handle production incidents, making resilience even more crucial.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Changes Often Trigger Incidents

  • Many incidents are triggered by pushes of change rather than single root causes.
  • Treat changes as proximate causes and design systems to tolerate them.
ADVICE

Adopt A Shared Change-Safety Framework

  • Create a shared, simple framework of change-safety principles for your teams.
  • Use common language so SREs, DevOps, and architects can discuss and act quickly.
INSIGHT

Observability Needs Malleability

  • Observability without the ability to change (malleability) still leaves systems brittle.
  • You must reach the right balance of observability and malleability for your system.
Get the Snipd Podcast app to discover more snips from this episode
Get the app