This is Fine! A podcast about resilience engineering and software

Colette Alexander and Clint Byrum
undefined
12 snips
Mar 12, 2026 • 59min

The 2025 DORA Report w/special guest Fred Hebert

Fred Hebert, a Staff SRE and Lund student known for work on SLOs, error budgets, and the Law of Stretched Systems, discusses the 2025 DORA Report. He unpacks why the report reframes around AI-assisted development. They explore AI adoption models, survey limits, platform vs AI impacts, cognitive load and burnout, and how new capacity can be reabsorbed by organizational demands.
undefined
Feb 26, 2026 • 1h 9min

Building and Revising Adaptive Capacity Sharing for Technical Incident Response with Beth Adele Long

Beth Adele Long, Principal at Adaptive Capacity Labs and resilience practitioner, shares field-tested practices from New Relic. She describes the NERF rotation, incident command vs support roles, and how calm coordinators reduce org-wide disruption. Conversation covers lowering friction to ask for help, making operational work a career path, and using management and tools to sustain adaptive capacity.
undefined
16 snips
Feb 12, 2026 • 42min

Outsourcing and Resilience

They debate outsourcing software and the risks of partial handoffs versus full ownership. They explore how trust, in-person time, and clear agency shape reliable operations. They riff on outsourcing everyday tasks, construction trade adaptations, and cultural practices like servant leadership and joint retrospectives.
undefined
12 snips
Feb 1, 2026 • 1h 43min

The Messy 9 and Coding with AI - A Panel Discussion

David Woods, resilience engineering founder and Professor Emeritus, brings foundational perspectives on the Messy 9 and socio-technical risks in AI systems. Shiri Cabral, enterprise architecture leader with experience at MongoDB and Salesforce, explains using AI for diagnostics and knowledge retrieval. They discuss AI in coding workflows, de-skilling risks, automation pitfalls, observability with AI, and designing collaborative human–AI systems.
undefined
Jan 17, 2026 • 1h 2min

Going Solid

If you’re feeling like you need to do more to respond to our moment:Lots of place to donate to in the twin cities are listed here: https://mspmag.com/arts-and-culture/general-interest/ice-minnesota-support-immigrant-communities-fundraisers-food-drives-trainings/You can always find mutual aid networks in your own area, including immigrant aid networkshttps://immigrantdefensenetwork.org/ does good work, tooThe Hometown Holler podcast with Tressie McMillan Cottom was a wonderful discussion: https://www.youtube.com/watch?v=2gr4mW8aR-gThe Ruth Wilson Gilmore’s interview that I quoted clumsily is here: https://www.nytimes.com/2019/04/17/magazine/prison-abolition-ruth-wilson-gilmore.html The paper itself: https://qualitysafety.bmj.com/content/14/2/130.shortIf you haven’t seen The Pitt, you should, it’s super good: https://en.wikipedia.org/wiki/The_PittCharles Perrow’s Normal Accidents has more definitions/examples of coupling: https://bookshop.org/p/books/normal-accidents-living-with-high-risk-technologies-updated-edition-professor-charles-perrow/cad38a43fcffa1f8?ean=9780691004129&next=tSome stuff on microservices and coupling here: https://microservices.io/post/architecture/2023/03/28/microservice-architecture-essentials-loose-coupling.htmlColette’s #notanad endorsement for paper organizing is https://paperpile.com/Rasmussen’s boundary model comes initially from his paper here: https://www.sciencedirect.com/science/article/abs/pii/S0925753597000520And if you want a good writeup on Rasmussen’s boundary model explaining it, you can always read Lorin’s blog: https://surfingcomplexity.blog/2021/05/31/transgressing-the-boundaries-rasmussen-and-woods/Dr Cook’s talk at Velocity is a classic, and goes over Rasmussen’s boundary model really well: https://www.youtube.com/watch?v=PGLYEDpNu60Fred does a great job writing about the Law of Stretched Systems and how it applies to his own work on his blog: https://ferd.ca/the-law-of-stretched-cognitive-systems.html“Plans are nothing, but planning is everything” is a paraphrase of Eisenhower: https://www.presidency.ucsb.edu/documents/remarks-the-national-defense-executive-reserve-conferenceWant to chat about this paper with other folks? Come to the RISF live event for a Paper Party! https://resilienceinsoftware.org/events/157553
undefined
Dec 31, 2025 • 53min

The Year in Resilience w/special guest John Allspaw

Seriously though, can’t wait to gtfo of this year.Palisades fire links: https://www.nbclosangeles.com/investigations/anonymous-letter-demands-independent-palisades-fire-investigations/3800442/https://internationalfireandsafetyjournal.com/palisades-fire-report/https://www.latimes.com/california/story/2025-12-20/lafd-report-on-palisades-fire-was-watered-down-in-editing-process-records-showCorey Quinn’s commentary on the AWS outage in October is here: https://www.theregister.com/2025/10/20/aws_outage_amazon_brain_drain_corey_quinn/Time to reset the clock on how many episodes it’s been since we’ve mentioned the Ironies of Automation: https://ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdfAlso on Rasmussen’s Boundary Model, which Lorin does a great write up on: https://surfingcomplexity.blog/2021/05/31/transgressing-the-boundaries-rasmussen-and-woods/Lorin’s Law is our favorite law: https://surfingcomplexity.blog/2017/06/24/a-conjecture-on-why-reliable-systems-fail/You can ask us questions or write to us using our form linked from our website: thisisfinepod.comResilience in Software Foundation is at resilienceinsoftware.org 
undefined
Nov 28, 2025 • 43min

Incident Status: On Hold w/special guest Will Gallego

Mentioned multiple times, Em Ruppe’s amazing talk on incident severity: https://www.usenix.org/conference/srecon24americas/presentation/ruppeWe talk about the RIS Slack sometimes - you can join us in the slack, by joining the Foundation here: https://resilienceinsoftware.org/Please ask us a question at thisisfinepod.com 
undefined
25 snips
Nov 13, 2025 • 1h 8min

Complex Systems and the Messy Nine w/special guests Dave Woods and John Allspaw

Dive into systems thinking with experts discussing the nuanced complexities of resilience engineering. Explore how perspectives shape learning, revealing the limitations of reductionism. Discover why linear root cause analyses fail in complex systems, and how pressure can stifle adaptive thinking. Introduced is the 'messy nine' heuristic, illustrating potential pitfalls like congestion and cascading failures. The conversation highlights the vital role of local expertise amidst automation and emphasizes the necessity of maintaining adaptable systems.
undefined
Oct 30, 2025 • 37min

All the things about Incident Command

It’s Spamton G (not J) Spamton, Clint! Get hip to the game characters! https://deltarune.fandom.com/wiki/SpamtonThere are a couple of incident command trainers out there who tend to get recommended in the tech world (that we know of): https://www.blackrock3.com/ and Great Circle: https://greatcircle.com/im/
undefined
Oct 16, 2025 • 60min

Root Cause Analysis vs. Resilience Engineering w/special guest Lorin Hochstein

Lorin Hochstein, a software engineer and researcher on reliability, dives into the nuances between root cause analysis and resilience engineering. He reveals the origins of the 'Five Whys' and critiques its overuse in incident analyses. Hochstein argues for resilience-oriented methods, asserting they reveal deeper insights and prevent future failures. He discusses the Swiss Cheese Model, the limitations of assuming a single root cause, and introduces STAMP, a method from safety-critical fields, emphasizing the need for effective learning over mere fixes. A must-listen for tech teams!

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app