

This is Fine! A podcast about resilience engineering and software
Colette Alexander and Clint Byrum
A podcast about resilience engineering and software.
Ever wondered why things on the internet break? Do you work in software and wish that you could have a Dear-Abby-Like call-in show that could answer your deepest questions about how to make your workplace suck less? We're here to help!
Write us anonymously at our open question form
Email us at: thisisfine.softwarepodcast@gmail.com
Call us and leave a voicemail, or text us at: (401) 592-7574
Ever wondered why things on the internet break? Do you work in software and wish that you could have a Dear-Abby-Like call-in show that could answer your deepest questions about how to make your workplace suck less? We're here to help!
Write us anonymously at our open question form
Email us at: thisisfine.softwarepodcast@gmail.com
Call us and leave a voicemail, or text us at: (401) 592-7574
Episodes
Mentioned books

12 snips
Mar 12, 2026 • 59min
The 2025 DORA Report w/special guest Fred Hebert
Fred Hebert, a Staff SRE and Lund student known for work on SLOs, error budgets, and the Law of Stretched Systems, discusses the 2025 DORA Report. He unpacks why the report reframes around AI-assisted development. They explore AI adoption models, survey limits, platform vs AI impacts, cognitive load and burnout, and how new capacity can be reabsorbed by organizational demands.

Feb 26, 2026 • 1h 9min
Building and Revising Adaptive Capacity Sharing for Technical Incident Response with Beth Adele Long
Beth Adele Long, Principal at Adaptive Capacity Labs and resilience practitioner, shares field-tested practices from New Relic. She describes the NERF rotation, incident command vs support roles, and how calm coordinators reduce org-wide disruption. Conversation covers lowering friction to ask for help, making operational work a career path, and using management and tools to sustain adaptive capacity.

16 snips
Feb 12, 2026 • 42min
Outsourcing and Resilience
They debate outsourcing software and the risks of partial handoffs versus full ownership. They explore how trust, in-person time, and clear agency shape reliable operations. They riff on outsourcing everyday tasks, construction trade adaptations, and cultural practices like servant leadership and joint retrospectives.

12 snips
Feb 1, 2026 • 1h 43min
The Messy 9 and Coding with AI - A Panel Discussion
David Woods, resilience engineering founder and Professor Emeritus, brings foundational perspectives on the Messy 9 and socio-technical risks in AI systems. Shiri Cabral, enterprise architecture leader with experience at MongoDB and Salesforce, explains using AI for diagnostics and knowledge retrieval. They discuss AI in coding workflows, de-skilling risks, automation pitfalls, observability with AI, and designing collaborative human–AI systems.

Jan 17, 2026 • 1h 2min
Going Solid
If you’re feeling like you need to do more to respond to our moment:Lots of place to donate to in the twin cities are listed here: https://mspmag.com/arts-and-culture/general-interest/ice-minnesota-support-immigrant-communities-fundraisers-food-drives-trainings/You can always find mutual aid networks in your own area, including immigrant aid networkshttps://immigrantdefensenetwork.org/ does good work, tooThe Hometown Holler podcast with Tressie McMillan Cottom was a wonderful discussion: https://www.youtube.com/watch?v=2gr4mW8aR-gThe Ruth Wilson Gilmore’s interview that I quoted clumsily is here: https://www.nytimes.com/2019/04/17/magazine/prison-abolition-ruth-wilson-gilmore.html The paper itself: https://qualitysafety.bmj.com/content/14/2/130.shortIf you haven’t seen The Pitt, you should, it’s super good: https://en.wikipedia.org/wiki/The_PittCharles Perrow’s Normal Accidents has more definitions/examples of coupling: https://bookshop.org/p/books/normal-accidents-living-with-high-risk-technologies-updated-edition-professor-charles-perrow/cad38a43fcffa1f8?ean=9780691004129&next=tSome stuff on microservices and coupling here: https://microservices.io/post/architecture/2023/03/28/microservice-architecture-essentials-loose-coupling.htmlColette’s #notanad endorsement for paper organizing is https://paperpile.com/Rasmussen’s boundary model comes initially from his paper here: https://www.sciencedirect.com/science/article/abs/pii/S0925753597000520And if you want a good writeup on Rasmussen’s boundary model explaining it, you can always read Lorin’s blog: https://surfingcomplexity.blog/2021/05/31/transgressing-the-boundaries-rasmussen-and-woods/Dr Cook’s talk at Velocity is a classic, and goes over Rasmussen’s boundary model really well: https://www.youtube.com/watch?v=PGLYEDpNu60Fred does a great job writing about the Law of Stretched Systems and how it applies to his own work on his blog: https://ferd.ca/the-law-of-stretched-cognitive-systems.html“Plans are nothing, but planning is everything” is a paraphrase of Eisenhower: https://www.presidency.ucsb.edu/documents/remarks-the-national-defense-executive-reserve-conferenceWant to chat about this paper with other folks? Come to the RISF live event for a Paper Party! https://resilienceinsoftware.org/events/157553

Dec 31, 2025 • 53min
The Year in Resilience w/special guest John Allspaw
Seriously though, can’t wait to gtfo of this year.Palisades fire links: https://www.nbclosangeles.com/investigations/anonymous-letter-demands-independent-palisades-fire-investigations/3800442/https://internationalfireandsafetyjournal.com/palisades-fire-report/https://www.latimes.com/california/story/2025-12-20/lafd-report-on-palisades-fire-was-watered-down-in-editing-process-records-showCorey Quinn’s commentary on the AWS outage in October is here: https://www.theregister.com/2025/10/20/aws_outage_amazon_brain_drain_corey_quinn/Time to reset the clock on how many episodes it’s been since we’ve mentioned the Ironies of Automation: https://ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdfAlso on Rasmussen’s Boundary Model, which Lorin does a great write up on: https://surfingcomplexity.blog/2021/05/31/transgressing-the-boundaries-rasmussen-and-woods/Lorin’s Law is our favorite law: https://surfingcomplexity.blog/2017/06/24/a-conjecture-on-why-reliable-systems-fail/You can ask us questions or write to us using our form linked from our website: thisisfinepod.comResilience in Software Foundation is at resilienceinsoftware.org

Nov 28, 2025 • 43min
Incident Status: On Hold w/special guest Will Gallego
Mentioned multiple times, Em Ruppe’s amazing talk on incident severity: https://www.usenix.org/conference/srecon24americas/presentation/ruppeWe talk about the RIS Slack sometimes - you can join us in the slack, by joining the Foundation here: https://resilienceinsoftware.org/Please ask us a question at thisisfinepod.com

25 snips
Nov 13, 2025 • 1h 8min
Complex Systems and the Messy Nine w/special guests Dave Woods and John Allspaw
Dive into systems thinking with experts discussing the nuanced complexities of resilience engineering. Explore how perspectives shape learning, revealing the limitations of reductionism. Discover why linear root cause analyses fail in complex systems, and how pressure can stifle adaptive thinking. Introduced is the 'messy nine' heuristic, illustrating potential pitfalls like congestion and cascading failures. The conversation highlights the vital role of local expertise amidst automation and emphasizes the necessity of maintaining adaptable systems.

Oct 30, 2025 • 37min
All the things about Incident Command
It’s Spamton G (not J) Spamton, Clint! Get hip to the game characters! https://deltarune.fandom.com/wiki/SpamtonThere are a couple of incident command trainers out there who tend to get recommended in the tech world (that we know of): https://www.blackrock3.com/ and Great Circle: https://greatcircle.com/im/

Oct 16, 2025 • 60min
Root Cause Analysis vs. Resilience Engineering w/special guest Lorin Hochstein
Lorin Hochstein, a software engineer and researcher on reliability, dives into the nuances between root cause analysis and resilience engineering. He reveals the origins of the 'Five Whys' and critiques its overuse in incident analyses. Hochstein argues for resilience-oriented methods, asserting they reveal deeper insights and prevent future failures. He discusses the Swiss Cheese Model, the limitations of assuming a single root cause, and introduces STAMP, a method from safety-critical fields, emphasizing the need for effective learning over mere fixes. A must-listen for tech teams!


