Durable computing: What is it and why now?
34 snips
Mar 5, 2026 John Coleman, lead consultant with expertise in distributed systems and state recovery. Brandon Cook, principal engineer focused on operationalizing resiliency in event-driven platforms. They define durable computing and state recovery. They compare durable platforms to orchestration patterns, weigh tradeoffs and lock-in, discuss testing and versioning, and explore durable agents for AI and practical pitfalls.
AI Snips
Chapters
Transcript
Episode notes
Durable Computing Guarantees Workflow Continuity
- Durable computing lets a program recover its state and continue so long-running workflows can complete despite crashes.
- John Coleman explained platforms persist state or effects and either replay or resume execution to guarantee completion.
Platforms Abstract Known Failure Patterns
- Durable computing platforms package known distributed-systems failure patterns so teams don't reinvent retries and recovery.
- John Coleman said the novelty is platform power, not concepts like assured delivery or exactly-once semantics.
Assessment Led To A Central Durable Platform Team
- After an assessment, Brandon helped create a platform team to centralize durable capabilities instead of each team reimplementing them.
- He evaluated many platforms to democratize resiliency rather than forcing every team to build it.
