Thoughtworks Technology Podcast

Durable computing: What is it and why now?

34 snips
Mar 5, 2026
John Coleman, lead consultant with expertise in distributed systems and state recovery. Brandon Cook, principal engineer focused on operationalizing resiliency in event-driven platforms. They define durable computing and state recovery. They compare durable platforms to orchestration patterns, weigh tradeoffs and lock-in, discuss testing and versioning, and explore durable agents for AI and practical pitfalls.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Durable Computing Guarantees Workflow Continuity

  • Durable computing lets a program recover its state and continue so long-running workflows can complete despite crashes.
  • John Coleman explained platforms persist state or effects and either replay or resume execution to guarantee completion.
INSIGHT

Platforms Abstract Known Failure Patterns

  • Durable computing platforms package known distributed-systems failure patterns so teams don't reinvent retries and recovery.
  • John Coleman said the novelty is platform power, not concepts like assured delivery or exactly-once semantics.
ANECDOTE

Assessment Led To A Central Durable Platform Team

  • After an assessment, Brandon helped create a platform team to centralize durable capabilities instead of each team reimplementing them.
  • He evaluated many platforms to democratize resiliency rather than forcing every team to build it.
Get the Snipd Podcast app to discover more snips from this episode
Get the app