MLOps.community

Durable Execution and Modern Distributed Systems

28 snips
Mar 17, 2026
Johann Schleier-Smith, Technical Lead for AI at Temporal Technologies, builds reliable infrastructure for long-running production AI workflows. He explains durable execution and why it makes regular Python programs crash-proof and scalable. Topics include deterministic workflows, cross-region resilience, integrating durable state with databases, using durable execution with LLMs and agents, and practical operational patterns.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

State Capture Lets Workflows Survive Region Failures

  • Temporal records activity results durably so running programs can be moved across regions mid-execution.
  • That lets you survive region outages by replaying protected state in another region backed by the database.
INSIGHT

Incremental State Capture Beats Coarse Checkpoints

  • Temporal's approach is like incremental checkpoints: it persistently captures deltas of program state rather than coarse-grained full snapshots.
  • You can still take full snapshots ('continue as new') when history grows large, but incremental capture enables fine-grained recovery.
ANECDOTE

Database Startup Acquisition Shaped Temporal Vision

  • Johann described his prior startup in serverless databases being acquired by Temporal because both missions aimed to make systems reliable without developer toil.
  • That history influenced Temporal's goal to provide database-like guarantees in regular programming languages.
Get the Snipd Podcast app to discover more snips from this episode
Get the app