
MLOps.community Durable Execution and Modern Distributed Systems
28 snips
Mar 17, 2026 Johann Schleier-Smith, Technical Lead for AI at Temporal Technologies, builds reliable infrastructure for long-running production AI workflows. He explains durable execution and why it makes regular Python programs crash-proof and scalable. Topics include deterministic workflows, cross-region resilience, integrating durable state with databases, using durable execution with LLMs and agents, and practical operational patterns.
AI Snips
Chapters
Transcript
Episode notes
State Capture Lets Workflows Survive Region Failures
- Temporal records activity results durably so running programs can be moved across regions mid-execution.
- That lets you survive region outages by replaying protected state in another region backed by the database.
Incremental State Capture Beats Coarse Checkpoints
- Temporal's approach is like incremental checkpoints: it persistently captures deltas of program state rather than coarse-grained full snapshots.
- You can still take full snapshots ('continue as new') when history grows large, but incremental capture enables fine-grained recovery.
Database Startup Acquisition Shaped Temporal Vision
- Johann described his prior startup in serverless databases being acquired by Temporal because both missions aimed to make systems reliable without developer toil.
- That history influenced Temporal's goal to provide database-like guarantees in regular programming languages.
