Vanishing Gradients

Episode 71: Durable Agents - How to Build AI Systems That Survive a Crash with Samuel Colvin

34 snips
Feb 18, 2026
Samuel Colvin, creator of Pydantic and lead of the Pydantic Stack, explains building durable AI agents with engineering-grade reliability. He discusses agentlets as small specialized building blocks, using Temporal for robust workflow durability, separating deterministic workflows from stochastic model calls, and making observability and type-safe validation central to production AI.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Use Durable Execution Frameworks

  • Use durable execution frameworks like Temporal instead of homegrown snapshotting solutions.
  • Let the framework manage retries, state caching, and resumption so agents can recover from failures.
ANECDOTE

20 Questions Demo Shows Failure Cost

  • Samuel demos a 20-questions agent that loses progress when it crashes because it lacks durability.
  • The non-durable run required restarting from step one after a network error or node kill.
ADVICE

Separate Deterministic Workflow From IO

  • Split workflows into deterministic workflows and stochastic activities.
  • Make IO and LLM calls activities so the engine can cache outputs and replay them on retries.
Get the Snipd Podcast app to discover more snips from this episode
Get the app