The Peterman Pod

AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

76 snips
Apr 13, 2026
Marc Brooker, a Distinguished Engineer at AWS known for building distributed databases and analyzing thousands of postmortems. He discusses lessons from 3,000+ incident postmortems, why caches can trigger large-scale failures, how on-call experience shapes product thinking, and how AI will reshape software engineering and career paths.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Good Postmortems Go Many Layers Deep

  • Great postmortems dig into multiple layers of why, not just the proximate code bug.
  • Marc looks for fixes spanning code, testing, organizational processes, and whether repeated patterns merit new services or libraries.
ADVICE

Design Databases To Avoid Client Locking

  • Design databases to avoid common operational pitfalls like clients holding locks unexpectedly.
  • D-SQL uses multiversion concurrency control and optimistic commit checks so readers never block writers, preventing frequent relational DB outages.
INSIGHT

Caches Create Metastable Failure Modes

  • Caches create two operational modes: fast when warm and slow/down when cold, which can produce metastable failures.
  • Marc prefers full materialized views or scalable backends (e.g., D-SQL, DynamoDB) to avoid cache-empty collapse.
Get the Snipd Podcast app to discover more snips from this episode
Get the app