
The Peterman Pod AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
76 snips
Apr 13, 2026 Marc Brooker, a Distinguished Engineer at AWS known for building distributed databases and analyzing thousands of postmortems. He discusses lessons from 3,000+ incident postmortems, why caches can trigger large-scale failures, how on-call experience shapes product thinking, and how AI will reshape software engineering and career paths.
AI Snips
Chapters
Books
Transcript
Episode notes
Good Postmortems Go Many Layers Deep
- Great postmortems dig into multiple layers of why, not just the proximate code bug.
- Marc looks for fixes spanning code, testing, organizational processes, and whether repeated patterns merit new services or libraries.
Design Databases To Avoid Client Locking
- Design databases to avoid common operational pitfalls like clients holding locks unexpectedly.
- D-SQL uses multiversion concurrency control and optimistic commit checks so readers never block writers, preventing frequent relational DB outages.
Caches Create Metastable Failure Modes
- Caches create two operational modes: fast when warm and slow/down when cold, which can produce metastable failures.
- Marc prefers full materialized views or scalable backends (e.g., D-SQL, DynamoDB) to avoid cache-empty collapse.






