
Changelog Master Feed Kaizen! Let it crash (Changelog & Friends #124)
14 snips
Jan 17, 2026 Gerhard Lazu, a reliability expert and Kaizen contributor, returns to discuss the nuances of 'let it crash' philosophy and how it boosts system resilience. He dives into troublesome out-of-memory crashes and shares insights on bandwidth spikes that challenge Varnish. Their humorous investigation into a wildly popular podcast episode reveals surprising patterns of downloads from Asia, leading to questions of scraping or user behavior. Plus, Gerhard showcases innovative tools to monitor system health, paving the way for efficient troubleshooting.
AI Snips
Chapters
Books
Transcript
Episode notes
Hot Regions Drive Sizing Decisions
- A single hot region can dominate load and requires heavier instance sizing and tuned limits.
- Deploy uniform instance sizes then scale down in quiet regions to simplify Fly.io constraints.
AI Reviewed Varnish Stats For Different Audiences
- Gerhard ran Varnish stats through multiple LLMs to get tailored recommendations for sysadmins, devs, and business folks.
- The AI summaries highlighted a healthy cache hit rate but flagged disk allocation fragmentation and disk cache failures.
Run Hourly Multi-Region Health Checks
- Use hourly region-specific checks (hurl) to detect hangs and timeouts early across all Fly regions.
- Run diagnostics that fetch MP3s and feeds from each region to catch intermittent failures proactively.



