Today we're talking with Kolton Andrus, the Founder and CEO of Gremlin, about what happens to reliability when AI is writing most of the code. Kolton helped build the Chaos Engineering practice of both Amazon and Netflix before starting Gremlin.

In our conversation we talk about scar tissue, the intuition engineers develop from being woken up at 3:00 AM to fix production outages and how AI doesn't have any of it. It generates code in an afternoon that maybe took a team previously weeks to build, but none of those painful lessons come along for the ride.

We dig into why 10x more code might mean 10x more failures. The concept of reliability guardrails, think ethical guardrails, but for keeping your systems up. Why you still have to test in production no matter how good your staging environment is? How Gremlin is rethinking their product for the world where agents, not engineers, are essentially the primary users.And why we're entering a painful, narrow part of the hourglass before AI gets good enough to handle all of this on its own.

AI and Proactive Reliability with Kolton Andrus