AI SRE for Complex Systems

39 snips

Apr 5, 2026

Anish Agarwal, CEO of Traversal and Columbia professor with MIT PhD roots in causal ML and RL, discusses AI-native approaches to observability and SRE. He covers why traditional observability breaks, how AI-generated code explodes telemetry, reframing observability as an AI problem, building a production world model, and the vision of agentic search and self-driving production stacks.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Observability Shows Symptoms Not Causes

Observability gives you eyes into a system but not the reasoning to find root causes.
Anish Agarwal says metrics, logs, and traces produce correlations; humans still must connect causality across thousands of signals during incidents.

INSIGHT

AI Code Is Exploding Telemetry Faster Than Teams Can Learn

AI-generated code is rapidly increasing telemetry volume while team understanding and SRE headcount remain flat.
Anish highlights an expanding gap: more data, less human understanding, and static SRE capacity driving the failure mode.

INSIGHT

Build A Production World Model To Reason At Scale

Traversal builds a production world model plus a causal search engine to map causal relationships across observability data.
They re-index existing logs, metrics, and traces into a representation readable by agentic systems to find root causes at scale.

Get the Snipd Podcast app to discover more snips from this episode

Get the app