The Reasoning Show

AI SRE for Complex Systems

39 snips
Apr 5, 2026
Anish Agarwal, CEO of Traversal and Columbia professor with MIT PhD roots in causal ML and RL, discusses AI-native approaches to observability and SRE. He covers why traditional observability breaks, how AI-generated code explodes telemetry, reframing observability as an AI problem, building a production world model, and the vision of agentic search and self-driving production stacks.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Observability Shows Symptoms Not Causes

  • Observability gives you eyes into a system but not the reasoning to find root causes.
  • Anish Agarwal says metrics, logs, and traces produce correlations; humans still must connect causality across thousands of signals during incidents.
INSIGHT

AI Code Is Exploding Telemetry Faster Than Teams Can Learn

  • AI-generated code is rapidly increasing telemetry volume while team understanding and SRE headcount remain flat.
  • Anish highlights an expanding gap: more data, less human understanding, and static SRE capacity driving the failure mode.
INSIGHT

Build A Production World Model To Reason At Scale

  • Traversal builds a production world model plus a causal search engine to map causal relationships across observability data.
  • They re-index existing logs, metrics, and traces into a representation readable by agentic systems to find root causes at scale.
Get the Snipd Podcast app to discover more snips from this episode
Get the app