Elixir Wizards

Telemetry & Observability for Elixir Apps at Cars.com with Zack Kayser & Ethan Gunderson

7 snips
Dec 12, 2024
Ethan Gunderson, Principal Software Engineer focused on performance and observability, and Zack Kayser, Senior Software Engineer experienced in running large-scale Elixir systems, discuss telemetry and observability at Cars.com. They cover scaling Elixir for high traffic, tracing and spans, LiveView WebSocket challenges, sampling and storage trade-offs, OpenTelemetry adoption, and practical steps for getting telemetry working in apps.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

Real Traffic Numbers At Cars.com

  • Cars.com routinely handles hundreds of millions and has hit billions of requests per day across HTTP and WebSocket traffic.
  • Zack reports 100,000 to 200,000 open WebSocket connections in quiet mornings, illustrating LiveView scale challenges.
INSIGHT

Better To Have Too Much Telemetry Than Too Little

  • When incidents occur, favor too much telemetry over too little because missing data blocks root-cause analysis.
  • Zack explains they'd rather have massive traces to dig through than no traces at all during an outage.
ADVICE

Sample Traces But Preserve Rare Feature Signals

  • Do sample high-volume traces but be mindful of naive sampling that drops rare-feature visibility.
  • Ethan says Cars samples ~1% and warns random sampling can filter out low-usage features you still need to inspect.
Get the Snipd Podcast app to discover more snips from this episode
Get the app