Latent Space: The AI Engineer Podcast

NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo)

359 snips
Mar 10, 2026
Kyle Kranen, an engineering leader behind NVIDIA Dynamo who builds datacenter-scale inference systems. Nader Khalil, a DevRel leader focused on GPU developer UX and Brev’s developer onboarding. They discuss Dynamo’s scale-out inference approach, prefill vs decode disaggregation, Kubernetes-based scaling, SOL (Speed of Light) urgency culture, model‑hardware co-design, long‑context limits, and agent security and tooling.
Ask episode
AI Snips
Chapters
Transcript
Episode notes

Limit Agent Capabilities To Two Of Three

  • Only grant agents two of three powerful capabilities to reduce risk: files, internet, and code execution should never all be enabled together.
  • Nader Khalil recommends isolating combinations (e.g., allow file+code but block internet) to avoid remote injection and unexpected exfiltration.

Brev Made GPU Selection One Click And Fun

  • Nader describes Brev as a one-click developer front page that makes getting an A100 or other GPU as simple as selecting a chip SVG.
  • He built artisanal animated SVG chip visuals in Figma, converted to React SVG with transitions to simplify GPU selection UX.

Speed Of Light Forces Root Cause Deadlines

  • SOL (Speed Of Light) forces teams to reason from theoretical physics limits to create urgency and expose hidden constraints on timelines.
  • Nader explains SOL asks "what is the theoretical minimum" then layers practical constraints back in to break through excuses.
Get the Snipd Podcast app to discover more snips from this episode
Get the app