Eye On A.I.

#320 Carter Huffman: Exploring The Architecture Behind Modulate's Next-Gen Voice AI

23 snips
Feb 11, 2026
Carter Huffman, CTO and co-founder of Modulate, builds real-time voice AI for emotion, intent, deception detection and large-scale moderation. He discusses why live voice understanding matters, the ensemble model architecture that beats foundation models, ultra-low latency scaling for millions of streams, and applications from gaming toxicity to deepfake and fraud detection.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Route To Specialized Models To Save Cost

  • Route each audio segment to specialized, smaller models instead of one large model to save compute and improve accuracy.
  • Use an orchestrator to select the right submodel for the current audio conditions in real time.
INSIGHT

Design Ensembles For Millisecond Latency

  • Low latency with ensembles requires splitting stateful decisions into fast feedforward passes plus slower feedback updates.
  • Modulate pre-registers model choices and tolerates partial results to meet tight latency budgets.
ADVICE

Scale By Separating Feedforward From Feedback

  • Treat each live stream as an independent feedforward computation to scale to millions of streams.
  • Run asynchronous background feedback to update routing and improve future accuracy without blocking realtime responses.
Get the Snipd Podcast app to discover more snips from this episode
Get the app