
The AI in Business Podcast Why Ensemble Architectures Win Against Real-Time Voice Risk - with Mike Pappas of Modulate
12 snips
Mar 20, 2026 Mike Pappas, Co‑founder and CEO at Modulate, builds audio-native voice intelligence for real-time fraud and deepfake detection. He explains why live voice needs specialized, multi-model listening to catch social engineering and adversarial audio signals. The conversation covers ensemble audio models, where they outperform text systems, and how to evaluate voice-AI by speed, accuracy, and adaptability.
AI Snips
Chapters
Transcript
Episode notes
Catching Fraud In The Act Prevents The Largest Harms
- Real-time detection matters because the worst harms occur when fraud is only noticed after funds are gone.
- Mike Pappas explains prevention avoids immediate losses, regulatory exposure, and long-term loss of customer trust that post hoc detection cannot repair.
Prevention Steps Can Introduce Costly Friction
- Preventative measures can create hidden costs like longer call flows, regulatory exposure from biometric storage, and increased staffing needs.
- Mike describes how adding voice ID steps can add minutes of friction and new compliance risks that reduce throughput.
LLMs Miss Voice Cues Critical To Fraud Detection
- General-purpose LLMs struggle for fraud detection because they're sycophantic and text-native, losing vocal nuance and adversarial signals.
- Mike notes voice contains cues (emotion, timbre, background) that transcripts alone cannot capture for adversarial detection.

