Why Ensemble Architectures Win Against Real-Time Voice Risk - with Mike Pappas of Modulate

12 snips

Mar 20, 2026

Mike Pappas, Co‑founder and CEO at Modulate, builds audio-native voice intelligence for real-time fraud and deepfake detection. He explains why live voice needs specialized, multi-model listening to catch social engineering and adversarial audio signals. The conversation covers ensemble audio models, where they outperform text systems, and how to evaluate voice-AI by speed, accuracy, and adaptability.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Catching Fraud In The Act Prevents The Largest Harms

Real-time detection matters because the worst harms occur when fraud is only noticed after funds are gone.
Mike Pappas explains prevention avoids immediate losses, regulatory exposure, and long-term loss of customer trust that post hoc detection cannot repair.

INSIGHT

Prevention Steps Can Introduce Costly Friction

Preventative measures can create hidden costs like longer call flows, regulatory exposure from biometric storage, and increased staffing needs.
Mike describes how adding voice ID steps can add minutes of friction and new compliance risks that reduce throughput.

INSIGHT

LLMs Miss Voice Cues Critical To Fraud Detection

General-purpose LLMs struggle for fraud detection because they're sycophantic and text-native, losing vocal nuance and adversarial signals.
Mike notes voice contains cues (emotion, timbre, background) that transcripts alone cannot capture for adversarial detection.

Get the Snipd Podcast app to discover more snips from this episode

Get the app