The world of voice AI, with Mati Staniszewski of ElevenLabs

271 snips

Apr 14, 2026

Mati Staniszewski

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Voice Turing Test Fails On Orchestration Not Speech

Voice assistants still fail because conversation needs turn-taking, interruption handling, clarification, and tool use, not just speech recognition.
Mati Staniszewski argues text passed the Turing test long ago, while voice only works in narrower domains like support calls.

INSIGHT

Transcription Gets Better When Models Know The Speaker

Speech recognition should become speaker-specific, not purely global, because accent, noise, and room context vary by person.
Mati Staniszewski says ElevenLabs already does diarization and keyword detection, and planned person-specific transcription for settings like healthcare and home devices.

INSIGHT

Controllable Speech Beats Raw Speech To Speech

Voice generation is shifting from best-guess outputs to controllable speech where users can direct pace, pauses, and emotional delivery.
ElevenLabs' V3 and expressive mode let agents react reassuringly to a stressed caller, while Mati Staniszewski still favors cascaded systems for enterprise reliability.

Get the Snipd Podcast app to discover more snips from this episode