AI Snips
Chapters
Transcript
Episode notes
Voice Turing Test Fails On Orchestration Not Speech
- Voice assistants still fail because conversation needs turn-taking, interruption handling, clarification, and tool use, not just speech recognition.
- Mati Staniszewski argues text passed the Turing test long ago, while voice only works in narrower domains like support calls.
Transcription Gets Better When Models Know The Speaker
- Speech recognition should become speaker-specific, not purely global, because accent, noise, and room context vary by person.
- Mati Staniszewski says ElevenLabs already does diarization and keyword detection, and planned person-specific transcription for settings like healthcare and home devices.
Controllable Speech Beats Raw Speech To Speech
- Voice generation is shifting from best-guess outputs to controllable speech where users can direct pace, pauses, and emotional delivery.
- ElevenLabs' V3 and expressive mode let agents react reassuringly to a stressed caller, while Mati Staniszewski still favors cascaded systems for enterprise reliability.



