Cheeky Pint

The world of voice AI, with Mati Staniszewski of ElevenLabs

271 snips
Apr 14, 2026
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Voice Turing Test Fails On Orchestration Not Speech

  • Voice assistants still fail because conversation needs turn-taking, interruption handling, clarification, and tool use, not just speech recognition.
  • Mati Staniszewski argues text passed the Turing test long ago, while voice only works in narrower domains like support calls.
INSIGHT

Transcription Gets Better When Models Know The Speaker

  • Speech recognition should become speaker-specific, not purely global, because accent, noise, and room context vary by person.
  • Mati Staniszewski says ElevenLabs already does diarization and keyword detection, and planned person-specific transcription for settings like healthcare and home devices.
INSIGHT

Controllable Speech Beats Raw Speech To Speech

  • Voice generation is shifting from best-guess outputs to controllable speech where users can direct pace, pauses, and emotional delivery.
  • ElevenLabs' V3 and expressive mode let agents react reassuringly to a stressed caller, while Mati Staniszewski still favors cascaded systems for enterprise reliability.
Get the Snipd Podcast app to discover more snips from this episode
Get the app