Training Data

ElevenLabs' Mati Staniszewski: How Voice Becomes the Interface for Everything

137 snips
May 8, 2026
Mati Staniszewski, co-founder and CEO of ElevenLabs, builds audio AI for text-to-speech, speech-to-text, and voice agents. He recounts the company's origin solving dubbing and accessibility. He explains why audio was overlooked, ElevenLabs' early monetization and scaling choices, breakthroughs in emotionality and voice cloning, and how voice will become the primary interface for agents, robots, and next-gen computing.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Monetize Early To Preserve Independence

  • Monetize early to stay financially independent and fund model development instead of raising indefinitely.
  • ElevenLabs launched product revenue quickly and then raised when ambitions required large model training budgets.
INSIGHT

Building The Full Audio Stack From Contextual TTS To Music

  • ElevenLabs built a full audio stack: text-to-speech with contextual emotion, speech-to-text, translation/dubbing, real-time streaming, orchestration, and music generation.
  • They prioritized combining models into voice agents and expanded into music to capture emotional nuance.
ANECDOTE

First Wow Moments Came From Voice Likeness And Laughter

  • Early internal milestones included cloning Mati's accented voice and getting the model to laugh, which made interactions feel human.
  • Viral demos (e.g., Javier Milei, Matthew McConaughey in Spanish/Portuguese) showcased cross-language, familiar-voice dubbing impact.
Get the Snipd Podcast app to discover more snips from this episode
Get the app