ElevenLabs' Mati Staniszewski: How Voice Becomes the Interface for Everything

137 snips

May 8, 2026

Mati Staniszewski, co-founder and CEO of ElevenLabs, builds audio AI for text-to-speech, speech-to-text, and voice agents. He recounts the company's origin solving dubbing and accessibility. He explains why audio was overlooked, ElevenLabs' early monetization and scaling choices, breakthroughs in emotionality and voice cloning, and how voice will become the primary interface for agents, robots, and next-gen computing.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Monetize Early To Preserve Independence

Monetize early to stay financially independent and fund model development instead of raising indefinitely.
ElevenLabs launched product revenue quickly and then raised when ambitions required large model training budgets.

INSIGHT

Building The Full Audio Stack From Contextual TTS To Music

ElevenLabs built a full audio stack: text-to-speech with contextual emotion, speech-to-text, translation/dubbing, real-time streaming, orchestration, and music generation.
They prioritized combining models into voice agents and expanded into music to capture emotional nuance.

ANECDOTE

First Wow Moments Came From Voice Likeness And Laughter

Early internal milestones included cloning Mati's accented voice and getting the model to laugh, which made interactions feel human.
Viral demos (e.g., Javier Milei, Matthew McConaughey in Spanish/Portuguese) showcased cross-language, familiar-voice dubbing impact.

Get the Snipd Podcast app to discover more snips from this episode

Get the app