
Barrchives How Cartesia Edges Out The Big Labs With Audio AI Models, with Founder and CEO, Karan Goel
Mar 26, 2025
Karan Goel, Co-founder and CEO of Cartesia, dives into the future of voice AI and the groundbreaking use of state space models (SSMs) for audio applications. He details his transition from academia at CMU and Stanford to entrepreneurship, emphasizing the innovative efficiency of SSMs over traditional models. Karan also reveals how Cartesia is developing Sonic, an ultra-low latency text-to-speech model, and elaborates on the importance of rapid execution in voice AI, all while navigating the startup landscape.
AI Snips
Chapters
Transcript
Episode notes
SSMs vs. Transformers
- State-space models (SSMs) offer subquadratic scaling with context, unlike transformers' quadratic scaling.
- This makes SSMs more efficient for large context windows and long-running AI systems.
SSMs and Data Types
- SSMs excel at processing signal data like audio and video due to their compressibility.
- Their effectiveness in language modeling remains uncertain due to the lack of large-scale text SSMs.
Startup Focus
- Focus on deeply held beliefs and convictions when choosing a startup idea.
- Constraints can foster breakthrough ideas, unlike the potentially unfocused approach of resource-rich big labs.

