How Cartesia Edges Out The Big Labs With Audio AI Models, with Founder and CEO, Karan Goel

Mar 26, 2025

Karan Goel, Co-founder and CEO of Cartesia, dives into the future of voice AI and the groundbreaking use of state space models (SSMs) for audio applications. He details his transition from academia at CMU and Stanford to entrepreneurship, emphasizing the innovative efficiency of SSMs over traditional models. Karan also reveals how Cartesia is developing Sonic, an ultra-low latency text-to-speech model, and elaborates on the importance of rapid execution in voice AI, all while navigating the startup landscape.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

SSMs vs. Transformers

State-space models (SSMs) offer subquadratic scaling with context, unlike transformers' quadratic scaling.
This makes SSMs more efficient for large context windows and long-running AI systems.

INSIGHT

SSMs and Data Types

SSMs excel at processing signal data like audio and video due to their compressibility.
Their effectiveness in language modeling remains uncertain due to the lack of large-scale text SSMs.

ADVICE

Startup Focus

Focus on deeply held beliefs and convictions when choosing a startup idea.
Constraints can foster breakthrough ideas, unlike the potentially unfocused approach of resource-rich big labs.

Get the Snipd Podcast app to discover more snips from this episode

Get the app