The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Delivering Neural Speech Services at Scale with Li Jiang - #522

Sep 27, 2021

Li Jiang, a distinguished engineer at Microsoft with 27 years of experience in speech technologies, dives into the rapid advancements in speech recognition. He discusses the trade-offs between hybrid and end-to-end models and their implications for accuracy and service quality. Jiang also highlights the importance of customizing voice solutions for different industries and emphasizes the ethical considerations surrounding text-to-speech technologies. With a forward-looking perspective, he envisions the future of speech services, focusing on achieving human-like communication.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Azure Speech Customization

Azure Speech offers customization features, letting users upload their data to refine models.
User-uploaded data remains private and controlled by the users, enhancing model performance for specific applications.

INSIGHT

Domain Specialization in Speech

Azure Speech aims for generic scenarios but offers customization due to the diverse needs of various domains.
Medical domains require specialized expertise, which motivated Microsoft's acquisition of Nuance.

INSIGHT

Generic vs. Domain-Specific Models

While generic models improve with more data, domain-specific knowledge remains crucial.
A two-legged approach of improving generic models and domain adaptation is necessary.

Get the Snipd Podcast app to discover more snips from this episode

Get the app