
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) Delivering Neural Speech Services at Scale with Li Jiang - #522
Sep 27, 2021
Li Jiang, a distinguished engineer at Microsoft with 27 years of experience in speech technologies, dives into the rapid advancements in speech recognition. He discusses the trade-offs between hybrid and end-to-end models and their implications for accuracy and service quality. Jiang also highlights the importance of customizing voice solutions for different industries and emphasizes the ethical considerations surrounding text-to-speech technologies. With a forward-looking perspective, he envisions the future of speech services, focusing on achieving human-like communication.
AI Snips
Chapters
Transcript
Episode notes
Azure Speech Customization
- Azure Speech offers customization features, letting users upload their data to refine models.
- User-uploaded data remains private and controlled by the users, enhancing model performance for specific applications.
Domain Specialization in Speech
- Azure Speech aims for generic scenarios but offers customization due to the diverse needs of various domains.
- Medical domains require specialized expertise, which motivated Microsoft's acquisition of Nuance.
Generic vs. Domain-Specific Models
- While generic models improve with more data, domain-specific knowledge remains crucial.
- A two-legged approach of improving generic models and domain adaptation is necessary.

