The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Delivering Neural Speech Services at Scale with Li Jiang - #522

Sep 27, 2021
Li Jiang, a distinguished engineer at Microsoft with 27 years of experience in speech technologies, dives into the rapid advancements in speech recognition. He discusses the trade-offs between hybrid and end-to-end models and their implications for accuracy and service quality. Jiang also highlights the importance of customizing voice solutions for different industries and emphasizes the ethical considerations surrounding text-to-speech technologies. With a forward-looking perspective, he envisions the future of speech services, focusing on achieving human-like communication.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Azure Speech Customization

  • Azure Speech offers customization features, letting users upload their data to refine models.
  • User-uploaded data remains private and controlled by the users, enhancing model performance for specific applications.
INSIGHT

Domain Specialization in Speech

  • Azure Speech aims for generic scenarios but offers customization due to the diverse needs of various domains.
  • Medical domains require specialized expertise, which motivated Microsoft's acquisition of Nuance.
INSIGHT

Generic vs. Domain-Specific Models

  • While generic models improve with more data, domain-specific knowledge remains crucial.
  • A two-legged approach of improving generic models and domain adaptation is necessary.
Get the Snipd Podcast app to discover more snips from this episode
Get the app