
Machine Learning: How Did We Get Here? Machine Learning and Speech Recognition with Kai-Fu Lee
7 snips
May 4, 2026 Kai-Fu Lee, AI pioneer and former president of Google China now leading Sinovation Ventures and 01.AI, reflects on choosing speech recognition and early breakthroughs using hidden Markov models. He traces the rise of deep learning, transformers, and end-to-end models. He also considers whether speech is "solved" and what surprised him about AI’s rapid leaps.
AI Snips
Chapters
Transcript
Episode notes
Switching CMU Speech Research To Hidden Markov Models
- Kai-Fu Lee switched CMU's DARPA speech project from expert systems to Hidden Markov Models (HMMs).
- He credited mentorship from Peter Brown, Raj Reddy's support, abundant data and 20 Spark workstations for enabling his experiments.
Three Practical Changes That Made HMMs Work
- Combining better acoustic features, context-aware HMM structure, and n-gram language models raised word accuracy from ~50% to 96% on the DARPA task.
- Key specifics: mel-frequency cepstral features from Shikano, modeling phoneme context, and bigram/trigram language models.
Why Transformers Outpaced HMMs And Deep Nets
- Progress in speech recognition required model architecture, lots of data, and compute; HMMs gave way to deep learning and then Transformers for larger jumps.
- Transformers provide long-range context via attention, unlike HMMs' limited n-gram context.

