Machine Learning and Speech Recognition with Kai-Fu Lee

7 snips

May 4, 2026

Kai-Fu Lee, AI pioneer and former president of Google China now leading Sinovation Ventures and 01.AI, reflects on choosing speech recognition and early breakthroughs using hidden Markov models. He traces the rise of deep learning, transformers, and end-to-end models. He also considers whether speech is "solved" and what surprised him about AI’s rapid leaps.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Switching CMU Speech Research To Hidden Markov Models

Kai-Fu Lee switched CMU's DARPA speech project from expert systems to Hidden Markov Models (HMMs).
He credited mentorship from Peter Brown, Raj Reddy's support, abundant data and 20 Spark workstations for enabling his experiments.

INSIGHT

Three Practical Changes That Made HMMs Work

Combining better acoustic features, context-aware HMM structure, and n-gram language models raised word accuracy from ~50% to 96% on the DARPA task.
Key specifics: mel-frequency cepstral features from Shikano, modeling phoneme context, and bigram/trigram language models.

INSIGHT

Why Transformers Outpaced HMMs And Deep Nets

Progress in speech recognition required model architecture, lots of data, and compute; HMMs gave way to deep learning and then Transformers for larger jumps.
Transformers provide long-range context via attention, unlike HMMs' limited n-gram context.

Get the Snipd Podcast app to discover more snips from this episode

Get the app