
Machine Learning: How Did We Get Here? Reinforcement Learning with Rich Sutton
Mar 16, 2026
Rich Sutton, research scientist and professor celebrated for foundational work in reinforcement learning and a 2024 Turing Award co-winner. He defines learning from trial and error, traces RL’s historical roots, explains temporal-difference learning, and contrasts RL with supervised approaches. He discusses early successes like TD-Gammon and AlphaGo, limits of deep learning representations, and open problems in continual representation learning.
AI Snips
Chapters
Transcript
Episode notes
Reinforcement Learning Recovers Learning From Experience
- Reinforcement learning reframes learning as trial-and-error from ordinary experience rather than supervised examples.
- Sutton traces this lineage to early AI and Harry Kopf, arguing RL recovers the lost thread of learning from rewards and penalties.
Temporal Difference Learning Links Prediction Change To Credit
- Temporal-difference learning tracks changes in reward predictions over time and uses that prediction error to assign credit.
- Sutton connects TD learning to animal secondary reinforcement where cues that predict reward become reinforcing themselves.
TD-Gammon Demonstrated Practical Reinforcement Learning
- TD-Gammon was the first clear practical win, using TD Lambda to reach world-champion backgammon level in the 1990s.
- Sutton cites Jerry Tesauro's 1992 TD-Gammon paper as the initial convincing demonstration of RL's practicality.

