Machine Learning: How Did We Get Here?

Reinforcement Learning with Rich Sutton

Mar 16, 2026
Rich Sutton, research scientist and professor celebrated for foundational work in reinforcement learning and a 2024 Turing Award co-winner. He defines learning from trial and error, traces RL’s historical roots, explains temporal-difference learning, and contrasts RL with supervised approaches. He discusses early successes like TD-Gammon and AlphaGo, limits of deep learning representations, and open problems in continual representation learning.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Reinforcement Learning Recovers Learning From Experience

  • Reinforcement learning reframes learning as trial-and-error from ordinary experience rather than supervised examples.
  • Sutton traces this lineage to early AI and Harry Kopf, arguing RL recovers the lost thread of learning from rewards and penalties.
INSIGHT

Temporal Difference Learning Links Prediction Change To Credit

  • Temporal-difference learning tracks changes in reward predictions over time and uses that prediction error to assign credit.
  • Sutton connects TD learning to animal secondary reinforcement where cues that predict reward become reinforcing themselves.
ANECDOTE

TD-Gammon Demonstrated Practical Reinforcement Learning

  • TD-Gammon was the first clear practical win, using TD Lambda to reach world-champion backgammon level in the 1990s.
  • Sutton cites Jerry Tesauro's 1992 TD-Gammon paper as the initial convincing demonstration of RL's practicality.
Get the Snipd Podcast app to discover more snips from this episode
Get the app