Reinforcement Learning with Rich Sutton

Mar 16, 2026

Rich Sutton, research scientist and professor celebrated for foundational work in reinforcement learning and a 2024 Turing Award co-winner. He defines learning from trial and error, traces RL’s historical roots, explains temporal-difference learning, and contrasts RL with supervised approaches. He discusses early successes like TD-Gammon and AlphaGo, limits of deep learning representations, and open problems in continual representation learning.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Reinforcement Learning Recovers Learning From Experience

Reinforcement learning reframes learning as trial-and-error from ordinary experience rather than supervised examples.
Sutton traces this lineage to early AI and Harry Kopf, arguing RL recovers the lost thread of learning from rewards and penalties.

INSIGHT

Temporal Difference Learning Links Prediction Change To Credit

Temporal-difference learning tracks changes in reward predictions over time and uses that prediction error to assign credit.
Sutton connects TD learning to animal secondary reinforcement where cues that predict reward become reinforcing themselves.

ANECDOTE

TD-Gammon Demonstrated Practical Reinforcement Learning

TD-Gammon was the first clear practical win, using TD Lambda to reach world-champion backgammon level in the 1990s.
Sutton cites Jerry Tesauro's 1992 TD-Gammon paper as the initial convincing demonstration of RL's practicality.

Get the Snipd Podcast app to discover more snips from this episode

Get the app