Super Data Science: ML & AI Podcast with Jon Krohn

977: Attention, World Models and the Future of AI, with Prof. Kyunghyun Cho

29 snips
Mar 24, 2026
Kyunghyun Cho, NYU professor and co-author of the original attention paper, explains the origins of attention and its rapid validation. He discusses sample efficiency and why actively choosing data matters. He debates world models versus latent planning and describes teaching undergrads with coding agents.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

Birth Of Attention From An Intern Idea

  • The attention mechanism emerged from a quick morning idea by intern Dima Badanau and was validated within days.
  • Kyunghyun Cho recounts Dima implementing the prototype and seeing it work after three to four days, sparking the breakthrough in NMT.
INSIGHT

Algorithms Provide The Rare Step Function

  • Scaling AI has three levers: more data, more compute/model size, or better algorithms; universities should focus on algorithmic breakthroughs.
  • Cho explains algorithmic 'aha' moments change the slope of scaling laws and are rare but crucial.
ANECDOTE

UNK Token Demo Revealed Geopolitical Knowledge

  • A demo revealed the model's implicit geopolitical knowledge when UNK tokens were translated differently based on context.
  • Cho describes unknown Korea translated as South Korea in a "friend" context and North Korea in an "enemy" context.
Get the Snipd Podcast app to discover more snips from this episode
Get the app