
Will machines ever be intelligent?
Microsoft Research Podcast
00:00
Transformer architecture overview
Nicolò outlines attention and feed-forward layers, tokenization, and why transformers enabled large-scale parallel training.
Play episode from 08:16
Transcript


