
The Inference Shift
Stratechery
00:00
Breaking inference into three parts
Ben Thompson breaks inference into pre-fill and two decode steps and explains the memory-bandwidth constraints per token.
Play episode from 02:29
Transcript

Ben Thompson breaks inference into pre-fill and two decode steps and explains the memory-bandwidth constraints per token.