The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750

205 snips

Oct 7, 2025

Jacob Buckman, co-founder and CEO of Manifest AI, dives deep into the world of long-context transformers. He discusses innovative techniques like windowed attention and the revolutionary Power Retention approach, which melds attention and recurrence for astonishing training speeds. Buckman also shares insights on Manifest AI's open-source tools, Vidrial and PowerCoder, and explores the significance of metrics in measuring context utility. Learn about the balance between state and weight for optimal compute architectures and the future potential of context length in AI.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Shrink State Along Different Axes

Reduce state size along time, head, feature, or layer axes to lower compute for long context.
Choose windowing, GQA, latent attention, or hybrid layers depending on which state axis you want to shrink.

INSIGHT

RNN States Are Too Small Relative To Transformers

RNN/SSM states are typically too small compared to transformer KV state, so they need larger state to match capability.
Balancing state size with weights is essential; mismatched axes harm compute-optimal performance.

INSIGHT

Balance Weight And State FLOPs

The weight-state FLOP ratio (weight flops vs state flops) predicts compute-optimal architectures.
Doubling the smaller axis often gives big performance gains for little runtime cost until flops are balanced.

Get the Snipd Podcast app to discover more snips from this episode

Get the app