#235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon

105 snips

Mar 3, 2026

They cover Anthropic’s Sonnet 4.6 with 1M context and ARC-AGI-2 gains. Google’s Gemini 3.1 Pro and xAI’s Grok 4.2 beta get attention for multimodal demos and multi-agent debate. Big funding and compute moves include Meta–AMD talks, MatX and World Labs raises, and a startup simulating human behavior. Research highlights include deep-thinking tokens, optimizer tricks, attractor states, and Anthropic–Pentagon tensions.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Gemini 3.1 Pro Boosts ARC Performance With Multimodality

Google rolled out Gemini 3.1 Pro and achieved a large ARC-AGI-2 jump to 77.1, partly enabled by multimodal capabilities for visual puzzles.
Jeremie highlights Google leaning into multimodality and affordable pricing ($2 input, $12 output per million tokens) as strategic advantages.

INSIGHT

Masking Updates Gives Big Optimizer Gains

Masked updates in adaptive optimizers (randomly skipping parameter updates) significantly improve training stability and final loss.
Andrey and Jeremie discuss Google results showing up to ~19% perplexity reduction on 1B-parameter models using momentum-aligned masking (MAGMA).

INSIGHT

Deep-Thinking Tokens Predict Reasoning Progress

Deep-thinking tokens measure reasoning effort by tracking token distribution changes across layers; the deep-thinking ratio predicts answer accuracy better than token length.
Jeremie explains they compute Jensen-Shannon divergence layer-to-layer to spot tokens that 'flip-flop' until late layers.

Get the Snipd Podcast app to discover more snips from this episode

Get the app