
Last Week in AI #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon
105 snips
Mar 3, 2026 They cover Anthropic’s Sonnet 4.6 with 1M context and ARC-AGI-2 gains. Google’s Gemini 3.1 Pro and xAI’s Grok 4.2 beta get attention for multimodal demos and multi-agent debate. Big funding and compute moves include Meta–AMD talks, MatX and World Labs raises, and a startup simulating human behavior. Research highlights include deep-thinking tokens, optimizer tricks, attractor states, and Anthropic–Pentagon tensions.
AI Snips
Chapters
Transcript
Episode notes
Gemini 3.1 Pro Boosts ARC Performance With Multimodality
- Google rolled out Gemini 3.1 Pro and achieved a large ARC-AGI-2 jump to 77.1, partly enabled by multimodal capabilities for visual puzzles.
- Jeremie highlights Google leaning into multimodality and affordable pricing ($2 input, $12 output per million tokens) as strategic advantages.
Masking Updates Gives Big Optimizer Gains
- Masked updates in adaptive optimizers (randomly skipping parameter updates) significantly improve training stability and final loss.
- Andrey and Jeremie discuss Google results showing up to ~19% perplexity reduction on 1B-parameter models using momentum-aligned masking (MAGMA).
Deep-Thinking Tokens Predict Reasoning Progress
- Deep-thinking tokens measure reasoning effort by tracking token distribution changes across layers; the deep-thinking ratio predicts answer accuracy better than token length.
- Jeremie explains they compute Jensen-Shannon divergence layer-to-layer to spot tokens that 'flip-flop' until late layers.
