Last Week in AI

#235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon

105 snips
Mar 3, 2026
They cover Anthropic’s Sonnet 4.6 with 1M context and ARC-AGI-2 gains. Google’s Gemini 3.1 Pro and xAI’s Grok 4.2 beta get attention for multimodal demos and multi-agent debate. Big funding and compute moves include Meta–AMD talks, MatX and World Labs raises, and a startup simulating human behavior. Research highlights include deep-thinking tokens, optimizer tricks, attractor states, and Anthropic–Pentagon tensions.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Gemini 3.1 Pro Boosts ARC Performance With Multimodality

  • Google rolled out Gemini 3.1 Pro and achieved a large ARC-AGI-2 jump to 77.1, partly enabled by multimodal capabilities for visual puzzles.
  • Jeremie highlights Google leaning into multimodality and affordable pricing ($2 input, $12 output per million tokens) as strategic advantages.
INSIGHT

Masking Updates Gives Big Optimizer Gains

  • Masked updates in adaptive optimizers (randomly skipping parameter updates) significantly improve training stability and final loss.
  • Andrey and Jeremie discuss Google results showing up to ~19% perplexity reduction on 1B-parameter models using momentum-aligned masking (MAGMA).
INSIGHT

Deep-Thinking Tokens Predict Reasoning Progress

  • Deep-thinking tokens measure reasoning effort by tracking token distribution changes across layers; the deep-thinking ratio predicts answer accuracy better than token length.
  • Jeremie explains they compute Jensen-Shannon divergence layer-to-layer to spot tokens that 'flip-flop' until late layers.
Get the Snipd Podcast app to discover more snips from this episode
Get the app