
Last Week in AI #230 - 2025 Retrospective, Nvidia buys Groq, GLM 4.7, METR
479 snips
Jan 7, 2026 Nvidia's monumental $20 billion acquisition of Groq is driving advancements in AI hardware and inference technology. New York's RAISE Act aims to set safety regulations for AI systems, marking a significant legislative move in the sector. The launch of GLM 4.7 represents a major leap in open-source AI coding capabilities. Concerns arise over the escalating costs of long-horizon AI tasks, prompting debates on efficiency and ROI in the evolving AI landscape. Plus, insights into innovative approaches like activation oracles reveal intriguing new methods for model interpretability.
AI Snips
Chapters
Transcript
Episode notes
Harder Benchmarks For Scientific Research
- Frontier Science benchmark targets expert-level scientific tasks with granular rubrics and selection against models.
- It exposes headroom on research tasks where top models score much lower than on Olympiad problems.
From LLM Outputs To Causal Graphs
- Democritus builds causal graphs from LLM-generated causal triples using a geometric transformer layer.
- The method stitches local relations into coherent causal structures for richer world models.
Convergence Of Scientific Representations
- Stronger scientific models converge on similar internal representations across modalities, indicating shared structure.
- Out-of-distribution inputs break that convergence and reveal architecture-specific biases.
