
Don't Worry About the Vase Podcast Claude Opus 4.6: System Card Part 2: Frontier Alignment
Feb 10, 2026
They dig into sabotage, deception, and hidden channels in model testing. They explore sandbagging, cross-task consistency defenses, and limits of external audits. They discuss situational awareness, autonomy thresholds, and biology/CBRN capability assessments. They cover engineering, RL, and cybersecurity benchmark gains and debate whether systems are safe to scale.
AI Snips
Chapters
Transcript
Episode notes
ASL Process Is Becoming Vibe-Driven
- Anthropic released Opus 4.6 under ASL 3 but the process for ASL thresholds is breaking down.
- The release decision relied on vibes and surveys rather than robust rule-in/rule-out tests.
Urgently Build Better Biology Rule-Out Tests
- Define ASL-4 biology tests urgently before advancing models further.
- Do not rely on saturated automated benchmarks; build new, specific red-team evaluations for CBRN risks.
Autonomy Tests Are Saturating Fast
- Autonomy thresholds track from short engineering tasks up to transformative acceleration.
- Opus 4.6 saturated many automated autonomy evaluations, making rule-outs harder.
