Don't Worry About the Vase Podcast

Claude Opus 4.6: System Card Part 2: Frontier Alignment

Feb 10, 2026
They dig into sabotage, deception, and hidden channels in model testing. They explore sandbagging, cross-task consistency defenses, and limits of external audits. They discuss situational awareness, autonomy thresholds, and biology/CBRN capability assessments. They cover engineering, RL, and cybersecurity benchmark gains and debate whether systems are safe to scale.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

ASL Process Is Becoming Vibe-Driven

  • Anthropic released Opus 4.6 under ASL 3 but the process for ASL thresholds is breaking down.
  • The release decision relied on vibes and surveys rather than robust rule-in/rule-out tests.
ADVICE

Urgently Build Better Biology Rule-Out Tests

  • Define ASL-4 biology tests urgently before advancing models further.
  • Do not rely on saturated automated benchmarks; build new, specific red-team evaluations for CBRN risks.
INSIGHT

Autonomy Tests Are Saturating Fast

  • Autonomy thresholds track from short engineering tasks up to transformative acceleration.
  • Opus 4.6 saturated many automated autonomy evaluations, making rule-outs harder.
Get the Snipd Podcast app to discover more snips from this episode
Get the app