Don't Worry About the Vase Podcast

Claude Opus 4.6: System Card Part 2: Frontier Alignment

Feb 10, 2026

They dig into sabotage, deception, and hidden channels in model testing. They explore sandbagging, cross-task consistency defenses, and limits of external audits. They discuss situational awareness, autonomy thresholds, and biology/CBRN capability assessments. They cover engineering, RL, and cybersecurity benchmark gains and debate whether systems are safe to scale.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

ASL Process Is Becoming Vibe-Driven

Anthropic released Opus 4.6 under ASL 3 but the process for ASL thresholds is breaking down.
The release decision relied on vibes and surveys rather than robust rule-in/rule-out tests.

ADVICE

Urgently Build Better Biology Rule-Out Tests

Define ASL-4 biology tests urgently before advancing models further.
Do not rely on saturated automated benchmarks; build new, specific red-team evaluations for CBRN risks.

INSIGHT

Autonomy Tests Are Saturating Fast

Autonomy thresholds track from short engineering tasks up to transformative acceleration.
Opus 4.6 saturated many automated autonomy evaluations, making rule-outs harder.

Get the Snipd Podcast app to discover more snips from this episode