
The AI Policy Podcast Inside The Second International AI Safety Report with Writers Stephen Clare and Stephen Casper
73 snips
Feb 10, 2026 Stephen Casper, an MIT Ph.D. student focused on technical safeguards for ML risk management, and Stephen Clare, lead writer of the Second International AI Safety Report and former research manager on frontier AI governance, discuss the report’s scope, jagged capability gains, misuse and systemic risks, the AI development lifecycle, data curation, fine-tuning defenses, post-deployment monitoring and policy priorities for evidence and resilience.
AI Snips
Chapters
Transcript
Episode notes
Automation Shortens Red-Team Feedback Loops
- Automated tools now speed red-teaming from minutes to hours, raising defenses rapidly.
- Language-model-assisted exploit discovery both aids attackers and defense improvement.
Use Machine Unlearning For Sensitive Topics
- Apply machine unlearning to suppress knowledge of illicit topics while preserving benign capabilities.
- Use tailored unlearning algorithms to fuzz or block outputs tied to banned content.
Deploy Filters And Monitors Around Models
- Insert system-level filters and monitors between model and user to block harmful inputs or outputs.
- Favor low-cost, efficient detectors where possible to scale safety across many users.
