
Talk Python To Me #521: Red Teaming LLMs and GenAI with PyRIT
101 snips
Sep 29, 2025 Tori Westerhoff leads operations for Microsoft's AI Red Team, focusing on high-risk generative AI systems, while Roman Lutz develops automation tools like PyRIT for enhanced adversarial testing. They discuss the growing threat landscape of prompt injection and the vulnerabilities facing LLM applications. Tori and Roman explore how automation can revolutionize red teaming, detailing their framework's ability to streamline testing and improve security. Insights on integrating human oversight and minimizing cognitive load highlight the delicate balance between automation and expert judgment.
AI Snips
Chapters
Transcript
Episode notes
Adversarial LLMs Attack Other LLMs
- PyRIT uses adversarial LLMs to generate attacks and a third model to judge success, automating creative exploitation.
- This leverages LLM strengths both to attack and to evaluate results at scale.
Fine-Tune A Model For Attacks (Safely)
- Fine-tune a model for red-teaming so it won't refuse harmful-query generation, but restrict its use internally.
- Use adversarial fine-tuning only in controlled environments for testing.
Use Composite Scoring
- Build composite scorers using multiple detectors or human review to accurately label successful attacks.
- Combine refusals, content detectors, and human-in-loop review to reduce false positives.
