#521: Red Teaming LLMs and GenAI with PyRIT

101 snips

Sep 29, 2025

Tori Westerhoff leads operations for Microsoft's AI Red Team, focusing on high-risk generative AI systems, while Roman Lutz develops automation tools like PyRIT for enhanced adversarial testing. They discuss the growing threat landscape of prompt injection and the vulnerabilities facing LLM applications. Tori and Roman explore how automation can revolutionize red teaming, detailing their framework's ability to streamline testing and improve security. Insights on integrating human oversight and minimizing cognitive load highlight the delicate balance between automation and expert judgment.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Adversarial LLMs Attack Other LLMs

PyRIT uses adversarial LLMs to generate attacks and a third model to judge success, automating creative exploitation.
This leverages LLM strengths both to attack and to evaluate results at scale.

ADVICE

Fine-Tune A Model For Attacks (Safely)

Fine-tune a model for red-teaming so it won't refuse harmful-query generation, but restrict its use internally.
Use adversarial fine-tuning only in controlled environments for testing.

ADVICE

Use Composite Scoring

Build composite scorers using multiple detectors or human review to accurately label successful attacks.
Combine refusals, content detectors, and human-in-loop review to reduce false positives.

Get the Snipd Podcast app to discover more snips from this episode

Get the app