
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729
135 snips
Apr 30, 2025 In this engaging discussion, Nidhi Rastogi, an assistant professor at the Rochester Institute of Technology specializing in Cyber Threat Intelligence, dives into her project CTIBench. She explores the evolution of AI in cybersecurity, emphasizing how large language models (LLMs) enhance threat detection and defense. Nidhi discusses the challenges of outdated information and the advantages of Retrieval-Augmented Generation for real-time responses. She also highlights how benchmarks can expose model limitations and the vital role of understanding emerging threats in cybersecurity.
AI Snips
Chapters
Transcript
Episode notes
CTI Bench Task Categories
- CTI Bench categorizes tasks into knowledge, reasoning, vulnerability mapping, threat attribution, and severity scoring.
- Each category tests LLMs on critical, real-world CTI analyst activities for accuracy.
Mitigation Not Yet Benchmarked
- CTI Bench focuses more on background knowledge and attribution than mitigation.
- Developing benchmarks for mitigation and remediation is a future goal currently under research.
Build Benchmark with Trusted Data
- Build CTI Bench from trustworthy standards like MITRE and GDPR for accurate evaluation.
- Generate many MCQs with LLM help but thoroughly validate them manually for quality.

