The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

135 snips
Apr 30, 2025
In this engaging discussion, Nidhi Rastogi, an assistant professor at the Rochester Institute of Technology specializing in Cyber Threat Intelligence, dives into her project CTIBench. She explores the evolution of AI in cybersecurity, emphasizing how large language models (LLMs) enhance threat detection and defense. Nidhi discusses the challenges of outdated information and the advantages of Retrieval-Augmented Generation for real-time responses. She also highlights how benchmarks can expose model limitations and the vital role of understanding emerging threats in cybersecurity.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

CTI Bench Task Categories

  • CTI Bench categorizes tasks into knowledge, reasoning, vulnerability mapping, threat attribution, and severity scoring.
  • Each category tests LLMs on critical, real-world CTI analyst activities for accuracy.
INSIGHT

Mitigation Not Yet Benchmarked

  • CTI Bench focuses more on background knowledge and attribution than mitigation.
  • Developing benchmarks for mitigation and remediation is a future goal currently under research.
ADVICE

Build Benchmark with Trusted Data

  • Build CTI Bench from trustworthy standards like MITRE and GDPR for accurate evaluation.
  • Generate many MCQs with LLM help but thoroughly validate them manually for quality.
Get the Snipd Podcast app to discover more snips from this episode
Get the app