The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

135 snips

Apr 30, 2025

In this engaging discussion, Nidhi Rastogi, an assistant professor at the Rochester Institute of Technology specializing in Cyber Threat Intelligence, dives into her project CTIBench. She explores the evolution of AI in cybersecurity, emphasizing how large language models (LLMs) enhance threat detection and defense. Nidhi discusses the challenges of outdated information and the advantages of Retrieval-Augmented Generation for real-time responses. She also highlights how benchmarks can expose model limitations and the vital role of understanding emerging threats in cybersecurity.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

CTI Bench Task Categories

CTI Bench categorizes tasks into knowledge, reasoning, vulnerability mapping, threat attribution, and severity scoring.
Each category tests LLMs on critical, real-world CTI analyst activities for accuracy.

INSIGHT

Mitigation Not Yet Benchmarked

CTI Bench focuses more on background knowledge and attribution than mitigation.
Developing benchmarks for mitigation and remediation is a future goal currently under research.

ADVICE

Build Benchmark with Trusted Data

Build CTI Bench from trustworthy standards like MITRE and GDPR for accurate evaluation.
Generate many MCQs with LLM help but thoroughly validate them manually for quality.

Get the Snipd Podcast app to discover more snips from this episode

Get the app