AI Security Podcast

Inside the 29.5 Million DARPA AI Cyber Challenge: How Autonomous Agents Find & Patch Vulns

49 snips
Nov 6, 2025
Michael Brown, Principal Security Engineer at Trail of Bits and leader of the Buttercup project in DARPA's AI Cyber Challenge, shares insights into building autonomous AI systems for vulnerability detection. He reveals how Buttercup, despite its initial skepticism, impressed with high-quality patch generation thanks to a 'best of both worlds' approach combining AI with traditional methods. Michael also discusses the competition's unique challenges, the importance of robust engineering, and practical tips for applying AI in security tasks. The future of Buttercup aims at automatic bug fixes at scale for the open-source community.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Scoring Favored Complete, Fast Remediation

  • Patching was weighted far higher than discovery in semifinals: 6 points for a patch vs 2 for a find.
  • The finals added time decay and bonuses for proofs, incentivizing speed and end-to-end verification.
INSIGHT

Proof Beats Speculation For Maintainers

  • Early-stage tools should prioritize proof-of-exploit to be useful to maintainers.
  • Speculative patching finds more bugs but risks high false positives and lower adoption.
INSIGHT

LLMs Already Know Open-Source Code

  • Commercial LLMs already contain extensive open-source code and vulnerability knowledge.
  • This made local custom models unnecessary for the competition's open-source targets.
Get the Snipd Podcast app to discover more snips from this episode
Get the app