Inside the 29.5 Million DARPA AI Cyber Challenge: How Autonomous Agents Find & Patch Vulns

49 snips

Nov 6, 2025

Michael Brown, Principal Security Engineer at Trail of Bits and leader of the Buttercup project in DARPA's AI Cyber Challenge, shares insights into building autonomous AI systems for vulnerability detection. He reveals how Buttercup, despite its initial skepticism, impressed with high-quality patch generation thanks to a 'best of both worlds' approach combining AI with traditional methods. Michael also discusses the competition's unique challenges, the importance of robust engineering, and practical tips for applying AI in security tasks. The future of Buttercup aims at automatic bug fixes at scale for the open-source community.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Scoring Favored Complete, Fast Remediation

Patching was weighted far higher than discovery in semifinals: 6 points for a patch vs 2 for a find.
The finals added time decay and bonuses for proofs, incentivizing speed and end-to-end verification.

INSIGHT

Proof Beats Speculation For Maintainers

Early-stage tools should prioritize proof-of-exploit to be useful to maintainers.
Speculative patching finds more bugs but risks high false positives and lower adoption.

INSIGHT

LLMs Already Know Open-Source Code

Commercial LLMs already contain extensive open-source code and vulnerability knowledge.
This made local custom models unnecessary for the competition's open-source targets.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

What does it take to build a fully autonomous AI system that can find, verify, and patch vulnerabilities in open-source software? Michael Brown, Principal Security Engineer at Trail of Bits, joins us to go behind the scenes of the 3-year DARPA AI Cyber Challenge (AICC), where his team's agent, "Buttercup," won second place.

Michael, a self-proclaimed "AI skeptic," shares his surprise at how capable LLMs were at generating high-quality patches . However, he also shared the most critical lesson from the competition: "AI was actually the commodity" The real differentiator wasn't the AI model itself, but the "best of both worlds" approach, robust engineering, intelligent scaffolding, and using "AI where it's useful and conventional stuff where it's useful" .

This is a great listen for any engineering or security team building AI solutions. We cover the multi-agent architecture of Buttercup, the real-world costs and the open-source future of this technology .

Questions asked:

(00:00) Introduction: The DARPA AI Hacking Challenge(03:00) Who is Michael Brown? (Trail of Bits AI/ML Research)(04:00) What is the DARPA AI Cyber Challenge (AICC)?(04:45) Why did the AICC take 3 years to run?(07:00) The AICC Finals: Trail of Bits takes 2nd place(07:45) The AICC Goal: Autonomously find AND patch open source(10:45) Competition Rules: No "virtual patching"(11:40) AICC Scoring: Finding vs. Patching(14:00) The competition was fully autonomous(14:40) The 3-month sprint to build Buttercup v1(15:45) The origin of the name "Buttercup" (The Princess Bride)(17:40) The original (and scrapped) concept for Buttercup(20:15) The critical difference: Finding vs. Verifying a vulnerability(26:30) LLMs were allowed, but were they the key?(28:10) Choosing LLMs: Using OpenAI for patching, Anthropic for fuzzing(30:30) What was the biggest surprise? (An AI skeptic is blown away)(32:45) Why the latest models weren't always better(35:30) The #1 lesson: The importance of high-quality engineering(39:10) Scaffolding vs. AI: What really won the competition?(40:30) Key Insight: AI was the commodity, engineering was the differentiator(41:40) The "Best of Both Worlds" approach (AI + conventional tools)(43:20) Pro Tip: Don't ask AI to "boil the ocean"(45:00) Buttercup's multi-agent architecture (Engineer, Security, QA)(47:30) Can you use Buttercup for your enterprise? (The $100k+ cost)(48:50) Buttercup is open source and runs on a laptop(51:30) The future of Buttercup: Connecting to OSS-Fuzz(52:45) How Buttercup compares to commercial tools (RunSybil, XBOW)(53:50) How the 1st place team (Team Atlanta) won(56:20) Where to find Michael Brown & Buttercup

Resources discussed during the interview:

Trail of Bits