AI Agent Security: Threats & Defenses for Modern Deployments

May 21, 2025

Yifeng (Ethan) He, a PhD candidate at UC Davis specializing in software and AI security, and Peter Rong, a researcher focused on vulnerabilities in AI agents, discuss the critical threats facing AI agents. They break down issues like session hijacks and tool-based jailbreaks, highlighting the shortcomings of current defenses. The duo also advocates for effective sandboxing and agent-to-agent protocols, sharing practical strategies for securing AI deployments and emphasizing the importance of a zero-trust approach in agent security.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

How Research Started

Peter described starting AI security research after seeing ChatGPT's code-writing evolution and questioning code safety.
That grew into studying agent attack surfaces beyond just insecure code outputs.

INSIGHT

User Data Can Poison Agents

Fine-tuning agents with user interaction opens poisoning and backdoor risks if training data is untrusted.
Malicious preferences or prompts can steer agent behavior subtly over time.

INSIGHT

Session Hijack Via Context Injection

Session hijacks happen when a tool or prompt forces the agent to 'forget' prior context or accept new malicious context.
Attackers can bias agent decisions like rankings by injecting crafted tool outputs into the chat history.

Get the Snipd Podcast app to discover more snips from this episode

Get the app