Hard Fork AI

OpenAI Sounds Alarm: Perpetual Agent Prompt Plague

9 snips
Jan 3, 2026
A deep dive into persistent prompt injection threats that can hijack agent reasoning. Real examples show hidden instructions in emails, webpages, and docs can compel harmful actions. Discussion covers OpenAI's layered defenses, red‑team RL attackers that surface novel multi‑step attacks, and practical mitigation tradeoffs like permissions, confirmations, and logging limits.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Agents Can Be Hijacked By Hidden Prompts

  • Prompt injection attacks let web content or emails stealthily override agent behavior and cause harmful actions.
  • Jaeden Schafer warns these attacks can appear inside normal-looking messages or embedded site content.
ANECDOTE

Hidden Test Instructions In A Real Email Example

  • Jaeden describes an email with normal text followed by hidden test instructions that tell an agent to execute destructive tasks.
  • The example shows an agent could be prompted to log into bank sites or extract credentials without obvious signs.
INSIGHT

Prompt Injection Is Likely Persistent

  • Industry leaders including OpenAI and the UK's National Cyber Security Centre say prompt injection may never be fully solved.
  • Jaeden emphasizes the risk is systemic across agentic browsers from multiple vendors.
Get the Snipd Podcast app to discover more snips from this episode
Get the app