
Hard Fork AI OpenAI Sounds Alarm: Perpetual Agent Prompt Plague
9 snips
Jan 3, 2026 A deep dive into persistent prompt injection threats that can hijack agent reasoning. Real examples show hidden instructions in emails, webpages, and docs can compel harmful actions. Discussion covers OpenAI's layered defenses, red‑team RL attackers that surface novel multi‑step attacks, and practical mitigation tradeoffs like permissions, confirmations, and logging limits.
AI Snips
Chapters
Transcript
Episode notes
Agents Can Be Hijacked By Hidden Prompts
- Prompt injection attacks let web content or emails stealthily override agent behavior and cause harmful actions.
- Jaeden Schafer warns these attacks can appear inside normal-looking messages or embedded site content.
Hidden Test Instructions In A Real Email Example
- Jaeden describes an email with normal text followed by hidden test instructions that tell an agent to execute destructive tasks.
- The example shows an agent could be prompted to log into bank sites or extract credentials without obvious signs.
Prompt Injection Is Likely Persistent
- Industry leaders including OpenAI and the UK's National Cyber Security Centre say prompt injection may never be fully solved.
- Jaeden emphasizes the risk is systemic across agentic browsers from multiple vendors.
