Who documented this first?

The canonical paper is Greshake et al. 2023 (arXiv 2302.12173, AISec '23). Simon Willison's "lethal trifecta" formulation (June 2025) and OpenAI's own public acknowledgement (December 2025) have since made this a widely discussed production risk.

Why is this different from normal prompt injection?

Direct prompt injection requires the attacker to interact with the user or model. Indirect injection requires only that the attacker control one piece of data the agent will eventually retrieve. Since agents routinely read untrusted data — search results, emails, tickets — the attack surface is much larger.

← Attack Database

Part of: MCP Security reference

Indirect Prompt Injection

Q: What is indirect prompt injection?

An attacker plants instructions in data the agent will later retrieve — a webpage, email, document, ticket, calendar invite, PDF, image with hidden text. When the agent reads that data, the instructions enter its context on the same footing as the user's own request. The attacker never speaks to the LLM; they speak to a document the LLM will someday read.

Updated Sun Apr 19 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Agent behaviour verified

Indirect Prompt Injection

Summary

Indirect prompt injection is the root category from which tool-result injection descends. An attacker plants instructions in data that the agent will later retrieve — a webpage, email, document, ticket, calendar invite, PDF, image with hidden text — and waits. When the agent reads that data, the instructions enter its context on the same footing as the user’s own request. Unlike direct prompt injection, the attacker never speaks to the LLM; they speak to a document that the LLM will someday read. The canonical paper is Greshake et al. 2023, and every year since has produced fresh production demonstrations.

How it works

The attacker writes an instruction payload — plain text, hidden HTML, zero-width characters, white-on-white text in a PDF, alt-text in an image, metadata in a calendar invite.
The payload lands somewhere the agent is likely to encounter: a public webpage, a shared Google Doc, an email inbox, a Jira ticket, a wiki page, a product review.
A legitimate user asks the agent a legitimate question whose answer requires reading that data: “summarise my inbox”, “review this PR”, “what’s on my calendar”.
The retrieval step pulls the poisoned content into the agent’s context.
The LLM cannot tell the difference between “the user asked me to…” and “this email told me to…”. It may follow the injected instructions — visiting a URL, exfiltrating data, forwarding emails, calling tools.

Greshake et al. formalised this in 2023 and demonstrated it against Bing’s GPT-4-powered Chat, GPT-4 code completion, and synthetic agents. The paper’s threat taxonomy — data theft, worming, ecosystem contamination, unauthorised API calls — has held up.

Real-world example

Greshake et al., “Not what you’ve signed up for”, arXiv 2302.12173, February 2023. The canonical paper. Submitted 23 February 2023, final revision 5 May 2023, published at the 16th ACM Workshop on AI and Security (AISec ‘23). Authors: Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz. Demonstrated working exploits against Bing Chat (then GPT-4-powered), GPT-4 code completion, and synthetic agents. Showed remote control of the model at inference time, persistent compromise, data theft, worming between documents, and denial of service. Established that processing retrieved prompts is equivalent to arbitrary code execution of the LLM’s tool-use surface. (arxiv.org/abs/2302.12173; Black Hat USA 2023 whitepaper, accessed 19-04-2026.)

ChatGPT Operator data exfiltration via Hacker News page, Johann Rehberger, February 2025. Discussed in the sibling attack page, but belongs here too: the injection payload lived in a GitHub issue title the agent navigated to, not in anything the user typed. The agent extracted a private email address from the user’s logged-in Hacker News session and leaked it through a textarea field. (simonwillison.net, 17-02-2025, accessed 19-04-2026.)

The “lethal trifecta”, Simon Willison, June 2025. Willison, who coined “prompt injection” in 2022, formalised the conditions under which indirect injection becomes catastrophic: the agent has (1) access to private data, (2) exposure to untrusted content, and (3) the ability to communicate externally. Any agent combining these three is exploitable. MCP encourages users to mix-and-match tools in exactly this combination. (simonwillison.net, 16-06-2025, accessed 19-04-2026.)

OpenAI’s December 2025 statement. OpenAI publicly acknowledged that prompt injection against AI browsers “may never be fully solved”, a position echoed by other labs. (techcrunch.com, 22-12-2025; fortune.com, 23-12-2025, accessed 19-04-2026.)

Impact

Exfiltration of any private data the agent can see — email contents, documents, chat histories, source code, secrets.
Unauthorised actions taken “as the user”: sent emails, forwarded invites, transferred funds, approved PRs.
Persistence: an injected instruction can tell the agent to plant the same payload in new documents (worming).
Information-ecosystem contamination — the agent produces summaries shaped by the attacker’s narrative.
Reputation damage when the agent posts attacker-dictated content under the user’s identity.

Detection

Retrieval tools returning payloads containing imperative second-person text (“you must”, “ignore”, “now do X”).
Hidden-character anomalies: zero-width spaces, Unicode tag characters, CSS-hidden text, white-on-white.
Agent tool calls whose target has no provenance in the user’s original request.
Any tool call that sends data outward shortly after a tool call that read externally-authored content.
Divergence between user intent (extracted from the original prompt) and actual tool-call graph.

Prevention

Architecturally, the answer is to break the lethal trifecta. At the transport layer, that means enforcing that a single agent session cannot simultaneously read untrusted content, access private data, and send data outward without an explicit policy decision.

Example PolicyLayer policy for a mixed-capability agent:

version: "1"
description: "Break the lethal trifecta at the transport layer"
default: "allow"
tools:
  web_fetch:
    rules:
      - name: "mark session as tainted once external content is read"
        state:
          counter: "tainted_reads"
          window: "hour"
          increment: 1

  send_email:
    rules:
      - name: "block outbound email after tainted reads"
        conditions:
          - path: "state.web_fetch.tainted_reads"
            op: "lte"
            value: 0
        on_deny: "Session has read untrusted web content; outbound email blocked"

  read_private_repo:
    rules:
      - name: "block private reads after tainted reads"
        conditions:
          - path: "state.web_fetch.tainted_reads"
            op: "lte"
            value: 0
        on_deny: "Session has read untrusted web content; private-repo access blocked"

  post_webhook:
    rules:
      - name: "allowlist outbound destinations"
        conditions:
          - path: "args.url"
            op: "regex"
            value: "^https://hooks\\.internal\\.example\\.com/"
        on_deny: "Outbound webhook target not on allowlist"

The three action: deny style gates are the practical implementation of the lethal-trifecta principle: once the session has consumed untrusted content, its ability to read private data or call outbound tools is revoked for the remainder of the session.

Combine with:

Per-session token scoping so the agent only sees data for the current task.
Content filters that strip instruction-like patterns from retrieved documents before they reach the model.
Egress allowlists on any tool that can send data outside the trust boundary.

Sources

Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — Greshake et al., arXiv:2302.12173 — accessed 19-04-2026
Black Hat USA 2023 whitepaper (PDF) — accessed 19-04-2026
Published version, AISec ‘23, ACM DL — accessed 19-04-2026
The lethal trifecta for AI agents — Simon Willison, 16-06-2025 — accessed 19-04-2026
ChatGPT Operator: Prompt Injection Exploits & Defenses — Simon Willison, 17-02-2025 — accessed 19-04-2026
OpenAI says AI browsers may always be vulnerable to prompt injection attacks — TechCrunch, 22-12-2025 — accessed 19-04-2026
Prompt Injection — Wikipedia — accessed 19-04-2026

Prompt Injection via Tool Results
Confused Deputy
Destructive Action Autonomy

Indirect Prompt Injection

Summary

How it works

Real-world example

Impact

Detection

Prevention

Sources

Related attacks

Take your agents live. Without losing control.