What is the Lethal Trifecta?
The Lethal Trifecta is Simon Willison's term for the combination of three agent capabilities — access to private data, exposure to untrusted content, and the ability to communicate externally — that together make data exfiltration via prompt injection possible.
WHY IT MATTERS
Each capability is harmless on its own. An agent that reads your private documents but never sees attacker-controlled text cannot be tricked. An agent exposed to untrusted web pages but with no way to send data out cannot leak anything. The danger appears only when all three are present: an attacker plants instructions in content the agent will read, the model follows them, and the agent's own tools carry private data out.
The trifecta matters because it reframes agent security as a capability-combination problem rather than a model-alignment problem. Prompt injection has no reliable model-level fix, so the practical defence is to ensure no single agent session holds all three legs at once. This is a tractable engineering decision, not a research problem.
- Private data — file systems, email, internal databases, anything an MCP tool can read on your behalf.
- Untrusted content — web pages, issues, emails, or tool results authored by someone other than you. See indirect prompt injection.
- External communication — any channel that can move data out: HTTP requests, sending messages, creating public pull requests.
Willison's canonical example is the GitHub MCP exploit, where a single server combined all three: reading attacker-filed public issues, accessing private repositories, and opening pull requests that exfiltrated the private data.
HOW POLICYLAYER USES THIS
PolicyLayer's gateway gives teams a deterministic way to break the trifecta. Because every tools/call from every connected server passes through one policy evaluation point, you can write rules that deny the externally-communicating leg whenever a session also has private-data tools enabled — for example, blocking outbound network tools for agents scoped to internal repositories. Per-person scoped tokens keep the capability split enforced per user rather than relying on each client's configuration.