// GLOSSARY -- AI AGENT SECURITY

What is the Lethal Trifecta?

2 min read Updated Jun 11, 2026

The Lethal Trifecta is Simon Willison's term for the combination of three agent capabilities — access to private data, exposure to untrusted content, and the ability to communicate externally — that together make data exfiltration via prompt injection possible.

WHY IT MATTERS

Each capability is harmless on its own. An agent that reads your private documents but never sees attacker-controlled text cannot be tricked. An agent exposed to untrusted web pages but with no way to send data out cannot leak anything. The danger appears only when all three are present: an attacker plants instructions in content the agent will read, the model follows them, and the agent's own tools carry private data out.

The trifecta matters because it reframes agent security as a capability-combination problem rather than a model-alignment problem. Prompt injection has no reliable model-level fix, so the practical defence is to ensure no single agent session holds all three legs at once. This is a tractable engineering decision, not a research problem.

Private data — file systems, email, internal databases, anything an MCP tool can read on your behalf.
Untrusted content — web pages, issues, emails, or tool results authored by someone other than you. See indirect prompt injection.
External communication — any channel that can move data out: HTTP requests, sending messages, creating public pull requests.

Willison's canonical example is the GitHub MCP exploit, where a single server combined all three: reading attacker-filed public issues, accessing private repositories, and opening pull requests that exfiltrated the private data.

HOW POLICYLAYER USES THIS

PolicyLayer's gateway gives teams a deterministic way to break the trifecta. Because every tools/call from every connected server passes through one policy evaluation point, you can write rules that deny the externally-communicating leg whenever a session also has private-data tools enabled — for example, blocking outbound network tools for agents scoped to internal repositories. Per-person scoped tokens keep the capability split enforced per user rather than relying on each client's configuration.

See the MCP Security reference →

FREQUENTLY ASKED QUESTIONS

Who coined the term Lethal Trifecta?

Simon Willison, in a June 2025 post on simonwillison.net. It built on his earlier writing on prompt injection, which he also named.

Is removing one leg of the trifecta enough?

It prevents the classic exfiltration attack, yes. An agent can still be manipulated into destructive actions within its remaining capabilities, so least privilege and tool-level policy still apply.

Why not just train models to ignore injected instructions?

No model reliably distinguishes attacker instructions from legitimate content; mitigations are probabilistic. Capability separation is deterministic, which is why it is the recommended defence.

What is the Lethal Trifecta?

WHY IT MATTERS

HOW POLICYLAYER USES THIS

FREQUENTLY ASKED QUESTIONS

FURTHER READING

Let agents act without letting them run wild.

What is the Lethal Trifecta?

WHY IT MATTERS

HOW POLICYLAYER USES THIS

FREQUENTLY ASKED QUESTIONS

RELATED TERMS

FURTHER READING

Let agents act without letting them run wild.