What is an Agent Threat Model?

2 min read Updated

A systematic analysis of threats to an AI agent system: what can go wrong, who might attack it, what assets are at risk, and what controls mitigate each threat. Essential for securing MCP deployments against both adversarial and accidental risks.

WHY IT MATTERS

Threat modelling is the foundation of deliberate security. Without it, security measures are reactive — patching vulnerabilities after they are exploited rather than preventing them by design. For AI agent systems, threat modelling is unusually important because the threat landscape is novel and poorly catalogued.

An agent threat model starts by identifying assets: what data can the agent access? What systems can it modify? What actions can it take? Then it identifies threats: prompt injection from user input, tool poisoning from malicious MCP servers, data exfiltration through tool arguments, privilege escalation through tool chaining, denial of service through resource exhaustion. Each threat is assessed for likelihood and impact.

The MCP-specific elements of a threat model include: trust boundaries between agent and each MCP server, the integrity of tool descriptions and responses, the confidentiality of arguments passed to tools, the availability of the policy enforcement layer, and the authenticity of server identities. Each of these boundaries is a potential attack surface.

A complete threat model maps each identified threat to specific controls. Prompt injection maps to infrastructure-level policy enforcement. Data exfiltration maps to argument validation and output filtering. Privilege escalation maps to tool allowlists and least privilege. This mapping ensures that every identified risk has a corresponding mitigation — and reveals gaps where controls are missing.

HOW POLICYLAYER USES THIS

Intercept is the primary control for many threats identified in an agent threat model. Tool access threats are mitigated by allowlists and denylists. Argument-based threats are mitigated by validation conditions. Volume-based threats are mitigated by rate limiting. Detection gaps are addressed by audit logging. Infrastructure failures are addressed by fail-closed design. When building a threat model for an MCP deployment, Intercept should be mapped as the control for tool-layer threats, with its specific policy configuration documented for each mitigated risk.

FREQUENTLY ASKED QUESTIONS

How is an agent threat model different from a traditional application threat model?
Agent threat models include LLM-specific threats (prompt injection, context poisoning, hallucination), tool-specific threats (excessive agency, tool poisoning, argument injection), and non-determinism (the same agent may behave differently across runs). Traditional models do not account for these.
What framework should I use for agent threat modelling?
Start with STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) applied to each MCP trust boundary. Supplement with the OWASP Top 10 for LLMs for AI-specific threats. Document using standard threat modelling formats (data flow diagrams, threat trees).
How often should I update the threat model?
Whenever the agent's capabilities change — new MCP servers, new tools, changed permissions, model updates, or new deployment environments. Also review after security incidents or when new attack techniques are published for LLM-based systems.

FURTHER READING

Enforce policies on every tool call

Intercept is the open-source MCP proxy that enforces YAML policies on AI agent tool calls. No code changes needed.

npx -y @policylayer/intercept
github.com/policylayer/intercept →
// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.