What is an Agent Threat Model?
A systematic analysis of threats to an AI agent system: what can go wrong, who might attack it, what assets are at risk, and what controls mitigate each threat. Essential for securing MCP deployments against both adversarial and accidental risks.
WHY IT MATTERS
Threat modelling is the foundation of deliberate security. Without it, security measures are reactive — patching vulnerabilities after they are exploited rather than preventing them by design. For AI agent systems, threat modelling is unusually important because the threat landscape is novel and poorly catalogued.
An agent threat model starts by identifying assets: what data can the agent access? What systems can it modify? What actions can it take? Then it identifies threats: prompt injection from user input, tool poisoning from malicious MCP servers, data exfiltration through tool arguments, privilege escalation through tool chaining, denial of service through resource exhaustion. Each threat is assessed for likelihood and impact.
The MCP-specific elements of a threat model include: trust boundaries between agent and each MCP server, the integrity of tool descriptions and responses, the confidentiality of arguments passed to tools, the availability of the policy enforcement layer, and the authenticity of server identities. Each of these boundaries is a potential attack surface.
A complete threat model maps each identified threat to specific controls. Prompt injection maps to infrastructure-level policy enforcement. Data exfiltration maps to argument validation and output filtering. Privilege escalation maps to tool allowlists and least privilege. This mapping ensures that every identified risk has a corresponding mitigation — and reveals gaps where controls are missing.
HOW POLICYLAYER USES THIS
Intercept is the primary control for many threats identified in an agent threat model. Tool access threats are mitigated by allowlists and denylists. Argument-based threats are mitigated by validation conditions. Volume-based threats are mitigated by rate limiting. Detection gaps are addressed by audit logging. Infrastructure failures are addressed by fail-closed design. When building a threat model for an MCP deployment, Intercept should be mapped as the control for tool-layer threats, with its specific policy configuration documented for each mitigated risk.