MCP Security
The Model Context Protocol ships with no built-in authorisation, no rate limits, no spending controls, and no audit trail. The public MCP ecosystem has grown past 10,000 servers since its introduction, with millions of SDK downloads a month across Python, TypeScript, Java, and Rust. Security is each deployment's responsibility — and most deployments get it wrong.
This is the canonical reference for how MCP fails in production and what it takes to run it safely. It covers the attack surface, the documented classes of vulnerability, how PolicyLayer classifies MCP tools by risk (currently 21,783 tools across 1,537 servers in our catalogue), and the enforcement architecture that works regardless of which agent framework, client, or server you use.
Written for platform engineers, security leads, and CTOs who need to run AI agents against MCP tools in production.
Why MCP is hard to secure
MCP's appeal is exactly what makes it hard to secure: it is a uniform protocol that lets any agent call any tool exposed by any server. Authentication is handled by the transport; authorisation is not defined by the protocol at all. Every MCP server decides for itself what its tools do, who can call them, under what conditions — and most servers make no decision at all.
The protocol has no opinion on authorisation
Read the current MCP specification (2025-11-25) and you will find detailed sections on transport, message format, and OAuth. You will not find a section titled "authorisation" that defines which caller can invoke which tool under which conditions. That decision is deferred to implementers. In practice, deferred decisions mean absent decisions: most MCP servers either authenticate and then allow every tool, or do not authenticate at all. A 2026 survey found that 53% of production MCP servers rely on static API keys and only 8.5% use OAuth — the remaining 38.5% use no authentication whatsoever.
The agent cannot be trusted to self-enforce
Prompt-based guardrails — "do not call delete_repository", "only use this tool for X" — live inside the model's context. A sufficiently capable agent can reason around them: the original prompt injection paper (Greshake et al., 2023) established that instructions in retrieved data can override system prompts. Production incidents have confirmed this repeatedly: the Invariant Labs GitHub MCP disclosure (May 2025) showed an agent following instructions embedded in a GitHub issue, and the Supabase Cursor incident (July 2025) showed an agent chaining private reads with public writes under a single legitimate session. The agent was not compromised. It was following instructions it received through a trusted channel.
Tools run with inherited privileges
MCP servers typically run as child processes with the user's full filesystem, environment variables, and network access. Any credential in the host's environment is readable by the server. Any file the user can read, the server can read. When a server is compromised — through a malicious package, a rug pull, or a supply chain attack — the blast radius is not "the MCP tools" but "everything the user can do on that machine". The April 2026 Ox Security disclosure formalised this as an architectural property of the MCP SDKs, not a bug: arbitrary command execution during server launch is intended behaviour.
What MCP actually exposes
The attack surface of an MCP deployment is larger than most threat models account for. It is not just "the tools" — it is the tools, the credentials that authorise those tools, and the transport flows that carry the calls.
Tool calls
The primary surface. Every tool is a function the agent can invoke with arguments it chooses. An agent with access to execute_query can read any row the database user can read. An agent with delete_repository can remove any repo the token authorises. The specific tools exposed determine the ceiling of what any compromise can achieve. Browse the classified tool catalogue or filter by severity to see the distribution.
Server credentials
Each MCP server connects to some upstream system — a database, a Git host, a payment rail — using credentials the operator has configured. Those credentials sit in environment variables or on disk. When the server is compromised, the credentials are the prize: they grant durable access to the upstream system long after the attack is over. Scoped tokens, short expiries, and least-privilege access keys all matter more in an MCP context than in a typical server deployment. Browse the policy library for server-specific credential posture recommendations.
Transport flows
MCP supports stdio (local child process), Server-Sent Events (SSE — legacy), and Streamable HTTP. Each transport has its own attack surface: stdio is vulnerable to command injection during launch, HTTP is vulnerable to session hijacking if session identifiers are predictable or unbound to the authenticated principal. Four CVEs have been assigned against MCP SDK HTTP implementations in 2025–2026.
The four categories of MCP risk
The MCP Attack Database organises 18 documented attack classes into four categories. Each category maps to a distinct layer of the stack where the failure happens, and each responds to a different defensive pattern.
Protocol-level attacks
Flaws in the transport or the protocol itself — authentication bypasses, command injection during server launch, poisoned tool metadata, typosquatted packages, token mis-redemption across servers. These live below the application layer; no amount of careful tool design inside an MCP server saves a deployment from a broken protocol boundary. CVE-2026-33032 (MCPwn) is the exemplar: a single missing middleware call let unauthenticated attackers invoke every tool nginx-ui exposed. See all protocol-level attacks →
Agent behaviour attacks
Attacks that exploit how the agent reasons about tool calls — destructive autonomy, runaway loops, confused deputies, prompt injection via tool results, indirect prompt injection. Each individual call is authorised; the damage emerges from composition, speed, or absent upper bounds. Access control alone cannot stop these; policy needs to reason about sequences of calls, not just single calls. See all agent behaviour attacks →
Credential and data attacks
Attacks that target credentials, sessions, and sensitive data flowing through MCP tool calls — privilege escalation via over-scoped tokens, data exfiltration via tool chaining, credential leaks via verbose error messages, HTTP session hijacking. The common thread: an agent is a reliable indirect channel for whatever the server sees. See all credential and data attacks →
Supply chain attacks
Attacks on the MCP server ecosystem itself — compromised npm/PyPI packages, typosquatted servers, rug pulls, backdoored community servers. Sixteen thousand servers are published across unofficial registries with no identity verification and no required signing. This is the cheapest class of attack in the stack: publish, wait, exploit. See all supply-chain attacks →
Known attack classes
18 distinct attack patterns are documented in the MCP Attack Database, each with a verified real-world incident or a reproduced proof of concept. A non-exhaustive tour of the most cited:
MCPwn (CVE-2026-33032) — CVSS 9.8 authentication bypass in nginx-ui's MCP integration. Two HTTP requests gave unauthenticated attackers full control over ~2,600 publicly-reachable nginx servers. Added to VulnCheck's Known Exploited Vulnerabilities list in April 2026.
MCP STDIO command injection — Ox Security disclosed a systemic flaw in Anthropic's official MCP SDKs: pass a malicious command during server launch and it executes regardless of whether the server starts. Affects 150M+ SDK downloads across Python, TypeScript, Java, and Rust. Anthropic declined to patch, calling it "expected behaviour".
Prompt injection via tool results — Invariant Labs demonstrated in May 2025 that instructions embedded in a GitHub issue body would cause the GitHub MCP agent to chain a private repo read with a public issue post, exfiltrating data under a single legitimate session.
Destructive action autonomy — Amazon's Kiro agent, given a task to fix a bug, autonomously decided to delete and recreate an AWS Cost Explorer production environment. Thirteen-hour outage. Replit's agent destroyed the SaaStr production database under near-identical conditions.
MCP rug pull — a server initially exposes benign tools to earn one-time approval, then silently changes tool definitions after the fact. The MCP spec has no mechanism to track tool drift or require re-approval, so agents keep calling tools whose meaning has shifted.
Session hijacking in HTTP MCP — predictable or unbound Mcp-Session-Id headers let attackers attach to another user's SSE stream, read tool I/O, and inject calls. CVEs assigned against the oatpp, Ruby, TypeScript, and Java MCP SDKs.
Browse the full 18-attack database →
Tool risk classification
PolicyLayer runs a continuous classifier over discovered MCP servers. Every tool we catalogue is assigned a capability category (Read, Write, Execute, Destructive, Financial, Other) and a 1–5 severity score. As of 20 April 2026 our catalogue holds 21,783 tools across 1,537 servers — a subset of the broader MCP ecosystem, weighted toward servers published to public registries (official MCP, npm, Smithery, Glama). Servers are added continuously; severity distribution as it stands today:
- Destructive: 1,123 tools — permanently delete or destroy resources (browse)
- Financial: 116 tools — initiate financial transactions (browse)
- Execute: 1,088 tools — trigger processes and side effects (browse)
- Write: 5,177 tools — create or modify resources (browse)
- Read: 14,192 tools — retrieve data without state change (browse)
- Other: 87 tools — auxiliary operations (browse)
Severity browse
Browse by severity rather than capability when you want to see which tools share the same policy recommendation. Critical-risk tools (Destructive + Financial) should be blocked by default. High-risk tools (Execute) require rate limits and argument validation. See all risk levels →
How classification works
Tools are classified by combining static analysis of the tool's declaration (name, description, argument schema) with observed behaviour in the registry data. The classifier produces a capability category and a numeric severity score. Classifications are refreshed on a 15-minute cycle and re-audited weekly. Every classification is attached to an MCP server in the policy library so operators can generate a starting policy YAML for their specific servers.
How to secure MCP
Five principles cover the defensible ground. Each one is expressible as a transport-layer policy rule; each one is designed to survive the agent's attempts to reason around it.
1. Enforce policy at the transport layer, not the agent
Policy evaluation must happen outside the agent's reasoning loop. A proxy between client and server — for MCP tool calls this means an enforcement binary or a hosted gateway — is the only place where the agent cannot see the rules. Inside the agent, anything is negotiable.
2. Default-deny destructive operations
If an operation is irreversible, block it by default. Enable specific destructive tools only through require_approval actions that pause the call until a human signs off. This is the policy that would have saved Amazon thirteen hours and Replit a production database.
tools:
drop_table:
rules:
- action: deny
reason: "Irreversible destructive operation"
delete_repository:
rules:
- action: require_approval
reason: "Destructive action — requires human review"
3. Rate-limit execute and write tools
Even non-destructive tools become dangerous at machine speed. Per-tool rate limits prevent runaway loops from burning through API quotas or flooding downstream systems. Default limits: 10/minute for execute-class tools, 30/minute for writes, 60/minute for reads.
tools:
create_issue:
rules:
- action: allow
rate_limit:
max: 30
window: 60
4. Pin tool manifests and detect drift
Capture the tool schema at approval time and reject any call whose schema has drifted. This neutralises rug-pull attacks where a server changes tool definitions after earning trust. The MCP specification has no built-in primitive for this yet; transport-layer enforcement is where it can be added.
5. Audit every tool call with full arguments
The audit log is the compliance artefact and the forensic record. It should include the caller identity, the tool name, the arguments, the policy decision, the timestamp, and — where relevant — a cryptographic signature that makes the log tamper-evident. Without this, post-incident investigation degrades to guesswork.
Production checklist
Before deploying an MCP-connected agent to production, confirm every item below:
- Every MCP server the agent can reach runs behind a transport-layer policy engine.
- Destructive operations default-deny; explicitly allowed tools require human approval.
- Financial operations require per-transaction spend caps and recipient allowlists.
- Execute-class tools have rate limits enforced per agent session.
- Tool schemas are pinned at approval time; drift triggers re-approval or a block.
- Agent credentials follow least-privilege — no long-lived admin tokens.
- Session identifiers are bound to the authenticated principal (RFC 8707-compliant).
- Error messages from MCP servers are sanitised before re-entering agent context.
- Every tool call produces an immutable audit record.
- A kill switch exists — a single config change that blocks all traffic.
- Incident response playbook names the on-call human for human-approval decisions.
- Policy is reviewed and updated on a schedule as the tool catalogue evolves.
Standards and frameworks
MCP security sits inside a broader regulatory and standards landscape. The most relevant touchpoints:
OWASP Agentic Top 10
Published by the OWASP Gen AI Security Project, the Agentic Top 10 catalogues the ten most significant risk categories for autonomous AI agents. Microsoft's Agent Governance Toolkit (April 2026) claims coverage of all ten; the MCP Attack Database's taxonomy overlaps substantially with it.
MCP specification security requirements
The current MCP specification (2025-11-25) mandates audience-bound tokens via RFC 8707 Resource Indicators and prohibits token passthrough between servers. Future revisions are expected to address audit trails and SSO-integrated auth per the 2026 MCP Roadmap.
RFC 8707 — OAuth 2.0 Resource Indicators
Binds an OAuth access token to a specific resource server so tokens cannot be replayed across MCP servers. Introduced in the MCP spec's 2025-06-18 revision and made mandatory in 2025-11-25. Client conformance is still uneven, so operators should treat cross-server token reuse as a present risk until clients catch up.
NIST AI Risk Management Framework
The NIST AI RMF covers agent-specific risk management in its adaptations. Transport-layer enforcement maps cleanly to the Govern, Map, Measure, and Manage functions — the audit log produced by policy evaluation is the evidence artefact these frameworks require.
The MCP security ecosystem
The MCP security category emerged through 2025 and 2026 as production deployments hit predictable walls. Several approaches have taken shape.
Where PolicyLayer fits
PolicyLayer operates at the MCP transport layer. Our open-source Intercept proxy evaluates every tool call against YAML-defined policy before execution. The hosted gateway runs the same engine for teams that prefer managed infrastructure. The control plane dashboards visibility across fleets of MCP-connected agents. The classifier feeds the tool catalogue that makes this page possible.
Complementary approaches
Framework-native governance (agent framework hooks, callback handlers) handles risks that live inside the agent's reasoning loop but cannot cover MCP servers outside the operator's control. Identity infrastructure (SSO, SCIM, enterprise identity providers) solves authentication but not authorisation on individual tool calls. Observability tools (LLM tracing, token metering) produce visibility without enforcement. Each layer addresses a different gap; none substitutes for transport-layer enforcement.
The architectural consensus is forming. Cloudflare's April 2026 reference architecture for enterprise MCP deployment validates the transport-layer pattern: centralised MCP server portals with aggregated policy, OAuth-backed auth, and shadow-MCP detection at the network layer. Microsoft's Agent Governance Toolkit (April 2026) ships a sub-millisecond enforcement engine integrated with agent frameworks. Both point to the same conclusion: policy has to live outside the agent and outside the server. PolicyLayer is built on that premise, with the additional guarantees that the enforcement engine runs anywhere (not just one vendor's cloud), the policy language supports stateful rules (rate limits, spend caps, human approval, not just allow/deny), and the catalogue of classified tools is public and reusable across deployments.
What the protocol will and won't fix
The MCP specification is actively evolving. RFC 8707 landed. Audit trail primitives, SSO-integrated auth, and gateway behaviour are on the 2026 roadmap. But architectural properties that exist by design — STDIO command execution, agents' inability to self-enforce, tool manifests that can drift after approval — will not be solved by protocol changes. Those have to live outside the protocol.
Frequently asked questions
Is MCP safe for production?
Out of the box, no. The Model Context Protocol ships with no built-in authorisation, no rate limits, no spend caps, and no audit trail. Every production deployment needs an enforcement layer added on top — either a policy engine in front of every MCP server, or comprehensive controls inside the agent framework. Once that layer is in place, MCP can run safely at scale.
What's the biggest MCP security risk?
Destructive action autonomy — agents with write or delete permissions on production systems making irreversible decisions faster than a human can review. The Amazon Kiro incident (December 2025) is the canonical example: a coding agent deleted and recreated an AWS Cost Explorer environment, causing a 13-hour outage. Any production MCP deployment with destructive tools enabled without approval gates shares the same exposure.
How do I prevent prompt injection in MCP?
You can't fully prevent it — you have to contain it. Prompt guardrails in the system prompt can always be reasoned around by a capable agent. Transport-layer enforcement cannot. Block destructive tools by default, rate-limit writes, require approval on external sends, validate tool arguments before forwarding. The injection may still happen; the damage is bounded by policy.
Can agents bypass MCP policy enforcement?
They can bypass enforcement inside their own context (prompt guardrails, framework hooks they can observe) because a sufficiently capable agent can reason around constraints in its reasoning loop. They cannot bypass enforcement at the transport layer, because the evaluator runs outside the agent's reasoning process and rejects the call before execution. That's why the architectural boundary matters.
What's the difference between prompt guardrails and transport enforcement?
Prompt guardrails are instructions to the model ("do not call delete_repository"). They live inside the agent's context and depend on the model's compliance. Transport enforcement is a proxy that sits between the agent and the MCP server, evaluating every tool call against a deterministic policy. The agent never sees the rules; it only sees the allow/deny decision. One is probabilistic, the other is guaranteed.
Does MCP support OAuth?
Yes. The MCP specification defines an OAuth flow for HTTP transports, and the 2025-11-25 revision mandates RFC 8707 Resource Indicators so tokens cannot be replayed across servers. STDIO transports typically use static credentials (API keys, service account tokens). Client conformance is still uneven, so operators should treat cross-server token reuse as a present risk.
Can I audit every MCP tool call?
Only if something in the path is recording them. MCP itself does not emit an audit trail. A transport-layer proxy is the natural place to produce one — it sees every call, every argument, every decision, and can emit an immutable log with cryptographic signing if required for compliance.
What's the OWASP Agentic Top 10?
A framework published by OWASP's Gen AI Security Project cataloguing the ten most significant risk categories for autonomous AI agents — including token mismanagement, prompt injection, tool misuse, and privilege escalation. Microsoft's Agent Governance Toolkit (April 2026) is notable for claiming coverage of all ten. The categories overlap substantially with the MCP Attack Database taxonomy.
How do I block destructive operations by default?
Classify every tool on your MCP servers into risk categories (Read, Write, Execute, Destructive, Financial). In your policy engine, set a default-deny rule for the Destructive and Financial categories. Enable specific destructive tools only with a require_approval action, which pauses the call until a human explicitly approves. PolicyLayer's Intercept proxy supports this with a few lines of YAML.
Is there a standard MCP threat model?
Not yet formalised by the MCP project. The closest industry references are OWASP Agentic Top 10 and the MCP Attack Database maintained here at PolicyLayer. The 2026 MCP Roadmap explicitly names threat model formalisation as an enterprise-readiness priority, alongside audit trails and SSO-integrated auth.
Secure your MCP deployment
One command scans your MCP config and generates enforcement policies for every server in it.
npx -y @policylayer/intercept init