// PolicyLayer Research · June 2026 edition

The State of MCP Security

What 2,031 MCP servers can actually do to your systems.

We classified every tool on every Model Context Protocol server we could enumerate from the public registries — 31,000 tools across 2,031 working servers. The data shows an ecosystem that hands AI agents wide, dangerous, and almost entirely unannounced control over the systems they touch.

42%
of MCP servers expose a tool that destroys data or executes commands
94%
probability that a five-server stack exposes such a tool. 99% at ten.
96.1%
of MCP tools don't warn the agent about destructive behaviour
49%
of MCP servers that touch money also expose tools that destroy data
Dataset snapshot 1 June 2026 · updated monthly. Methodology in the appendix.

Six things moved meaningfully since last month.

Dataset size
+13.7%
Servers with a parseable tool list grew from 1,787 to 2,031.
Risk shift
+81.9%
SmartBear MCP — risk score moved from 30.81 to 56.03.
New entrant
Top 10
AdButler entered the ten riskiest servers (risk score 172.54).
New entrant
Top 10
Arcane entered the ten riskiest servers (risk score 48.84).
New entrant
Top 10
io.github.JXUE0/opencut-controller entered the ten riskiest servers (risk score 47.56).
New entrant
Top 10
AWS Bedrock AgentCore MCP Server entered the ten riskiest servers (risk score 36.79).

Risk score is a server's tool count multiplied by the average risk weight of its tools — it climbs with both breadth and danger, so a server ranks highly only when it exposes many tools and those tools skew destructive. A change is flagged here when destructive share shifts ±3 points, the dataset size shifts ±10%, any named server's risk score shifts ±10%, or a server newly enters the ten riskiest.

1. What MCP servers actually do

Every tool we found was classified into one of six risk categories: read-only, write, execute, destructive, financial, or other. The chart below shows how many of the 2,031 servers in our dataset expose at least one tool in each category.

Read
1,888 (93%)
Write
1,087 (53.5%)
Execute
600 (29.5%)
Destructive
508 (25%)
Other
71 (3.5%)
Financial
70 (3.4%)

Servers usually expose tools in multiple categories — an integration that lists, creates, and deletes records lands in Read, Write and Destructive simultaneously. Percentages are of the 2,031 servers in the dataset.

2. One in four MCP servers can permanently destroy data

508 servers (25%) expose at least one destructive tool — deleting records, dropping tables, wiping indexes, force-pushing branches, removing cloud resources. These are operations that a human operator would normally guard with a confirmation dialog or a four-eyes review. When invoked through MCP, they fire on the model's first decision.

Another 600 servers (29.5%) can execute arbitrary commands — shell, scripts, container exec, SQL with no read-only enforcement. Combine the two: roughly four in ten MCP servers give an agent a way to do something it cannot easily undo.

508
MCP servers ship a delete-first tool. Most don't ask twice.

Exposure compounds with every server you add

42.2% of MCP servers expose a destructive or execute tool on their own. Stacking servers is the common case — an agent rarely connects to one. If the per-server rate holds independently, the probability that a stack of N servers exposes at least one such tool is 1 − (1 − 0.42)N. It passes 93.5% by the fifth server and 99.6% by the tenth.

N = 1
42.2%
N = 2
66.6%
N = 3
80.7%
N = 4
88.8%
N = 5
93.5%
N = 6
96.3%
N = 7
97.8%
N = 8
98.8%
N = 9
99.3%
N = 10
99.6%
N = 11
99.8%
N = 12
99.9%
N = 13
99.9%
N = 14
100%

Independence is an approximation — tool overlap between servers makes the true figure slightly lower — but the direction holds: multi-server exposure is the default, not the tail.

The single most common destructive verb across the dataset is delete: it appears as the first token of 466 tool names. create, update, and delete together form the standard CRUD trio that virtually every "integration" MCP server ships. The protocol provides no separation between them.

3. The average MCP install gives an AI agent 15.5 tools

The median MCP server exposes 8 tools. The mean is 15.5. The 99th percentile exposes 128. The fattest single server is adbutler, which exposes 622 tools to any agent that connects. The full classifier output for every server we scanned is in our public MCP tool catalogue.

ServerTools
AdButler 622
io.fusionauth/mcp-api 314
io.github.aibtcdev/mcp-server 308
Financial Modeling Prep 253
SmartBear MCP 243
Trello 200
Google Super 200
Arcane 180
io.github.alxpark/propresenter-mcp 177
Leaper Vision Toolkit 169

A server exposing 200+ tools is unauditable in practice. No human reads 200 tool descriptions before installing. The model sees them all by default, and a context-window's worth of tool schemas competes with whatever task the user actually asked for.

40×
The fattest MCP server crowds the agent's context with more than 40× as many tool schemas as the average install — before the user has typed a single character of their actual task.

4. Destructive MCP surface is concentrated in a few servers

Most MCP servers are not dangerous. The 25% headline number obscures a much sharper truth: destructive surface is heavily concentrated in a small minority of servers, while the long tail of CRUD-shaped integrations sits in the middle. Three out of every four MCP servers expose zero destructive tools at all.

Destructive tools per server
0 destructive 1,523 (75%)
1–2 destructive 319 (15.7%)
3–5 destructive 113 (5.6%)
6–10 destructive 46 (2.3%)
11+ destructive 30 (1.5%)
13%
of all destructive surface in the MCP ecosystem is concentrated in just 5 servers — 0.2% of the dataset. The top 10% of destructive-bearing servers (51 servers, 2.4% of the dataset) hold 43.8%.

A handful of servers carry the lion's share of the protocol's risk. They tend to be the same shape: large integrations — identity providers, project-management platforms, all-in-one cloud SDKs — that ship hundreds of CRUD endpoints with equal policy weight, and one or two of those endpoints turn out to be the kill switch.

The classifier's top 5% on its own accounts for an outsized share of the destructive calls an MCP-connected agent could make. The protocol does not surface this asymmetry to the model. The model sees a flat list of tools, with no hint that some are load-bearing for the host system and some are not.

The named riskiest ten

Ranked by risk score — tool count weighted by average per-tool risk. These are the servers carrying the most concentrated destructive surface in the dataset.

# Server Tools Destructive Risk score
1 AdButler 622 105 172.54
2 io.fusionauth/mcp-api 314 41 106.08
3 SmartBear MCP 243 19 56.03
4 io.github.Antonytm/mcp-sitecore-server 153 24 48.95
5 Arcane 180 21 48.84
6 io.github.JXUE0/opencut-controller 161 22 47.56
7 io.github.alxpark/propresenter-mcp 177 14 45.01
8 Trello 200 31 42.62
9 AWS Bedrock AgentCore MCP Server 122 15 36.79
10 Github Mcp Server Sls C4d5e6f7 A8b9 4012 B345 456789012345 138 21 36.6

5. When MCP servers touch money, most can also destroy data

Only 70 MCP servers in our dataset expose financial tools — payments, transfers, wallet operations. They are rare. They are also the cohort with the highest combined risk in the entire ecosystem.

48.6%
of MCP servers that touch money also expose destructive tools.
34 of 70 servers
72.9%
also expose destructive or arbitrary-execute tools.
51 of 70 servers

Across the 34 servers that combine financial and destructive surface, an agent connecting to a single one of them gets, on average, 2.9 ways to destroy data and 2.5 ways to move money. The single worst dual-risk server gives the agent 12 destructive tools and 17 financial tools — 29 ways to either break things irreversibly or move money — in a single MCP install.

72.9%
of MCP servers that touch money also let an agent either destroy data or run a command on the host. The model has to pick the right tool, every time, from descriptions that 96.1% of the time don't warn it about consequences.

The named dual-risk servers

Every server below exposes both financial and destructive tools in a single install. An agent connecting to one of them can move money and delete records without changing context.

Server Tools Destructive Financial
io.github.aibtcdev/mcp-server 308 12 17
Agent Passport System — Cryptographic Identity for AI Agents 150 3 1
xdevplatform/xmcp 135 12 2
hiveagent 122 1 6
Lichess Integration 90 3 3
Lichess Integration 90 3 3
io.github.EmperorMew/voidly-mcp-server 84 3 1
AWS IoT SiteWise MCP Server 72 5 2
MERX - TRON Resource Exchange 66 2 6
Helius 63 1 3
Indigo Protocol MCP 62 3 2
io.github.IndigoProtocol/indigo-mcp 59 3 2
Kosyak Evm 50 1 2
WooCommerce Store Manager 47 4 1
Clareo 45 2 1
Linear 42 3 1
Midnight + Next.js MCP 35 4 1
io.github.NyxToolsDev/quickbooks-mcp-server 34 1 2
Agent0 34 2 1
Name Whisper 34 2 2
AgentPact 32 2 1
Linear MCP Server 32 1 1
PayPal 30 2 3
Jobly — Agent-to-Agent Contract Marketplace 29 2 1
Lunch Money 29 5 1

5.5 Deep dive: the Stripe MCP

Stripe's MCP server exposes 27 tools to any agent that connects. 4% of them are classified destructive and 11% touch money directly. Ranked by risk weight, the three highest are create_refund, finalize_invoice, cancel_subscription. One MCP install hands all of them to the model as a flat list.

What it can move

3 of Stripe's tools are financial — the calls that move balances, charges, refunds, payouts, and transfers. An agent with the server connected can invoke any of them directly, with whatever arguments it infers from the request. In policy terms these are the operations that take money out of the account.

create_refundfinalize_invoicecreate_payment_link

What it can destroy

1 tools are classified destructive — deletes, cancellations, and voids that the same API cannot reverse. None of them carry warning language the model reads before calling; the category is inferred from the verb in the tool name, not declared by the server.

cancel_subscription

What a deny-by-default policy looks like

A deny-by-default posture starts every Stripe tool denied and allows back only the read paths an agent needs — listing charges, retrieving a customer, reading a balance. The 4 destructive and financial tools stay denied unless a policy grants them explicitly, and the ones that are granted route through an approval gate rather than firing on the model's first decision. A worked example is published at policylayer.com/policies/stripe.

Stripe's MCP is well-built; the point is not that it is unusually dangerous. The point is that the server cannot know which agent should be allowed to issue a refund. That decision belongs to the control plane in front of it.

5.6 Some MCP servers expose no read-only tools

44 servers (2.6% of the 1,705 servers with three or more tools) expose no read-only tool — every tool they ship mutates state. You cannot connect such a server in observe-only mode; installing it grants write access or worse from the first call.

Server Tools Destructive Execute Financial
io.github.discourse/mcp 43 0 0 0
io.github.daedalus/mcp-numpy 29 1 0 0
io.github.antvis/mcp-server-chart 27 0 0 0
Contracts 25 0 2 0
io.github.aryanduntley/aifp 24 0 1 0
Mcp Products 14 7 4 0
AWS AppSync MCP Server 10 0 0 0
io.github.Dave-London/build 9 0 8 0
aaaa-nexus 9 0 8 0
OpenSCAD 8 0 0 0

6. Official MCP registries are not noticeably safer

A common assumption is that "official" MCP listings are curated and therefore safer. The data does not support it. Average risk weight per tool barely moves between sources, and seed-listed servers (those originally added by hand to bootstrap the ecosystem) are actually the highest-risk cohort.

Source Servers Tools Avg risk % destructive % execute
crawler 3,331 14,172 0.234 6.3% 6.1%
smithery 818 9,949 0.197 4.1% 3.5%
seed 350 5,819 0.32 6.7% 6.2%
user_scan 80 148 0.205 8.1% 3.4%

Every registry leaves risk evaluation to the developer installing the server. None of them gate on tool category, parameter danger, or the presence of unconfirmed write paths. Listing is curation only by name.

7. Two of the six most common MCP verbs are destructive

The MCP ecosystem speaks one language: CRUD. Across 31,000 tools, the four most common verbs after get and list are create, search, update, and delete. Two of the top six are mutations the model cannot undo. The protocol provides no separation between any of them.

get_* 4,893
list_* 1,371
create_* 858
update_* 471
delete_* 466
check_* 277
generate_* 273
arcane_* 180
add_* 167
set_* 152
analyze_* 132

delete_* appears 466 times. That is roughly one destructive-named tool for every five servers in the dataset, before counting tools that are destructive without using the word ("drop", "remove", "wipe", "purge"). Verb shape is the cheapest signal a client could act on; nothing in MCP requires clients to use it, so they don't.

8. MCP tools don't brief the agent. 96.1% give no warning at all.

MCP tool descriptions go directly into the model's context as the only briefing it gets. We searched all 31,000 classified tool descriptions for warning language — "irreversible", "permanent", "cannot be undone", "destroys", "wipes", "deletes", "drops", "purges". Only 1,208 tools (3.9%) contain any of those phrases.

The other 96.1% rely on the model inferring danger from the verb in the tool name. For a request like "clean up duplicate rows", an agent given fifty CRUD tools and no warnings will pick the one whose name matches the verb. delete_rows is the obvious match. There is no semantic signal that distinguishes it from list_rows.

A further 15% of servers (304) accept parameters whose names imply filesystem paths or shell command strings — path, filename, command, script, exec, stdin. These tools provide direct write or execution surfaces against the host the server runs on, regardless of whether they are classified as destructive. The wider catalogue of documented MCP attack patterns shows how prompt injection, tool poisoning, and supply-chain compromise convert these surfaces into incidents.

9. The trust boundary is the developer's restraint

The MCP specification ships with no built-in authorisation, no rate limits, no spend caps, and no audit trail. Servers expose whatever their authors decided to expose, in whatever shape, with whatever description. Clients pass tool lists to models with no enforced filter. Models call tools with whatever arguments they think appropriate. The trust boundary is the developer's restraint when they write the server.

This dataset puts numbers on the consequences:

  • One in four MCP servers can delete or destroy data.
  • One in four can execute arbitrary commands on its host.
  • The average install hands the agent 15.5 tools, often more than 30.
  • 3.9% of tools warn the model about what they do. The other 96.1% don't.
  • Official, semi-official and community registries show no meaningful risk gap.

Most teams would not ship an internal API where every endpoint is unauthenticated and uncategorised, where 1 in 4 endpoints can delete production data, and where 96.1% of endpoints have no documentation about side effects. That is the median MCP server today. Whether your agent runs on it is a control-plane decision, not a server-author decision.

The fix is not to ban destructive tools. The fix is enforcement at the transport layer: every tool call evaluated against a deterministic policy before it reaches the server. For the broader picture of how the protocol breaks under production conditions, see the canonical MCP security overview.

PolicyLayer is the MCP control plane:

  • A gateway in front of every MCP server in your fleet, with managed OAuth that holds and refreshes upstream tokens transparently.
  • A policy editor that discovers each server's tools so you can gate by category — destructive, financial, execute — instead of by tool name.
  • Scoped per-agent grants, decoupled from user identity, so revoking one agent doesn't break the rest of your stack.
  • A per-call audit log keyed to the grant that made the call, with full arguments, outcome, and latency.

One install. Every server. Scan your config to get the same picture this report shows, but for your stack — in 30 seconds.

Methodology

PolicyLayer maintains a continuously-updated catalogue of MCP servers harvested from the official Model Context Protocol registry, npm, Smithery, and Glama. For each server we attempt to extract its tool list through one of three paths:

  1. Static analysis — grep the published npm tarball for tool definitions.
  2. README extraction — parse README for tool tables and code blocks.
  3. Live execution — spawn the server via npx in a sandboxed container and read its tools/list response.

The 2,031 servers in this report are those for which at least one path produced a parseable tool list. Tools are classified into six risk categories (Read, Write, Execute, Destructive, Financial, Other) using a verb-based classifier with input-schema heuristics. 74.5% of tool classifications are marked high-confidence, 12.5% verified.

Risk weights are floats from 0.0 (read-only) to 1.0 (destructive financial). A server's risk score is its tool count multiplied by the average risk weight of its tools, so a server scores highly only when it exposes many tools and those tools skew dangerous — it is a measure of total exposed surface, not per-tool severity. The full classified catalogue — one row per server, one row per tool — is published as an open dataset on Hugging Face under CC-BY-4.0: huggingface.co/datasets/PolicyLayer/mcp-server-catalogue. Loadable via load_dataset("PolicyLayer/mcp-server-catalogue"). Methodology questions or custom cuts: research@policylayer.com.

Limitations. The dataset only covers servers reachable through public registries; private and self-hosted servers are not included. Tool-level classification can mislabel ambiguous verbs ("update" can be safe or destructive depending on parameters); the confidence breakdown above surfaces these. Some registry-listed servers were unreachable through our scan pipeline and are excluded from the figures here; the dataset is therefore a lower bound on the real ecosystem.

Let agents act without letting them run wild.

Route your MCP servers through PolicyLayer and every tool call is checked against your policy before it runs — allow, deny, or require approval. Per-identity grants. Full audit log. Live in minutes.

Free to start. No card required.

4,600+ MCP servers and 31,000+ tools scanned and risk-classified.

// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.