The State of MCP Security
What 2,031 MCP servers can actually do to your systems.
We classified every tool on every Model Context Protocol server we could enumerate from the public registries — 31,000 tools across 2,031 working servers. The data shows an ecosystem that hands AI agents wide, dangerous, and almost entirely unannounced control over the systems they touch.
Six things moved meaningfully since last month.
Risk score is a server's tool count multiplied by the average risk weight of its tools — it climbs with both breadth and danger, so a server ranks highly only when it exposes many tools and those tools skew destructive. A change is flagged here when destructive share shifts ±3 points, the dataset size shifts ±10%, any named server's risk score shifts ±10%, or a server newly enters the ten riskiest.
1. What MCP servers actually do
Every tool we found was classified into one of six risk categories: read-only, write, execute, destructive, financial, or other. The chart below shows how many of the 2,031 servers in our dataset expose at least one tool in each category.
Servers usually expose tools in multiple categories — an integration that lists, creates, and deletes records lands in Read, Write and Destructive simultaneously. Percentages are of the 2,031 servers in the dataset.
2. One in four MCP servers can permanently destroy data
508 servers (25%) expose at least one destructive tool — deleting records, dropping tables, wiping indexes, force-pushing branches, removing cloud resources. These are operations that a human operator would normally guard with a confirmation dialog or a four-eyes review. When invoked through MCP, they fire on the model's first decision.
Another 600 servers (29.5%) can execute arbitrary commands — shell, scripts, container exec, SQL with no read-only enforcement. Combine the two: roughly four in ten MCP servers give an agent a way to do something it cannot easily undo.
Exposure compounds with every server you add
42.2% of MCP servers expose a destructive or execute
tool on their own. Stacking servers is the common case — an agent rarely connects
to one. If the per-server rate holds independently, the probability that a stack of
N servers exposes at least one such tool is
1 − (1 − 0.42)N. It passes
93.5% by the fifth server and
99.6% by the tenth.
Independence is an approximation — tool overlap between servers makes the true figure slightly lower — but the direction holds: multi-server exposure is the default, not the tail.
The single most common destructive verb across the dataset is delete:
it appears as the first token of 466
tool names. create, update, and delete together
form the standard CRUD trio that virtually every "integration" MCP server ships.
The protocol provides no separation between them.
3. The average MCP install gives an AI agent 15.5 tools
The median MCP server exposes 8 tools. The mean is
15.5. The 99th percentile exposes 128.
The fattest single server is adbutler, which exposes
622 tools to any agent that connects.
The full classifier output for every server we scanned is in our
public MCP tool catalogue.
| Server | Tools |
|---|---|
| AdButler | 622 |
| io.fusionauth/mcp-api | 314 |
| io.github.aibtcdev/mcp-server | 308 |
| Financial Modeling Prep | 253 |
| SmartBear MCP | 243 |
| Trello | 200 |
| Google Super | 200 |
| Arcane | 180 |
| io.github.alxpark/propresenter-mcp | 177 |
| Leaper Vision Toolkit | 169 |
A server exposing 200+ tools is unauditable in practice. No human reads 200 tool descriptions before installing. The model sees them all by default, and a context-window's worth of tool schemas competes with whatever task the user actually asked for.
4. Destructive MCP surface is concentrated in a few servers
Most MCP servers are not dangerous. The 25% headline number obscures a much sharper truth: destructive surface is heavily concentrated in a small minority of servers, while the long tail of CRUD-shaped integrations sits in the middle. Three out of every four MCP servers expose zero destructive tools at all.
A handful of servers carry the lion's share of the protocol's risk. They tend to be the same shape: large integrations — identity providers, project-management platforms, all-in-one cloud SDKs — that ship hundreds of CRUD endpoints with equal policy weight, and one or two of those endpoints turn out to be the kill switch.
The classifier's top 5% on its own accounts for an outsized share of the destructive calls an MCP-connected agent could make. The protocol does not surface this asymmetry to the model. The model sees a flat list of tools, with no hint that some are load-bearing for the host system and some are not.
The named riskiest ten
Ranked by risk score — tool count weighted by average per-tool risk. These are the servers carrying the most concentrated destructive surface in the dataset.
| # | Server | Tools | Destructive | Risk score |
|---|---|---|---|---|
| 1 | AdButler | 622 | 105 | 172.54 |
| 2 | io.fusionauth/mcp-api | 314 | 41 | 106.08 |
| 3 | SmartBear MCP | 243 | 19 | 56.03 |
| 4 | io.github.Antonytm/mcp-sitecore-server | 153 | 24 | 48.95 |
| 5 | Arcane | 180 | 21 | 48.84 |
| 6 | io.github.JXUE0/opencut-controller | 161 | 22 | 47.56 |
| 7 | io.github.alxpark/propresenter-mcp | 177 | 14 | 45.01 |
| 8 | Trello | 200 | 31 | 42.62 |
| 9 | AWS Bedrock AgentCore MCP Server | 122 | 15 | 36.79 |
| 10 | Github Mcp Server Sls C4d5e6f7 A8b9 4012 B345 456789012345 | 138 | 21 | 36.6 |
5. When MCP servers touch money, most can also destroy data
Only 70 MCP servers in our dataset expose financial tools — payments, transfers, wallet operations. They are rare. They are also the cohort with the highest combined risk in the entire ecosystem.
34 of 70 servers
51 of 70 servers
Across the 34 servers that combine financial and destructive surface, an agent connecting to a single one of them gets, on average, 2.9 ways to destroy data and 2.5 ways to move money. The single worst dual-risk server gives the agent 12 destructive tools and 17 financial tools — 29 ways to either break things irreversibly or move money — in a single MCP install.
The named dual-risk servers
Every server below exposes both financial and destructive tools in a single install. An agent connecting to one of them can move money and delete records without changing context.
| Server | Tools | Destructive | Financial |
|---|---|---|---|
| io.github.aibtcdev/mcp-server | 308 | 12 | 17 |
| Agent Passport System — Cryptographic Identity for AI Agents | 150 | 3 | 1 |
| xdevplatform/xmcp | 135 | 12 | 2 |
| hiveagent | 122 | 1 | 6 |
| Lichess Integration | 90 | 3 | 3 |
| Lichess Integration | 90 | 3 | 3 |
| io.github.EmperorMew/voidly-mcp-server | 84 | 3 | 1 |
| AWS IoT SiteWise MCP Server | 72 | 5 | 2 |
| MERX - TRON Resource Exchange | 66 | 2 | 6 |
| Helius | 63 | 1 | 3 |
| Indigo Protocol MCP | 62 | 3 | 2 |
| io.github.IndigoProtocol/indigo-mcp | 59 | 3 | 2 |
| Kosyak Evm | 50 | 1 | 2 |
| WooCommerce Store Manager | 47 | 4 | 1 |
| Clareo | 45 | 2 | 1 |
| Linear | 42 | 3 | 1 |
| Midnight + Next.js MCP | 35 | 4 | 1 |
| io.github.NyxToolsDev/quickbooks-mcp-server | 34 | 1 | 2 |
| Agent0 | 34 | 2 | 1 |
| Name Whisper | 34 | 2 | 2 |
| AgentPact | 32 | 2 | 1 |
| Linear MCP Server | 32 | 1 | 1 |
| PayPal | 30 | 2 | 3 |
| Jobly — Agent-to-Agent Contract Marketplace | 29 | 2 | 1 |
| Lunch Money | 29 | 5 | 1 |
5.5 Deep dive: the Stripe MCP
Stripe's MCP server exposes 27 tools to any agent that
connects. 4% of them are classified destructive and
11% touch money directly. Ranked by risk weight, the three highest
are create_refund, finalize_invoice, cancel_subscription.
One MCP install hands all of them to the model as a flat list.
What it can move
3 of Stripe's tools are financial — the calls that move balances, charges, refunds, payouts, and transfers. An agent with the server connected can invoke any of them directly, with whatever arguments it infers from the request. In policy terms these are the operations that take money out of the account.
create_refundfinalize_invoicecreate_payment_link
What it can destroy
1 tools are classified destructive — deletes, cancellations, and voids that the same API cannot reverse. None of them carry warning language the model reads before calling; the category is inferred from the verb in the tool name, not declared by the server.
cancel_subscription
What a deny-by-default policy looks like
A deny-by-default posture starts every Stripe tool denied and allows back only the read paths an agent needs — listing charges, retrieving a customer, reading a balance. The 4 destructive and financial tools stay denied unless a policy grants them explicitly, and the ones that are granted route through an approval gate rather than firing on the model's first decision. A worked example is published at policylayer.com/policies/stripe.
Stripe's MCP is well-built; the point is not that it is unusually dangerous. The point is that the server cannot know which agent should be allowed to issue a refund. That decision belongs to the control plane in front of it.
5.6 Some MCP servers expose no read-only tools
44 servers (2.6% of the 1,705 servers with three or more tools) expose no read-only tool — every tool they ship mutates state. You cannot connect such a server in observe-only mode; installing it grants write access or worse from the first call.
| Server | Tools | Destructive | Execute | Financial |
|---|---|---|---|---|
| io.github.discourse/mcp | 43 | 0 | 0 | 0 |
| io.github.daedalus/mcp-numpy | 29 | 1 | 0 | 0 |
| io.github.antvis/mcp-server-chart | 27 | 0 | 0 | 0 |
| Contracts | 25 | 0 | 2 | 0 |
| io.github.aryanduntley/aifp | 24 | 0 | 1 | 0 |
| Mcp Products | 14 | 7 | 4 | 0 |
| AWS AppSync MCP Server | 10 | 0 | 0 | 0 |
| io.github.Dave-London/build | 9 | 0 | 8 | 0 |
| aaaa-nexus | 9 | 0 | 8 | 0 |
| OpenSCAD | 8 | 0 | 0 | 0 |
6. Official MCP registries are not noticeably safer
A common assumption is that "official" MCP listings are curated and therefore safer. The data does not support it. Average risk weight per tool barely moves between sources, and seed-listed servers (those originally added by hand to bootstrap the ecosystem) are actually the highest-risk cohort.
| Source | Servers | Tools | Avg risk | % destructive | % execute |
|---|---|---|---|---|---|
crawler | 3,331 | 14,172 | 0.234 | 6.3% | 6.1% |
smithery | 818 | 9,949 | 0.197 | 4.1% | 3.5% |
seed | 350 | 5,819 | 0.32 | 6.7% | 6.2% |
user_scan | 80 | 148 | 0.205 | 8.1% | 3.4% |
Every registry leaves risk evaluation to the developer installing the server. None of them gate on tool category, parameter danger, or the presence of unconfirmed write paths. Listing is curation only by name.
7. Two of the six most common MCP verbs are destructive
The MCP ecosystem speaks one language: CRUD. Across 31,000
tools, the four most common verbs after get and list are
create, search, update, and delete.
Two of the top six are mutations the model cannot undo. The protocol provides no
separation between any of them.
get_* 4,893 list_* 1,371 create_* 858 search_* 748 update_* 471 delete_* 466 check_* 277 generate_* 273 arcane_* 180 add_* 167 set_* 152 analyze_* 132 delete_* appears 466
times. That is roughly one destructive-named tool for every five servers in the
dataset, before counting tools that are destructive without using the word
("drop", "remove", "wipe", "purge"). Verb shape is the cheapest signal a client could
act on; nothing in MCP requires clients to use it, so they don't.
8. MCP tools don't brief the agent. 96.1% give no warning at all.
MCP tool descriptions go directly into the model's context as the only briefing it gets. We searched all 31,000 classified tool descriptions for warning language — "irreversible", "permanent", "cannot be undone", "destroys", "wipes", "deletes", "drops", "purges". Only 1,208 tools (3.9%) contain any of those phrases.
The other 96.1% rely on the model inferring danger from the verb in the tool name.
For a request like "clean up duplicate rows", an agent given fifty CRUD tools
and no warnings will pick the one whose name matches the verb. delete_rows
is the obvious match. There is no semantic signal that distinguishes it from
list_rows.
A further 15% of servers (304)
accept parameters whose names imply filesystem paths or shell command strings —
path, filename, command, script,
exec, stdin. These tools provide direct write or execution
surfaces against the host the server runs on, regardless of whether they are
classified as destructive. The wider catalogue of
documented MCP attack patterns shows how prompt injection,
tool poisoning, and supply-chain compromise convert these surfaces into incidents.
9. The trust boundary is the developer's restraint
The MCP specification ships with no built-in authorisation, no rate limits, no spend caps, and no audit trail. Servers expose whatever their authors decided to expose, in whatever shape, with whatever description. Clients pass tool lists to models with no enforced filter. Models call tools with whatever arguments they think appropriate. The trust boundary is the developer's restraint when they write the server.
This dataset puts numbers on the consequences:
- One in four MCP servers can delete or destroy data.
- One in four can execute arbitrary commands on its host.
- The average install hands the agent 15.5 tools, often more than 30.
- 3.9% of tools warn the model about what they do. The other 96.1% don't.
- Official, semi-official and community registries show no meaningful risk gap.
Most teams would not ship an internal API where every endpoint is unauthenticated and uncategorised, where 1 in 4 endpoints can delete production data, and where 96.1% of endpoints have no documentation about side effects. That is the median MCP server today. Whether your agent runs on it is a control-plane decision, not a server-author decision.
The fix is not to ban destructive tools. The fix is enforcement at the transport layer: every tool call evaluated against a deterministic policy before it reaches the server. For the broader picture of how the protocol breaks under production conditions, see the canonical MCP security overview.
PolicyLayer is the MCP control plane:
- A gateway in front of every MCP server in your fleet, with managed OAuth that holds and refreshes upstream tokens transparently.
- A policy editor that discovers each server's tools so you can gate by category — destructive, financial, execute — instead of by tool name.
- Scoped per-agent grants, decoupled from user identity, so revoking one agent doesn't break the rest of your stack.
- A per-call audit log keyed to the grant that made the call, with full arguments, outcome, and latency.
One install. Every server. Scan your config to get the same picture this report shows, but for your stack — in 30 seconds.
Methodology
PolicyLayer maintains a continuously-updated catalogue of MCP servers harvested from the official Model Context Protocol registry, npm, Smithery, and Glama. For each server we attempt to extract its tool list through one of three paths:
- Static analysis — grep the published npm tarball for tool definitions.
- README extraction — parse README for tool tables and code blocks.
- Live execution — spawn the server via
npxin a sandboxed container and read itstools/listresponse.
The 2,031 servers in this report are those for which at least one path produced a parseable tool list. Tools are classified into six risk categories (Read, Write, Execute, Destructive, Financial, Other) using a verb-based classifier with input-schema heuristics. 74.5% of tool classifications are marked high-confidence, 12.5% verified.
Risk weights are floats from 0.0 (read-only) to 1.0 (destructive financial). A server's
risk score is its tool count multiplied by the average risk weight of its tools,
so a server scores highly only when it exposes many tools and those tools skew
dangerous — it is a measure of total exposed surface, not per-tool severity. The
full classified catalogue — one row per server, one row per tool — is
published as an open dataset on Hugging Face under CC-BY-4.0:
huggingface.co/datasets/PolicyLayer/mcp-server-catalogue.
Loadable via load_dataset("PolicyLayer/mcp-server-catalogue"). Methodology
questions or custom cuts: research@policylayer.com.
Limitations. The dataset only covers servers reachable through public registries; private and self-hosted servers are not included. Tool-level classification can mislabel ambiguous verbs ("update" can be safe or destructive depending on parameters); the confidence breakdown above surfaces these. Some registry-listed servers were unreachable through our scan pipeline and are excluded from the figures here; the dataset is therefore a lower bound on the real ecosystem.
Let agents act without letting them run wild.
Route your MCP servers through PolicyLayer and every tool call is checked against your policy before it runs — allow, deny, or require approval. Per-identity grants. Full audit log. Live in minutes.
Free to start. No card required.
4,600+ MCP servers and 31,000+ tools scanned and risk-classified.