Home / Compliance / NIST AI RMF

NIST AI RMF for MCP deployments

How AI-agent tool calls map to the four NIST AI RMF functions — the question each function asks of agent deployments, where default setups fall short, and the controls that produce the evidence.

QUICK ANSWERThe NIST AI RMF is voluntary — there is no certification and nothing is “AI RMF compliant”. It is the common US AI-governance vocabulary, technology-neutral across Govern, Map, Measure and Manage. Each MCP server is a third-party component to inventory and govern; each tools/call is behaviour to monitor and treat. A gateway produces that inventory, those controls and that monitoring as a byproduct of operating.

POLICYLAYER SCAN DATA 31,002 tools · 4,628 servers · 1,773 destructive · 1,649 execute code Methodology & research →

The AI RMF is a framework to align with, not an audit to pass.

Released as NIST AI 100-1 in January 2023, the AI RMF is a voluntary framework — nobody certifies you against it. It is the shared language US enterprises and federal contractors use to govern AI, and it shows up in procurement and vendor-risk questionnaires. It is technology-neutral, so it applies to agent deployments without naming MCP. Three things put your agent traffic in frame.

01
Each server is a component to map.

The Map function asks you to inventory every component, including third-party software, and assess its risk. An MCP server your agents reach is exactly that component — and in a default setup, an uninventoried one.

02
Each call is behaviour to measure.

The Measure function expects production AI behaviour to be monitored. A tools/call is the unit of agent behaviour; without a record of calls there is nothing to measure and no metric to report.

03
NIST is moving towards agents.

In February 2026 NIST’s CAISI launched an AI Agent Standards Initiative focused on interoperable agent protocols, identity and security. MCP sits in that industry context, though NIST has not selected or certified it as a baseline. The framework itself is current as of mid-2026, with a revision directed.

Where the four functions meet agent traffic.

The AI RMF functions and subcategories MCP traffic maps to. For each: the question it asks of agent deployments, where a default setup falls short, and where the gateway fits.

GOVERN 1.2 / 1.4 Transparent policies & controls
WHAT IT EXPECTS

Risk management is established through transparent policies, processes and controls, prioritised by the organisation’s risk appetite.

THE QUESTION FOR AGENT DEPLOYMENTS

Is “what your agents may do” written down anywhere as an enforced rule set — and can you see which rule decided each action?

THE DEFAULT-SETUP GAP

There is no written, machine-enforced rule set. What an agent may call is whatever the client config happens to allow.

WHERE THE GATEWAY FITS

Policy-as-code is the transparent control. Every verdict logs the deciding rule, so both the control and its outcome are transparent and reviewable.

GOVERN 1.6 / MAP 4.1–4.2 Inventory & component risk map
WHAT IT EXPECTS

An inventory of AI systems is maintained (Govern 1.6) and risks plus internal controls are mapped for all components, including third-party software (Map 4.1, 4.2).

THE QUESTION FOR AGENT DEPLOYMENTS

Do you have an inventory of what your agents can reach, with the risk of each component recorded?

THE DEFAULT-SETUP GAP

Organisations rarely know which servers and tools are reachable, let alone the risk each one carries.

WHERE THE GATEWAY FITS

The registered-server and tool catalogue is the inventory and the component risk map in one artefact.

31,002 tools across 4,628 servers carry a risk classification — 19,718 Read · 7,607 Write · 1,773 Destructive · 1,649 Execute · 154 Financial. That classification is the Map 4.1/4.2 component risk record.

GOVERN 2.1 / 3.2 Roles & human-AI configuration
WHAT IT EXPECTS

Roles and responsibilities are documented, and oversight of human-AI configurations is defined so a person remains accountable for the system’s behaviour.

THE QUESTION FOR AGENT DEPLOYMENTS

Can every agent action be tied to an identified, accountable principal — and is it defined where the human sits in the loop?

THE DEFAULT-SETUP GAP

Shared keys mean there is no identifiable principal per action, and no defined point where a human approves.

WHERE THE GATEWAY FITS

Per-person grants tie every action to an identified principal; deny and approve rules define where the human sits in the loop.

GOVERN 6.1 / 6.2 Third-party risk & contingency
WHAT IT EXPECTS

Policies address risks from third-party AI software (Govern 6.1) and contingency processes handle failures or incidents in those third-party resources (Govern 6.2).

THE QUESTION FOR AGENT DEPLOYMENTS

How is third-party tool software governed — and what happens when one of those servers misbehaves?

THE DEFAULT-SETUP GAP

External servers are onboarded ad hoc, with no governing policy and no contingency when one starts behaving unexpectedly.

WHERE THE GATEWAY FITS

Every upstream server is mediated, policy-gated and revocable — flipping a misbehaving server to deny is the contingency action.

4,628 third-party servers inventoried — the population a third-party risk policy has to account for.

MAP 3.3 / 3.5 Scope & human oversight
WHAT IT EXPECTS

The targeted application scope is specified and documented (Map 3.3) and processes for human oversight are documented (Map 3.5).

THE QUESTION FOR AGENT DEPLOYMENTS

Is each agent’s tool surface restricted to the documented scope its task actually needs?

THE DEFAULT-SETUP GAP

Connecting to a server exposes every tool on it — the scope is whatever the server happens to offer, not what the task needs.

WHERE THE GATEWAY FITS

Grants restrict the tool surface to the documented scope; least-privilege scoping is the literal subcategory the grant implements.

19,718 Read tools form the bulk of the corpus — where scoping to the documented task, not blanket access, is the live question.

MEASURE 2.4 / 2.6 / 2.7 Production monitoring & metrics
WHAT IT EXPECTS

Deployed system behaviour is monitored in production (Measure 2.4), systems fail safely under real-time monitoring (Measure 2.6), and security and resilience are evaluated (Measure 2.7) — the GAI Profile action MS-2.7-004 names unauthorised-access-attempt counts as a security-effectiveness metric.

THE QUESTION FOR AGENT DEPLOYMENTS

Is the production behaviour of your agents — every tool call and its outcome — actually being monitored, with a metric you can report?

THE DEFAULT-SETUP GAP

Default MCP produces no call-level telemetry, so there is no production monitoring and no metric to track.

WHERE THE GATEWAY FITS

The audit log is tool-call-level production monitoring. Deny verdicts plus rate-limit and spend-cap trips are the metrics — deny counts map directly to MS-2.7-004 — and GAI action MG-2.2-007 names real-time auditing tools for response.

Monitoring prioritises what matters: calls touching the 1,773 Destructive and 154 Financial tools in the corpus.

MANAGE 1.3 / 2.4 / 4.1 / 4.3 Risk response & disengage
WHAT IT EXPECTS

Risk responses are documented as mitigate, transfer, avoid or accept (Manage 1.3); mechanisms exist to supersede, disengage or deactivate systems behaving inconsistently with intended use (Manage 2.4); and post-deployment monitoring, override, decommission and incident documentation are maintained (Manage 4.1, 4.3).

THE QUESTION FOR AGENT DEPLOYMENTS

Is there a documented response for each tool, a way to disengage an agent immediately, and a trail of what happened?

THE DEFAULT-SETUP GAP

No documented response decision, no disengage mechanism, and no incident trail when something goes wrong.

WHERE THE GATEWAY FITS

Per-tool allow, deny or condition is the documented response decision; revoking a grant or flipping a server to deny is the disengage mechanism; the verdict log is the incident record.

Policies that record the risk treatment.

Illustrative policies — not complete compliance controls on their own.

Documented risk response across a mixed tool set MANAGE 1.3

Manage 1.3 asks for a documented response per risk: accept the read tools, mitigate the write tool with an argument condition, avoid the destructive one by leaving it denied. The policy is the recorded decision.

policy.json
{
  "version": "1",
  "default": "deny",
  "tools": {
    "list_records": {},
    "get_record": {},
    "update_record": {
      "deny_if": [
        {
          "conditions": [
            { "path": "args.scope", "op": "regex", "value": "(?i)^all$" }
          ]
        }
      ]
    }
  }
}
Disengage a server under review MANAGE 2.4

Manage 2.4 wants a mechanism to disengage a system behaving inconsistently with intended use. While the agent is under review, high-impact tools are denied and only low-risk reads stay open — each blocked attempt lands in the verdict log as an incident record.

policy.json
{
  "version": "1",
  "default": "deny",
  "tools": {
    "list_resources": {},
    "get_resource": {}
  }
}

See Writing policies for the policy format, operators, and quota shapes.

What aligns with each function — and what you export.

The AI RMF asks you to document risk treatment for every component. A gateway produces the inventory, controls and monitoring evidence as a byproduct of operating. The artefact for each function:

What the auditor asks forWhat the gateway exports
Component inventory with risk classes (Govern 1.6, Map 4.1/4.2, GAI GV-1.6-001) Tool inventory with risk classification per tool and per server — the inventory and component risk map in one export.
Documented, version-controlled controls (Govern 1.4, Manage 1.3) Policy-as-code: the per-tool allow/deny/condition rules, versioned, as the documented risk-response decision.
Production monitoring and security metrics (Measure 2.4/2.7, GAI MS-2.7-004, MG-2.2-007) Per-call verdict/audit log, with deny-verdict counts as the unauthorised-attempt metric and rate-limit/spend-cap trips as threshold events.
Accountable principal and disengage capability (Govern 2.1/3.2, Manage 2.4) Per-person grants plus timestamped revocation records — the identity trail and the disengage mechanism.
Third-party entities with access to organisational content (Govern 6.1, GAI GV-6.1-007) Central credential custody plus the upstream server registry — the inventory of third parties able to reach your content.
Post-deployment incident record (Manage 4.1/4.3) The verdict log filtered to denied and condition-tripped calls — the post-deployment incident and override trail.

NIST AI RMF and MCP questions.

Is NIST AI RMF mandatory?+

No. The AI RMF is a voluntary framework — there is no certification and no “AI RMF compliant” status, only self-attested alignment. Regulators including the FTC, SEC, CFPB, FDA and EEOC reference its principles, and it appears regularly in procurement and vendor-risk questionnaires. The federal executive order that once directed its use was rescinded in January 2025, so it is now adopted as a voluntary standard rather than a mandate.

How does the AI RMF apply to MCP and AI agents?+

It is technology-neutral, so it applies without naming MCP: inventory your components (Map), govern them (Govern), monitor production behaviour (Measure) and treat the risk (Manage). Each MCP server is a third-party component, and each tools/call is behaviour to monitor. NIST’s CAISI agent-standards initiative (February 2026) is working on interoperable agent protocols and identity — MCP is part of that industry context, though NIST has not selected it as a baseline.

AI RMF vs ISO 42001 — what is the difference?+

The AI RMF is US, voluntary and non-certifiable — you self-attest alignment. ISO 42001 is an international, certifiable AI management-system standard you can be audited against. They are complementary: the RMF gives you the governance vocabulary, and ISO 42001 gives you the auditable system. The same gateway evidence — inventory, policies, logs — feeds both.

What does Excessive Agency mean?+

Excessive Agency is OWASP LLM06:2025 — the canonical name for the core agentic risk. It means an agent has more functionality, permissions or autonomy than its task needs, in three flavours: excessive functionality, excessive permissions and excessive autonomy. OWASP’s named mitigations — least-privilege tool scopes and human approval for high-impact actions — map one-to-one onto policy rules and per-person grants. (Separately, NIST’s January 2025 term “agent hijacking” describes adversarial content causing unintended tool invocations within authorised scope.)

How do I evidence the Manage function for agent deployments?+

Three artefacts. Written per-tool policies are the documented risk response (Manage 1.3). An immediate disengage mechanism — revoking a grant or flipping a server to deny — satisfies Manage 2.4. A monitoring and incident log covers Manage 4.1/4.3. A gateway produces all three automatically as a byproduct of mediating traffic, so the Manage evidence falls out of normal operation rather than a separate exercise.

Raw MCP versus gateway-mediated MCP.

Default setupThrough the gateway
One shared upstream API key on every laptop Per-person scoped grant tokens, revocable individually
No record of what agents called Per-call audit log: grant, tool, argument keys, rule, verdict
Every tool on a server is callable Deny-by-default — each tool and argument explicitly granted
Access rules scattered across client configs One central, version-controlled policy

PolicyLayer doesn’t certify your organisation — it gives your compliance team enforceable controls and exportable evidence for the MCP slice of the audit.

Primary sources.

Last reviewed 04-06-2026 by the PolicyLayer research team. This guide maps how the framework intersects with MCP deployments — it is not legal advice.

Enforceable controls and audit evidence on every MCP call.

Per-person grants, deny-by-default policy and a per-call audit log — the NIST AI RMF evidence for the MCP slice of your programme. Live in minutes.

Free to start. No card required.

4,600+ MCP servers and 31,000+ tools scanned and risk-classified.

// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.