How AI-agent tool calls map to the four NIST AI RMF functions — the question each function asks of agent deployments, where default setups fall short, and the controls that produce the evidence.
QUICK ANSWERThe NIST AI RMF is voluntary — there is no certification and nothing is “AI RMF compliant”. It is the common US AI-governance vocabulary, technology-neutral across Govern, Map, Measure and Manage. Each MCP server is a third-party component to inventory and govern; each tools/call is behaviour to monitor and treat. A gateway produces that inventory, those controls and that monitoring as a byproduct of operating.
Released as NIST AI 100-1 in January 2023, the AI RMF is a voluntary framework — nobody certifies you against it. It is the shared language US enterprises and federal contractors use to govern AI, and it shows up in procurement and vendor-risk questionnaires. It is technology-neutral, so it applies to agent deployments without naming MCP. Three things put your agent traffic in frame.
The Map function asks you to inventory every component, including third-party software, and assess its risk. An MCP server your agents reach is exactly that component — and in a default setup, an uninventoried one.
The Measure function expects production AI behaviour to be monitored. A tools/call is the unit of agent behaviour; without a record of calls there is nothing to measure and no metric to report.
In February 2026 NIST’s CAISI launched an AI Agent Standards Initiative focused on interoperable agent protocols, identity and security. MCP sits in that industry context, though NIST has not selected or certified it as a baseline. The framework itself is current as of mid-2026, with a revision directed.
The AI RMF functions and subcategories MCP traffic maps to. For each: the question it asks of agent deployments, where a default setup falls short, and where the gateway fits.
Risk management is established through transparent policies, processes and controls, prioritised by the organisation’s risk appetite.
Is “what your agents may do” written down anywhere as an enforced rule set — and can you see which rule decided each action?
There is no written, machine-enforced rule set. What an agent may call is whatever the client config happens to allow.
Policy-as-code is the transparent control. Every verdict logs the deciding rule, so both the control and its outcome are transparent and reviewable.
An inventory of AI systems is maintained (Govern 1.6) and risks plus internal controls are mapped for all components, including third-party software (Map 4.1, 4.2).
Do you have an inventory of what your agents can reach, with the risk of each component recorded?
Organisations rarely know which servers and tools are reachable, let alone the risk each one carries.
The registered-server and tool catalogue is the inventory and the component risk map in one artefact.
31,002 tools across 4,628 servers carry a risk classification — 19,718 Read · 7,607 Write · 1,773 Destructive · 1,649 Execute · 154 Financial. That classification is the Map 4.1/4.2 component risk record.
Roles and responsibilities are documented, and oversight of human-AI configurations is defined so a person remains accountable for the system’s behaviour.
Can every agent action be tied to an identified, accountable principal — and is it defined where the human sits in the loop?
Shared keys mean there is no identifiable principal per action, and no defined point where a human approves.
Per-person grants tie every action to an identified principal; deny and approve rules define where the human sits in the loop.
Policies address risks from third-party AI software (Govern 6.1) and contingency processes handle failures or incidents in those third-party resources (Govern 6.2).
How is third-party tool software governed — and what happens when one of those servers misbehaves?
External servers are onboarded ad hoc, with no governing policy and no contingency when one starts behaving unexpectedly.
Every upstream server is mediated, policy-gated and revocable — flipping a misbehaving server to deny is the contingency action.
4,628 third-party servers inventoried — the population a third-party risk policy has to account for.
The targeted application scope is specified and documented (Map 3.3) and processes for human oversight are documented (Map 3.5).
Is each agent’s tool surface restricted to the documented scope its task actually needs?
Connecting to a server exposes every tool on it — the scope is whatever the server happens to offer, not what the task needs.
Grants restrict the tool surface to the documented scope; least-privilege scoping is the literal subcategory the grant implements.
19,718 Read tools form the bulk of the corpus — where scoping to the documented task, not blanket access, is the live question.
Deployed system behaviour is monitored in production (Measure 2.4), systems fail safely under real-time monitoring (Measure 2.6), and security and resilience are evaluated (Measure 2.7) — the GAI Profile action MS-2.7-004 names unauthorised-access-attempt counts as a security-effectiveness metric.
Is the production behaviour of your agents — every tool call and its outcome — actually being monitored, with a metric you can report?
Default MCP produces no call-level telemetry, so there is no production monitoring and no metric to track.
The audit log is tool-call-level production monitoring. Deny verdicts plus rate-limit and spend-cap trips are the metrics — deny counts map directly to MS-2.7-004 — and GAI action MG-2.2-007 names real-time auditing tools for response.
Monitoring prioritises what matters: calls touching the 1,773 Destructive and 154 Financial tools in the corpus.
Risk responses are documented as mitigate, transfer, avoid or accept (Manage 1.3); mechanisms exist to supersede, disengage or deactivate systems behaving inconsistently with intended use (Manage 2.4); and post-deployment monitoring, override, decommission and incident documentation are maintained (Manage 4.1, 4.3).
Is there a documented response for each tool, a way to disengage an agent immediately, and a trail of what happened?
No documented response decision, no disengage mechanism, and no incident trail when something goes wrong.
Per-tool allow, deny or condition is the documented response decision; revoking a grant or flipping a server to deny is the disengage mechanism; the verdict log is the incident record.
Illustrative policies — not complete compliance controls on their own.
Manage 1.3 asks for a documented response per risk: accept the read tools, mitigate the write tool with an argument condition, avoid the destructive one by leaving it denied. The policy is the recorded decision.
{
"version": "1",
"default": "deny",
"tools": {
"list_records": {},
"get_record": {},
"update_record": {
"deny_if": [
{
"conditions": [
{ "path": "args.scope", "op": "regex", "value": "(?i)^all$" }
]
}
]
}
}
} Manage 2.4 wants a mechanism to disengage a system behaving inconsistently with intended use. While the agent is under review, high-impact tools are denied and only low-risk reads stay open — each blocked attempt lands in the verdict log as an incident record.
{
"version": "1",
"default": "deny",
"tools": {
"list_resources": {},
"get_resource": {}
}
} See Writing policies for the policy format, operators, and quota shapes.
The AI RMF asks you to document risk treatment for every component. A gateway produces the inventory, controls and monitoring evidence as a byproduct of operating. The artefact for each function:
| What the auditor asks for | What the gateway exports |
|---|---|
| Component inventory with risk classes (Govern 1.6, Map 4.1/4.2, GAI GV-1.6-001) | Tool inventory with risk classification per tool and per server — the inventory and component risk map in one export. |
| Documented, version-controlled controls (Govern 1.4, Manage 1.3) | Policy-as-code: the per-tool allow/deny/condition rules, versioned, as the documented risk-response decision. |
| Production monitoring and security metrics (Measure 2.4/2.7, GAI MS-2.7-004, MG-2.2-007) | Per-call verdict/audit log, with deny-verdict counts as the unauthorised-attempt metric and rate-limit/spend-cap trips as threshold events. |
| Accountable principal and disengage capability (Govern 2.1/3.2, Manage 2.4) | Per-person grants plus timestamped revocation records — the identity trail and the disengage mechanism. |
| Third-party entities with access to organisational content (Govern 6.1, GAI GV-6.1-007) | Central credential custody plus the upstream server registry — the inventory of third parties able to reach your content. |
| Post-deployment incident record (Manage 4.1/4.3) | The verdict log filtered to denied and condition-tripped calls — the post-deployment incident and override trail. |
No. The AI RMF is a voluntary framework — there is no certification and no “AI RMF compliant” status, only self-attested alignment. Regulators including the FTC, SEC, CFPB, FDA and EEOC reference its principles, and it appears regularly in procurement and vendor-risk questionnaires. The federal executive order that once directed its use was rescinded in January 2025, so it is now adopted as a voluntary standard rather than a mandate.
It is technology-neutral, so it applies without naming MCP: inventory your components (Map), govern them (Govern), monitor production behaviour (Measure) and treat the risk (Manage). Each MCP server is a third-party component, and each tools/call is behaviour to monitor. NIST’s CAISI agent-standards initiative (February 2026) is working on interoperable agent protocols and identity — MCP is part of that industry context, though NIST has not selected it as a baseline.
The AI RMF is US, voluntary and non-certifiable — you self-attest alignment. ISO 42001 is an international, certifiable AI management-system standard you can be audited against. They are complementary: the RMF gives you the governance vocabulary, and ISO 42001 gives you the auditable system. The same gateway evidence — inventory, policies, logs — feeds both.
Excessive Agency is OWASP LLM06:2025 — the canonical name for the core agentic risk. It means an agent has more functionality, permissions or autonomy than its task needs, in three flavours: excessive functionality, excessive permissions and excessive autonomy. OWASP’s named mitigations — least-privilege tool scopes and human approval for high-impact actions — map one-to-one onto policy rules and per-person grants. (Separately, NIST’s January 2025 term “agent hijacking” describes adversarial content causing unintended tool invocations within authorised scope.)
Three artefacts. Written per-tool policies are the documented risk response (Manage 1.3). An immediate disengage mechanism — revoking a grant or flipping a server to deny — satisfies Manage 2.4. A monitoring and incident log covers Manage 4.1/4.3. A gateway produces all three automatically as a byproduct of mediating traffic, so the Manage evidence falls out of normal operation rather than a separate exercise.
| Default setup | Through the gateway |
|---|---|
| One shared upstream API key on every laptop | Per-person scoped grant tokens, revocable individually |
| No record of what agents called | Per-call audit log: grant, tool, argument keys, rule, verdict |
| Every tool on a server is callable | Deny-by-default — each tool and argument explicitly granted |
| Access rules scattered across client configs | One central, version-controlled policy |
PolicyLayer doesn’t certify your organisation — it gives your compliance team enforceable controls and exportable evidence for the MCP slice of the audit.
Last reviewed 04-06-2026 by the PolicyLayer research team. This guide maps how the framework intersects with MCP deployments — it is not legal advice.
Per-person grants, deny-by-default policy and a per-call audit log — the NIST AI RMF evidence for the MCP slice of your programme. Live in minutes.
Free to start. No card required.
4,600+ MCP servers and 31,000+ tools scanned and risk-classified.