How to Safely Run AI Agents With Tool Access in Production
Your AI agent just issued a refund. Then another. Then 200 more. By the time someone noticed, it had processed $47,000 in fraudulent refund requests because a prompt injection told it to “process all pending customer complaints immediately.”
This is not a hypothetical. Agents in production have real credentials, real API keys, and real consequences. The gap between demo and production is a policy layer — a set of rule-based controls that constrain what an agent can do regardless of what it’s been told to do.
This post is the checklist we wish every team had before shipping their first agent to production. Each item includes a concrete YAML policy example using Intercept, our open-source MCP policy enforcement proxy.
Why AI Agents in Production Need Policy Enforcement
An agent with MCP tool access can do anything the tools allow. It can create Stripe charges, delete database records, merge pull requests, send emails, and transfer money. It does these things based on natural language instructions — including text injected by malicious users, hallucinated by the model, or simply misunderstood.
The model is not your security boundary. The prompt is not your security boundary. Your security boundary is a deterministic enforcement layer that sits between the agent and every tool it can call.
Here is how to build that boundary.
Production AI Agent Safety Checklist
- Deny by default
- Classify tools by risk
- Set spend limits
- Rate limit everything
- Validate arguments
- Require approval for high-risk actions
- Hide tools you don’t need
- Fail closed
- Audit everything
- Shadow mode first
1. Deny by default
The single most important policy decision: unlisted tools are blocked. If a new tool appears on an MCP server after an update, it does not become available to your agent automatically. You must explicitly allow it.
version: "1"
description: "Customer support agent -- Stripe access"
default: "deny"
tools:
list_customers:
rules: []
search_stripe_resources:
rules: []
create_refund:
rules:
- name: "refund cap"
conditions:
- path: "args.amount"
op: "lte"
value: 5000
on_deny: "Refunds over $50.00 require a human"
With default: "deny", only list_customers, search_stripe_resources, and create_refund are available. Everything else — delete_customer, create_charge, cancel_subscription — is rejected before any rules are evaluated.
Never run an agent in production with a default-allow posture.
2. Classify tools by risk
Not all tools are equal. A list_customers call is harmless. A create_refund moves money. A delete_customer is irreversible. Group tools by risk tier and apply progressively stricter controls.
| Tier | Examples | Controls |
|---|---|---|
| Read | list_customers, retrieve_balance, search_stripe_documentation | Rate limit only |
| Write | create_customer, create_product, update_dispute | Rate limit + argument validation |
| Financial | create_refund, create_invoice, create_payment_link | Amount caps + daily limits + argument validation |
| Destructive | delete_customer, cancel_subscription | Block or require human approval |
This classification drives every other decision in your policy. Do it first.
3. Set spend limits
Cumulative caps catch runaway loops. A single-call cap of $50 does not help if the agent issues 500 of them. You need per-call limits AND rolling daily totals.
tools:
create_refund:
rules:
- name: "max single refund"
conditions:
- path: "args.amount"
op: "lte"
value: 5000
on_deny: "Single refund cannot exceed $50.00"
- name: "daily refund cap"
conditions:
- path: "state.create_refund.daily_total"
op: "lt"
value: 50000
on_deny: "Daily refund cap of $500.00 reached"
state:
counter: "daily_total"
window: "day"
increment_from: "args.amount"
The state block creates a rolling counter that accumulates the args.amount from every allowed call. When the daily total hits $500, the tool is blocked for the rest of the day — regardless of individual amounts.
For MCP servers that use the Machine Payments Protocol (MPP), you can use the spend shorthand which evaluates costs from payment challenges:
tools:
generate_image:
rules:
- name: "image budget"
spend:
per_call: 5.00
daily: 50.00
4. Rate limit everything
Rate limits serve two purposes: burst limits catch tight loops (agent retrying the same call 100 times in a second), and daily limits catch slow accumulation.
tools:
create_customer:
rules:
- name: "burst protection"
rate_limit: 10/minute
on_deny: "Too many customer creations -- slow down"
- name: "daily cap"
rate_limit: 100/day
on_deny: "Daily customer creation limit reached"
"*":
rules:
- name: "global rate limit"
rate_limit: 60/minute
on_deny: "Global rate limit exceeded"
The wildcard "*" rule applies to every tool call across the entire server. Even if individual tools have generous limits, the global limit prevents aggregate abuse. If your agent is making more than 60 tool calls per minute, something is wrong.
5. Validate arguments
Checking that a value is under a threshold is not enough. You need to validate the shape of arguments — restrict currencies, allowlist recipients, and block dangerous patterns.
tools:
create_charge:
rules:
- name: "allowed currencies only"
conditions:
- path: "args.currency"
op: "in"
value: ["usd", "gbp", "eur"]
on_deny: "Only USD, GBP, and EUR charges are permitted"
- name: "description must exist"
conditions:
- path: "args.description"
op: "exists"
value: true
on_deny: "Charges must include a description"
send_email:
rules:
- name: "internal recipients only"
conditions:
- path: "args.to"
op: "regex"
value: "^[a-zA-Z0-9._%+-]+@(acme\\.com|acme\\.co\\.uk)$"
on_deny: "Emails can only be sent to internal addresses"
The regex operator is particularly useful for constraining free-text fields. If your agent should only send emails to internal addresses, enforce it. If database queries should only touch certain tables, enforce it. Do not rely on the system prompt to constrain these — a prompt injection can override instructions, but it cannot override a regex match.
6. Require approval for high-risk actions
Some actions should never happen without a human in the loop. Rather than blocking them entirely, you can hold them for approval — the agent pauses, a human reviews, and work continues.
version: "1"
default: "deny"
approvals:
default_timeout: 15m
tools:
cancel_subscription:
rules:
- name: "require approval for cancellations"
action: "require_approval"
approval_timeout: 15m
on_deny: "Subscription cancellations require human approval"
create_refund:
rules:
- name: "large refund approval"
action: "require_approval"
conditions:
- path: "args.amount"
op: "gt"
value: 10000
on_deny: "Refunds over $100.00 require human approval"
- name: "auto-allow small refunds"
action: "evaluate"
conditions:
- path: "args.amount"
op: "lte"
value: 10000
This creates a two-tier system: refunds under $100 flow through automatically (subject to other limits), while anything above pauses for approval. The agent is told to wait, and a human can approve or deny via the terminal or a webhook.
7. Hide tools you don’t need
Most MCP servers expose far more tools than any single agent needs. A Stripe MCP server has 27 tools. Your customer support agent needs maybe 5. Every unnecessary tool is attack surface — and wasted context window tokens.
version: "1"
description: "Support agent -- minimal Stripe access"
default: "deny"
hide:
- delete_customer
- transfer_repository
- create_payment_link
- update_subscription
- cancel_subscription
tools:
list_customers:
rules: []
search_stripe_resources:
rules: []
create_refund:
rules:
- name: "refund cap"
conditions:
- path: "args.amount"
op: "lte"
value: 5000
on_deny: "Refunds over $50.00 require a human"
Hidden tools are stripped from the tools/list response — the agent never even knows they exist. This is strictly better than deny-by-default alone, because a denied tool still occupies context window space and the agent may waste turns attempting to call it.
8. Fail closed
If the enforcement layer is down, what happens? If the answer is “everything goes through,” you do not have a security boundary. You have a suggestion.
Intercept is an inline proxy — it sits between the agent and the MCP server. If Intercept is not running, the agent has no path to the server at all. This is fail-closed by architecture, not by configuration. The agent cannot bypass the proxy because the proxy IS the transport.
This matters more than any individual rule. A misconfigured rule is a bug. A bypassable enforcement layer is a design flaw.
9. Audit everything
Every tool call, every policy decision, every argument — logged with full context. When something goes wrong, you need to reconstruct exactly what happened.
Intercept logs every decision to stderr by default:
[INTERCEPT] ALLOWED tool=list_customers args={"limit":10}
[INTERCEPT] DENIED tool=create_refund rule="daily refund cap" args={"amount":50000}
[INTERCEPT] HELD tool=cancel_subscription rule="require approval" id=abc123
In production, pipe these to your log aggregator. Every allowed call, every denial, every approval hold — all with the full argument payload. This is not optional. Without an audit trail, you have no way to investigate incidents, tune policies, or prove compliance.
10. Shadow mode first
New policies should be tested before they enforce. Shadow mode evaluates every rule and logs what would have happened, but lets all calls through.
version: "1"
mode: "shadow"
tools:
create_refund:
rules:
- name: "refund cap"
conditions:
- path: "args.amount"
op: "lte"
value: 5000
on_deny: "Refunds over $50.00 require a human"
- name: "daily refund cap"
conditions:
- path: "state.create_refund.daily_total"
op: "lt"
value: 50000
on_deny: "Daily refund cap of $500.00 reached"
state:
counter: "daily_total"
window: "day"
increment_from: "args.amount"
Run shadow mode for a few days against real traffic. Review the logs. Look for false positives — legitimate calls that would have been blocked. Adjust thresholds. Then flip to enforcement.
The deployment sequence is: shadow -> review logs -> adjust -> enforce. Skipping this step means your first enforcement run is also your first test, and your users discover the bugs.
Complete example: securing Stripe for a customer support agent
Here is a production-ready policy for a customer support agent that needs to look up customers, search resources, and issue small refunds:
version: "1"
description: "Customer support agent -- Stripe MCP server"
default: "deny"
hide:
- create_charge
- create_payment_link
- create_invoice
- create_invoice_item
- finalize_invoice
- create_coupon
- create_price
- create_product
- cancel_subscription
- update_subscription
tools:
# --- Read tools (low risk) ---
list_customers:
rules: []
list_charges:
rules: []
list_invoices:
rules: []
search_stripe_resources:
rules: []
search_stripe_documentation:
rules: []
retrieve_balance:
rules: []
# --- Write tools (medium risk) ---
update_dispute:
rules:
- name: "dispute update limit"
rate_limit: 10/hour
on_deny: "Dispute update rate limit reached"
# --- Financial tools (high risk) ---
create_refund:
rules:
- name: "max single refund"
conditions:
- path: "args.amount"
op: "lte"
value: 5000
on_deny: "Single refund cannot exceed $50.00"
- name: "daily refund count"
conditions:
- path: "state.create_refund.daily_count"
op: "lt"
value: 20
on_deny: "Daily refund limit (20) reached"
state:
counter: "daily_count"
window: "day"
- name: "daily refund total"
conditions:
- path: "state.create_refund.daily_total"
op: "lt"
value: 50000
on_deny: "Daily refund cap of $500.00 reached"
state:
counter: "daily_total"
window: "day"
increment_from: "args.amount"
# --- Global limits ---
"*":
rules:
- name: "global rate limit"
rate_limit: 60/minute
on_deny: "Rate limit: maximum 60 tool calls per minute"
This policy gives the agent exactly what it needs and nothing more. It can look up customer data freely (rate-limited globally), update disputes at a moderate pace, and issue refunds within tight guardrails. It cannot create charges, send invoices, cancel subscriptions, or do anything else. Ten high-risk tools are hidden entirely — the agent does not know they exist.
Common Mistakes When Deploying AI Agents to Production
Over-permissive defaults. The most common mistake is default: "allow" or no default at all. This means every new tool on the server is automatically available to your agent. MCP servers update regularly. New tools appear. With allow-by-default, your agent’s capabilities expand without your knowledge or consent.
No rate limits. “The agent would never call the same tool 500 times” — until it does, because of a retry loop, a hallucination, or a prompt injection that says “keep trying until it works.” Rate limits are cheap insurance against expensive loops.
Trusting the prompt. System prompts are not security controls. They are suggestions that can be overridden by prompt injection, jailbreaks, or simply model confusion. If you would not trust a junior employee’s verbal promise to follow a rule, do not trust a prompt to enforce it. Use rule-based policies.
No audit trail. “We’ll add logging later” is how you end up unable to explain a $10,000 incident. Log every decision from day one. Storage is cheap. Incident response without logs is expensive.
Testing in production. Run shadow mode first. Every time. It takes two days and saves you from blocking legitimate traffic or, worse, discovering your policy has a gap after something gets through.
Get started
Intercept is open-source and installs in one command:
npx -y @policylayer/intercept init
It auto-detects your MCP client (Claude Code, Cursor, Claude Desktop, VS Code, Windsurf, and others), wraps your MCP servers with policy enforcement, and generates a starter policy file you can customise.
Docs and source: intercept.policylayer.com
Your agent is only as safe as the constraints around it. Ship the policy before you ship the agent.
Protect your agent in 30 seconds
Scans your MCP config and generates enforcement policies for every server.
npx -y @policylayer/intercept init