Medium Risk

filter_sensitive_words

Filter sensitive words from text by replacing them with a replacement string

How to control filter_sensitive_words ↓

What filter_sensitive_words does on Sensitive Lexicon

AI agents use filter_sensitive_words to create or update resources in Sensitive Lexicon — usually the action step of a workflow, after the agent has gathered context. Every call changes real data in your Sensitive Lexicon environment.

Medium Risk

Why filter_sensitive_words needs a policy

This tool modifies text content by replacing sensitive words with a substitute string. It creates a transformed version of the input text — a reversible, non-destructive modification — which fits the Write category. It does not delete data permanently, execute code, or involve financial transactions.

From the tool's definition Filter sensitive words from text by replacing them with a replacement string

Documented attack patterns abuse exactly the kind of access filter_sensitive_words gives an agent:

How to control filter_sensitive_words

PolicyLayer is an MCP gateway — it sits between your AI agents and Sensitive Lexicon, and nothing reaches the server without passing your rules. This is the rule we recommend for filter_sensitive_words:

policy.json
{
  "version": "1",
  "default": "deny",
  "tools": {
    "filter_sensitive_words": {
      "limits": [
        {
          "counter": "filter_sensitive_words_rate",
          "window": "minute",
          "max": 30,
          "scope": "grant"
        }
      ]
    }
  }
}

filter_sensitive_words stays usable, but capped — an agent stuck in a loop can't make hundreds of changes a minute. Everything else on the server is denied unless you say otherwise.

  1. Create a free account and register Sensitive Lexicon — nothing to install.
  2. Add this policy — paste it, or build it visually.
  3. Point your MCP client (Claude, Cursor, anything) at your gateway URL.
LIMIT THIS TOOL →

Free to start. No card required.

Related tools and policies

Go deeper

Questions about filter_sensitive_words

What does the filter_sensitive_words tool do? +

Filter sensitive words from text by replacing them with a replacement string. It is categorised as a Write tool in the Sensitive Lexicon MCP Server, which means it can create or modify data. Consider rate limits to prevent runaway writes.

How do I enforce a policy on filter_sensitive_words? +

Register the Sensitive Lexicon MCP server in PolicyLayer and add a rule for filter_sensitive_words: allow, deny, rate-limit, or require approval. Point your MCP client at the PolicyLayer proxy URL and the rule is enforced on every call, before it reaches Sensitive Lexicon. Nothing to install.

What risk level is filter_sensitive_words? +

filter_sensitive_words is a Write tool with medium risk. Write tools should be rate-limited to prevent accidental bulk modifications.

Can I rate-limit filter_sensitive_words? +

Yes. Add a rate_limit block to the filter_sensitive_words rule in your PolicyLayer policy. For example, setting max: 10 and window: 60 limits the tool to 10 calls per minute. Rate limits are tracked per agent session and reset automatically.

How do I block filter_sensitive_words completely? +

Set action: deny in the PolicyLayer policy for filter_sensitive_words. The AI agent will receive a policy violation error and cannot call the tool. You can also include a reason field to explain why the tool is blocked.

What MCP server provides filter_sensitive_words? +

filter_sensitive_words is provided by the Sensitive Lexicon MCP server (zephyrpersonal/sensitive-lexicon-mcp). PolicyLayer sits as a proxy in front of this server to enforce policies before tool calls reach the server.

Enforce policy on every Sensitive Lexicon tool call.

Start from Sensitive Lexicon, add the rest of your stack, and see everything your agents can call. Then put policy on all of it.

Free to start. No card required.

4 Sensitive Lexicon tools catalogued and risk-classified — across an index of 43,000+ MCP servers.

// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.