Medium Risk

merge_datasets

Merge multiple PentesterFlow datasets into a single file with deduplication and balanced sampling.

How to control merge_datasets ↓

AI agents use merge_datasets to create or update resources in OffensiveSET — usually the action step of a workflow, after the agent has gathered context. Every call changes real data in your OffensiveSET environment.

Medium Risk

The tool combines multiple datasets into one output file, which is a data creation/modification action. While the server generates penetration testing datasets (offensive security content), the merge_datasets tool itself performs a data transformation and consolidation operation — a Write action rather than Execute or Destructive.

From the tool's definition Tool description states it 'Merge[s] multiple PentesterFlow datasets into a single file' — this creates or modifies a dataset file by combining existing datasets with deduplication and sampling, which are reversible write operations.

Documented attack patterns abuse exactly the kind of access merge_datasets gives an agent:

PolicyLayer is an MCP gateway — it sits between your AI agents and OffensiveSET, and nothing reaches the server without passing your rules. This is the rule we recommend for merge_datasets:

policy.json
{
  "version": "1",
  "default": "deny",
  "tools": {
    "merge_datasets": {
      "limits": [
        {
          "counter": "merge_datasets_rate",
          "window": "minute",
          "max": 30,
          "scope": "grant"
        }
      ]
    }
  }
}

merge_datasets stays usable, but capped — an agent stuck in a loop can't make hundreds of changes a minute. Everything else on the server is denied unless you say otherwise.

  1. Create a free account and register OffensiveSET — nothing to install.
  2. Add this policy — paste it, or build it visually.
  3. Point your MCP client (Claude, Cursor, anything) at your gateway URL.
LIMIT THIS TOOL →

Free to start. No card required.

Go deeper

What does the merge_datasets tool do? +

Merge multiple PentesterFlow datasets into a single file with deduplication and balanced sampling. It is categorised as a Write tool in the OffensiveSET MCP Server, which means it can create or modify data. Consider rate limits to prevent runaway writes.

How do I enforce a policy on merge_datasets? +

Register the OffensiveSET MCP server in PolicyLayer and add a rule for merge_datasets: allow, deny, rate-limit, or require approval. Point your MCP client at the PolicyLayer proxy URL and the rule is enforced on every call, before it reaches OffensiveSET. Nothing to install.

What risk level is merge_datasets? +

merge_datasets is a Write tool with medium risk. Write tools should be rate-limited to prevent accidental bulk modifications.

Can I rate-limit merge_datasets? +

Yes. Add a rate_limit block to the merge_datasets rule in your PolicyLayer policy. For example, setting max: 10 and window: 60 limits the tool to 10 calls per minute. Rate limits are tracked per agent session and reset automatically.

How do I block merge_datasets completely? +

Set action: deny in the PolicyLayer policy for merge_datasets. The AI agent will receive a policy violation error and cannot call the tool. You can also include a reason field to explain why the tool is blocked.

What MCP server provides merge_datasets? +

merge_datasets is provided by the OffensiveSET MCP server (pentesterflow/offensiveset). PolicyLayer sits as a proxy in front of this server to enforce policies before tool calls reach the server.

Enforce policy on every OffensiveSET tool call.

Deterministic rules across all 10 OffensiveSET tools. Per-identity grants. Full audit log. Live in minutes. Nothing to install.

Free to start. No card required.

10 OffensiveSET tools catalogued and risk-classified — across an index of 42,500+ MCP servers.

// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.