jailbreak_attempt_detector

SERVERGapup Mcp SOURCEhttps://mcp.gapup.io/mcp

Low RISK CLASS

Category Read

Parameters 41 required

Recommended Allowedsee the rule below

Registry record Grade F, identity unverified Pull the record →

This record as markdown: /tools/io-github-getgapup-gapup-mcp/jailbreak-attempt-detector.md

WHAT IT DOES

What jailbreak_attempt_detector does on Gapup Mcp

AI agents call jailbreak_attempt_detector to retrieve information from Gapup Mcp without modifying anything. It is typically the context-gathering step in research, monitoring, and reporting workflows, before the agent takes action elsewhere.

Parameter	Type	Required	Description
`async`	boolean	—	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client ti
`context`	string	—	Optional conversation context for better pattern matching
`message`	string	Yes	User input text to analyze for jailbreak attempts
`threshold`	number	—	Confidence threshold for flagging attempts

Parameters from the server's own tool schema.

RISK

Why jailbreak_attempt_detector is rated Low

This tool reads and analyzes input text to assess risk, returning a classification/score. It has no side effects — it does not modify data, execute code, or take financial action. The severity is medium because it processes potentially sensitive user inputs and its output could be misused (e.g., to fine-tune jailbreak attempts that evade detection), even though the tool itself is purely analytical.

From the tool's definition Detects potential LLM jailbreak attempts by analyzing user input... returning a risk assessment with confidence scores and pattern matches

Attacks that exploit this kind of access

RECOMMENDED RULE

The rule that runs jailbreak_attempt_detector safely

PolicyLayer is an MCP gateway: it sits between your AI agents and Gapup Mcp, and checks every tool call against a rule you set before the call runs. Nothing changes on the server itself. For jailbreak_attempt_detector, this is the rule to start with:

jailbreak_attempt_detector Allowed

jailbreak_attempt_detector is read-only, so it stays allowed. Everything else on the server is denied unless you say otherwise.

View as policy code

policy.json

{
  "version": "1",
  "default": "deny",
  "tools": {
    "jailbreak_attempt_detector": {}
  }
}

ALLOW ONLY THIS TOOL → Instant setup, no code required.

The button opens the PolicyLayer dashboard: create your workspace, connect Gapup Mcp, apply this rule, and every jailbreak_attempt_detector call is checked against it from then on.

FAQ

Questions about jailbreak_attempt_detector

What does the jailbreak_attempt_detector tool do? +

Detects potential LLM jailbreak attempts by analyzing user input against NIST AI Risk Management Framework adversarial patterns. Designed for persona risk assessment, this tool evaluates text for common jailbreak techniques such as prompt injection, role-playing, or obfuscation. Inputs include the user message and optional context, returning a risk assessment with confidence scores and pattern matches. Ideal for real-time moderation in chat applications or API gateways. It is categorised as a Read tool in the Gapup Mcp MCP Server, which means it retrieves data without modifying state.

What parameters does jailbreak_attempt_detector accept? +

jailbreak_attempt_detector accepts 4 parameters: async, context, message, threshold. Required: message. The full parameter table on this page comes from the server's own tool schema.

How do I enforce a policy on jailbreak_attempt_detector? +

Register the Gapup MCP server in PolicyLayer and add a rule for jailbreak_attempt_detector: allow, deny, rate-limit, or require approval. Point your MCP client at the PolicyLayer proxy URL and the rule is enforced on every call, before it reaches Gapup Mcp. Nothing to install.

What risk level is jailbreak_attempt_detector? +

jailbreak_attempt_detector is a Read tool with low risk. Read-only tools are generally safe to allow by default.

Can I rate-limit jailbreak_attempt_detector? +

Yes. Add a rate_limit block to the jailbreak_attempt_detector rule in your PolicyLayer policy. For example, setting max: 10 and window: 60 limits the tool to 10 calls per minute. Rate limits are tracked per agent session and reset automatically.

How do I block jailbreak_attempt_detector completely? +

Set action: deny in the PolicyLayer policy for jailbreak_attempt_detector. The AI agent will receive a policy violation error and cannot call the tool. You can also include a reason field to explain why the tool is blocked.

What MCP server provides jailbreak_attempt_detector? +

jailbreak_attempt_detector is provided by the Gapup MCP server (https://mcp.gapup.io/mcp). PolicyLayer sits as a proxy in front of this server to enforce policies before tool calls reach the server.

KEEP EXPLORING

More on Gapup, and thousands of servers like it.

This server

All 271 Gapup tools →

Across the catalogue

The MCP Attack Database →

Guides

Roll out MCP under one policy →Data exfiltration →MCP token cost →