High-risk tools in Judges Panel

High severity Judges Panel MCP Server 63 of 78 tools

63 of the 78 tools in Judges Panel are classified as high risk. This page profiles those tools specifically, with recommended policy actions and the attack patterns that target them.

Every operation listed below is an action PolicyLayer recommends controlling at the transport layer. Open any tool to see the full profile, risk score, and YAML policy snippet.

Tools at high risk

ai-code-review Execute

Optimized for reviewing AI-generated code
benchmark_gate Execute

Run the benchmark suite and check results against quality thresholds. Returns pass/fail with metric details including F1, precision, recall, and detection rate. Use in CI pipeli...
boilerplate-express Execute

Standard Express.js boilerplate patterns
ci-friendly Execute

Optimized for CI pipelines with critical-only findings
Compliance Execute

Focus on compliance, data security, sovereignty, and privacy judges.
Django Execute

Tuned for Django apps — emphasizes template security, ORM misuse, CSRF, admin security.
error-handling-gaps Execute

Async code without error handling (common AI omission)
evaluate_app_builder_flow Execute

Run a 3-step app-builder workflow: tribunal review, plain-language risk translation, and prioritized remediation tasks with AI-fixable P0/P1 items.
evaluate_batch Execute

Evaluate multiple code files in a single call. Returns per-file verdicts with scores and findings, plus aggregate statistics.
evaluate_code Execute

Submit code to the full Judges Panel for evaluation. Handles ALL code types including application code, infrastructure-as-code (Bicep, Terraform, ARM, CloudFormation), and confi...
evaluate_code_single_judge Execute

Submit code to a specific judge for targeted domain analysis. Handles ALL code types including application code, infrastructure-as-code (Bicep, Terraform, ARM, CloudFormation), ...
evaluate_code_streaming Execute

Submit code for streaming evaluation — returns per-judge results as each judge completes, with running aggregate scores. Ideal for long evaluations where you want progressive fe...
evaluate_diff Execute

Evaluate only the changed lines in a code diff. Runs all ${JUDGES.length} judges on the full file but filters findings to only those affecting the specified changed lines. Ideal...
evaluate_focused Execute

Run a focused evaluation using only the specified judges. Use this after an initial full evaluation to re-check specific areas — for example, re-run only
evaluate_git_diff Execute

Evaluate code changes from a git diff. Parses the unified diff from a git repository, identifies changed files and lines, and runs the full tribunal on each changed file — filte...
evaluate_policy_aware Execute

Run policy-aware tribunal evaluation with named policy profiles (startup, regulated, healthcare, fintech, public-sector), evidence calibration from runtime metrics, specialty-pe...
evaluate_project Execute

Submit multiple files for project-level analysis. All ${JUDGES.length} judges evaluate each file, plus cross-file architectural analysis detects issues like code duplication, in...
evaluate_public_repo_report Execute

Clone a public repository URL, run the full judges panel across source files, and generate a consolidated markdown report.
evaluate_then_fix Execute

Evaluate code and automatically generate fix patches for all findings that have auto-fix support. Returns the evaluation verdict alongside ready-to-apply patches. Use this for a...
evaluate_with_progress Execute

Evaluate code with progressive judge-by-judge reporting. Returns intermediate counts as each judge completes, useful for large files where full tribunal takes time.
evaluate-code Execute

Evaluate a code snippet or file for issues across all judge categories
evaluate-diff Execute

Evaluate a code diff (PR or commit) for introduced issues
example-domains Execute

Example domains/placeholder URLs from training data
excessive-inline-comments Execute

Line-by-line explanatory comments (AI teaching style)
execute_sql Execute

Execute any SQL query on the database
explain_finding Execute

Explain a Judges Panel finding in plain language. Provides OWASP/CWE references, risk context, and remediation guidance based on the rule ID and finding details.
explain-finding Execute

Provide detailed explanation of a specific finding
Express Execute

Tuned for Express.js APIs — emphasizes middleware security, authentication, CORS, and rate limiting.
FastAPI Execute

Tuned for Python FastAPI — focuses on input validation, async patterns, and API security.
Fintech Execute

For financial services — PCI DSS compliance, cryptography, authentication,
fix_code Execute

Evaluate code with the Judges Panel and automatically apply all available auto-fix patches. Returns the fixed code along with a summary of applied and remaining findings. Use th...
generic-naming Execute

Generic variable names (data, result, response, temp, item, value)
Government Execute

For government and public sector — FedRAMP/NIST compliance, data sovereignty,
Healthtech Execute

For healthcare applications — HIPAA compliance, data sovereignty, encryption at rest,
Kubernetes Execute

Tuned for Kubernetes manifests — security contexts, RBAC, resource limits, network policies.
Lenient Execute

Only critical and high severity findings. Good for early development.
minimal Execute

Minimal configuration with only critical findings
missing-tests Execute

Complex implementation file without corresponding test references
Next.js Execute

Tuned for Next.js — covers both server and client security, API routes, SSR/ISR patterns.
onboarding Execute

Gentle review for new team members
Onboarding Execute

Smart defaults for first-time adoption — suppresses noisy absence-based rules,
performance Execute

Focus on performance issues
Performance Execute

Focus on performance, caching, scalability, and concurrency judges.
placeholder-credentials Execute

Placeholder API keys/tokens from AI training data
pr-review Execute

Balanced review for pull requests
Rails Execute

Tuned for Ruby on Rails — emphasizes mass assignment, CSRF, SQL injection, strong params.
re_evaluate_with_context Execute

Re-evaluate code with developer-provided context from a multi-turn conversation. Accepts disputed findings, accepted findings, and additional context to adjust the evaluation. T...
React Execute

Tuned for React/Next.js apps — enables accessibility, XSS protection, disables backend-only judges.
record_feedback Execute

Record user feedback on a finding — mark it as a true positive (tp), false positive (fp), or won
review-project Execute

Full project-level review with cross-file analysis
run_benchmark Execute

Run the full benchmark suite and return a detailed dashboard with per-judge, per-category, and per-difficulty breakdowns. Includes precision, recall, F1, false positive rates, a...
run_command Execute

Execute shell command on the server
SaaS Execute

For multi-tenant SaaS platforms — tenant isolation, rate limiting, scalability,
scaffold_plugin Execute

Generate a starter plugin template for the Judges Panel. Creates a self-contained plugin file with custom rules, optional custom judges, and lifecycle hooks.
security-audit Execute

Deep security review with all severity levels
security-focused Execute

Focus on security vulnerabilities and best practices
spawn_agent Execute

Spawn a new agent to handle a subtask
strict Execute

Strict mode with all judges enabled and low severity threshold
Strict Execute

All judges, all severities. No findings tolerated. Best for production code reviews.
suggest-fix Execute

Generate fix suggestions for detected findings
Terraform Execute

Tuned for Terraform/OpenTofu IaC — focuses on infrastructure security, cloud-readiness, compliance.
todo-placeholder Execute

TODO/FIXME placeholders common in AI-generated code
uniform-comments Execute

Uniform JSDoc/docstring style on every function

Attacks that target this class

High-risk tools in any server share these documented attack patterns. Each links to the full case and the defensive policy.

Destructive Action Autonomy
Runaway Tool Loops
Prompt Injection via Tool Results

High-risk tools in Judges Panel

Tools at high risk

Attacks that target this class

More on Judges Panel

Let agents act without letting them run wild.