High-risk tools in Judges Panel
63 of the 78 tools in Judges Panel are classified as high risk. This page profiles those tools specifically, with recommended policy actions and the attack patterns that target them.
Every operation listed below is an action PolicyLayer recommends controlling at the transport layer. Open any tool to see the full profile, risk score, and YAML policy snippet.
Tools at high risk
-
ai-code-reviewExecuteOptimized for reviewing AI-generated code
-
benchmark_gateExecuteRun the benchmark suite and check results against quality thresholds. Returns pass/fail with metric details including F1, precision, recall, and detection rate. Use in CI pipeli...
-
boilerplate-expressExecuteStandard Express.js boilerplate patterns
-
ci-friendlyExecuteOptimized for CI pipelines with critical-only findings
-
ComplianceExecuteFocus on compliance, data security, sovereignty, and privacy judges.
-
DjangoExecuteTuned for Django apps — emphasizes template security, ORM misuse, CSRF, admin security.
-
error-handling-gapsExecuteAsync code without error handling (common AI omission)
-
evaluate_app_builder_flowExecuteRun a 3-step app-builder workflow: tribunal review, plain-language risk translation, and prioritized remediation tasks with AI-fixable P0/P1 items.
-
evaluate_batchExecuteEvaluate multiple code files in a single call. Returns per-file verdicts with scores and findings, plus aggregate statistics.
-
evaluate_codeExecuteSubmit code to the full Judges Panel for evaluation. Handles ALL code types including application code, infrastructure-as-code (Bicep, Terraform, ARM, CloudFormation), and confi...
-
evaluate_code_single_judgeExecuteSubmit code to a specific judge for targeted domain analysis. Handles ALL code types including application code, infrastructure-as-code (Bicep, Terraform, ARM, CloudFormation), ...
-
evaluate_code_streamingExecuteSubmit code for streaming evaluation — returns per-judge results as each judge completes, with running aggregate scores. Ideal for long evaluations where you want progressive fe...
-
evaluate_diffExecuteEvaluate only the changed lines in a code diff. Runs all ${JUDGES.length} judges on the full file but filters findings to only those affecting the specified changed lines. Ideal...
-
evaluate_focusedExecuteRun a focused evaluation using only the specified judges. Use this after an initial full evaluation to re-check specific areas — for example, re-run only
-
evaluate_git_diffExecuteEvaluate code changes from a git diff. Parses the unified diff from a git repository, identifies changed files and lines, and runs the full tribunal on each changed file — filte...
-
evaluate_policy_awareExecuteRun policy-aware tribunal evaluation with named policy profiles (startup, regulated, healthcare, fintech, public-sector), evidence calibration from runtime metrics, specialty-pe...
-
evaluate_projectExecuteSubmit multiple files for project-level analysis. All ${JUDGES.length} judges evaluate each file, plus cross-file architectural analysis detects issues like code duplication, in...
-
evaluate_public_repo_reportExecuteClone a public repository URL, run the full judges panel across source files, and generate a consolidated markdown report.
-
evaluate_then_fixExecuteEvaluate code and automatically generate fix patches for all findings that have auto-fix support. Returns the evaluation verdict alongside ready-to-apply patches. Use this for a...
-
evaluate_with_progressExecuteEvaluate code with progressive judge-by-judge reporting. Returns intermediate counts as each judge completes, useful for large files where full tribunal takes time.
-
evaluate-codeExecuteEvaluate a code snippet or file for issues across all judge categories
-
evaluate-diffExecuteEvaluate a code diff (PR or commit) for introduced issues
-
example-domainsExecuteExample domains/placeholder URLs from training data
-
excessive-inline-commentsExecuteLine-by-line explanatory comments (AI teaching style)
-
execute_sqlExecuteExecute any SQL query on the database
-
explain_findingExecuteExplain a Judges Panel finding in plain language. Provides OWASP/CWE references, risk context, and remediation guidance based on the rule ID and finding details.
-
explain-findingExecuteProvide detailed explanation of a specific finding
-
ExpressExecuteTuned for Express.js APIs — emphasizes middleware security, authentication, CORS, and rate limiting.
-
FastAPIExecuteTuned for Python FastAPI — focuses on input validation, async patterns, and API security.
-
FintechExecuteFor financial services — PCI DSS compliance, cryptography, authentication,
-
fix_codeExecuteEvaluate code with the Judges Panel and automatically apply all available auto-fix patches. Returns the fixed code along with a summary of applied and remaining findings. Use th...
-
generic-namingExecuteGeneric variable names (data, result, response, temp, item, value)
-
GovernmentExecuteFor government and public sector — FedRAMP/NIST compliance, data sovereignty,
-
HealthtechExecuteFor healthcare applications — HIPAA compliance, data sovereignty, encryption at rest,
-
KubernetesExecuteTuned for Kubernetes manifests — security contexts, RBAC, resource limits, network policies.
-
LenientExecuteOnly critical and high severity findings. Good for early development.
-
minimalExecuteMinimal configuration with only critical findings
-
missing-testsExecuteComplex implementation file without corresponding test references
-
Next.jsExecuteTuned for Next.js — covers both server and client security, API routes, SSR/ISR patterns.
-
onboardingExecuteGentle review for new team members
-
OnboardingExecuteSmart defaults for first-time adoption — suppresses noisy absence-based rules,
-
performanceExecuteFocus on performance issues
-
PerformanceExecuteFocus on performance, caching, scalability, and concurrency judges.
-
placeholder-credentialsExecutePlaceholder API keys/tokens from AI training data
-
pr-reviewExecuteBalanced review for pull requests
-
RailsExecuteTuned for Ruby on Rails — emphasizes mass assignment, CSRF, SQL injection, strong params.
-
re_evaluate_with_contextExecuteRe-evaluate code with developer-provided context from a multi-turn conversation. Accepts disputed findings, accepted findings, and additional context to adjust the evaluation. T...
-
ReactExecuteTuned for React/Next.js apps — enables accessibility, XSS protection, disables backend-only judges.
-
record_feedbackExecuteRecord user feedback on a finding — mark it as a true positive (tp), false positive (fp), or won
-
review-projectExecuteFull project-level review with cross-file analysis
-
run_benchmarkExecuteRun the full benchmark suite and return a detailed dashboard with per-judge, per-category, and per-difficulty breakdowns. Includes precision, recall, F1, false positive rates, a...
-
run_commandExecuteExecute shell command on the server
-
SaaSExecuteFor multi-tenant SaaS platforms — tenant isolation, rate limiting, scalability,
-
scaffold_pluginExecuteGenerate a starter plugin template for the Judges Panel. Creates a self-contained plugin file with custom rules, optional custom judges, and lifecycle hooks.
-
security-auditExecuteDeep security review with all severity levels
-
security-focusedExecuteFocus on security vulnerabilities and best practices
-
spawn_agentExecuteSpawn a new agent to handle a subtask
-
strictExecuteStrict mode with all judges enabled and low severity threshold
-
StrictExecuteAll judges, all severities. No findings tolerated. Best for production code reviews.
-
suggest-fixExecuteGenerate fix suggestions for detected findings
-
TerraformExecuteTuned for Terraform/OpenTofu IaC — focuses on infrastructure security, cloud-readiness, compliance.
-
todo-placeholderExecuteTODO/FIXME placeholders common in AI-generated code
-
uniform-commentsExecuteUniform JSDoc/docstring style on every function
Attacks that target this class
High-risk tools in any server share these documented attack patterns. Each links to the full case and the defensive policy.