High-risk tools in Nodebench

High severity Nodebench MCP Server 110 of 824 tools

110 of the 824 tools in Nodebench are classified as high risk. This page profiles those tools specifically, with recommended policy actions and the attack patterns that target them.

Every operation listed below is an action PolicyLayer recommends controlling at the transport layer. Open any tool to see the full profile, risk score, and YAML policy snippet.

Tools at high risk

benchmark_models Execute

Run the same prompt against multiple LLM providers and compare responses. Returns side-by-side results with latency, token usage, and a summary. Useful for model selection, qual...
bootstrap_parallel_agents Execute

Detect whether a target project repo has parallel agent infrastructure and, if not, scaffold everything needed. Scans for task coordination, role configs, oracle testing, contex...
build_banking_packet Execute

Build a banker-readiness packet from the canonical company packet.
build_company_profile_starter Execute

Build a starter PitchBook/Crunchbase-like company profile.
build_founder_operating_model Execute

Build the complete founder operating model: execution order, queue topology, packet routing, source trust policy, progression rubric, and benchmark oracles.
build_research_digest Execute

Generate a digest of new (unseen) articles from RSS feeds. Compares against previously seen articles via SQLite. Returns only new items grouped by category. After generating, ar...
build_shared_context_subscription Execute

Build the exact pull/subscription manifest an agent client should use to watch a packet or packet scope.
build_shared_context_subscription_manifest Execute

Build a filtered snapshot/events/pull manifest for one peer, packet class, producer, scope, or subject so clients can subscribe to only relevant shared-context updates.
build_submission_export Execute

Build a generic submission export from the canonical company packet.
build_temporal_graph Execute

Build a temporal relationship graph for an entity.
burst_capture Execute

Capture N sequential screenshots at fixed intervals using Playwright.
call_driver_tool Execute

Invoke a tool on a connected MCP driver. This proxies the call to the external MCP server (e.g. playwright-mcp or mobile-mcp) and returns the result. Use list_driver_tools to se...
call_llm Execute

Call an LLM model directly and get the response with metrics (tokens, latency). Uses available API keys: Gemini → OpenAI → Anthropic. Useful for eval-driven workflows, model com...
call_openclaw_skill Execute

Run an OpenClaw skill safely through security checks.
call_webmcp_tool Execute

Invoke a WebMCP tool on a connected origin. The tool is executed in the browser page context via page.evaluate(). Args are validated for suspicious patterns and results are scan...
capture_surface_stats Execute

Capture Android SurfaceFlinger stats and logcat for jank analysis (Layer 0 only). Returns janky frame counts, missed vsync, slow UI thread metrics, and filtered logcat entries f...
capture_ui_screenshot Execute

Capture a screenshot of a URL using headless Playwright. Returns the screenshot as an inline image that multimodal agents can see and evaluate directly. Also captures browser co...
compile_decision_packet Execute

Compile entity intelligence into a decision-ready packet.
compile_scenarios Execute

Generate 3-7 future scenario branches for an entity or decision.
compile_tension_model Execute

Model explicit tensions between forces for a decision or entity.
compute_ssim_analysis Execute

Compute block-based SSIM analysis on a set of frame images. Uses 8x8 blocks with parallel ProcessPoolExecutor. Returns SSIM scores, adaptive threshold (max(0.70, median - 2*std)...
connect_mcp_driver Execute

Connect to an external MCP server and make its tools available through nodebench-mcp. Predefined drivers:
connect_openclaw Execute

Connect to an OpenClaw agent with a security policy applied.
connect_webmcp_origin Execute

Connect to a WebMCP-enabled website via Playwright. Navigates to the URL, intercepts navigator.modelContext tool registrations, and makes discovered tools available for invocati...
convex_pre_deploy_gate Execute

Run a comprehensive pre-deployment quality gate. Checks: convex/ directory structure, schema.ts validity, deprecated validator usage, auth configuration, recent audit results, a...
convex_quality_gate Execute

Run a configurable quality gate across all stored audit results. Like SonarQube
delta_self_dogfood Execute

Dogfood NodeBench Delta on itself. Verifies runtime health, setup friction, distribution surfaces, and compounding return loops, then emits a repair-ready delta.market packet.
disconnect_driver Execute

Disconnect from an external MCP driver and shut down its child process. Use this to clean up or to reconnect with different settings.
disconnect_openclaw Execute

Disconnect an OpenClaw session and generate a safety summary.
disconnect_webmcp_origin Execute

Disconnect from a WebMCP origin and close the browser page. Use this to clean up resources or to reconnect with different settings.
dive_auto_discover Execute

Scan the current page DOM and auto-register components in the dive tree. Discovers semantic landmarks (nav, header, footer, forms, modals, tables, sidebars) and counts interacti...
dive_fix_verify Execute

After fixing a bug, verify the fix by re-navigating to the affected route, comparing before/after state, and updating the bug status + changelog. This is the core flywheel step:...
dive_interaction_test Execute

Define and track a structured interaction test for a component. Provide preconditions and a sequence of test steps (action, target, expected outcome). The agent executes each st...
dive_reexplore Execute

Re-traverse a route after code changes to detect regressions and verify fixes. Compares the current state against previously registered components, bugs, and test results for th...
end_openclaw_session Execute

End an OpenClaw session and generate a safety summary.
enforce_merge_gate Execute

Pre-merge validation combining git state, verification cycles, eval runs, test results, and quality gates. Returns a go/no-go merge decision with detailed check results and bloc...
extract_video_frames Execute

Record screen and extract key frames from an Android device (Layers 1+2). Uses adb screenrecord then ffmpeg scene-filtered JPEG extraction. CRITICAL: uses -pix_fmt yuvj420p for ...
founder_local_weekly_reset Execute

One-call convenience: gathers all local context and produces a complete
get_path_replay Execute

Replay a session
gtm_script_builder Execute

Build a starter GTM script for the current founder wedge.
invoke_openclaw_skill Execute

Run an OpenClaw tool safely through security checks.
invoke_view_tool Execute

Invoke a per-view tool on the current or specified view.
judge_request_retry Execute

Request a retry, re-plan, escalation, or stop for a failed subtask.
judge_tool_output Execute

Run the 7-criterion LLM judge on a tool
load_toolset Execute

Dynamically load a toolset into the current session. After loading, the tools become immediately available for use. Based on the
log_interaction Execute

Log and optionally auto-execute an interaction step. If the built-in Playwright browser is active (launched by start_ui_dive), the action is automatically executed in the browse...
manipulate_screenshot Execute

Manipulate a screenshot using sharp (image processing). Supports crop (extract a region), resize, and annotate (draw colored rectangles and labels to highlight areas). Use after...
navigate_to_view Execute

Navigate to a specific view in the NodeBench AI frontend.
nb_start_agent Execute

Start a new agent conversation with an optional initial message.
nb_switch_research_tab Execute

Switch the research hub tab.
nodebench_ask_agent Execute

Send a question to the NodeBench AI agent. Returns a structured response with reasoning and sources.
nodebench_navigate Execute

Navigate to a specific view. Returns the target view
nodebench.claims.verify Execute

Re-run deterministic public/private boundary and source-evidence verification for a stored public claim.
nodebench.research_role Execute

Run or reuse public role research, store public hiring or market claims, and return a compact role dossier.
nodebench.research_run Execute

Run adaptive, evidence-backed research across one or more subjects. Automatically resolves entities, infers scenario facets, selects relevant research angles, reuses precomputed...
open_dive_dashboard Execute

Open the NodeBench UI Dive dashboard in a browser. Shows the full flywheel cycle:
open_local_dashboard Execute

Start the local Daily Brief dashboard server if needed, and return the URL. The dashboard shows Brief metrics, Narrative thread lanes, and Ops status — all from local SQLite.
project_financials Execute

Build 5-year financial projections based on historical data and industry assumptions. Projects: - Revenue with declining growth rates - Operating expenses (gross margin, SG&A, ...
readiness_scan Execute

Run a founder readiness scan against the progression and diligence model.
render_flow_visualization Execute

Render flow visualization with colored bounding boxes for each flow group. Supports overlay on a rendered page image or synthetic dark canvas fallback. Handles wide Figma pages ...
request_execution_approval Execute

Request a human approval gate for a risky execution-trace action. Approval state is written onto the live run so the UI and ledger can show the pending handoff.
run_autonomous_loop Execute

Execute autonomous verification loop with stop conditions. Implements Ralph Wiggum pattern with checkpoints, iteration limits, and timeout. Use for multi-step autonomous tasks t...
run_benchmark_batch Execute

Run a longitudinal benchmark batch. N=1 is a smoke test (1 founder, 1 session).
run_browserstack_benchmark_lane Execute

Return a BrowserStack/browser-automation benchmark lane payload.
run_closed_loop Execute

Track a compile-lint-test-debug closed loop iteration. Record the result of each step. Never present changes without a full green loop.
run_code_analysis Execute

Static analysis on code or text content for security issues, secrets, homograph attacks, ANSI injections, suspicious URLs, and code quality. Returns structured findings with sev...
run_competitor_signal_benchmark Execute

Return a competitor-signal-to-response benchmark lane payload.
run_deep_sim Execute

Run a multi-agent scenario simulation with bounded branching and budget controls. Instantiates agents with personas and incentives, varies conditions across branches, and genera...
run_dogfood_batch_with_judge Execute

Execute the priority 3 dogfood scenarios with automatic LLM judge validation.
run_entity_intelligence_mission Execute

Run a full DeepTrace entity intelligence mission with optional bounded research cell. Unifies relationship mapping, ownership, supply chain, signals, and causal analysis. Pass r...
run_flicker_detection Execute

Run full 4-layer Android UI flicker detection pipeline: SurfaceFlinger stats + logcat (L0), screenrecord (L1), frame extraction + SSIM analysis with adaptive threshold (L2), opt...
run_founder_autonomy_benchmark Execute

Run the weekly founder reset autonomy benchmark lane.
run_graphify Execute

Generate a knowledge graph from a folder of code, docs, papers, or images.
run_judge_loop Execute

Execute a full judge-fix-verify loop: calls a tool, judges the output, and if it fails,
run_mandatory_flywheel Execute

Enforce the mandatory 6-step AI Flywheel verification after any non-trivial change. All 6 steps must pass before work is considered done. Skipping is only allowed for trivial ch...
run_oracle_comparison Execute

Compare actual output against a known-good oracle reference. Based on Anthropic\
run_packet_to_implementation_benchmark Execute

Return a packet-to-implementation benchmark lane payload.
run_quality_gate Execute

Evaluate content or code against a set of boolean rules. Returns pass/fail with specific failures listed. The agent evaluates each rule and passes boolean results — the tool per...
run_recon Execute

Start a reconnaissance research session. Use this at the start of Phase 1 (Context Gathering) to organize research into external sources (SDKs, APIs, blogs) AND internal context...
run_research_cell Execute

Run a bounded re-analysis cell for a DeepTrace entity investigation. Queries existing DeepTrace state through parallel branches (evidence gap analysis, counter-hypothesis, dimen...
run_self_directed_delivery_loop Execute

Run a local-first autonomous delivery loop across exploratory research, planning, implementation commands, dogfood, verification, and judge. Persists one durable run in SQLite a...
run_self_heal Execute

Autonomous self-healing for detected drift issues. Fixes orphaned verification cycles
run_self_maintenance Execute

Run autonomous self-maintenance cycle. Checks TypeScript compilation, documentation sync, tool counts, test coverage, and more. Can auto-fix low-risk issues. Scope: quick (1hr c...
run_signal_sweep Execute

Run a live signal sweep across all data sources (HackerNews, GitHub Trending, Yahoo Finance, ProductHunt). Returns signals sorted by relevance with severity tiers (FLASH/PRIORIT...
run_sync_bridge_flush Execute

Open the outbound websocket bridge, pair or resume the local device, and flush pending approved operations to the paired web account.
run_tests_cli Execute

Execute a shell test command with timeout, capture stdout/stderr, and return structured results. Useful for running test suites, linters, or build commands as part of verificati...
run_visual_qa_suite Execute

End-to-end visual QA pipeline: burst capture → SSIM stability analysis →
runNewsroomPipeline Execute

Trigger the full DRANE newsroom pipeline (Scout > Historian > Analyst > Publisher) for a given topic or entity. Returns the generated narrative analysis.
runSpotFixScan Execute

Scan for common operational issues: stale missions, blocked tasks with met deps, old sniff checks.
sandbox_batch Execute

Execute multiple commands, index all outputs, and run multiple search queries — all in ONE call. This is the highest-efficiency tool: one sandbox_batch replaces N sandbox_execut...
sandbox_execute Execute

Run a shell command, automatically index the output into the sandbox, and return only a summary. The raw stdout/stderr stays in SQLite — only line counts and a preview enter con...
scrapling_crawl Execute

Start a multi-page spider crawl with extraction. Crawls from start URLs, follows links matching a CSS selector, extracts data per page. Returns a session_id to poll with scrapli...
scrapling_crawl_stop Execute

Stop a running crawl session. Pass the session_id from scrapling_crawl. Items collected so far are preserved. Use when you have enough data or need to abort. Requires Scrapling ...
self_implement Execute

Self-implement missing agent infrastructure. Generates implementation plan and code templates for: agent_loop, telemetry, evaluation, verification, multi_channel, self_learning,...
simulate_decision_paths Execute

Run Monte Carlo simulation for founder decisions. Generates multiple random paths to visualize possible future outcomes. Shows average payoff, success/failure rates, best/worst ...
smart_select_tools Execute

LLM-powered tool selection: sends your task description + a compact tool catalog to a fast model (Gemini 3 Flash, GPT-5-mini, or Claude Haiku 4.5) to pick the best 5-10 tools. M...
spawn_openclaw_agent Execute

Start a secure OpenClaw session with safety rules applied.
start_autonomy_benchmark Execute

Start an autonomous capability benchmark. Defines a complex build challenge and tracks agent progress through milestones. Inspired by Anthropic\
start_dogfood_session Execute

Start a new dogfood session for one of the 3 canonical loops (weekly_reset, pre_delegation, company_search). Returns sessionId for subsequent recording.
start_eval_run Execute

Start a new eval run. Define the test batch upfront with test cases (input, intent, expected behavior), then record results as each case is executed. Rule: no change ships witho...
start_execution_run Execute

Start a live Convex-backed execution trace run for a workflow. Creates a task session and trace together so later steps, decisions, evidence, verifications, and approvals all la...
start_ui_dive Execute

Initialize a UI/UX Full Dive session. Auto-launches a headless Playwright browser if installed (zero setup). Navigates to the app URL and optionally auto-discovers page componen...
start_verification_cycle Execute

Start a new 6-phase verification cycle for a non-trivial implementation. Returns the cycle ID and Phase 1 instructions. Call this before declaring any integration, migration, or...
switchTab Execute

Switch between research hub tabs
thompson_pipeline Execute

End-to-end Thompson Protocol pipeline orchestrator. Takes a complex topic and runs it through all 4 agents (Writer → Feynman Editor → Visual Mapper → Anti-Elitism Linter) with q...
transcribe_audio_file Execute

Transcribe a local audio file (MP3/WAV/etc) to text using faster-whisper via Python. Deterministic, no network.
trigger_batch_run Execute

Run a scheduled task right now instead of waiting.
trigger_investigation Execute

When an eval run shows regression, trigger a new verification cycle to investigate. This is how the outer loop feeds the inner loop: regressions trigger 6-phase investigations.
triple_verify Execute

Run triple verification on agent implementation. V1: Internal codebase analysis. V2: External authoritative source validation (Anthropic, OpenAI, LangChain, etc.). V3: Synthesis...
validate_agent_compatibility Execute

Run the agent validation harness — simulates how AI agents (Claude Code,

Attacks that target this class

High-risk tools in any server share these documented attack patterns. Each links to the full case and the defensive policy.

Destructive Action Autonomy
Runaway Tool Loops
Prompt Injection via Tool Results

High-risk tools in Nodebench

Tools at high risk

Attacks that target this class

More on Nodebench

Enforce policy on every Nodebench tool call.