High-risk tools in Nodebench
90 of the 724 tools in Nodebench are classified as high risk. This page profiles those tools specifically, with recommended policy actions and the attack patterns that target them.
Every operation listed below is an action PolicyLayer recommends controlling at the transport layer. Open any tool to see the full profile, risk score, and YAML policy snippet.
Tools at high risk
-
benchmark_modelsExecuteRun the same prompt against multiple LLM providers and compare responses. Returns side-by-side results with latency, token usage, and a summary. Useful for model selection, qual...
-
build_banking_packetExecuteBuild a banker-readiness packet from the canonical company packet.
-
build_before_after_memoExecuteBuild a memo showing the before and after path plus the validation rationale.
-
build_causal_chainExecuteConstruct a causal chain from temporal observations. Nodes must be in chronological order. Each node represents a cause-effect step with timestamp, label, description, and optio...
-
build_claim_graphExecuteExtract claims from a source packet and link each claim to its evidence. Returns a directed graph of claims with supporting/contradicting evidence, confidence per claim, and wha...
-
build_company_packetExecuteBuild the canonical company readiness packet.
-
build_company_profile_starterExecuteBuild a starter PitchBook/Crunchbase-like company profile.
-
build_diligence_packetExecuteBuild a diligence-oriented export payload from the canonical company packet.
-
build_founder_operating_modelExecuteBuild the complete founder operating model: execution order, queue topology, packet routing, source trust policy, progression rubric, and benchmark oracles.
-
build_investor_packetExecuteBuild an investor-oriented export payload from the canonical company packet.
-
build_research_digestExecuteGenerate a digest of new (unseen) articles from RSS feeds. Compares against previously seen articles via SQLite. Returns only new items grouped by category. After generating, ar...
-
build_shared_context_subscriptionExecuteBuild the exact pull/subscription manifest an agent client should use to watch a packet or packet scope.
-
build_shared_context_subscription_manifestExecuteBuild a filtered snapshot/events/pull manifest for one peer, packet class, producer, scope, or subject so clients can subscribe to only relevant shared-context updates.
-
build_slack_onepagerExecuteBuild a Slack-friendly one-page founder report.
-
build_submission_exportExecuteBuild a generic submission export from the canonical company packet.
-
build_temporal_graphExecuteBuild a temporal relationship graph for an entity.
-
call_driver_toolExecuteInvoke a tool on a connected MCP driver. This proxies the call to the external MCP server (e.g. playwright-mcp or mobile-mcp) and returns the result. Use list_driver_tools to se...
-
call_webmcp_toolExecuteInvoke a WebMCP tool on a connected origin. The tool is executed in the browser page context via page.evaluate(). Args are validated for suspicious patterns and results are scan...
-
compare_eval_runsExecuteCompare two eval runs to decide whether a change should ship. Returns side-by-side scores and a deploy/revert recommendation. Rule: no change ships without an eval improvement.
-
compile_decision_packetExecuteCompile entity intelligence into a decision-ready packet.
-
compile_environment_specExecuteGenerate a simulation environment specification from entity intelligence.
-
compile_scenariosExecuteGenerate 3-7 future scenario branches for an entity or decision.
-
compile_tension_modelExecuteModel explicit tensions between forces for a decision or entity.
-
execution-trace-workflowExecuteStart and maintain a traceable execution run. Use this for any workflow that needs receipts, evidence, decisions, verification, approvals, and a durable audit trail.
-
founder_direction_assessmentExecutePressure-test a founder direction against team shape, AI stance, build speed,
-
grade_agent_runExecuteGrade a single agent run on both outcome quality (task success, regressions, time) and process quality (recon, risk, tests, gates, learnings). Combines deterministic grading fro...
-
graphify_reportExecuteGet the GRAPH_REPORT.md analysis from a graphify run. Contains god nodes (most connected),
-
gtm_script_builderExecuteBuild a starter GTM script for the current founder wedge.
-
invoke_openclaw_skillExecuteRun an OpenClaw tool safely through security checks.
-
invoke_view_toolExecuteInvoke a per-view tool on the current or specified view.
-
judge_tool_outputExecuteRun the 7-criterion LLM judge on a tool
-
link_durable_objectsExecuteCreate a durable relationship such as screen -> action, workflow -> run, run -> artifact, or outcome -> evidence.
-
log_interactionExecuteLog and optionally auto-execute an interaction step. If the built-in Playwright browser is active (launched by start_ui_dive), the action is automatically executed in the browse...
-
navigate_to_viewExecuteNavigate to a specific view in the NodeBench AI frontend.
-
nb_start_agentExecuteStart new agent conversation
-
nodebench.research_runExecuteStart an adaptive, evidence-backed research run on one or more subjects (companies, people, events, topics). Reuses precomputed angles when available. Returns a runId the client...
-
preconditions_verifiedExecuteEnvironment, auth, and test data preconditions were verified before the trigger step.
-
primary_mission_preservedExecuteThe run stayed focused on the reported bug instead of drifting into unrelated exploration.
-
record_dogfood_telemetryExecuteRecord a full telemetry row for a dogfood run. Captures surface, scenario, user role, prompt, tool usage, token estimates, latency, cost, and quality scores.
-
request_execution_approvalExecuteRequest a human approval gate for a risky execution-trace action. Approval state is written onto the live run so the UI and ledger can show the pending handoff.
-
retry_budget_respectedExecuteRetries were bounded and targeted at the failing trigger or precondition, not the whole workflow.
-
run_autonomous_loopExecuteExecute autonomous verification loop with stop conditions. Implements Ralph Wiggum pattern with checkpoints, iteration limits, and timeout. Use for multi-step autonomous tasks t...
-
run_benchmark_batchExecuteRun a longitudinal benchmark batch. N=1 is a smoke test (1 founder, 1 session).
-
run_browserstack_benchmark_laneExecuteReturn a BrowserStack/browser-automation benchmark lane payload.
-
run_closed_loopExecuteTrack a compile-lint-test-debug closed loop iteration. Record the result of each step. Never present changes without a full green loop.
-
run_code_analysisExecuteStatic analysis on code or text content for security issues, secrets, homograph attacks, ANSI injections, suspicious URLs, and code quality. Returns structured findings with sev...
-
run_competitor_signal_benchmarkExecuteReturn a competitor-signal-to-response benchmark lane payload.
-
run_deep_simExecuteRun a multi-agent scenario simulation with bounded branching and budget controls. Instantiates agents with personas and incentives, varies conditions across branches, and genera...
-
run_dogfood_batch_with_judgeExecuteExecute the priority 3 dogfood scenarios with automatic LLM judge validation.
-
run_entity_intelligence_missionExecuteRun a full DeepTrace entity intelligence mission with optional bounded research cell. Unifies relationship mapping, ownership, supply chain, signals, and causal analysis. Pass r...
-
run_flicker_detectionExecuteRun full 4-layer Android UI flicker detection pipeline: SurfaceFlinger stats + logcat (L0), screenrecord (L1), frame extraction + SSIM analysis with adaptive threshold (L2), opt...
-
run_founder_autonomy_benchmarkExecuteRun the weekly founder reset autonomy benchmark lane.
-
run_graphifyExecuteGenerate a knowledge graph from a folder of code, docs, papers, or images.
-
run_judge_loopExecuteExecute a full judge-fix-verify loop: calls a tool, judges the output, and if it fails,
-
run_mandatory_flywheelExecuteEnforce the mandatory 6-step AI Flywheel verification after any non-trivial change. All 6 steps must pass before work is considered done. Skipping is only allowed for trivial ch...
-
run_oracle_comparisonExecuteCompare actual output against a known-good oracle reference. Based on Anthropic\
-
run_packet_to_implementation_benchmarkExecuteReturn a packet-to-implementation benchmark lane payload.
-
run_quality_gateExecuteEvaluate content or code against a set of boolean rules. Returns pass/fail with specific failures listed. The agent evaluates each rule and passes boolean results — the tool per...
-
run_reconExecuteStart a reconnaissance research session. Use this at the start of Phase 1 (Context Gathering) to organize research into external sources (SDKs, APIs, blogs) AND internal context...
-
run_research_cellExecuteRun a bounded re-analysis cell for a DeepTrace entity investigation. Queries existing DeepTrace state through parallel branches (evidence gap analysis, counter-hypothesis, dimen...
-
run_self_directed_delivery_loopExecuteRun a local-first autonomous delivery loop across exploratory research, planning, implementation commands, dogfood, verification, and judge. Persists one durable run in SQLite a...
-
run_self_healExecuteAutonomous self-healing for detected drift issues. Fixes orphaned verification cycles
-
run_self_maintenanceExecuteRun autonomous self-maintenance cycle. Checks TypeScript compilation, documentation sync, tool counts, test coverage, and more. Can auto-fix low-risk issues. Scope: quick (1hr c...
-
run_signal_sweepExecuteRun a live signal sweep across all data sources (HackerNews, GitHub Trending, Yahoo Finance, ProductHunt). Returns signals sorted by relevance with severity tiers (FLASH/PRIORIT...
-
run_sync_bridge_flushExecuteOpen the outbound websocket bridge, pair or resume the local device, and flush pending approved operations to the paired web account.
-
run_tests_cliExecuteExecute a shell test command with timeout, capture stdout/stderr, and return structured results. Useful for running test suites, linters, or build commands as part of verificati...
-
run_visual_qa_suiteExecuteEnd-to-end visual QA pipeline: burst capture → SSIM stability analysis →
-
sandbox_batchExecuteExecute multiple commands, index all outputs, and run multiple search queries — all in ONE call. This is the highest-efficiency tool: one sandbox_batch replaces N sandbox_execut...
-
sandbox_executeExecuteRun a shell command, automatically index the output into the sandbox, and return only a summary. The raw stdout/stderr stays in SQLite — only line counts and a preview enter con...
-
scaffold_nodebench_projectExecuteCreate a complete project template pre-configured for nodebench-mcp. Generates: package.json, AGENTS.md, .mcp.json, .parallel-agents/, .github/workflows/, tsconfig.json, .gitign...
-
scaffold_research_pipelineExecuteGenerate a complete, standalone Node.js project for an automated research digest pipeline. Creates: package.json, main script (RSS subscribe → fetch → digest → email), cron setu...
-
scrapling_crawlExecuteStart a multi-page spider crawl with extraction. Crawls from start URLs, follows links matching a CSS selector, extracts data per page. Returns a session_id to poll with scrapli...
-
scrapling_crawl_stopExecuteStop a running crawl session. Pass the session_id from scrapling_crawl. Items collected so far are preserved. Use when you have enough data or need to abort. Requires Scrapling ...
-
self_implementExecuteSelf-implement missing agent infrastructure. Generates implementation plan and code templates for: agent_loop, telemetry, evaluation, verification, multi_channel, self_learning,...
-
simulate_decision_pathsExecuteRun Monte Carlo simulation for founder decisions. Generates multiple random paths to visualize possible future outcomes. Shows average payoff, success/failure rates, best/worst ...
-
solve_green_polygon_area_from_imageExecuteCompute the area of a green filled polygon in an image by pixel segmentation, calibrating pixel-to-unit scale from nearby purple length labels. Deterministic, no network.
-
spawn_openclaw_agentExecuteStart a secure OpenClaw session with safety rules applied.
-
start_autonomy_benchmarkExecuteStart an autonomous capability benchmark. Defines a complex build challenge and tracks agent progress through milestones. Inspired by Anthropic\
-
start_component_flowExecuteClaim a component for traversal by a specific subagent. Marks it as
-
start_dogfood_sessionExecuteStart a new dogfood session for one of the 3 canonical loops (weekly_reset, pre_delegation, company_search). Returns sessionId for subsequent recording.
-
start_eval_runExecuteStart a new eval run. Define the test batch upfront with test cases (input, intent, expected behavior), then record results as each case is executed. Rule: no change ships witho...
-
start_execution_runExecuteStart a live Convex-backed execution trace run for a workflow. Creates a task session and trace together so later steps, decisions, evidence, verifications, and approvals all la...
-
start_ui_diveExecuteInitialize a UI/UX Full Dive session. Auto-launches a headless Playwright browser if installed (zero setup). Navigates to the app URL and optionally auto-discovers page componen...
-
start_verification_cycleExecuteStart a new 6-phase verification cycle for a non-trivial implementation. Returns the cycle ID and Phase 1 instructions. Call this before declaring any integration, migration, or...
-
track_milestoneExecuteRecord a significant milestone (phase complete, deploy, ship, launch, pivot, decision) with optional evidence and metrics.
-
trigger_batch_runExecuteRun a scheduled task right now instead of waiting.
-
trigger_investigationExecuteWhen an eval run shows regression, trigger a new verification cycle to investigate. This is how the outer loop feeds the inner loop: regressions trigger 6-phase investigations.
-
trigger_verify_splitExecuteThe action that attempts reproduction is separate from the step that verifies the resulting UI or system state.
-
ui-qa-checklistExecuteUI/UX QA checklist for frontend implementations. Run after any change that touches React components, layouts, or interactions. Guides the agent through component tests, accessib...
-
workflowTitleExecuteHuman-readable title for the run
Attacks that target this class
High-risk tools in any server share these documented attack patterns. Each links to the full case and the defensive policy.