NODEBENCH TOOLS

724 tools from the Nodebench MCP Server, categorised by risk level.

READ 513 tools
Read abandon_cycle Abandon an active verification cycle that will not be completed. Use this to clean up orphaned or stale cyc... Read accept_shared_task Accept a proposed shared-context task. Read ack_shared_context Acknowledge that a peer received and accepted a context packet. Read agent-contract The NodeBench Agent Contract — mandatory behavioral rules for any agent using NodeBench MCP. Embeds the Read agent-delegation-with-approval-trace Traceable workflow for delegated agent work with approval gates. Use this when a capable agent can operate,... Read agentCount Number of parallel agents (default: 4) Read all_tests_green Full test suite passes (static + unit + integration + e2e) Read analyze_experiment_data Analyze experiment data (CSV or JSON) and generate publication-ready analysis paragraphs. Uses \\paragraph{... Read analyze_figma_flows Full Figma flow analysis pipeline: extract frames from Figma file (depth=3 tree traversal), cluster into fl... Read analyze_repo Analyze a GitHub repository Read analyze_screenshot Send a screenshot to a vision-capable AI model for analysis. Accepts base64 image data (from capture_ui_scr... Read analyze_seo_content Analyze HTML or text content for SEO quality: word count, sentence count, paragraph count, Flesch-Kincaid r... Read analyze_voice_config Validate a voice pipeline configuration. Checks STT/TTS/LLM compatibility, estimates per-minute costs, iden... Read anomalies_logged_separately Secondary anomalies or newly discovered bugs were recorded separately from the main verdict. Read aria_labels_present Interactive elements have accessible names (aria-label, aria-labelledby, or visible text) Read ask_context Ask against saved NodeBench context first. This searches session memory and the accumulated knowledge base ... Read assess_risk Assess risk tier for a given action. Returns tier (low/medium/high), reversibility, external impact, and re... Read audit_openclaw_skills Scan installed OpenClaw tools for security risks: dangerous patterns, Read baseline_exists A visual baseline screenshot exists for the changed route/component at all target viewports Read benchmark_voice_latency Calculate theoretical latency for one or more voice pipeline configurations. Breaks down STT, LLM (first to... Read bind_local_account Record explicit local pairing permission so this device can map durable local context to a specific web use... Read blocked_infra_classified Infrastructure or environment blockers were classified explicitly instead of being mislabeled as app defects. Read bootstrap_parallel_agents Detect whether a target project repo has parallel agent infrastructure and, if not, scaffold everything nee... Read bootstrap-parallel-agents Detect and scaffold parallel agent infrastructure for any project. Scans a target repo for 7 categories of ... Read burn_rate_sanity Sanity check founder burn against runway and stage expectations. Read burst_capture Capture N sequential screenshots at fixed intervals using Playwright. Read call_llm Call an LLM model directly and get the response with metrics (tokens, latency). Uses available API keys: Ge... Read call_loaded_tool Call a dynamically loaded tool by name. Use this after load_toolset when your client does not automatically... Read capture_responsive_suite Capture screenshots at all 3 standard responsive breakpoints (mobile 375px, tablet 768px, desktop 1280px) i... Read capture_surface_stats Capture Android SurfaceFlinger stats and logcat for jank analysis (Layer 0 only). Returns janky frame count... Read capture_ui_screenshot Capture a screenshot of a URL using headless Playwright. Returns the screenshot as an inline image that mul... Read cheapest_valid_path Return the cheapest valid workflow path for the stated objective. Read check_agent_inbox Read unread messages for the current agent session. Filter by category, sender, or priority. Messages are m... Read check_contract_compliance Analyze an agent session Read check_design_compliance Check a single .tsx/.ts file for design governance compliance. Runs all banned pattern checks (color, typog... Read check_dive_drivers Setup wizard for MCP automation drivers. Probes the system for Android SDK (ANDROID_HOME, adb, emulator), i... Read check_email_setup Diagnostic wizard for email tool configuration. Checks env vars (EMAIL_USER, EMAIL_PASS, etc.), optionally ... Read check_framework_updates Get a structured checklist of sources to check for framework/SDK updates. Pre-built source lists for: anthr... Read check_git_compliance Validate branch state, uncommitted changes, and conventional commit compliance. Checks if on a protected br... Read check_mcp_setup Comprehensive diagnostic wizard for the entire NodeBench MCP. Checks all env vars, API keys, optional npm p... Read check_openclaw_setup Check if OpenClaw is ready to use: verifies the server is installed, Read check_page_performance Lightweight page performance check via HTTP fetch (no browser). Measures: response time, content size, comp... Read check_paper_logic Check academic text for logical issues: contradictions between statements, undefined terms, terminology inc... Read check_peer_messages Read the direct messages for a peer, optionally restricting to unread items. Read check_plugin_update_readiness Check whether NodeBench MCP is ready for a version/update push across installer, plugin metadata, and edito... Read check_skill_freshness Check if registered skills are stale by comparing current source file hashes Read check_webmcp_setup Check WebMCP prerequisites: Playwright installation, Chromium browser availability, and any cached origin d... Read check_wordpress_site Detect whether a site runs WordPress and assess its security posture. Checks: WP generator meta tag, wp-con... Read claim_agent_task Claim a task lock so other parallel agents know you Read claim_verification_scan Scan high-risk claims against available evidence classes. Read classify_failure Classify a failure by canonical system layer taxonomy. Tracks symptom, root cause, and system layer for str... Read claude-code-parallel Guide for using NodeBench MCP with Claude Code Read cleanup_stale_runs Clean up orphaned eval runs stuck in Read cluster_figma_flows Cluster Figma frames into flow groups using multi-signal priority cascade: 1) section-based grouping (highe... Read color_contrast_sufficient Text meets WCAG 2.1 AA contrast ratios: 4.5:1 for normal text, 3:1 for large text (18px+ or 14px+ bold) Read company_intelligence Company analysis: discover → synthesize → export → track action Read company-direction-analysis-trace Traceable workflow for capability-to-product-direction analysis grounded in public evidence, credibility fi... Read compare Compare 2-4 entities using the same report-shaped output. The tool backfills quick diligence packets first ... Read compare_founder_directions Compare multiple founder directions side by side across strategic angles, issue count, confidence, and reco... Read compare_options Generate a formatted side-by-side comparison table from scored research results. Takes either raw options o... Read compare_savings Compare token usage, time savings, and cost estimates. Optionally compare two sessions for before/after ROI... Read compare_workflow_paths Compare current and optimized workflow paths and quantify likely savings. Read competitor_brief Competitor research: discover → recon → synthesize → export → track Read compiles_clean TypeScript compiles with zero errors (tsc --noEmit) Read component_renders Changed/new components render without errors in tests and browser Read componentName The component or feature that changed (e.g. Read compute_calibration Compute calibration bins and Brier aggregates across all resolved forecasts. Returns 10 bins (0-10%, 10-20%... Read compute_dimension_profile Recompute and persist the DeepTrace dimension profile for an entity. Use after new company evidence, relati... Read compute_web_stability Compute frame-to-frame stability metrics for a burst capture. Calculates Read critter_check The accountability partner that wants to know everything — answer Read csv_aggregate Aggregate values from a local CSV (count/sum/avg/min/max) with optional filters. Deterministic, no network. Read dark_light_variants_consistent Both dark and light theme variants render correctly without missing theme tokens or invisible text Read decision_quality_scan Check whether the founder decision has clear criteria, falsifiers, and next actions. Read delta_brief Generate a Read delta_compare Side-by-side entity comparison. Produces a delta.diligence comparison packet highlighting differences, stre... Read delta_diligence Deep entity intelligence teardown. Produces a delta.diligence packet with signals, risks, opportunities, an... Read delta_handoff Generate a delegation packet for handing off work to another agent or teammate. Produces a delta.handoff pa... Read delta_packets List recent delta packets. View your packet history, filter by type, and track lineage. Read delta_retain Preserve context for future sessions. Produces a delta.retain packet storing important notes, decisions, me... Read delta_review Reconcile a forecast or recommendation against reality. Produces a delta.review packet so the next decision... Read delta_scan Run a self-diligence market coverage scan. Produces a delta.market packet analyzing what NodeBench Delta co... Read delta_self_dogfood Dogfood NodeBench Delta on itself. Verifies runtime health, setup friction, distribution surfaces, and comp... Read delta_watch Add, remove, or list entities on your watchlist. Watched entities are checked for material changes during d... Read Deployment Ship changes with full quality verification Read design_voice_pipeline Given requirements (latency, privacy, platform, budget), recommend an optimal STT/TTS/LLM stack. Returns to... Read detect_company_mode Classify a request as own-company, external-company, or mixed-comparison mode before packet routing. Read detect_contradictions Detect contradictions across multiple sources for an entity. Read detect_repeated_questions Analyze causal_events to find patterns where the user/agent asks the same strategic question repeatedly. Us... Read detect_subvertical Detect the founder subvertical from the query and context. Read detect_temporal_signal Analyze observations in a stream and detect temporal signals: momentum (sustained directional trend), regim... Read detect_vertical Detect the founder vertical and subvertical from the query and context. Read diff_crawl Before/after site comparison. First call captures baseline. Second call diffs against baseline. Shows added... Read diff_outputs Compare two text or JSON outputs and produce a structured diff with similarity score. Use for model compari... Read diff_screenshots Compare two images structurally using pixel-level analysis. Computes perceptual similarity, difference regi... Read discover_infrastructure Discover existing agent infrastructure in the codebase. Scans for agent loops, telemetry, evaluation, verif... Read discover_tools Multi-modal tool search engine with 14 scoring strategies: keyword, fuzzy (typo-tolerant), n-gram (partial ... Read discover_vision_env Discover available vision-capable AI SDKs and API keys in the current environment. Returns which providers ... Read dismiss_alert Dismiss an important change alert so it no longer appears in proactive alerts. Sets the status to Read distribution_surface_scan Scan which distribution surfaces are actually ready right now. Read dive_auto_discover Scan the current page DOM and auto-register components in the dive tree. Discovers semantic landmarks (nav,... Read dive_code_locate Find the exact source code location for a bug, component, or design issue. Uses grep/ripgrep to search the ... Read dive_code_review Generate a structured code review report from all dive findings — similar to CodeRabbit or Augment Code Rev... Read dive_fix_verify After fixing a bug, verify the fix by re-navigating to the affected route, comparing before/after state, an... Read dive_interaction_test Define and track a structured interaction test for a component. Provide preconditions and a sequence of tes... Read dive_link_backend Link a UI component to its backend dependencies. Connect components to API endpoints, Convex queries/mutati... Read dive_preflight Analyze a project BEFORE starting a UI dive. Scans the project directory to detect: framework (Vite, Next.j... Read dive_record_test_step Record the actual result of a test step after executing it via the MCP Bridge. Compare expected vs actual, ... Read dive_reexplore Re-traverse a route after code changes to detect regressions and verify fixes. Compares the current state a... Read dive_snapshot Capture a screenshot or accessibility snapshot of the current page during a dive session. Requires the buil... Read dive_walkthrough Generate a comprehensive page-by-page, component-by-component walkthrough document for a dive session. Incl... Read end_component_flow Complete a component Read end_dogfood_session End a dogfood session with summary metrics: time-to-first-useful-output, delegation success, packet export ... Read end_openclaw_session End an OpenClaw session and generate a safety summary. Read enrich_entity Enrich an entity (company, person, theme, market) with structured intelligence. Read enrich_recon Retroactively enrich an existing recon session with live web search results. Call this after run_recon when... Read entity_lookup Quick intelligence on any company, person, or topic. Type a name, get structured facts, signals, and sugges... Read error_handling_present Error paths are handled, not swallowed silently Read escalate_shared_task Escalate a shared-context task when the assignee cannot complete it cleanly. Read eval_scores_improved Eval scores are equal to or better than baseline (use compare_eval_runs) Read evidence_attached The verdict cites concrete evidence such as screenshots, logs, videos, metrics, or diffs. Read evidence_gap_scan List missing evidence classes and materials for diligence readiness. Read extract_figma_frames Extract all frames from a Figma file using depth=3 tree traversal (DOCUMENT -> CANVAS -> SECTION -> FRAME).... Read extract_fractions_and_simplify_from_image Extract slash-style fractions (e.g. 3/4) from body text in an image and also detect stacked numerator/denom... Read extract_patent_claims Extract likely patent and IP claims from source text. Read extract_publication_metadata Extract publication-oriented metadata from source text. Read extract_regulatory_artifacts Extract regulatory path signals from source text. Read extract_structured_data Extract structured JSON data from unstructured text using an LLM. Provide the text and a description of the... Read extract_trial_evidence Extract trial, study, or lab evidence snippets from source text. Read extract_variables Identify and weight the key variables driving an entity Read extract_video_frames Record screen and extract key frames from an Android device (Layers 1+2). Uses adb screenrecord then ffmpeg... Read fetch_rss_feeds Fetch and parse all registered RSS/Atom feeds (or specific URLs). New articles are stored in SQLite with is... Read fetch_url Fetch a URL and extract its content as markdown, text, or raw HTML. Useful for reading documentation, blog ... Read fileUri Input spreadsheet path or URI Read financial_hygiene_check Return the hidden financial hygiene requirements many founders forget before diligence. Read find_contradictions_for Find all entities that contradict a given entity or concept. Read findTools Search available methodology tools by keyword or capability description. Returns matching tool names and de... Read flag_important_change Flag a detected important change with impact scoring, affected entities, and optional suggested action. Use... Read focus_visible Focus indicators are visible on all interactive elements when navigated via keyboard Read follows_existing_patterns New code follows the same patterns as surrounding code Read forecast_temporal_trend Zero-shot forecasting on numeric time series. Supports naive (last value), linear (regression), and exponen... Read form_labels_linked All form inputs have associated <label> elements via htmlFor/id or are wrapped in <label> Read founder_company_naming_pack Generate a founder company naming shortlist and starter profile. Read founder_deep_context_gather MUST be called before generating or updating a Founder Artifact Packet. Read founder_delegation_boundary_scan Separate delegable work from founder-only work for the current direction. Read founder_gaps_detect Detect missing foundations, hidden risks, and weak strategic angles for a founder direction. Read founder_local_gather Gathers all locally-available context for a founder packet: git log, Read founder_local_synthesize Takes gathered local context and synthesizes a complete Founder Artifact Packet. Read founder_materials_check Return the founder materials checklist and missing external-readiness artifacts. Read founder_next_unlocks List the next progression unlocks required to move the founder to the next stage. Read founder_ontrack_scorecard Return explicit 2-week and 3-month on-track or off-track scorecards. Read founder_packet_diff Compares two Founder Artifact Packets and returns a structured diff showing Read founder_packet_history_diff Compares the most recent Founder Artifact Packet for an entity against Read founder_packet_validate Validates a draft Founder Artifact Packet against quality gates before saving. Read founder_readiness_score Return the founder readiness score and a concise interpretation. Read founder_stage_assess Return the founder progression stage, readiness score, and stage ladder for the current direction. Read founder_target_customer_map Map the downstream customer groups the company should target first. Read get_ab_test_report Generate an A/B test comparison report for static vs dynamic toolset loading. Shows session counts, tool co... Read get_active_forecasts List active forecasts. Optionally filter by those needing refresh (based on last refresh time and frequency). Read get_agent_role Get the current agent Read get_annual_retrospective Full year view: quarterly summaries, yearly milestones, category distribution, growth metrics. Read get_autopilot_status Check the status of scheduled tasks: when the last run happened, Read get_batch_run_history See the history of past scheduled runs: what was found, Read get_benchmark_history View historical benchmark batch results. Shows RCA and PRR trends over time. Read get_benchmark_oracles Return the oracle definitions for founder autonomy and workflow optimization benchmark lanes. Read get_benchmark_report Get the full detailed report for a specific benchmark batch by batchId. Read get_boilerplate_status Check what NodeBench infrastructure is already set up in a project vs what Read get_causal_chain Trace the causality chain from a given event backwards through causedByEventId links. Returns the chain of ... Read get_compaction_recovery Recovery tool for post-context-compaction state restoration. Call this RIGHT AFTER Claude Code compacts con... Read get_company_truth Get current canonical company truth from subconscious memory blocks. Read get_context_bundle Returns the full NodeBench context bundle: pinned identity (mission, wedge, confidence), Read get_daily_brief_summary Get the latest daily brief summary from local SQLite. Returns dashboard metrics, features, and source summa... Read get_daily_log Get all tracked actions for a specific date, grouped by session with milestone highlights. Read get_design_spec Return the full design governance specification as structured JSON. Includes: approved semantic colors, typ... Read get_design_violations Scan the entire src/ directory for design governance violations. Groups results by file, sorted by severity... Read get_dimension_profile Fetch the latest persisted DeepTrace dimension profile, including regime label, policy context, confidence,... Read get_distribution_surfaces Inspect NodeBench MCP distribution surfaces: npm/npx, installer, plugin configs, Smithery, and shared web r... Read get_dive_report Generate a comprehensive UI/UX Full Dive report for a session. Includes: executive summary, component tree,... Read get_dive_tree Get the full XML-like component tree for a dive session. Shows all registered components in their hierarchy... Read get_dogfood_sessions List recent dogfood sessions with their judge scores. Filter by loop type. Read get_dogfood_telemetry Query dogfood telemetry rows with optional filters. Returns matching rows plus computed averages (tool call... Read get_drift_report Detect configuration and state drift in the NodeBench system. Checks for orphaned verification Read get_engine_context_health Returns accumulated knowledge health: learnings count + freshness, conformance trend direction (improving/s... Read get_entity_graph_summary Get a summary of the knowledge graph: entity counts by type, edge counts by relation, recent additions. Read get_event_ledger Query the causal event ledger with optional filtering by entityId, eventType, entityType, or correlationId. Read get_failure_triage Get open failure cases grouped by system layer with frequency counts. The triage board for fixing system gaps. Read get_figma_design_context Get design context for a specific component or screen from the codebase governance spec. Returns the releva... Read get_flywheel_status Get the current state of both loops (Verification inner loop and Eval outer loop) and how they connect. Sho... Read get_forecast_chain Get full audit trail for a forecast: the forecast itself, all evidence entries, update history, and resolut... Read get_forecast_evidence Query evidence ledger for a forecast. Optionally filter by signal direction (supporting/disconfirming/neutr... Read get_forecast_track_record Get aggregate Brier scores, calibration summary, and track record statistics across all resolved forecasts. Read get_founder_execution_order Return the canonical founder/company packet execution order so all surfaces follow the same run sequence. Read get_founder_job_topology Return the queue and job topology for founder ingestion, sweeps, deltas, packet refresh, exports, delegatio... Read get_founder_packet_resource Fetch the resource URI, pull query, and subscription query for a founder issue or resolution packet. Read get_founder_progression_rubric Return the explicit founder progression rubric, including mandatory and optional signals for each stage. Read get_gate_history Get the history of quality gate runs for a given gate name. Shows pass/fail trend over time. Read get_gate_preset Get the rules for a built-in quality gate preset. Returns rule names, descriptions, and evaluation instruct... Read get_important_changes Query flagged important changes with optional status filtering. Returns changes ordered by timestamp descen... Read get_improvement_recommendations Analyze all persisted data and return actionable improvement recommendations. Detects: unused tools, missin... Read get_ingest_status Report which data sources are available for ingestion. Checks if Claude Code transcripts exist, what projec... Read get_judge_history Get the history of LLM judge runs, optionally filtered by scenario. Read get_latest_signals Get the most recent signal sweep results without running a new sweep. Returns cached signals from the last ... Read get_messaging_health Check which messaging channels are working. Read get_monthly_report Monthly rollup of actions: weekly breakdown, trending categories, velocity (actions/day), milestone timeline. Read get_narrative_status Get narrative thread status from local SQLite. Returns threads grouped by phase (emerging, escalating, clim... Read get_observability_summary Unified observability summary combining MCP system pulse, sentinel probes, watchdog status, Read get_openclaw_delivery_status Check if your messages were delivered. Read get_openclaw_results Get results and safety summary for an OpenClaw session. Read get_ops_dashboard Get operational dashboard status from local SQLite. Returns last sync info, tool call frequency, active ver... Read get_packet_lineage Trace the full derivation chain for a packet or entity. Read get_parallel_status Get a comprehensive overview of all parallel agent activity: active task claims, role assignments, context ... Read get_path_replay Replay a session Read get_proactive_alerts Scan causal memory for watchlist-worthy alerts: new events on tracked entities, unresolved important change... Read get_project_context Retrieve the stored project context (tech stack, architecture, conventions, etc.) and knowledge base stats.... Read get_quarterly_review Quarterly strategic view: monthly trends, category shifts, biggest milestones, velocity curve. Read get_recon_summary Get aggregated summary of all findings from a recon session. Groups findings by category (breaking changes,... Read get_regression_gate Check if the 3 canonical loops pass. Returns per-loop scores, overall pass/fail, and regression detection. Read get_repeat_cognition_metrics The key compound metric. Measures repeat question rate, manual reconstruction count, packet abandonment rat... Read get_role_packet_defaults Return the default packet, artifact, monitor, and delegation policy for a specific public role lens. Read get_self_directed_delivery_run Load a previously recorded autonomous delivery run with all stage receipts, summaries, and final recommenda... Read get_self_eval_report Generate a comprehensive self-evaluation report by cross-referencing all persisted data: verification cycle... Read get_sentinel_report Get the latest sentinel self-test report with all 9 probe results (build, e2e, design, dogfood, Read get_session_journal Get all tracked actions from the current or a specified session, in chronological order. Read get_session_profile Get the efficiency profile for the current session. Shows total calls, cost, latency, redundancy, and optim... Read get_shared_context_packet Fetch one shared-context packet directly and include the suggested resource URI plus pull/subscription filt... Read get_shared_context_peer Fetch one peer and its current summary, capabilities, and scopes. Read get_shared_context_snapshot Inspect the current shared-context protocol state: peers, packets, handoffs, messages, and aggregate counts. Read get_signal_recommendations Get founder-specific actionable recommendations from the latest signals. Each recommendation includes: what... Read get_source_trust_policy Return the source-level permission and trust policy for storage, summarization, and export across private a... Read get_state_diff_history Get the change history for a specific entity, showing all recorded state diffs in reverse chronological order. Read get_subconscious_hint Get the subconscious Read get_sync_bridge_status Inspect whether the device is local-only, connected but idle, or actively ready to sync to a paired web acc... Read get_system_pulse Get a real-time health snapshot of the NodeBench MCP system. Returns database status, Read get_tool_graph Returns a JSON graph of tool relationships that grows with usage. Read get_tool_quick_ref Get the quick reference for a specific tool: what to do next, related tools, methodology, and tips. Call th... Read get_trajectory_analysis Analyze tool usage trajectories across sessions. Returns tool frequency, error rates, average duration, pha... Read get_trajectory_summary Compute a trajectory summary for a date range: event counts by type, diff counts by change type, path step ... Read get_traversal_plan Generate a traversal plan for accomplishing a goal across multiple views. Read get_uptime_stats Get session uptime, tool call rates, error trends, and top tools over multiple time windows Read get_usage_insights Get aggregate usage insights across all sessions. Shows top tools, repeated queries, cost breakdown, and op... Read get_verification_status Get the current status of a verification cycle including all phases, gaps, and test results. Use this to un... Read get_view_capabilities Get full capabilities for a specific view — actions, data endpoints, Read get_view_state Get the current agent traversal session state — which view you Read get_watchdog_log Get recent watchdog check results. Shows health score trend, detected issues, and auto-healed Read get_weekly_summary Summarize a week Read get_workflow_chain Get a recommended tool sequence for a common workflow. Returns step-by-step tool chain with actions for eac... Read get_workflow_history Returns past runs for a specific workflow with scores, grades, step counts, and durations. Use to track con... Read getMethodology Get step-by-step guidance for a development methodology. Topics: verification, eval, flywheel, mandatory_fl... Read goal What the spreadsheet workflow should achieve Read grade_fraction_quiz_from_image Grade a fraction quiz shown in an image by OCRing the problems + student answers, computing correct answers... Read graphify_status Check if graphify is installed and ready. Returns version, installation instructions if missing. Read harness_get_mission_status Get full mission execution status: run info, subtask states, Read harness_list_runs List all mission runs with status summary. Read has_regression_test Every bug fix ships with a test that would have caught it Read hasHook First sentence must be a concrete claim, surprising stat, or contrarian take — not a label Read hasOpinion Must contain at least one first-person interpretive statement ( Read hasQuestion Must contain at least one genuine question to the audience (not rhetorical). Questions drive comments. Read heading_hierarchy Heading levels (h1-h6) are sequential and not skipped (no h1→h3 without h2) Read heartbeat_shared_context_peer Refresh peer liveness and optionally publish a compact machine-readable summary of its current work. Read hiring_gap_scan Identify the most obvious missing hiring lane for the current founder direction. Read important_change_review Change review: get alerts → synthesize → track action → track milestone Read ingest_claude_code_sessions Scan Claude Code JSONL transcripts from ~/.claude/projects/. Returns session summaries with turn counts, to... Read ingest_codebase_changes Fingerprint key files in a directory (package.json, README, schema, CLAUDE.md, etc.) and detect what change... Read ingest_dive_screenshots Scan a directory for PNG/JPG screenshot files and bulk-import them into the dive session Read ingest_temporal_observation Ingest a raw observation into the temporal substrate (timeSeriesObservations). Supports numeric, categorica... Read inject_context_into_prompt Wraps a user prompt with NodeBench Read invalidate_shared_context Invalidate a packet when it becomes stale, contradicted, or superseded. Read investigate Investigate a company, person, or topic and return a concise sourced artifact. This is the default v3 entry... Read judge_request_retry Request a retry, re-plan, escalation, or stop for a failed subtask. Read judge_session Score a dogfood session on 6 dimensions (1-5 each): truth, compression, anticipation, output, delegation, t... Read judge_verify_subtask Judge verifies a subtask Read keyboard_navigable All interactive elements reachable via Tab, activated via Enter/Space Read landmark_regions_present Page uses semantic landmarks (main, nav, aside, header, footer) or ARIA roles for screen reader navigation Read learnings_banked Agent recorded discoveries as persistent learnings (record_learning called) Read learnings_documented Edge cases and gotchas from this work are recorded as learnings Read list_agent_tasks List all current task claims across parallel agents. Shows who is working on what, blocked tasks, and recen... Read list_available_toolsets List all available toolsets showing which are currently loaded and which can be dynamically added. Includes... Read list_available_views List all available views in the NodeBench AI frontend with titles, Read list_contradictions List all open contradictions between memory blocks and graph entities. Read list_dimension_evidence List the durable evidence rows behind a DeepTrace dimension profile. Useful for auditing why a score or ava... Read list_dimension_interactions List stored interaction effects for an entity, such as capital plus investor quality reducing execution fra... Read list_dimension_snapshots List historical DeepTrace dimension snapshots for an entity to inspect regime transitions over time. Read list_driver_tools List all tools available from connected MCP drivers. Shows tool names, descriptions, and input schemas. Use... Read list_eval_runs List recent eval runs with their aggregate scores. Use this to track quality over time and detect drift. Read list_extracted_skill_templates List reusable skill templates automatically extracted from successful harness runs. Read list_founder_issue_packets List founder issue packets from shared context by workspace, producer, status, or strategic angle. Read list_learnings [DEPRECATED: Use search_all_knowledge instead] List stored learnings. PREFER search_all_knowledge for unifi... Read list_openclaw_channels List messaging channels you can send through. Read list_pending_sync_operations List queued outbound sync operations that still need to be pushed to the web account. Read list_self_directed_delivery_runs List recent autonomous delivery runs so operators can reopen or compare them. Read list_shared_context_peers List peers by product, workspace, role, surface, capability, or scope. Read list_skills List all registered skills with their freshness status, source files, Read list_stale_packets List memory blocks and packets that need refresh. Read list_verification_cycles List all verification cycles, optionally filtered by status. Use this to find a cycle ID or review past ver... Read list_webmcp_tools List all tools discovered from connected WebMCP origins. Shows tool names, descriptions, and input schemas.... Read list_workspace List files in the agent workspace. Shows folder tree with file sizes and dates. Read load_diligence_pack Load the vertical diligence pack for the current direction. Read load_session_notes Load session notes from the filesystem. Use after context compaction, /clear, or session resume to recover ... Read load_toolset Dynamically load a toolset into the current session. After loading, the tools become immediately available ... Read loading_skeleton_present Async-loaded content shows a skeleton/placeholder instead of empty space or layout jump Read loading_states_handled Async operations show loading indicator, error state, and empty state Read log_benchmark_milestone Record completion of a benchmark milestone. Tracks which milestones the agent achieved, time taken, tools u... Read log_context_budget Track context window usage to prevent pollution. LLM agents have finite context and, as Anthropic Read log_gap Record a gap found during Phase 2 (Gap Analysis). Gaps are categorized by severity: CRITICAL (protocol viol... Read log_phase_findings Record findings for the current phase of a verification cycle. Advances the cycle to the next phase if the ... Read log_recon_finding Record a finding from reconnaissance research. Link it to a recon session and categorize it. Use for both e... Read log_test_result Record a test result for Phase 4 (Testing & Validation). Tests are organized by layer: static, unit, integr... Read log_tool_call Log an MCP tool call for profiling. Called automatically by the NodeBench gateway to track tool usage patte... Read manipulate_screenshot Manipulate a screenshot using sharp (image processing). Supports crop (extract a region), resize, and annot... Read meeting_notes_extract_decisions Extract decisions, owners, and follow-ups from raw meeting notes. Read mine_session_patterns Analyze tool_call_log and learnings tables to extract recurring success/failure sequences across sessions. ... Read monitor_repo Track a GitHub repository Read multi_criteria_score Deterministic weighted multi-criteria decision analysis (MCDM). Takes options with numeric values per crite... Read nb_engage_feed_item Record engagement on feed item Read nb_filter_by_stage Filter by funding stage Read nb_get_agent_status Get agent thread status Read nb_get_feed_items Get personalized feed items Read nb_get_funding_brief Get funding intelligence Read nb_get_leaderboard Get model leaderboard Read nb_get_pr_status Get PR status Read nb_get_qa_results Get QA pipeline results Read nb_get_signal_detail Get signal details Read nb_get_signals Get latest research signals Read nb_list_agents List agent templates and active threads Read nb_list_deals List funding deals Read nb_list_documents List workspace documents Read nb_list_events List calendar events Read nb_list_repos List tracked repos Read nb_list_scenarios List eval scenarios Read nb_list_signals List public signals Read nb_search_documents Search document content Read nb_search_research Search research signals and briefings Read nb_view_screenshots Get route screenshots Read no_console_errors Browser console has zero errors/warnings from changed code Read no_critical_gaps No CRITICAL or HIGH gaps remain open Read no_forbidden_behaviors Agent did not perform forbidden actions (unsafe operations without risk assessment, skipping tests, hardcod... Read no_hardcoded_secrets No API keys, tokens, or passwords in code Read no_layout_shift No unexpected layout shifts (elements moving, resizing, or reflowing) compared to baseline Read no_lint_warnings Linter passes with zero warnings Read no_regressions Agent did not break existing functionality (full test suite still passes) Read no_todo_comments No TODO or FIXME comments left in changed files Read nodebench.activity_timeline Read the canonical activity ledger for a NodeBench report, including captures, notebook patches, graph clic... Read nodebench.capture Persist a messy event capture into the active NodeBench event workspace without live paid search. Uses even... Read nodebench.expand_resource Expand a Nodebench resource URI (nodebench://org/{key}) by one ring using the requested lens + depth. Retur... Read nodebench.notebook_append Append reviewed text into a NodeBench report notebook through the same Convex-backed report notebook persis... Read noGenericHashtags Must NOT use #AI, #TechIntelligence, #DailyBrief alone — use specific hashtags tied to the content. Read noReportHeader First 2 lines must NOT be a title card (Daily Intelligence Brief, VC DEAL FLOW MEMO, etc.) Read noWallOfText No more than 3 consecutive structured blocks (bullet lists, headers). Break with a 1-sentence human observa... Read onboarding Get started with NodeBench Development Methodology MCP. Shows first-time setup steps and key tools. Read oracle-test-harness Set up oracle-based testing for a component. Compares your implementation Read oracleSource Where the known-good reference comes from (e.g. Read orchestrating-swarms Master multi-agent orchestration using Claude Code Read overstory_fleet_status Read the Overstory multi-agent fleet status — active agents, capabilities, health, worktree state. Read overstory_mail_log Read recent Overstory QA mail messages — capture-complete, stability results, Read overstory_qa_summary Aggregate QA gate results across all routes — per-route stability grades, Read partnership_target_map Map likely partnership targets and why they fit the current wedge. Read pdf_search_text Search text inside a local PDF over selected pages. Returns page numbers and bounded snippets around matche... Read pixel_diff_within_threshold Visual diff between baseline and current screenshot is below 2% changed pixels Read polish_academic_text Deep-polish academic text for top-venue quality (NeurIPS, ICLR, ICML, ACL). Handles English and Chinese pap... Read pre_delegation Delegation prep: synthesize → export → track intent → track action Read predict_risks_from_patterns Given a task description, predict likely failure modes based on historically similar sessions. Searches the... Read projectGoal The overall project goal the team is working toward (e.g. Read projectPath Absolute path to the target project root (e.g. Read propose_shared_task Propose a task handoff between peers with input contexts and required output packet shape. Read pull_profile Pull the current company profile and saved context from NodeBench AI into Claude Code. Returns the latest s... Read pull_report Pull a saved report from NodeBench AI into Claude Code. Returns the full report with sections, sources, and... Read pull_shared_context Pull shared-context packets by type, producer, scope, workspace, status, or subject substring. Read quality_gate_enforced Agent ran a quality gate before declaring done (run_quality_gate called) Read query_daily_brief Get today Read query_funding_entities Search funding intelligence from the Convex platform. Filter by company name, round type, or get recent eve... Read query_graphify Query an existing graphify knowledge graph. Find nodes by label, explore connections, Read query_research_queue View the research task queue from the Convex platform. Shows active and pending research topics with priori... Read query_temporal_signals Search and retrieve temporal signals with filtering by entity, signal type, status, and date range. Returns... Read query_view_data Query data from a view Read queue_sync_operation Queue an explicit outbound sync operation when a custom workflow needs to push metadata, receipts, or appro... Read rank_interventions Rank potential interventions by expected trajectory delta. Each intervention includes expected impact, conf... Read rate_packet_usefulness Rate a packet Read read_csv_file Read a local CSV file and return a bounded table preview (headers + rows). Deterministic, no network. Read read_docx_text Extract text from a local DOCX (Office OpenXML) file. Deterministic, no network. Read read_emails Read emails from an IMAP mailbox over TLS. Requires EMAIL_USER and EMAIL_PASS env vars. Defaults to Gmail I... Read read_image_ocr_text Extract text from a local image (PNG/JPG/etc) using OCR (tesseract.js). Deterministic, no network. Read read_json_file Read a local JSON file and return a bounded JSON preview (depth/item/string truncation). Deterministic, no ... Read read_jsonl_file Read a local JSONL file and return bounded parsed rows. Deterministic, no network. Read read_pdf_text Extract text from a local PDF file for selected pages. Returns bounded text with page markers. Deterministi... Read read_pptx_text Extract text from a local PPTX (Office OpenXML) file. Deterministic, no network. Read read_text_file Read a local text file (txt/md/xml/json/etc) and return a bounded text slice. Deterministic, no network. Read read_workspace_file Read a file from the agent workspace. Returns content for text files, metadata for media files. Read read_xlsx_file Read a local XLSX workbook and return a bounded sheet preview (headers + rows). Deterministic, no network. Read readiness_scan Run a founder readiness scan against the progression and diligence model. Read record_eval_result Record the actual result for a specific eval case. Include what happened, the verdict (pass/fail/partial), ... Read record_event Record a typed event to the causal event ledger. Supports causal linking via causedByEventId and correlatio... Read record_execution_decision Record a structured decision on a live execution trace without storing raw hidden reasoning. Use for rankin... Read record_execution_step Record a structured execution step receipt on a live execution trace. Use this for meaningful actions like ... Read record_execution_verification Record a verification result on a live execution trace. Use for render checks, formula checks, diff checks,... Read record_fix_attempt Record a fix attempt with replay proof and regression protection description. Links to a failure case. Read record_learning Store an edge case, gotcha, pattern, or regression discovered during verification. Learnings are searchable... Read record_manual_correction Track a human correction to agent output. Every correction is evidence of a system gap — the system should ... Read record_path_step Record a navigation/exploration step in the user Read record_provenance_receipt Persist a durable execution receipt for a tool call, approval, verification, or other meaningful action. Read record_repeated_question Track a question the user asked that NodeBench should have already known. This is the core failure signal —... Read record_state_diff Record a before/after state change on an entity. Tracks what changed, which fields, and why. Read record_sync_artifact Persist a local artifact with verification state so it can be replayed, reviewed, and optionally synced to ... Read record_sync_outcome Persist an outcome with user value, stakeholder value, evidence, and status so the system always resolves w... Read reduced_motion_respected Animations/transitions honor prefers-reduced-motion media query or provide a UI toggle Read refresh_subconscious Get a summary of all subconscious memory blocks and their status. Read refresh_task_context Re-inject the current task context to combat attention drift. After 30+ tool calls, models lose sight of or... Read regression_guards_created Agent created eval cases or tests that would catch the bug if it reappeared Read release_agent_task Release a task lock after completing work. Updates status and optionally records a progress note for the ne... Read render_decision_memo Render a 1-page executive decision memo from a completed Deep Sim analysis. Combines claim graph, variables... Read render_flow_visualization Render flow visualization with colored bounding boxes for each flow group. Supports overlay on a rendered p... Read report Produce a human-readable artifact for either a research topic or a decision. If recommendation inputs are p... Read responsive_breakpoints_intact Layout is correct at mobile (375px), tablet (768px), and desktop (1280px) viewpoints Read responsive_check UI works at mobile (375px), tablet (768px), and desktop (1280px) breakpoints Read retention_get_status Return the latest retention.sh connection and recent event history from local MCP state. Read retention_status Check retention.sh connection status and QA metrics. Shows team code, QA score, member count, and last sync... Read retention_sync Sync data between NodeBench Delta and retention.sh. Pushes delta packets as team context and pulls QA findi... Read retention_sync_findings Sync retention.sh QA findings, scores, and token savings into local MCP state. Read review_paper_as_reviewer Simulate a peer reviewer evaluating a paper for a top venue. Default: harsh mode with rejection mindset — o... Read review_pr_checklist Structured PR review checklist with verification/eval cross-reference. Validates PR title format, descripti... Read risk_assessed Agent assessed risk before making changes (assess_risk called) Read riskLevel Expected risk level: low, medium, or high Read route_founder_packet Route a founder/company request into the canonical company mode, packet type, artifact type, and next actio... Read runway_check Basic runway check that translates cash and burn into months remaining and flags risk. Read sandbox_ingest Index arbitrary text into the context sandbox (FTS5). Raw content stays in SQLite — only a compact referenc... Read sandbox_search BM25-ranked full-text search across all sandboxed content. Pass multiple queries as an array to batch all q... Read sandbox_stats Show context savings for the current session — per-tool breakdown, total bytes indexed vs returned, savings... Read scaffold_directory Scaffold directory structure following OpenClaw patterns. Creates organized subdirectories and placeholder ... Read scaffold_openclaw_project Generate a starter project for OpenClaw + NodeBench. Read scan_capabilities Analyze a source file for structural code patterns. Returns a capability report showing what the code can s... Read scan_dependencies Scan a project Read scan_terminal_security Scan project files and dev environment for terminal security threats: Unicode homograph attacks, ANSI escap... Read scan_webmcp_origin One-shot scan: connect to a WebMCP-enabled site, discover tools, cache the manifest, and disconnect. Useful... Read scan_wordpress_updates Scan a WordPress site for plugin and theme versions, and optionally check for known vulnerabilities via the... Read score_compounding Compute the full 8-dimension trajectory score for an entity. Returns trust-adjusted compounding, drift, ada... Read score_scenario_branch Score a specific scenario branch against evidence and constraints. Read scrapling_batch_fetch Fetch multiple URLs in parallel with configurable concurrency. Use for competitive analysis, multi-source r... Read scrapling_crawl_status Check crawl progress and get collected items. Pass the session_id from scrapling_crawl. Returns status (run... Read scrapling_extract Extract structured data from a URL using CSS or XPath selectors. Zero LLM tokens — deterministic extraction... Read scrapling_fetch Fetch a URL with adaptive scraping. Auto-selects fetcher tier: Read scrapling_track_element Track an element across page versions using Scrapling Read search Search across live web results and stored NodeBench knowledge in one call. Use this instead of deciding bet... Read search_all_knowledge Search ALL accumulated knowledge in one call: learnings (edge cases, gotchas, patterns), recon findings (ac... Read search_content_archive Search past content by theme, title, or keywords using FTS5 full-text search. Use before generating new con... Read search_github Search GitHub repositories by query, topic, language, and star count. Useful for discovering libraries, fra... Read search_learnings [DEPRECATED: Use search_all_knowledge instead] Search past learnings. PREFER search_all_knowledge which sea... Read seo_audit_url Fetch a URL and analyze its SEO elements: title tag, meta description, Open Graph tags (og:title, og:descri... Read service_to_dashboard_path Map a concept from bespoke service work to a possible dashboard subscription path without losing the local-... Read session_memory_cycle Memory lifecycle: track intent → synthesize → summarize → recover → complete intent Read share_get_packet_link Retrieve a local share link record by share ID. Read shortest_valid_path Return the shortest valid workflow path for the stated objective. Read site_map Interactive site map with stateful drill-down. First call with { url } to crawl. Then use { action: Read sniff_record_human_review Record a human sniff-check for a subtask or merge output. Read solve_bass_clef_age_from_image Extract bass-clef note letters from a simple staff image and compute the derived Read solve_red_green_deviation_average_from_image Extract red and green numbers from an image, compute population stdev(red) and sample stdev(green), then re... Read solve_storage_upgrade_cost_per_file_from_image OCR plan tiers from an image, compute required storage from equally-sized file counts, and return average i... Read storybook_story_exists New/changed components have a Storybook story for documentation and visual testing Read strategicQuestion The product-direction or capability question being answered Read structured_recon Agent performed structured reconnaissance before implementation (run_recon or search_all_knowledge called) Read subagentCount Number of parallel subagents to coordinate (default: 3) Read subjectCompany Company being evaluated Read submission_readiness_score Score whether the company packet is ready for downstream submission or profile export. Read suggest_optimizations Analyze the current session and suggest specific optimizations: cheaper models, cached results, workflow sh... Read suggest_tests Generate scenario-based test suggestions from a site_map or diff_crawl session. Analyzes crawl findings and... Read summarize Turn raw context into a compact brief with key points and optional persistence. This is the fast human-read... Read summarize_session Summarize the current or specified session Read sync_company Push a company profile into NodeBench AI from Claude Code. Extracts company truth from a summary you provid... Read sync_daily_brief Sync daily brief + narrative data from Convex to local SQLite. Requires CONVEX_SITE_URL and MCP_SECRET envi... Read sync_figma_tokens Pull Figma design token variables from a Figma file and compare against the codebase Read sync_operator_profile Sync the Operator Profile to the local filesystem at ~/.nodebench/USER.md. Read sync_report Push a report artifact from Claude Code into NodeBench AI. The report is saved locally and published as a s... Read sync_skill Resync a stale skill after applying updates. Recomputes source hashes, updates Read synthesize_integration_proposal Synthesize an integration plan for an external tool, API, or framework. Read synthesize_recon_to_learnings Convert recon findings into persistent learnings. Recon findings are ephemeral research notes; learnings ar... Read system_observability System health check, drift detection, and auto-maintenance Read tag_ui_bug Tag a bug to a specific component (and optionally a specific interaction). Bugs are categorized by severity... Read task Delegated task description Read task_success Agent completed the task correctly (deterministic checks pass: tests, lint, type-check) Read taskDescription The overall task to split across parallel subagents (e.g. Read team_alignment_check Check whether the team is aligned on the wedge, next move, and moat story. Read techStack Target project Read tests_pass All existing test suites pass Read thompson_anti_elitism_lint Scan content for elitism, gatekeeping language, and intellectual intimidation. Deterministic banned-phrase ... Read thompson_pipeline End-to-end Thompson Protocol pipeline orchestrator. Takes a complex topic and runs it through all 4 agents ... Read thompson_quality_gate Deterministic 10-point quality gate for Thompson Protocol content. Produces a boolean checklist and overall... Read thompson_visual_map Generate precise visual prompts that map 1:1 with content analogies. No generic b-roll — every visual reinf... Read thompson-protocol The Thompson Protocol — Read three_layer_tests Agent ran tests at multiple layers (static + unit/integration + manual/e2e) Read toon_decode Convert TOON (Token-Oriented Object Notation) string back to JSON. Use this to parse TOON-encoded data from... Read toon_encode Convert JSON data to TOON (Token-Oriented Object Notation) format. TOON uses ~40% fewer tokens than JSON wh... Read track_action Record any significant action with before/after state, reasoning, and temporal metadata. Auto-captures sess... Read track_entity_changes Detect what changed for an entity since a given date. Read track_intent Track a user intent that should survive context window compaction. On Read transcribe_audio_file Transcribe a local audio file (MP3/WAV/etc) to text using faster-whisper via Python. Deterministic, no netw... Read translate_academic Translate academic text between Chinese and English, preserving LaTeX commands, citations, equations, and t... Read traverse_entity_graph Find all entities and relationships connected to a starting entity within N hops. Read traverse_feed Traverse content feeds with Moltbook-style sorting. Read triple_verify Run triple verification on agent implementation. V1: Internal codebase analysis. V2: External authoritative... Read underCharLimit Max 1500 chars for org page daily posts. Shorter posts get higher engagement. Read validate_agent_compatibility Run the agent validation harness — simulates how AI agents (Claude Code, Read validate_shortcut Validate that a proposed shortcut preserves output quality and visibility. Read verdict_is_defensible The final verdict includes a clear outcome, confidence, and the reasoning needed for human review. Read verify_concept_support Check if a source file contains all required code signatures for a concept. Provide a concept name and a li... Read visual_consistency Fonts, colors, spacing match existing design system and adjacent components Read watchlist_get_alerts Return watchlist entries with attached change summaries or non-zero alert counts. Read watchlist_list_entities List watched entities from the local founder watchlist. Read watchlist_refresh_entities Refresh watchlist timestamps and optionally attach change summaries for watched entities. Read web_search Search the web through NodeBench Read within_budget Agent completed within token/time budget (no runaway loops or excessive tool calls) Read workflow_adoption_scan Evaluate how naturally a direction fits current high-frequency user workflows, install surfaces, and mainte... Read workflowGoal What the workflow must accomplish Read xlsx_aggregate Aggregate values from a local XLSX (count/sum/avg/min/max) with optional filters. Deterministic, no network. Read zip_extract_file Extract a single file from a local ZIP archive to a local output directory (zip-slip safe). Deterministic, ... Read zip_list_files List entries in a local ZIP file. Deterministic, no network. Read zip_read_text_file Read a text file inside a local ZIP archive and return bounded text. Deterministic, no network.
WRITE 112 tools
Write add_forecast_evidence Add evidence to a forecast Write add_rss_source Register an RSS or Atom feed URL for monitoring. Stored in SQLite for persistent tracking. Validates the fe... Write archive_content Save generated content to the archive for deduplication and theme tracking. Prevents the engine from regene... Write aria_labels_complete All interactive elements (buttons, links, inputs) have accessible names via aria-label, aria-labelledby, or... Write assign_agent_role Assign a specialized role to the current agent session. Roles define focus area and behavioral instructions... Write attach_execution_evidence Attach evidence to a live execution trace. Use for URLs, uploaded files, screenshots, render outputs, and t... Write bootstrap_project Register or update your project Write broadcast_agent_update Broadcast a status update to all active agents. Unlike send_agent_message (point-to-point), this creates a ... Write complete_autonomy_benchmark Finalize an autonomy benchmark run. Computes final score, duration, tool usage stats, and comparison agains... Write complete_eval_run Finalize an eval run and compute aggregate scores. Returns pass rate, average score, failure patterns, and ... Write complete_execution_run Finish a live execution run by updating session status, trace status, and optional usage metrics. Use this ... Write complete_shared_task Complete a shared-context task and attach the output packet if one was produced. Write compute_ssim_analysis Compute block-based SSIM analysis on a set of frame images. Uses 8x8 blocks with parallel ProcessPoolExecut... Write configure_channel_preferences Set your messaging preferences: which channels to use first, Write connect_channels Connect to multiple information channels for aggressive information gathering. Channels: slack, telegram, d... Write connect_mcp_driver Connect to an external MCP server and make its tools available through nodebench-mcp. Predefined drivers: Write connect_webmcp_origin Connect to a WebMCP-enabled website via Playwright. Navigates to the URL, intercepts navigator.modelContext... Write create_forecast Create a new forecast with a question, resolution date, and criteria. Optionally set initial probability, b... Write create_proof_pack Assemble an immutable proof pack for verification. Bundles a checklist (pass/fail items), optional metrics ... Write create_task_bank Create or add to a fixed task bank for controlled agent evaluation. Each task defines: initial state (repo ... Write create_visual_pr End-to-end PR creation: exports screenshots, generates a rich markdown PR body with visual evidence (before... Write create_workspace_folder Create a subfolder within a workspace folder. Max 3 levels deep. Write csv_select_rows Select rows from a local CSV using deterministic filters. Returns bounded results (selected columns + match... Write decide_re_update Decide whether to update existing instructions or create new files. Implements Write delegate_founder_issue Create a bounded shared task handoff for a founder issue packet so the weak angle becomes assigned work. Write delta_memo Create a decision-ready memo artifact. Produces a delta.memo packet with recommendation, variables, scenari... Write disconnect_driver Disconnect from an external MCP driver and shut down its child process. Use this to clean up or to reconnec... Write disconnect_webmcp_origin Disconnect from a WebMCP origin and close the browser page. Use this to clean up resources or to reconnect ... Write dive_design_issue Tag a design inconsistency found during the dive. Covers visual problems like color mismatches, spacing dev... Write dive_generate_tests Generate Playwright regression test code from dive findings. Creates test cases from: bugs (verify the fix ... Write dive_save_screenshot Save a screenshot during a dive session. Pass base64 image data (from bridge Write draft_email_reply Structure an email thread for reply drafting. Parses the thread, extracts context (from, subject, date), an... Write enforce_merge_gate Pre-merge validation combining git state, verification cycles, eval runs, test results, and quality gates. ... Write export_artifact_packet Formats a Founder Artifact Packet or memo for export to a specific audience and format. Write export_crunchbase_profile Export a Crunchbase-like structured profile from the company packet. Write export_dimension_bundle Export the full DeepTrace dimension bundle for an entity: latest profile, snapshots, evidence, and interact... Write export_pitchbook_profile Export a PitchBook-like structured profile from the company packet. Write export_pr_screenshots Export before/after screenshot pairs from changelogs and fix verifications to a local directory. Screenshot... Write export_yc_application_context Export YC-style application context from the company packet. Write generate_academic_caption Generate academic figure or table captions following top-venue conventions. Handles Title Case for noun phr... Write generate_countermodels For every main thesis or scenario, generate serious alternative explanations with their own evidence and co... Write generate_flicker_report Generate visual flicker report from existing analysis data. Produces SSIM timeline chart (1200x400 PNG, PIL... Write generate_grid_collage Tile N screenshot images into a single grid collage PNG for visual inspection. Write generate_implementation_plan Generate a structured implementation plan for missing code signatures. Takes the gap analysis from verify_c... Write generate_parallel_agents_md Generate a portable, framework-agnostic AGENTS.md section for parallel agent coordination. Designed to be d... Write generate_plan_delegation_packet Convert a FeaturePlan into an agent-ready delegation packet Write generate_pr_report Generate a rich markdown PR body from a UI Dive session. Compiles visual changes (before/after screenshot c... Write generate_proposal_memo Render a FeaturePlan as a human-readable proposal memo. Write generate_report Compile structured findings, eval results, and quality gate data into a formatted markdown report. Useful f... Write generate_self_instructions Generate self-instructions for the agent in various formats: skills_md (SKILL.md), rules_md (RULES.md), gui... Write generate_team_install_plan Generate a practical install and rollout plan for a founder, solo developer, or small team using NodeBench ... Write generate_voice_scaffold Generate starter code for a voice bridge. Returns file contents, setup instructions, and dependency lists f... Write generate_zero_draft Auto-draft an artifact (slack message, email, spec doc, PR draft, architecture note, career plan, or conten... Write graphify_import_to_subconscious Import a graphify knowledge graph into NodeBench Write ingest_upload Ingest uploaded file content into the NodeBench entity intelligence system. Write install_nodebench_plugin Generate or write a starter .mcp.json entry for NodeBench MCP so a local team member can install the preset... Write json_select Select a sub-value from a local JSON file using a JSON Pointer (RFC 6901) and return a bounded preview. Det... Write manage_implementation_packets Create and manage implementation packets — structured instructions for Claude Code or other coding agents. Write manage_task_list Manage the workspace task list. Add, update, complete, delete, or list tasks. Write merge_compose_output Judge-gated merge of subtask artifacts into a composed output. Write merge_research_results Merge parallel sub-agent research results into a unified dataset. Takes arrays of records from multiple sou... Write nb_create_document Create new document Write nb_create_event Create calendar event Write nb_switch_research_tab Switch research hub tab Write nodebench.report_export_complete Complete a previously previewed NodeBench report export after review. Writes the export completion event to... Write nodebench.report_export_preview Prepare a reviewable NodeBench report export. Returns mapped contacts, companies, interactions, follow-ups,... Write open_core_boundary_advisor Advise what should stay open-core versus proprietary. Write open_dive_dashboard Open the NodeBench UI Dive dashboard in a browser. Shows the full flywheel cycle: Write open_local_dashboard Start the local Daily Brief dashboard server if needed, and return the URL. The dashboard shows Brief metri... Write open_operating_dashboard Start the Operating Dashboard — shows trajectory scores, event ledger, important changes, path replay, time... Write overstory_merge_queue Read the Overstory merge queue — pending merges, completed merges, Write parallel-agent-team Set up and coordinate a parallel agent team. Based on Anthropic Write plan_decompose_mission Decompose a mission into subtasks with verifiability routing. Write project-setup Guided project bootstrapping. Walks you through registering project context so the MCP has full project awa... Write projectName Name of the project to set up Write promote_to_eval Take findings from a completed verification cycle and promote them into eval test cases. This is how the in... Write publish_founder_issue_packet Turn the weakest founder-direction angle into a durable shared-context issue packet with lineage, proof lin... Write publish_shared_context Publish a structured shared-context packet with subject, claims, evidence refs, freshness, permissions, and... Write publish_to_queue Push content to the LinkedIn content queue on the Convex platform. Content goes through the engagement gate... Write register_component Register a UI component in the dive tree. Components form a hierarchy: page → section → form/modal/list → b... Write register_shared_context_peer Register a scoped peer with product, role, surface, capabilities, and heartbeat metadata for shared-context... Write register_skill Register a skill (rule/memory .md file) with its source documents, update triggers, Write reject_shared_task Reject a proposed shared-context task with a reason. Write research_job_market Research job market requirements for a given role or skill set. Provides guidance on in-demand skills, comm... Write resolve_forecast Resolve a forecast with an outcome. Auto-computes Brier and log scores for binary forecasts. Ambiguous outc... Write resolve_founder_issue Invalidate a founder issue packet and optionally publish a resolution packet so the issue lifecycle stays e... Write resolve_gap Mark a gap as resolved after implementing the fix. Returns remaining gap counts by severity. Write retention_register_connection Register a retention.sh team connection in local MCP state so QA findings and token savings can flow into f... Write save_research_resource Save a research resource with URL, source citation, tags, and notes. Write save_session_note Persist a critical finding, decision, or progress note to the filesystem. Notes survive context compaction ... Write send_agent_message Send a message to another agent by session ID or role. Enables asynchronous inter-agent communication for t... Write send_email Send an email via SMTP over TLS. Requires EMAIL_USER and EMAIL_PASS env vars. Defaults to Gmail SMTP (smtp.... Write send_openclaw_message Send a message through any connected channel. Write send_peer_message Send a direct structured message to a peer without routing everything through a central orchestrator. Write set_watchdog_config Configure the background watchdog that continuously monitors system health. Write setup_local_env Discover and diagnose the local development environment. Checks for available API keys, installed SDKs, Nod... Write setup_operator_profile Set up your profile to customize how the AI assistant works for you. Write share_create_packet_link Create a durable local share link record for a packet or founder memo so it can be rendered or synced later. Write smart_select_tools LLM-powered tool selection: sends your task description + a compact tool catalog to a fast model (Gemini 3 ... Write spreadsheet-enrichment-trace Traceable workflow for spreadsheet enrichment: inspect workbook, research supporting evidence, edit cells, ... Write synthesize_extension_plan Synthesize a plan for extending or deepening an existing feature. Write synthesize_feature_plan Synthesize a phased feature implementation plan conditioned on founder context, Write thompson_feynman_edit Skeptical Beginner editor — reviews Thompson-written content against 8 rejection criteria. Returns PASS/REW... Write thompson_write Transform complex content into Thompson Protocol format — plain English mandate, intuition-before-mechanics... Write update_agents_md Read, append, or update sections in the AGENTS.md file. This file contains instructions for AI agents worki... Write update_company_truth Update a subconscious memory block with new information. Write update_forecast_probability Update a forecast Write upsert_durable_object Register or update a durable local object so views, tools, workflows, runs, artifacts, and outcomes share o... Write watchlist_add_entity Add an entity to the local founder watchlist with alert preferences and optional strategic-angle linkage. Write workflowType Optional workflow label such as spreadsheet_enrichment or company_direction_analysis Write write_workspace_file Create or update a file in the agent workspace (~/.nodebench/workspace/). Write xlsx_select_rows Select rows from a local XLSX using deterministic filters. Returns bounded results (selected columns + matc...
EXECUTE 90 tools
Execute benchmark_models Run the same prompt against multiple LLM providers and compare responses. Returns side-by-side results with... Execute build_banking_packet Build a banker-readiness packet from the canonical company packet. Execute build_before_after_memo Build a memo showing the before and after path plus the validation rationale. Execute build_causal_chain Construct a causal chain from temporal observations. Nodes must be in chronological order. Each node repres... Execute build_claim_graph Extract claims from a source packet and link each claim to its evidence. Returns a directed graph of claims... Execute build_company_packet Build the canonical company readiness packet. Execute build_company_profile_starter Build a starter PitchBook/Crunchbase-like company profile. Execute build_diligence_packet Build a diligence-oriented export payload from the canonical company packet. Execute build_founder_operating_model Build the complete founder operating model: execution order, queue topology, packet routing, source trust p... Execute build_investor_packet Build an investor-oriented export payload from the canonical company packet. Execute build_research_digest Generate a digest of new (unseen) articles from RSS feeds. Compares against previously seen articles via SQ... Execute build_shared_context_subscription Build the exact pull/subscription manifest an agent client should use to watch a packet or packet scope. Execute build_shared_context_subscription_manifest Build a filtered snapshot/events/pull manifest for one peer, packet class, producer, scope, or subject so c... Execute build_slack_onepager Build a Slack-friendly one-page founder report. Execute build_submission_export Build a generic submission export from the canonical company packet. Execute build_temporal_graph Build a temporal relationship graph for an entity. Execute call_driver_tool Invoke a tool on a connected MCP driver. This proxies the call to the external MCP server (e.g. playwright-... Execute call_webmcp_tool Invoke a WebMCP tool on a connected origin. The tool is executed in the browser page context via page.evalu... Execute compare_eval_runs Compare two eval runs to decide whether a change should ship. Returns side-by-side scores and a deploy/reve... Execute compile_decision_packet Compile entity intelligence into a decision-ready packet. Execute compile_environment_spec Generate a simulation environment specification from entity intelligence. Execute compile_scenarios Generate 3-7 future scenario branches for an entity or decision. Execute compile_tension_model Model explicit tensions between forces for a decision or entity. Execute execution-trace-workflow Start and maintain a traceable execution run. Use this for any workflow that needs receipts, evidence, deci... Execute founder_direction_assessment Pressure-test a founder direction against team shape, AI stance, build speed, Execute grade_agent_run Grade a single agent run on both outcome quality (task success, regressions, time) and process quality (rec... Execute graphify_report Get the GRAPH_REPORT.md analysis from a graphify run. Contains god nodes (most connected), Execute gtm_script_builder Build a starter GTM script for the current founder wedge. Execute invoke_openclaw_skill Run an OpenClaw tool safely through security checks. Execute invoke_view_tool Invoke a per-view tool on the current or specified view. Execute judge_tool_output Run the 7-criterion LLM judge on a tool Execute link_durable_objects Create a durable relationship such as screen -> action, workflow -> run, run -> artifact, or outcome -> evi... Execute log_interaction Log and optionally auto-execute an interaction step. If the built-in Playwright browser is active (launched... Execute navigate_to_view Navigate to a specific view in the NodeBench AI frontend. Execute nb_start_agent Start new agent conversation Execute nodebench.research_run Start an adaptive, evidence-backed research run on one or more subjects (companies, people, events, topics)... Execute preconditions_verified Environment, auth, and test data preconditions were verified before the trigger step. Execute primary_mission_preserved The run stayed focused on the reported bug instead of drifting into unrelated exploration. Execute record_dogfood_telemetry Record a full telemetry row for a dogfood run. Captures surface, scenario, user role, prompt, tool usage, t... Execute request_execution_approval Request a human approval gate for a risky execution-trace action. Approval state is written onto the live r... Execute retry_budget_respected Retries were bounded and targeted at the failing trigger or precondition, not the whole workflow. Execute run_autonomous_loop Execute autonomous verification loop with stop conditions. Implements Ralph Wiggum pattern with checkpoints... Execute run_benchmark_batch Run a longitudinal benchmark batch. N=1 is a smoke test (1 founder, 1 session). Execute run_browserstack_benchmark_lane Return a BrowserStack/browser-automation benchmark lane payload. Execute run_closed_loop Track a compile-lint-test-debug closed loop iteration. Record the result of each step. Never present change... Execute run_code_analysis Static analysis on code or text content for security issues, secrets, homograph attacks, ANSI injections, s... Execute run_competitor_signal_benchmark Return a competitor-signal-to-response benchmark lane payload. Execute run_deep_sim Run a multi-agent scenario simulation with bounded branching and budget controls. Instantiates agents with ... Execute run_dogfood_batch_with_judge Execute the priority 3 dogfood scenarios with automatic LLM judge validation. Execute run_entity_intelligence_mission Run a full DeepTrace entity intelligence mission with optional bounded research cell. Unifies relationship ... Execute run_flicker_detection Run full 4-layer Android UI flicker detection pipeline: SurfaceFlinger stats + logcat (L0), screenrecord (L... Execute run_founder_autonomy_benchmark Run the weekly founder reset autonomy benchmark lane. Execute run_graphify Generate a knowledge graph from a folder of code, docs, papers, or images. Execute run_judge_loop Execute a full judge-fix-verify loop: calls a tool, judges the output, and if it fails, Execute run_mandatory_flywheel Enforce the mandatory 6-step AI Flywheel verification after any non-trivial change. All 6 steps must pass b... Execute run_oracle_comparison Compare actual output against a known-good oracle reference. Based on Anthropic\ Execute run_packet_to_implementation_benchmark Return a packet-to-implementation benchmark lane payload. Execute run_quality_gate Evaluate content or code against a set of boolean rules. Returns pass/fail with specific failures listed. T... Execute run_recon Start a reconnaissance research session. Use this at the start of Phase 1 (Context Gathering) to organize r... Execute run_research_cell Run a bounded re-analysis cell for a DeepTrace entity investigation. Queries existing DeepTrace state throu... Execute run_self_directed_delivery_loop Run a local-first autonomous delivery loop across exploratory research, planning, implementation commands, ... Execute run_self_heal Autonomous self-healing for detected drift issues. Fixes orphaned verification cycles Execute run_self_maintenance Run autonomous self-maintenance cycle. Checks TypeScript compilation, documentation sync, tool counts, test... Execute run_signal_sweep Run a live signal sweep across all data sources (HackerNews, GitHub Trending, Yahoo Finance, ProductHunt). ... Execute run_sync_bridge_flush Open the outbound websocket bridge, pair or resume the local device, and flush pending approved operations ... Execute run_tests_cli Execute a shell test command with timeout, capture stdout/stderr, and return structured results. Useful for... Execute run_visual_qa_suite End-to-end visual QA pipeline: burst capture → SSIM stability analysis → Execute sandbox_batch Execute multiple commands, index all outputs, and run multiple search queries — all in ONE call. This is th... Execute sandbox_execute Run a shell command, automatically index the output into the sandbox, and return only a summary. The raw st... Execute scaffold_nodebench_project Create a complete project template pre-configured for nodebench-mcp. Generates: package.json, AGENTS.md, .m... Execute scaffold_research_pipeline Generate a complete, standalone Node.js project for an automated research digest pipeline. Creates: package... Execute scrapling_crawl Start a multi-page spider crawl with extraction. Crawls from start URLs, follows links matching a CSS selec... Execute scrapling_crawl_stop Stop a running crawl session. Pass the session_id from scrapling_crawl. Items collected so far are preserve... Execute self_implement Self-implement missing agent infrastructure. Generates implementation plan and code templates for: agent_lo... Execute simulate_decision_paths Run Monte Carlo simulation for founder decisions. Generates multiple random paths to visualize possible fut... Execute solve_green_polygon_area_from_image Compute the area of a green filled polygon in an image by pixel segmentation, calibrating pixel-to-unit sca... Execute spawn_openclaw_agent Start a secure OpenClaw session with safety rules applied. Execute start_autonomy_benchmark Start an autonomous capability benchmark. Defines a complex build challenge and tracks agent progress throu... Execute start_component_flow Claim a component for traversal by a specific subagent. Marks it as Execute start_dogfood_session Start a new dogfood session for one of the 3 canonical loops (weekly_reset, pre_delegation, company_search)... Execute start_eval_run Start a new eval run. Define the test batch upfront with test cases (input, intent, expected behavior), the... Execute start_execution_run Start a live Convex-backed execution trace run for a workflow. Creates a task session and trace together so... Execute start_ui_dive Initialize a UI/UX Full Dive session. Auto-launches a headless Playwright browser if installed (zero setup)... Execute start_verification_cycle Start a new 6-phase verification cycle for a non-trivial implementation. Returns the cycle ID and Phase 1 i... Execute track_milestone Record a significant milestone (phase complete, deploy, ship, launch, pivot, decision) with optional eviden... Execute trigger_batch_run Run a scheduled task right now instead of waiting. Execute trigger_investigation When an eval run shows regression, trigger a new verification cycle to investigate. This is how the outer l... Execute trigger_verify_split The action that attempts reproduction is separate from the step that verifies the resulting UI or system st... Execute ui-qa-checklist UI/UX QA checklist for frontend implementations. Run after any change that touches React components, layout... Execute workflowTitle Human-readable title for the run

The managed route: connect Nodebench through the PolicyLayer gateway — every tool call above is checked against your policy before it runs, with a full audit log.

DIRECT INSTALL (UNMANAGED) npx -y nodebench-mcp
How many tools does the Nodebench MCP server have? +

The Nodebench MCP server exposes 724 tools across 4 categories: Read, Write, Destructive, Execute.

How do I enforce policies on Nodebench tools? +

Route the Nodebench server through the PolicyLayer gateway. Define allow, deny, or approval rules per tool in the dashboard — they are enforced on every call before it reaches the server.

What risk categories do Nodebench tools fall into? +

Nodebench tools are categorised as Read (513), Write (112), Destructive (9), Execute (90). Each category has a recommended default policy.

Let agents act without letting them run wild.

Route your MCP servers through PolicyLayer and every tool call is checked against your policy before it runs — allow, deny, or require approval. Per-identity grants. Full audit log. Live in minutes.

Free to start. No card required.

4,600+ MCP servers and 31,000+ tools scanned and risk-classified.

// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.