ScreenHand

89 tools. 55 can modify or destroy data without limits.

2 destructive tools with no built-in limits. Policy required.

Last updated:

55 can modify or destroy data
34 read-only
89 tools total

Community server · catalogue entry verified 12/06/2026

How to control ScreenHand ↓

What ScreenHand exposes to your agents

Read (34) Write / Execute (53) Destructive / Financial (2)
Critical Risk

The most dangerous ScreenHand tools

55 of ScreenHand's 89 tools can modify, destroy, or commit something on every call — and an agent calls them with no built-in limits.

How to control ScreenHand

PolicyLayer is an MCP gateway — it sits between your AI agents and ScreenHand, and nothing reaches the server without passing your rules. These are the rules we recommend:

Deny destructive operations
{
  "recording_cancel": {
    "deny_if": [
      {
        "conditions": [],
        "on_deny": "Blocked by default. Requires approval."
      }
    ]
  }
}

Destructive tools should never be available to autonomous agents without human approval.

Rate limit write operations
{
  "ingest_documentation": {
    "limits": [
      {
        "counter": "ingest_documentation_per_hour",
        "window": "hour",
        "max": 30,
        "scope": "grant"
      }
    ]
  }
}

Prevents bulk unintended modifications from agents caught in loops.

Cap read operations
{
  "app_list": {
    "limits": [
      {
        "counter": "app_list_per_minute",
        "window": "minute",
        "max": 60,
        "scope": "grant"
      }
    ]
  }
}

Controls API costs and prevents retry loops from exhausting upstream rate limits.

  1. Create a free account and register ScreenHand — nothing to install.
  2. Add these rules — paste them, or build them visually. Tune the limits to your setup.
  3. Point your MCP client (Claude, Cursor, anything) at your gateway URL.
ENFORCE POLICY ON SCREENHAND →

Free to start. No card required.

All 89 ScreenHand tools

EXECUTE 50 tools
Execute applescript Run an AppleScript command. For controlling Finder, Safari, Mail, Notes, etc. (macOS only). WARNING: Executes Execute app_launch Launch a macOS/Windows application by bundle ID (e.g., Execute browser_js Execute JavaScript in a Chrome/Electron tab. Returns the result. WARNING: This runs arbitrary JS in the browse Execute browser_navigate Navigate the active Chrome/Electron tab to a URL Execute browser_wait Wait for a condition on a Chrome/Electron page Execute launch Launch an application by bundle ID Execute navigate Navigate a browser to a URL, or open an app via Execute observer_start Start the observer daemon to continuously watch an app window. Captures frames via CGWindowListCreateImage, ru Execute observer_stop Stop the observer daemon. Execute orchestrator_start Start the multi-agent orchestrator daemon. Manages parallel worker slots: web tasks (CDP) run in parallel, nat Execute orchestrator_stop Stop the orchestrator daemon. Running tasks finish before exit. Execute playbook_record Macro recorder: start/stop/trim/clean recorded playbooks. Use Execute playbook_run Execute a saved playbook by ID or auto-match by task description. Playbooks run deterministically without AI c Execute recording_start Start recording user actions to auto-generate a playbook. Do the task manually while recording, then call reco Execute recording_stop Stop recording and save the captured actions as a new playbook. Execute session_start Start a new automation session. Returns a sessionId needed by all other tools. Automatically attaches to the f Execute task_run Run a complete task autonomously. Starts an observe→decide→act loop that uses the accessibility tree (not scre Execute wait_for Wait for a condition: element appears/disappears, text appears, URL changes, window title matches, etc. Execute wait_for_state Wait until a condition is met on screen: text appears, text disappears, or element becomes available. Polls at Execute app_focus Bring a running application to the foreground. Execute browser_stealth Inject anti-detection patches into Chrome/Electron page. Call once after navigating to a protected site. Hides Execute flick Fast swipe/flick gesture (for iOS home gesture etc) Execute focus Focus/activate an application by bundle ID Execute key Press a key combination Execute platform_explore Autonomously explore an app or website. Maps all interactive elements, tries each one, records working selecto Execute watch_dialog Register a dialog watch rule: when a dialog matching the pattern appears, auto-execute an action. Execute watch_register Register a watch rule: when element with matching title appears, execute an action. Use for automated response Execute watch_start Start the state watcher polling loop. Evaluates registered watch rules every 2s against the world model. Execute watch_stop Stop the state watcher polling loop. Execute ax_press Find a UI element by title and press/click it via accessibility Execute browser_click Click an element in Chrome/Electron by CSS selector. Uses CDP Input.dispatchMouseEvent for realistic mouse eve Execute browser_fill_form Fill a form field with human-like typing (anti-detection). Uses real keyboard events via CDP Input domain. Execute browser_human_click Alias for browser_click — both use realistic mouseMoved → mousePressed → mouseReleased events. Prefer browser_ Execute browser_open Open a URL in Chrome/Electron (creates new tab) Execute browser_type Type into an input field in Chrome/Electron. Uses CDP Input.dispatchKeyEvent for real keyboard events (works w Execute click Click at screen coordinates Execute click_text Find text on a window via OCR and click it. Handles Retina + shadow coordinate mapping. Execute click_with_fallback Click a target by text using the canonical fallback chain: AX → CDP → OCR. Automatically retries and falls thr Execute drag Drag from one point to another (slow, smooth) Execute key_combo Send a keyboard shortcut. Keys: Execute menu_click Click a menu item in an app Execute orchestrator_submit Submit a task to the orchestrator. Web tasks (CDP) run in parallel, native tasks queue per-app. Returns immedi Execute press Click/press a UI element. Finds the element by text, role, selector, or coordinates, then clicks it. Execute scroll Scroll at a position Execute scroll_with_fallback Scroll within an element or the active window using the canonical fallback chain: AX → CDP → coordinates. Scro Execute select_with_fallback Select an option from a dropdown/menu using the canonical fallback chain: AX → CDP. Finds the control, opens i Execute type_into Type text into a UI element (text field, search box, etc). Locates the field, optionally clears it, then types Execute type_text Type text using keyboard Execute type_with_fallback Type text into a target field using the canonical fallback chain: AX → CDP → coordinates. Finds the field by l Execute ui_press PREFERRED: Find and press/click a UI element by its title via Accessibility. Faster and more reliable than cli
READ 34 tools
Read app_list List all running applications with their bundle IDs, names, and PIDs. Read apps List all running applications Read ax_find Find a UI element by text/title in an app Read ax_tree Get the accessibility UI tree of an app Read browser_dom Query the DOM of a Chrome/Electron page. Returns matching elements Read browser_page_info Get current page title, URL, and text content summary Read browser_tabs List all open Chrome/Electron tabs. Use cdpPort to connect to a specific app (e.g. 9333 for Codex Desktop). Read coverage_report Check what ScreenHand knows about an app: shortcuts, selectors, flows, playbooks, error patterns, and stabilit Read discover_features Extract features from an app Read element_tree Get the accessibility element tree of the current app. Useful for understanding the UI structure and finding e Read extract Extract data from a UI element. Returns text content, table data, or structured JSON from the element. Read ingest_tutorial Extract structured playbook steps from a video transcript (e.g. YouTube captions). Converts tutorial narration Read locate_with_fallback Find an element Read map_app Visually map an app Read observer_ocr_roi Submit a targeted ROI OCR command to the running observer daemon. The daemon captures the window region, runs Read observer_status Get observer daemon status — frames captured, OCR text, popup detection. Read ocr OCR a window with element positions. SLOW — prefer ui_tree for structured element discovery. Use OCR only for Read ocr_regions Screenshot + OCR with detailed region positions (bounds, confidence) Read orchestrator_status Get orchestrator status — worker slots, task queue, active/completed tasks. Read platform_guide Get automation guide for a platform (selectors, URLs, flows, error solutions). Reads from references/ (curated Read platform_learn Scrape official docs, help center, keyboard shortcuts for a platform. Crawls pages via Chrome and extracts str Read playbook_list List all available playbooks with their IDs, names, platforms, and success rates. Read playbook_preflight Quick feasibility check before automating a platform. Scans the page for known blockers (captchas, WebGL, ifra Read read_with_fallback Read text content from the screen or a specific element using the canonical fallback chain: AX → CDP → OCR. Re Read recording_status Check if recording is active and how many events captured so far. Read scan_menu_bar Scan an app Read screenshot Screenshot a window (or full screen) and OCR it. Returns visible text. Read screenshot_file Take a screenshot and return the file path (for viewing the actual image) Read ui_find Find a specific UI element by text, title, or value. Falls back to value search if title match fails (e.g. fin Read ui_tree PREFERRED: Get the full UI element tree of an app via Accessibility. ~50ms, no screenshot/OCR. Use this FIRST Read watch_status Get all registered watch rules and their fire counts. Read window_list List all visible windows with their titles, positions, and sizes. Read windows List all visible windows with IDs and positions Read execution_plan Show the execution plan for an action type. Returns the ordered fallback chain based on available infrastructu

Related servers

Other MCP servers with similar tools — same risk classification, starter policies for each.

Questions about ScreenHand

Can an AI agent delete data through the ScreenHand MCP server? +

Yes. The ScreenHand server exposes 2 destructive tools including recording_cancel, watch_unregister. These permanently remove resources with no undo. PolicyLayer blocks destructive tools by default so they never reach the upstream server.

How do I prevent bulk modifications through ScreenHand? +

The ScreenHand server has 3 write tools including ingest_documentation, export_playbook, ui_set_value. Set a rate limit in your policy -- for example, 10 calls per hour prevents an agent from making more than 10 modifications per hour. PolicyLayer enforces this at the gateway, before calls reach ScreenHand.

How many tools does the ScreenHand MCP server expose? +

89 tools across 4 categories: Destructive, Execute, Read, Write. 34 are read-only. 55 can modify, create, or delete data.

How do I enforce a policy on ScreenHand? +

Register the ScreenHand MCP server in PolicyLayer, apply the suggested rules above (adjust the limits to your use case), and point your AI client at the PolicyLayer proxy URL instead of the server directly. Your agents keep the same tools; PolicyLayer evaluates every call against policy before it executes. Nothing to install, live in minutes.

Enforce policy on every ScreenHand tool call.

Deterministic rules across all 89 ScreenHand tools. Per-identity grants. Full audit log. Live in minutes. Nothing to install.

Free to start. No card required.

89 ScreenHand tools catalogued and risk-classified — across an index of 43,000+ MCP servers.

// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.