ScreenHand MCP Server: 89 Tools with Risk Policies

// ALL 89 SCREENHAND TOOLS

READ 34 tools

Read app_list List all running applications with their bundle IDs, names, and PIDs. Read apps List all running applications Read ax_find Find a UI element by text/title in an app Read ax_tree Get the accessibility UI tree of an app Read browser_dom Query the DOM of a Chrome/Electron page. Returns matching elements Read browser_page_info Get current page title, URL, and text content summary Read browser_tabs List all open Chrome/Electron tabs. Use cdpPort to connect to a specific app (e.g. 9333 for Codex Desktop). Read coverage_report Check what ScreenHand knows about an app: shortcuts, selectors, flows, playbooks, error patterns, and stabi... Read discover_features Extract features from an app Read element_tree Get the accessibility element tree of the current app. Useful for understanding the UI structure and findin... Read extract Extract data from a UI element. Returns text content, table data, or structured JSON from the element. Read ingest_tutorial Extract structured playbook steps from a video transcript (e.g. YouTube captions). Converts tutorial narrat... Read locate_with_fallback Find an element Read map_app Visually map an app Read observer_ocr_roi Submit a targeted ROI OCR command to the running observer daemon. The daemon captures the window region, ru... Read observer_status Get observer daemon status — frames captured, OCR text, popup detection. Read ocr OCR a window with element positions. SLOW — prefer ui_tree for structured element discovery. Use OCR only f... Read ocr_regions Screenshot + OCR with detailed region positions (bounds, confidence) Read orchestrator_status Get orchestrator status — worker slots, task queue, active/completed tasks. Read platform_guide Get automation guide for a platform (selectors, URLs, flows, error solutions). Reads from references/ (cura... Read platform_learn Scrape official docs, help center, keyboard shortcuts for a platform. Crawls pages via Chrome and extracts ... Read playbook_list List all available playbooks with their IDs, names, platforms, and success rates. Read playbook_preflight Quick feasibility check before automating a platform. Scans the page for known blockers (captchas, WebGL, i... Read read_with_fallback Read text content from the screen or a specific element using the canonical fallback chain: AX → CDP → OCR.... Read recording_status Check if recording is active and how many events captured so far. Read scan_menu_bar Scan an app Read screenshot Screenshot a window (or full screen) and OCR it. Returns visible text. Read screenshot_file Take a screenshot and return the file path (for viewing the actual image) Read ui_find Find a specific UI element by text, title, or value. Falls back to value search if title match fails (e.g. ... Read ui_tree PREFERRED: Get the full UI element tree of an app via Accessibility. ~50ms, no screenshot/OCR. Use this FIR... Read watch_status Get all registered watch rules and their fire counts. Read window_list List all visible windows with their titles, positions, and sizes. Read windows List all visible windows with IDs and positions Read execution_plan Show the execution plan for an action type. Returns the ordered fallback chain based on available infrastru...

WRITE 3 tools

Write ingest_documentation Parse a documentation page (HTML, markdown, or text) and extract shortcuts, workflows, and tips. Merges ext... Write export_playbook Generate a playbook JSON from your session. Extracts URLs, selectors, errors+solutions from memory. Share t... Write ui_set_value Set the value of a UI element (text field, slider, etc.). Searches by title first, falls back to value match.

DESTRUCTIVE 2 tools

Destructive recording_cancel Cancel the current recording without saving. Destructive watch_unregister Remove a watch rule by ID.

EXECUTE 50 tools

Execute applescript Run an AppleScript command. For controlling Finder, Safari, Mail, Notes, etc. (macOS only). WARNING: Execut... Execute app_launch Launch a macOS/Windows application by bundle ID (e.g., Execute browser_js Execute JavaScript in a Chrome/Electron tab. Returns the result. WARNING: This runs arbitrary JS in the bro... Execute browser_navigate Navigate the active Chrome/Electron tab to a URL Execute browser_wait Wait for a condition on a Chrome/Electron page Execute launch Launch an application by bundle ID Execute navigate Navigate a browser to a URL, or open an app via Execute observer_start Start the observer daemon to continuously watch an app window. Captures frames via CGWindowListCreateImage,... Execute observer_stop Stop the observer daemon. Execute orchestrator_start Start the multi-agent orchestrator daemon. Manages parallel worker slots: web tasks (CDP) run in parallel, ... Execute orchestrator_stop Stop the orchestrator daemon. Running tasks finish before exit. Execute playbook_record Macro recorder: start/stop/trim/clean recorded playbooks. Use Execute playbook_run Execute a saved playbook by ID or auto-match by task description. Playbooks run deterministically without A... Execute recording_start Start recording user actions to auto-generate a playbook. Do the task manually while recording, then call r... Execute recording_stop Stop recording and save the captured actions as a new playbook. Execute session_start Start a new automation session. Returns a sessionId needed by all other tools. Automatically attaches to th... Execute task_run Run a complete task autonomously. Starts an observe→decide→act loop that uses the accessibility tree (not s... Execute wait_for Wait for a condition: element appears/disappears, text appears, URL changes, window title matches, etc. Execute wait_for_state Wait until a condition is met on screen: text appears, text disappears, or element becomes available. Polls... Execute app_focus Bring a running application to the foreground. Execute browser_stealth Inject anti-detection patches into Chrome/Electron page. Call once after navigating to a protected site. Hi... Execute flick Fast swipe/flick gesture (for iOS home gesture etc) Execute focus Focus/activate an application by bundle ID Execute key Press a key combination Execute platform_explore Autonomously explore an app or website. Maps all interactive elements, tries each one, records working sele... Execute watch_dialog Register a dialog watch rule: when a dialog matching the pattern appears, auto-execute an action. Execute watch_register Register a watch rule: when element with matching title appears, execute an action. Use for automated respo... Execute watch_start Start the state watcher polling loop. Evaluates registered watch rules every 2s against the world model. Execute watch_stop Stop the state watcher polling loop. Execute ax_press Find a UI element by title and press/click it via accessibility Execute browser_click Click an element in Chrome/Electron by CSS selector. Uses CDP Input.dispatchMouseEvent for realistic mouse ... Execute browser_fill_form Fill a form field with human-like typing (anti-detection). Uses real keyboard events via CDP Input domain. Execute browser_human_click Alias for browser_click — both use realistic mouseMoved → mousePressed → mouseReleased events. Prefer brows... Execute browser_open Open a URL in Chrome/Electron (creates new tab) Execute browser_type Type into an input field in Chrome/Electron. Uses CDP Input.dispatchKeyEvent for real keyboard events (work... Execute click Click at screen coordinates Execute click_text Find text on a window via OCR and click it. Handles Retina + shadow coordinate mapping. Execute click_with_fallback Click a target by text using the canonical fallback chain: AX → CDP → OCR. Automatically retries and falls ... Execute drag Drag from one point to another (slow, smooth) Execute key_combo Send a keyboard shortcut. Keys: Execute menu_click Click a menu item in an app Execute orchestrator_submit Submit a task to the orchestrator. Web tasks (CDP) run in parallel, native tasks queue per-app. Returns imm... Execute press Click/press a UI element. Finds the element by text, role, selector, or coordinates, then clicks it. Execute scroll Scroll at a position Execute scroll_with_fallback Scroll within an element or the active window using the canonical fallback chain: AX → CDP → coordinates. S... Execute select_with_fallback Select an option from a dropdown/menu using the canonical fallback chain: AX → CDP. Finds the control, open... Execute type_into Type text into a UI element (text field, search box, etc). Locates the field, optionally clears it, then ty... Execute type_text Type text using keyboard Execute type_with_fallback Type text into a target field using the canonical fallback chain: AX → CDP → coordinates. Finds the field b... Execute ui_press PREFERRED: Find and press/click a UI element by its title via Accessibility. Faster and more reliable than ...

// FAQ

How many tools does the ScreenHand MCP server have? +

The ScreenHand MCP server exposes 89 tools across 4 categories: Read, Write, Destructive, Execute.

How do I enforce policies on ScreenHand tools? +

Route the ScreenHand server through the PolicyLayer gateway. Define allow, deny, or approval rules per tool in the dashboard; they are enforced on every call before it reaches the server.

What risk categories do ScreenHand tools fall into? +

ScreenHand tools are categorised as Read (34), Write (3), Destructive (2), Execute (50). Each category has a recommended default policy.

Enforce policy on every ScreenHand tool call.

Start from ScreenHand, add the rest of your stack, and see everything your agents can call. Then put policy on all of it.

CHECK YOUR STACK →

Free to start. No card required.

43,000+ MCP servers and 220,000+ tools scanned and risk-classified.