Web Scraper

62 tools. 35 can modify or destroy data without limits.

4 destructive tools with no built-in limits. Policy required.

Last updated:

35 can modify or destroy data
27 read-only
62 tools total

Community server · catalogue entry verified 12/06/2026

How to control Web Scraper ↓

What Web Scraper exposes to your agents

Read (27) Write / Execute (30) Destructive / Financial (4)
Critical Risk

The most dangerous Web Scraper tools

35 of Web Scraper's 62 tools can modify, destroy, or commit something on every call — and an agent calls them with no built-in limits.

How to control Web Scraper

PolicyLayer is an MCP gateway — it sits between your AI agents and Web Scraper, and nothing reaches the server without passing your rules. These are the rules we recommend:

Deny destructive operations
{
  "clear_cache": {
    "deny_if": [
      {
        "conditions": [],
        "on_deny": "Blocked by default. Requires approval."
      }
    ]
  }
}

Destructive tools should never be available to autonomous agents without human approval.

Rate limit write operations
{
  "download_file": {
    "limits": [
      {
        "counter": "download_file_per_hour",
        "window": "hour",
        "max": 30,
        "scope": "grant"
      }
    ]
  }
}

Prevents bulk unintended modifications from agents caught in loops.

Cap read operations
{
  "batch_contacts": {
    "limits": [
      {
        "counter": "batch_contacts_per_minute",
        "window": "minute",
        "max": 60,
        "scope": "grant"
      }
    ]
  }
}

Controls API costs and prevents retry loops from exhausting upstream rate limits.

  1. Create a free account and register Web Scraper — nothing to install.
  2. Add these rules — paste them, or build them visually. Tune the limits to your setup.
  3. Point your MCP client (Claude, Cursor, anything) at your gateway URL.
ENFORCE POLICY ON WEB SCRAPER →

Free to start. No card required.

All 62 Web Scraper tools

EXECUTE 22 tools
Execute cancel_job Cancel a running async job. Execute browser_evaluate Run a JavaScript expression on the current page and return the result. Execute browser_navigate Navigate the interactive browser to a URL. Execute browser_solve_challenge Explicitly trigger challenge detection and solving on the current page. Execute browser_wait_for Wait for selector state or a fixed delay on the active page. Execute new_session Start a fresh browser session, clearing all existing sessions. Execute run_bot_surface_diagnostic Run script-level bot surface diagnostics (scripts/bot_check.py). Execute run_browser_info_diagnostic_tool Collect browser fingerprint telemetry via scripts/get_browser_info.py. Execute run_challenge_diagnostic Run target-site diagnostics in either toolkit-native or matrix smoking-gun mode. Execute run_playbook Execute an Autonomous Crawl using a Playbook. Execute start_job Start a long-running job and return immediately with a job_id. Execute batch_scrape Scrape multiple URLs in parallel. Execute browser_hover Hover over an element by CSS selector. Execute deep_research Perform Deep Research (Search + Crawl + Report). Execute reload_runtime_config Reload runtime settings from config files. Execute browser_click Click an element on the current page by CSS selector. Execute browser_close Close the interactive browser session and free resources. Execute browser_press_key Press a keyboard key (Enter, Escape, Tab, ArrowDown, etc.). Execute browser_scroll Scroll page content or a specific scrollable element. Execute browser_type Type text into an input field on the current page. Execute click_element Navigate to URL and click an element (for JS triggers, expanding sections). Execute fill_form fill_form
READ 27 tools
Read batch_contacts Extract contacts from multiple URLs in parallel. Read browser_accessibility_tree Return a trimmed Playwright accessibility snapshot. Read browser_get_elements Find elements matching a CSS selector on the current page. Read browser_get_interaction_map Return a compact map of interactive elements with selector hints. Read browser_read_page Read the content of the current page or a specific element. Read browser_screenshot Capture a screenshot of the current page. Read chunk_text Split text into overlapping chunks for LLM processing. Read crawl_site Crawl a site's sitemap to discover pages. Read detect_content_type Detect content type of URL (HTML, PDF, image, etc.). Read extract_contacts Extract all contact information from a URL. Read extract_links Extract all hyperlinks from a webpage. Read extract_tables Extract structured table data from webpage. Read get_cache_stats Get response cache statistics (hits, misses, size). Read get_config Get current configuration settings. Read get_history Get recent scraping history. Read get_host_profiles Return host profile learning store (all hosts or one host). Read get_metadata Extract semantic metadata (JSON-LD, OpenGraph, TwitterCards). Read get_sitemap Smart Sitemap Discovery and Filtering. Read get_token_count Estimate token count for text. Read health_check Check system health. Returns status of browser, cache, sessions. Read list_jobs List recent async jobs and their statuses. Read list_sessions List all saved browser sessions. Read poll_job Get current status of a job started by start_job. Read scrape_url Scrape a URL and return its content. Read screenshot Capture a screenshot of a webpage. Read search_web Perform a web search and return results. Read validate_url Validate URL reachability before scraping. Returns status, content type, size.

Related servers

Other MCP servers with similar tools — same risk classification, starter policies for each.

Questions about Web Scraper

Can an AI agent delete data through the Web Scraper MCP server? +

Yes. The Web Scraper server exposes 4 destructive tools including clear_cache, clear_history, clear_host_profile. These permanently remove resources with no undo. PolicyLayer blocks destructive tools by default so they never reach the upstream server.

How do I prevent bulk modifications through Web Scraper? +

The Web Scraper server has 8 write tools including download_file, configure_host_learning, configure_retry. Set a rate limit in your policy -- for example, 10 calls per hour prevents an agent from making more than 10 modifications per hour. PolicyLayer enforces this at the gateway, before calls reach Web Scraper.

How many tools does the Web Scraper MCP server expose? +

62 tools across 4 categories: Destructive, Execute, Read, Write. 27 are read-only. 35 can modify, create, or delete data.

How do I enforce a policy on Web Scraper? +

Register the Web Scraper MCP server in PolicyLayer, apply the suggested rules above (adjust the limits to your use case), and point your AI client at the PolicyLayer proxy URL instead of the server directly. Your agents keep the same tools; PolicyLayer evaluates every call against policy before it executes. Nothing to install, live in minutes.

Enforce policy on every Web Scraper tool call.

Deterministic rules across all 62 Web Scraper tools. Per-identity grants. Full audit log. Live in minutes. Nothing to install.

Free to start. No card required.

62 Web Scraper tools catalogued and risk-classified — across an index of 43,000+ MCP servers.

// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.