// SCAN REPORT

Web Scraper

62 tools. 35 can modify or destroy data without limits.

4 destructive tools with no built-in limits. Policy required.

Last updated: 12 Jun 2026

35 can modify or destroy data

27 read-only

62 tools total

Community server · catalogue entry verified 12/06/2026

How to control Web Scraper ↓

TOOL MAP

What Web Scraper exposes to your agents

Read (27) Write / Execute (30) Destructive / Financial (4)

Running more than one server? Check your whole stack →

MOST DANGEROUS TOOLS

Critical Risk

The most dangerous Web Scraper tools

Destructive · Medium clear_cache Clearing a cache permanently deletes stored response data that cannot be recovered. Destructive · Medium clear_history Clearing history removes previously collected scraping records permanently with no indication of reversibility. Destructive · High clear_host_profile This tool permanently removes a profile record with no undo mechanism. Destructive · Medium clear_session Clearing cookies and storage is irreversible — once deleted, session data, authentication tokens, and local/session storage contents cannot be recovered.

35 of Web Scraper's 62 tools can modify, destroy, or commit something on every call — and an agent calls them with no built-in limits.

POLICY

How to control Web Scraper

PolicyLayer is an MCP gateway — it sits between your AI agents and Web Scraper, and nothing reaches the server without passing your rules. These are the rules we recommend:

Deny destructive operations

{
  "clear_cache": {
    "deny_if": [
      {
        "conditions": [],
        "on_deny": "Blocked by default. Requires approval."
      }
    ]
  }
}

Destructive tools should never be available to autonomous agents without human approval.

Rate limit write operations

{
  "download_file": {
    "limits": [
      {
        "counter": "download_file_per_hour",
        "window": "hour",
        "max": 30,
        "scope": "grant"
      }
    ]
  }
}

Prevents bulk unintended modifications from agents caught in loops.

Cap read operations

{
  "batch_contacts": {
    "limits": [
      {
        "counter": "batch_contacts_per_minute",
        "window": "minute",
        "max": 60,
        "scope": "grant"
      }
    ]
  }
}

Controls API costs and prevents retry loops from exhausting upstream rate limits.

Create a free account and register Web Scraper — nothing to install.
Add these rules — paste them, or build them visually. Tune the limits to your setup.
Point your MCP client (Claude, Cursor, anything) at your gateway URL.

ENFORCE POLICY ON WEB SCRAPER →

Free to start. No card required.

FULL CATALOGUE

All 62 Web Scraper tools

DESTRUCTIVE 4 tools

Destructive clear_cache Clear the response cache. Use when cached data may be stale. Destructive clear_history Clear scraping history. Destructive clear_host_profile Delete one host profile record from the profile store. Destructive clear_session Clear a browser session (cookies, storage). Use for fresh starts.

EXECUTE 22 tools

Execute cancel_job Cancel a running async job. Execute browser_evaluate Run a JavaScript expression on the current page and return the result. Execute browser_navigate Navigate the interactive browser to a URL. Execute browser_solve_challenge Explicitly trigger challenge detection and solving on the current page. Execute browser_wait_for Wait for selector state or a fixed delay on the active page. Execute new_session Start a fresh browser session, clearing all existing sessions. Execute run_bot_surface_diagnostic Run script-level bot surface diagnostics (scripts/bot_check.py). Execute run_browser_info_diagnostic_tool Collect browser fingerprint telemetry via scripts/get_browser_info.py. Execute run_challenge_diagnostic Run target-site diagnostics in either toolkit-native or matrix smoking-gun mode. Execute run_playbook Execute an Autonomous Crawl using a Playbook. Execute start_job Start a long-running job and return immediately with a job_id. Execute batch_scrape Scrape multiple URLs in parallel. Execute browser_hover Hover over an element by CSS selector. Execute deep_research Perform Deep Research (Search + Crawl + Report). Execute reload_runtime_config Reload runtime settings from config files. Execute browser_click Click an element on the current page by CSS selector. Execute browser_close Close the interactive browser session and free resources. Execute browser_press_key Press a keyboard key (Enter, Escape, Tab, ArrowDown, etc.). Execute browser_scroll Scroll page content or a specific scrollable element. Execute browser_type Type text into an input field on the current page. Execute click_element Navigate to URL and click an element (for JS triggers, expanding sections). Execute fill_form fill_form

WRITE 8 tools

Write download_file Download file from URL. Saves PDFs, images, documents directly. Write configure_host_learning Configure host-profile auto-learning behavior. Write configure_retry Configure retry behavior with exponential backoff. Write configure_runtime Apply runtime override values without restarting the MCP server. Write configure_scraper Configure browser settings. Write configure_stealth Configure stealth mode and robots.txt compliance. Write save_pdf Save a URL as a PDF file. Write set_host_profile Set active host routing profile from JSON payload (admin override).

READ 27 tools

Read batch_contacts Extract contacts from multiple URLs in parallel. Read browser_accessibility_tree Return a trimmed Playwright accessibility snapshot. Read browser_get_elements Find elements matching a CSS selector on the current page. Read browser_get_interaction_map Return a compact map of interactive elements with selector hints. Read browser_read_page Read the content of the current page or a specific element. Read browser_screenshot Capture a screenshot of the current page. Read chunk_text Split text into overlapping chunks for LLM processing. Read crawl_site Crawl a site's sitemap to discover pages. Read detect_content_type Detect content type of URL (HTML, PDF, image, etc.). Read extract_contacts Extract all contact information from a URL. Read extract_links Extract all hyperlinks from a webpage. Read extract_tables Extract structured table data from webpage. Read get_cache_stats Get response cache statistics (hits, misses, size). Read get_config Get current configuration settings. Read get_history Get recent scraping history. Read get_host_profiles Return host profile learning store (all hosts or one host). Read get_metadata Extract semantic metadata (JSON-LD, OpenGraph, TwitterCards). Read get_sitemap Smart Sitemap Discovery and Filtering. Read get_token_count Estimate token count for text. Read health_check Check system health. Returns status of browser, cache, sessions. Read list_jobs List recent async jobs and their statuses. Read list_sessions List all saved browser sessions. Read poll_job Get current status of a job started by start_job. Read scrape_url Scrape a URL and return its content. Read screenshot Capture a screenshot of a webpage. Read search_web Perform a web search and return results. Read validate_url Validate URL reachability before scraping. Returns status, content type, size.

OTHER 1 tools

Other truncate_text Truncate text to fit within token limit.

EXPLORE

Related servers

Other MCP servers with similar tools — same risk classification, starter policies for each.

FAQ

Questions about Web Scraper

Can an AI agent delete data through the Web Scraper MCP server? +

Yes. The Web Scraper server exposes 4 destructive tools including clear_cache, clear_history, clear_host_profile. These permanently remove resources with no undo. PolicyLayer blocks destructive tools by default so they never reach the upstream server.

How do I prevent bulk modifications through Web Scraper? +

The Web Scraper server has 8 write tools including download_file, configure_host_learning, configure_retry. Set a rate limit in your policy -- for example, 10 calls per hour prevents an agent from making more than 10 modifications per hour. PolicyLayer enforces this at the gateway, before calls reach Web Scraper.

How many tools does the Web Scraper MCP server expose? +

62 tools across 4 categories: Destructive, Execute, Read, Write. 27 are read-only. 35 can modify, create, or delete data.

How do I enforce a policy on Web Scraper? +

Register the Web Scraper MCP server in PolicyLayer, apply the suggested rules above (adjust the limits to your use case), and point your AI client at the PolicyLayer proxy URL instead of the server directly. Your agents keep the same tools; PolicyLayer evaluates every call against policy before it executes. Nothing to install, live in minutes.

Enforce policy on every Web Scraper tool call.

Deterministic rules across all 62 Web Scraper tools. Per-identity grants. Full audit log. Live in minutes. Nothing to install.

ENFORCE POLICY ON WEB SCRAPER →

Free to start. No card required.

62 Web Scraper tools catalogued and risk-classified — across an index of 43,000+ MCP servers.