IA QA COM/MCP LLM AND RAG TESTING DEV/QA TOOLBOX TOOLS

139 tools from the Ia Qa Com/mcp llm and RAG testing Dev/QA toolbox MCP Server, categorised by risk level.

READ 99 tools
Read ab_test_report Generate an A/B test report comparing two prompts or model configurations. Accepts arrays of scores and ret... Read analyze_diff_bugs Detect potential bugs and code smells from a git diff or two code versions. Returns a list of issues with s... Read base64_decode Decode a Base64 string back to UTF-8 text. Use for inspecting Base64-encoded API responses, JWT payload cla... Read base64_encode Encode a UTF-8 string to Base64. Use when you need to embed binary data, multi-line text, or special charac... Read bias_detect Analyse a set of LLM responses generated from the same prompt template but with different demographic varia... Read bm25_score Compute BM25 relevance score between a query and one or more documents. BM25 is the industry-standard keywo... Read calculate_readability Calculate readability scores: Flesch Reading Ease, Flesch-Kincaid Grade Level, Coleman-Liau Index, and Auto... Read check_contrast_ratio Calculate WCAG 2.1 contrast ratio between two colors. Returns ratio and compliance for AA/AAA normal and la... Read compare_models Compare 2-5 AI models side by side: context window, pricing, multimodal, reasoning capabilities, and provid... Read compare_responses Compare two LLM or MCP responses side by side. Detects structural differences, missing keys, value changes,... Read consistency_check Compare multiple LLM responses to the same prompt and detect inconsistencies using Jaccard word-overlap sim... Read context_window_check Given an array of message objects [{role, content}], estimate total token usage and check if it fits in the... Read conversation_analyze Analyze a multi-turn conversation for context retention, topic drift, instruction following, and repetition... Read cors_checker Check the CORS configuration of a URL the same way a browser would. Returns the main response status, all A... Read cors_test Test a URL for CORS misconfigurations. Sends preflight (OPTIONS) and cross-origin requests with various Ori... Read cot_analyzer Analyze a Chain-of-Thought (CoT) or reasoning trace from an LLM. Detects step count, logical flow, conclusi... Read count_code_lines Count lines of code: total, code lines, comment lines, blank lines, and comment density. Supports JS/TS, Py... Read count_tokens Estimate the token count of a text string using the cl100k_base approximation (~4 chars/token). Call this B... Read cron_validator Validate a 5-field cron expression, explain the schedule, and preview the next execution times. Use this to... Read decode_jwt Decode a JWT (JSON Web Token) and return its header and payload without verifying the signature. Also repor... Read detect_language Detect the natural language of a text using n-gram frequency analysis and common word markers. Supports 15 ... Read detect_secrets Scan code or config files for hardcoded secrets: AWS keys, GitHub tokens, OpenAI/Anthropic API keys, Stripe... Read diff_text Compute a unified line-by-line diff between two text strings (LCS algorithm). Returns added/removed/unchang... Read embedding_similarity Compute text similarity using local algorithms (Bag of Words, TF-IDF, Character N-grams). No API key needed... Read escape_html Escape HTML special characters (&, <, >, ", ') to their safe HTML entities. ALWAYS call this before inserti... Read estimate_llm_cost Estimate the API cost in USD for a given model and token counts. Supports all major 2024–2026 models: GPT-4... Read extract_json_from_text Extract the first valid JSON object or array embedded in chaotic LLM output (surrounded by markdown fences,... Read extract_json_path Extract a value from a JSON string using dot-notation path (e.g., "user.address.city", "items.0.name", "met... Read extract_links Extract all URLs, email addresses, and domain names from text. Returns categorized and deduplicated results... Read extract_todos Extract TODO, FIXME, HACK, BUG, NOTE, OPTIMIZE, and custom tags from any source code or text. Returns line ... Read fetch_veille_feed Fetch the latest QA & AI/LLM articles aggregated from curated RSS sources (Google Testing Blog, DEV.to Test... Read few_shot_formatter Format few-shot examples for LLM prompts. Converts example pairs into formatted blocks. Supports chat forma... Read find_tool Search available MCP tools by keyword or category before calling them. Returns matching tool names, descrip... Read flatten_json Flatten a nested JSON object to single-level dot-notation keys (e.g. {"a":{"b":1}} → {"a.b":1}), or unfla... Read format_bytes Convert raw byte counts to human-readable sizes in SI (KB=1000) or IEC (KiB=1024) units, or parse size stri... Read format_json Format, validate, and pretty-print a JSON string. Returns the formatted JSON or a detailed parse error. Read format_table Convert a JSON array of objects into a Markdown table. Automatically detects columns, aligns headers, and f... Read function_call_validate Validate an LLM function call / tool_use output: check that function name is in allowed list, arguments mat... Read get_testing_guidelines Query the IA-QA methodology knowledge base. Returns structured testing guidelines, assertion strategies, th... Read guardrail_test Test an LLM response against a set of guardrail rules: must-include, must-not-include, max length, required... Read hallucination_check Word-overlap based hallucination check: verifies if an LLM answer's words and numbers appear in the provide... Read hash_text Compute a cryptographic hash of a text string. Use when you need to verify data integrity, generate content... Read html_to_markdown Convert HTML to clean Markdown. Strips scripts, styles, nav, ads, and comments. Converts headings, lists, l... Read http_status_lookup Look up detailed information about any HTTP status code: class, name, description, cacheability, typical ca... Read identify_caller Returns what the server knows about the current MCP client: clientInfo captured during initialize, User-Age... Read json_diff Compute a deep structural diff between two JSON values. Returns added, removed, and changed keys with dot-n... Read json_schema_validate Validate a JSON value against a JSON Schema (draft-07 subset). Supports type, required, properties, items, ... Read json_to_csv Convert a JSON array of objects to CSV format. Automatically detects columns from all object keys. Handles ... Read json_to_yaml Convert a JSON object to clean, human-readable YAML. Handles nested objects, arrays, multiline strings, and... Read latency_benchmark Measure response time of one or more HTTP endpoints (GET/POST). Runs N iterations and returns min/max/avg/p... Read list_llm_models List all LLM models available on ia-qa.com with their provider, API endpoint, and capabilities. Filter by p... Read list_local_tests Discover .ia-eval.yaml LLM test suite files in the project directory. Scans CWD and standard sub-directorie... Read llm_fit_finder Find the best LLM for a given use case. Compares 30+ cloud API models and 12+ local models by cost, speed, ... Read llm_format_check Validate that an LLM output matches an expected format: JSON, Markdown, code block, bullet list, numbered l... Read llm_json_schema_check Validate that an LLM JSON output matches a JSON Schema definition. Tests required fields, types, enums, nes... Read llm_output_validator Validate an LLM response against QA criteria: format checks (JSON, code, markdown), content rules (must-inc... Read lorem_ipsum Generate Lorem Ipsum placeholder text for UI mockups, design prototypes, or test data population. Configura... Read mcp_schema_lint Lint an MCP tool definition for best practices: naming conventions, description quality, schema completenes... Read mcp_server_health_check Generate a health check report for an MCP server's tool manifest. Validates tool definitions, schema qualit... Read minify_js Minify a JavaScript snippet, function, class, or module up to 50 KB using Terser. Returns minified code and... Read mock_from_schema Generate realistic mock data from a JSON Schema. Supports all common types (string, number, integer, boolea... Read model_info Get detailed specs for an AI model: context window, pricing per 1K tokens, knowledge cutoff, provider, mult... Read normalize_vector L2-normalize a float vector (produce a unit vector with norm=1). Required by many vector DBs (Pinecone, Qdr... Read normalize_whitespace Normalize whitespace: trim trailing spaces, collapse blank lines, normalize line endings (LF/CRLF), convert... Read openapi_validate Validate the structure of an OpenAPI 3.x specification (JSON or YAML). Checks required top-level fields (op... Read optimize_prompt_tokens Compress an LLM prompt by removing filler words, verbose phrases, duplicate sentences, and unnecessary whit... Read pr_gatekeeper Compound quality gate for pull requests. Runs three sequential checks: (1) secret detection — scans diff ... Read prompt_injection_scan Scan user input or prompts for common prompt injection patterns. Detects system prompt overrides, jailbreak... Read prompt_test_suite Define a test suite for a prompt: provide the system prompt, user prompt, and expected output criteria. Ret... Read rag_relevance_rank Rank an array of text chunks by relevance to a query using TF-IDF scoring. Simulates retrieval ranking for ... Read redact_pii Automatically detect and redact Personally Identifiable Information (PII) from text. Replaces emails, phone... Read regex_test Test a regular expression pattern against an input string and return all matches with their index positions... Read rerank_evaluate Evaluate RAG retrieval quality using the NVIDIA neural reranker (nv-rerankqa-mistral-4b-v3). Ranks passages... Read response_quality_score Score an LLM response on multiple quality dimensions: relevance, completeness, clarity, conciseness, format... Read score_geo_signals Analyze a webpage <head> HTML (or full HTML) for GEO (Generative Engine Optimization) signals. Returns a sc... Read secret_scan Scan text or code for leaked secrets: API keys (AWS, GCP, Azure, OpenAI, Anthropic, Stripe, GitHub, GitLab,... Read security_headers_check Analyse the HTTP security headers of any public URL. Grades each header (A–F) for: Strict-Transport-Securit... Read shield_analyze Run a comprehensive AI guardrail analysis on an LLM response. Orchestrates 6 deterministic safety checks pl... Read similarity_score Compute text similarity between reference and hypothesis using multiple metrics: Cosine (BoW, TF-IDF), Jacc... Read sort_lines Sort, deduplicate, reverse, or filter lines of text. Useful for cleaning import lists, dependencies, log fi... Read split_chunks Split text into chunks of at most N tokens (cl100k_base: ~4 chars/token) with optional overlap. Designed fo... Read ssl_certificate_check Analyse the SSL/TLS certificate of any HTTPS host. Returns certificate subject, issuer, validity dates, day... Read test_skill Validate a SKILL.md definition (Cursor / GitHub Copilot / Windsurf) by auto-generating trigger-positive and... Read text_stats Compute comprehensive statistics for any text: character count (with and without spaces), word count, line ... Read token_budget_calculator Plan token allocation across system prompt, user input, context/RAG chunks, and expected output. Warns if b... Read toxicity_scan Scan text for toxic language, bias indicators, profanity, and harmful content categories. Returns risk scor... Read unescape_html Convert HTML entities (&amp;, &lt;, &gt;, &quot;, &#x27;, and numeric &#NNN;) back to plain characters. Use... Read url_decode Decode a percent-encoded URL string back to plain text. Use when parsing query parameters from raw URLs or ... Read url_encode Percent-encode a string for safe use in URLs. Call this before programmatically building query strings, pat... Read validate_agent_trajectory Run declarative assertions on an agent trace (OpenAI tool-call messages, LangChain run trees, or plain text... Read validate_email Validate an email address against RFC 5322 syntax before storing it, sending a transactional email, or addi... Read validate_mcp_response Validate that an MCP tool response conforms to expected format, schema, and content rules. Use this to QA-t... Read validate_url Parse and validate a URL. Returns decomposed components: protocol, hostname, port, path, query parameters, ... Read vector_similarity Compute similarity/distance between two float vectors: cosine similarity, dot product, Euclidean and Manhat... Read vector_stats Compute statistics for a float vector or matrix of vectors: mean, std, L2 norm, min, max, sparsity, top-K i... Read webhook_endpoint_requests Fetch the requests captured by a webhook created with webhook_endpoint_create. Returns the newest requests ... Read word_frequency Analyze word frequency in text. Returns top N words with counts and percentages. Supports English stopword ... Read xml_to_json Convert an XML string to a JSON object. Supports attributes, nested elements, arrays, CDATA, and namespaces... Read yaml_to_json Parse a YAML string and return the equivalent JSON value. The reverse of json_to_yaml. Supports nested obje...
WRITE 23 tools
Write case_convert Convert a string between naming conventions: camelCase, PascalCase, snake_case, kebab-case, UPPER_SNAKE_CAS... Write color_convert Convert a color between HEX, RGB, and HSL formats. Use when translating design tokens between CSS notations... Write cookie_security_audit Audit the security attributes of cookies set by any URL. Fetches the URL and inspects all Set-Cookie header... Write generate_curl Generate a curl command from request parameters. Supports GET/POST/PUT/DELETE, custom headers, JSON body, a... Write generate_eval_yaml Generate a complete .ia-eval.yaml evaluation contract from a plain-language description of what your LLM sh... Write generate_hmac Compute an HMAC signature for a message using a secret key. Supports SHA-256 (default), SHA-512, SHA-1, and... Write generate_html_report Convert a run_eval_contract() LLM Test Runner JSON result into a fully self-contained dark-themed HTML repo... Write generate_json_ld Generate a ready-to-paste <script type="application/ld+json"> snippet for GEO / structured data optimizatio... Write generate_password Generate a cryptographically secure random password using crypto.randomBytes. Configurable length (4–128)... Write generate_slug Convert any string into a URL-friendly slug: lowercase, ASCII-normalized (é→e), special characters remov... Write generate_test_cases Generate a set of test cases (valid, edge, invalid) for a given feature description. Returns test matrix wi... Write generate_uuid Generate one or more cryptographically random UUID v4 identifiers. Use this when you need unique IDs for te... Write json_schema_generate Infer a JSON Schema (draft-07) from a sample JSON value. Detects types, required fields, array item shapes,... Write levenshtein_distance Compute the Levenshtein (edit) distance and normalized similarity ratio between two strings. Supports batch... Write lint_commit_message Validate a git commit message against the Conventional Commits spec (feat, fix, docs, style, refactor, test... Write llm_generate Generate text using open-source LLM models hosted on Groq (ultra-fast) or HuggingFace Inference (serverless... Write merge_json Deep merge two JSON objects. Supports three array strategies: replace (default), concat, or unique (dedup c... Write multimodal_eval_guide Unified tool for multimodal AI evaluation: set action=guide for reference thresholds/interpretation (CLIP, ... Write needle_haystack_generate Generate a "needle in a haystack" test: embeds a target fact into a large block of filler text at a specifi... Write number_base_convert Convert numbers between bases: decimal, binary, octal, hexadecimal, or any base 2–36. Auto-detects 0x, 0b... Write prompt_template_fill Fill a prompt template with variables. Supports {{variable}} syntax and {{#if key}}...{{/if}} conditional b... Write timestamp_convert Convert between Unix timestamps (seconds or milliseconds) and ISO-8601 / UTC date strings. Auto-detects epo... Write webhook_endpoint_create Create a temporary webhook endpoint that captures incoming HTTP requests for one hour. Returns the webhook ...
EXECUTE 16 tools
Execute build_rag_prompt Assemble a complete RAG (Retrieval-Augmented Generation) prompt from retrieved context chunks and a user qu... Execute cron_parse Parse a cron expression into a human-readable schedule description. Supports standard 5-field cron (minute ... Execute env_parse Parse a .env file content into a JSON object. Handles quoted values (single and double), inline comments, e... Execute mcp_server_evaluate Run a full compliance evaluation against a live MCP server URL. Tests: server reachability (ping), manifest... Execute parse_csv Parse a CSV string into a JSON array of objects (or raw arrays). Handles RFC 4180 quoted fields, escaped qu... Execute parse_http_headers Parse a raw HTTP headers block into a structured JSON object. Detects multi-value headers, masks Authorizat... Execute run_eval_contract Parse a .ia-eval.yaml LLM test suite, call the specified LLM model for each scenario, run all configured sc... Execute run_pr_gate_pipeline Full automated QA pipeline for a pull request. Takes a unified git diff (output of `git diff HEAD`) and ret... Execute run_semantic_tests Semantic assertion primitive: compare actual vs expected text pairs using cosine similarity + ROUGE-L. Two ... Execute run_vlm_test_suite Run a test suite against a Vision-Language Model (VLM) — send an image (URL or base64) + N test cases (ea... Execute run_vlm_test_suite_batch Compare multiple VLMs on the same test suite in parallel — send an image (URL or base64) + N test cases t... Execute strip_markdown Strip all Markdown formatting (headers, bold, italic, code fences, links, lists) from text and return clean... Execute system_prompt_builder Build a structured system prompt from components: role, task, constraints, output format, tone, language, a... Execute transform_json_array Transform a JSON array using common operations: pluck (extract specific fields), filter (by field value), s... Execute vector_quantize Simulate int8 or int4 quantization of float32 embedding vectors. Reduces storage by 4x (int8) or 8x (int4).... Execute web_security_audit Run a comprehensive web security audit combining headers, SSL, CORS, and cookies checks — then use an LLM t...
How many tools does the Ia Qa Com/mcp llm and RAG testing Dev/QA toolbox MCP server have? +

The Ia Qa Com/mcp llm and RAG testing Dev/QA toolbox MCP server exposes 139 tools across 4 categories: Read, Write, Destructive, Execute.

How do I enforce policies on Ia Qa Com/mcp llm and RAG testing Dev/QA toolbox tools? +

Route the Ia Qa Com/mcp llm and RAG testing Dev/QA toolbox server through the PolicyLayer gateway. Define allow, deny, or approval rules per tool in the dashboard — they are enforced on every call before it reaches the server.

What risk categories do Ia Qa Com/mcp llm and RAG testing Dev/QA toolbox tools fall into? +

Ia Qa Com/mcp llm and RAG testing Dev/QA toolbox tools are categorised as Read (99), Write (23), Destructive (1), Execute (16). Each category has a recommended default policy.

Let agents act without letting them run wild.

Route your MCP servers through PolicyLayer and every tool call is checked against your policy before it runs — allow, deny, or require approval. Per-identity grants. Full audit log. Live in minutes.

Free to start. No card required.

4,600+ MCP servers and 31,000+ tools scanned and risk-classified.

// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.