speech_to_text

THE RISK

Low Risk

Why speech_to_text needs a policy

Speech-to-text transcription is a read operation that captures or processes audio input and outputs text. It does not create, modify, delete, or execute external operations; it merely transforms one data format into another. The most severe concurrent tool on the server (code_execute, delete_memory) does not elevate this tool's classification. Confidence is high because the description is clear and unambiguous.

From the tool's definition Tool name and description indicate it 'Convert[s] speech to text' from 'microphone recording or existing audio file' — a data retrieval/transcription operation with no side effects on system state.

Documented attack patterns abuse exactly the kind of access speech_to_text gives an agent:

POLICY

How to control speech_to_text

PolicyLayer is an MCP gateway — it sits between your AI agents and A-Modular-Kingdom, and nothing reaches the server without passing your rules. This is the rule we recommend for speech_to_text:

policy.json

{
  "version": "1",
  "default": "deny",
  "tools": {
    "speech_to_text": {}
  }
}

speech_to_text is read-only, so it stays allowed — but everything else on the server is denied unless you say otherwise.

Create a free account and register A-Modular-Kingdom — nothing to install.
Add this policy — paste it, or build it visually.
Point your MCP client (Claude, Cursor, anything) at your gateway URL.

CAP THIS TOOL →

Free to start. No card required.

EXPLORE

Related tools and policies

More A-Modular-Kingdom tools

Destructive delete_memory Delete a specific memory by ID. Execute code_execute Execute Python code in a sandboxed subprocess and return stdout/stderr Execute browse_web Control a persistent browser. Actions: navigate (open URL), click (by CSS selector or x,y Execute text_to_speech Convert text to speech using various TTS engines. Can play audio directly or save to file. Write save_fact save_fact Write save_memory save_memory Write set_global_rule Set a permanent global rule that persists across all projects and sessions Read analyze_media Analyze image/video files with a local multimodal model via Ollama (e.g., gemma3:4b)

All 14 A-Modular-Kingdom tools →

Read tools on other servers

M-Team MCP Server get_torrent_detail YouTube MCP Server search_channel Android Forensics ADB MCP Server parse_browser_history 0xarchive web3_challenge

Go deeper

The MCP Attack Database → Documented attack patterns against MCP deployments — and the policies that stop them.
Data exfiltration → How read access becomes an exfiltration channel when chained with untrusted content.
MCP token cost → What connecting this server costs in context-window tokens on every request.
Rate Limiting MCP Tool Calls: A Practical Guide →
MCP Security: Why Prompt Guardrails Aren't Enough →

FAQ

Questions about speech_to_text

What does the speech_to_text tool do? +

Convert speech to text using microphone recording or existing audio file. It is categorised as a Read tool in the A-Modular-Kingdom MCP Server, which means it retrieves data without modifying state.

How do I enforce a policy on speech_to_text? +

Register the A-Modular-Kingdom MCP server in PolicyLayer and add a rule for speech_to_text: allow, deny, rate-limit, or require approval. Point your MCP client at the PolicyLayer proxy URL and the rule is enforced on every call, before it reaches A-Modular-Kingdom. Nothing to install.

What risk level is speech_to_text? +

speech_to_text is a Read tool with low risk. Read-only tools are generally safe to allow by default.

Can I rate-limit speech_to_text? +

Yes. Add a rate_limit block to the speech_to_text rule in your PolicyLayer policy. For example, setting max: 10 and window: 60 limits the tool to 10 calls per minute. Rate limits are tracked per agent session and reset automatically.

How do I block speech_to_text completely? +

Set action: deny in the PolicyLayer policy for speech_to_text. The AI agent will receive a policy violation error and cannot call the tool. You can also include a reason field to explain why the tool is blocked.

What MCP server provides speech_to_text? +

speech_to_text is provided by the A-Modular-Kingdom MCP server (masihmoafi/a-modular-kingdom). PolicyLayer sits as a proxy in front of this server to enforce policies before tool calls reach the server.

Enforce policy on every A-Modular-Kingdom tool call.

Start from A-Modular-Kingdom, add the rest of your stack, and see everything your agents can call. Then put policy on all of it.

CHECK YOUR STACK →