inference_usage

Get inference API usage statistics including request counts, token usage, and costs.

Server Tuning Engines - LLM Fine-Tuning tuningengines-cli
Category Read
Risk class Low
Parameters 00 required

What inference_usage does on Tuning Engines - LLM Fine-Tuning

AI agents call inference_usage to retrieve information from Tuning Engines - LLM Fine-Tuning without modifying anything — typically the context-gathering step in research, monitoring, and reporting workflows, before the agent takes action elsewhere.

Why inference_usage needs a policy

Even though inference_usage only reads data, uncontrolled read access leaks sensitive information and racks up API costs — an agent caught in a retry loop can make thousands of calls a minute without anyone noticing.

Questions about inference_usage

What does the inference_usage tool do? +

Get inference API usage statistics including request counts, token usage, and costs. It is categorised as a Read tool in the Tuning Engines - LLM Fine-Tuning MCP Server, which means it retrieves data without modifying state.

How do I enforce a policy on inference_usage? +

Register the Tuning Engines - LLM Fine-Tuning MCP server in PolicyLayer and add a rule for inference_usage: allow, deny, rate-limit, or require approval. Point your MCP client at the PolicyLayer proxy URL and the rule is enforced on every call, before it reaches Tuning Engines - LLM Fine-Tuning. Nothing to install.

What risk level is inference_usage? +

inference_usage is a Read tool with low risk. Read-only tools are generally safe to allow by default.

Can I rate-limit inference_usage? +

Yes. Add a rate_limit block to the inference_usage rule in your PolicyLayer policy. For example, setting max: 10 and window: 60 limits the tool to 10 calls per minute. Rate limits are tracked per agent session and reset automatically.

How do I block inference_usage completely? +

Set action: deny in the PolicyLayer policy for inference_usage. The AI agent will receive a policy violation error and cannot call the tool. You can also include a reason field to explain why the tool is blocked.

What MCP server provides inference_usage? +

inference_usage is provided by the Tuning Engines - LLM Fine-Tuning MCP server (tuningengines-cli). PolicyLayer sits as a proxy in front of this server to enforce policies before tool calls reach the server.

// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.