List model evaluations. Evaluations run your trained models against benchmark datasets using various evaluators to measure quality.
AI agents call list_evaluations to retrieve information from Tuning Engines - LLM Fine-Tuning without modifying anything — typically the context-gathering step in research, monitoring, and reporting workflows, before the agent takes action elsewhere.
Even though list_evaluations only reads data, uncontrolled read access leaks sensitive information and racks up API costs — an agent caught in a retry loop can make thousands of calls a minute without anyone noticing.
Attacks that exploit this kind of access
List model evaluations. Evaluations run your trained models against benchmark datasets using various evaluators to measure quality. It is categorised as a Read tool in the Tuning Engines - LLM Fine-Tuning MCP Server, which means it retrieves data without modifying state.
Register the Tuning Engines - LLM Fine-Tuning MCP server in PolicyLayer and add a rule for list_evaluations: allow, deny, rate-limit, or require approval. Point your MCP client at the PolicyLayer proxy URL and the rule is enforced on every call, before it reaches Tuning Engines - LLM Fine-Tuning. Nothing to install.
list_evaluations is a Read tool with low risk. Read-only tools are generally safe to allow by default.
Yes. Add a rate_limit block to the list_evaluations rule in your PolicyLayer policy. For example, setting max: 10 and window: 60 limits the tool to 10 calls per minute. Rate limits are tracked per agent session and reset automatically.
Set action: deny in the PolicyLayer policy for list_evaluations. The AI agent will receive a policy violation error and cannot call the tool. You can also include a reason field to explain why the tool is blocked.
list_evaluations is provided by the Tuning Engines - LLM Fine-Tuning MCP server (tuningengines-cli). PolicyLayer sits as a proxy in front of this server to enforce policies before tool calls reach the server.