← Back to Blog

The Academic Case for Deterministic AI Agent Enforcement

A research paper published this month makes an argument that anyone running AI agents in production already suspects: language models cannot reliably enforce their own security constraints. The paper, “Securing Agentic AI” from researchers at CSIRO’s Data61 and the University of Melbourne, doesn’t just identify the problem. It lays out a specific architectural prescription — deterministic enforcement layers, operating outside the model, evaluating every action against hard-coded policy rules.

Every recommendation in that paper maps to something Intercept already does. This isn’t a coincidence. It’s convergent engineering — when you work backwards from the threat model, you arrive at the same architecture.

The Core Argument

The paper’s central thesis is blunt: LLMs blur the boundary between code and data. Plaintext prompts shape control flow. Dynamically generated text becomes input for the next decision. This creates a system where the enforcement mechanism and the thing being enforced are the same probabilistic process.

The authors invoke Saltzer and Schroeder’s security design principles — originally published in 1975, still the foundation of systems security — and apply them to agentic AI:

  • Least privilege. Agents should receive the minimum capabilities required for their task. Not access to every tool an MCP server exposes.
  • Complete mediation. Every sensitive operation must hit a policy check. Not most operations. Every one.
  • Separation of privilege. Trusted enforcement logic must be isolated from untrusted data flows — meaning the policy engine cannot live inside the model that generates the tool calls.

The practical consequence: you need a deterministic layer that sits outside the model and evaluates every tool call against explicit rules. The model decides what to do. The enforcement layer decides whether it’s allowed.

Why the Model Can’t Enforce Its Own Rules

The paper identifies three failure modes that make self-enforcement unreliable. Anyone who has deployed MCP agents will recognise them.

Non-determinism. The same model, the same prompt, the same input — different outputs on different runs. A system prompt that says “do not create charges exceeding $500” works most of the time. But “most of the time” in security means “sometimes it doesn’t.” Temperature, context window length, conversation history, and prompt phrasing all influence whether the model follows its own constraints.

Prompt injection. Agents read content from external systems — emails, database records, API responses, document contents. Any of these can contain instructions that override or contradict system prompt rules. The paper notes that MCP agents have an especially large attack surface because every tool call returns data from external systems that the model processes as context.

No audit trail. When a prompt guardrail blocks an action, there is no log entry. The model chose not to make the call. You cannot distinguish between “the agent respected the limit” and “the agent didn’t think the call was relevant.” This makes compliance impossible and incident response guesswork.

The paper’s conclusion is direct: relying solely on model-level defences is insufficient because the non-deterministic nature of LLM reasoning ensures that any individual defence can be circumvented.

What MCP Doesn’t Cover

The paper has a pointed observation about MCP and similar protocols: they address low-level mechanisms like authentication and transport security, but don’t adequately cover secure delegation, inter-agent trust boundaries, or privilege management.

This is the gap. MCP gives your agent authenticated access to Stripe, GitHub, AWS, Slack — but says nothing about what the agent should be allowed to do with that access. The GitHub MCP server exposes 83 tools. Does your agent need delete_file? Should it be allowed to call merge_pull_request without constraint? Can it trigger actions_run_trigger 500 times in an hour?

MCP authenticates the connection. It doesn’t enforce policy on the traffic.

Intercept: The Paper’s Recommendations, Shipped

Intercept implements every architectural recommendation from the paper as a transparent MCP proxy. It sits between the agent and the MCP server, intercepts every tools/call request, and evaluates it against YAML-defined policy rules.

Here’s a policy enforcing the exact constraints the paper advocates:

version: "1"
description: "Stripe MCP server — production policy"

tools:
  create_charge:
    rules:
      # Least privilege: cap individual actions
      - name: "max single charge"
        conditions:
          - path: "args.amount"
            op: "lte"
            value: 50000
        on_deny: "Single charge cannot exceed $500.00"

      # Complete mediation: track cumulative state
      - name: "daily spend cap"
        conditions:
          - path: "state.create_charge.daily_spend"
            op: "lt"
            value: 1000000
        on_deny: "Daily spending cap of $10,000.00 reached"
        state:
          counter: "daily_spend"
          window: "day"
          increment_from: "args.amount"

      # Argument validation: restrict to known-safe values
      - name: "allowed currencies"
        conditions:
          - path: "args.currency"
            op: "in"
            value: ["usd", "eur"]
        on_deny: "Only USD and EUR charges are permitted"

  # Separation of privilege: block destructive operations entirely
  delete_customer:
    rules:
      - name: "block destructive action"
        action: "deny"
        on_deny: "Customer deletion is not permitted via AI agents"

  # Global rate limit across all tools
  "*":
    rules:
      - name: "global rate limit"
        rate_limit: 60/minute
        on_deny: "Rate limit: maximum 60 tool calls per minute"

Every rule is deterministic. args.amount <= 50000 produces the same result for the same input, every time. No interpretation. No probability. No context window drift.

The paper recommends combining role-based access control with risk-adaptive approaches where authorisation decisions factor in aggregated risk. That’s what the state block does — it tracks cumulative spending across tool calls using exact counters in Redis or SQLite, not the model’s memory. After 50 calls, the counter is still exact. The model’s running total in its context window is not.

Complete Mediation, Not Selective Enforcement

The paper emphasises complete mediation: every access to every object must be checked for authority. Intercept implements this at the transport layer. Every JSON-RPC tools/call message passes through the proxy. There is no path for a tool call to bypass policy evaluation.

This matters because the alternative — selective enforcement — is the default in every prompt-based system. The model applies rules when it remembers to. It skips them when context is long, when the instruction is ambiguous, when injection overrides the constraint. Partial enforcement creates a false sense of security.

Intercept’s proxy architecture makes bypass structurally impossible. The agent’s MCP connection goes through Intercept. Every call is evaluated. Every denial is logged with the rule that triggered it, the tool that was called, and the arguments that were passed.

Agent → Intercept (policy evaluation) → MCP Server

    deny + log event

Defence in Depth, Not Defence in Hope

The paper frames deterministic enforcement as part of a defence-in-depth strategy: multiple independent layers, each catching what the others miss. The model’s own judgment is one layer. Prompt design is another. But neither is deterministic, and the paper argues that mandatory deterministic layers are essential — not optional, not supplementary. Essential.

Intercept is that layer. It doesn’t replace prompt engineering or model alignment. It complements them with hard constraints that cannot be bypassed by prompt injection, context window overflow, or probabilistic drift.

Getting Started

Intercept wraps any MCP server with zero changes to the server or the agent:

# Before: agent connects directly to MCP server
npx -y @modelcontextprotocol/server-github

# After: Intercept enforces policy on every tool call
intercept -c github-policy.yaml -- npx -y @modelcontextprotocol/server-github

130+ pre-built policies ship for popular MCP servers — Stripe, GitHub, AWS, Slack, Jira, and more. Each one maps every tool the server exposes, categorised by risk level. Customise the rules you need, deploy in production.

The paper calls for deterministic enforcement as an architectural principle. Intercept ships it as open-source software.


Ready to enforce deterministic policies on your MCP agents?

Reference: Securing Agentic AI: A Practice-Oriented Framework — Dong, Chen, et al., CSIRO Data61 & University of Melbourne, 2026.

Protect your agent in 30 seconds

Scans your MCP config and generates enforcement policies for every server.

npx -y @policylayer/intercept init
github.com/policylayer/intercept →
// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.