Namespace-Scope Your Kubernetes MCP Server From Production

23 May 2026

An agent is investigating a crashloop. Someone pastes the wrong namespace into the chat — payments-prod instead of payments-staging — and asks the agent to “get it back to a working state”. The agent has fixed this exact failure twice this week in staging by deleting the bad pod and letting the deployment respin. It calls delete_pod with the production namespace. The pod was the last replica handling live checkout traffic.

That is the failure mode. The agent did nothing irrational; it generalised from prior successful runs. What we want: the agent investigates freely in dev, staging, and sandbox, and cannot touch production no matter what it is asked. PolicyLayer enforces that boundary at the gateway, before the call reaches the cluster.

RBAC Alone Isn’t a Wall — It’s a Permission Surface

Kubernetes RBAC is the right primary control. The problem is calibration. A service account scoped tightly enough to be safe for an autonomous agent — read-only, single namespace, no exec, no delete — is also too restrictive to do most of the useful diagnostic work you actually want the agent to do. Tailing logs, describing failing pods, replaying a manifest into staging, scaling a deployment back up after a flap: these need write verbs on multiple resources across at least the non-prod namespaces.

So teams compromise. They grant the agent’s service account a broader role, then trust the prompt to keep it pointed at the right namespace. Prompt injection breaks that trust the moment a malicious log line, a poisoned issue comment, or a confused user instruction arrives. The agent does not know the namespace was wrong. RBAC sees a permitted verb on a permitted resource and allows it.

PolicyLayer does not replace RBAC. It adds a second wall in front of it. The service account stays broad enough for the agent to be useful; the gateway enforces deterministic MCP policy on namespace scope for every individual tools/call before it reaches the API server. The first wall says “this account is allowed to delete pods”. The second wall says “but not in production, and not more than five times an hour, and never with this specific combination of arguments”.

Namespace Allowlists on Destructive Tools

The policy below covers a typical Kubernetes MCP server such as mcp-server-kubernetes. Three rules do the work.

First, require that every destructive tool target an allowlisted namespace. PolicyLayer’s Require primitive checks args.namespace against the in operator with a fixed set. Anything outside the set — including a missing or empty namespace — fails the check and the call is denied before it reaches the cluster.

Second, hide cluster-admin tools the agent never needs. Hide strips whole tools from the MCP tools/list response, so the agent does not see them and cannot try to call them. Whole tools only — not individual arguments.

Third, limits on destructive verbs. A runaway loop deleting pods every two seconds is constrained at five per hour, scoped to the grant. Legitimate work continues; a malfunction is bounded.

{
  "version": "1",
  "default": "allow",
  "hide": [
    "delete_namespace",
    "get_secret",
    "apply_cluster_role",
    "delete_cluster_role"
  ],
  "tools": {
    "delete_pod": {
      "require": [
        {
          "conditions": [
            { "path": "args.namespace", "op": "in", "value": ["dev", "staging", "sandbox"] }
          ],
          "on_deny": "Namespace is outside the allowed non-production set."
        }
      ],
      "limits": [
        {
          "counter": "delete_pod",
          "max": 5,
          "window": "hour",
          "scope": "grant",
          "on_deny": "Pod deletion limit exceeded for this grant."
        }
      ]
    },
    "apply_manifest": {
      "require": [
        {
          "conditions": [
            { "path": "args.namespace", "op": "in", "value": ["dev", "staging", "sandbox"] },
            { "path": "args.kind", "op": "not_in", "value": ["Namespace", "ClusterRole", "ClusterRoleBinding"] }
          ],
          "on_deny": "Manifest applies are limited to non-production namespaced resources."
        }
      ]
    },
    "scale_deployment": {
      "require": [
        {
          "conditions": [
            { "path": "args.namespace", "op": "in", "value": ["dev", "staging", "sandbox"] }
          ],
          "on_deny": "Namespace is outside the allowed non-production set."
        }
      ]
    },
    "delete_deployment": {
      "require": [
        {
          "conditions": [
            { "path": "args.namespace", "op": "in", "value": ["dev", "staging", "sandbox"] }
          ],
          "on_deny": "Namespace is outside the allowed non-production set."
        }
      ]
    },
    "exec_pod": {
      "require": [
        {
          "conditions": [
            { "path": "args.namespace", "op": "in", "value": ["dev", "staging", "sandbox"] }
          ],
          "on_deny": "Namespace is outside the allowed non-production set."
        }
      ]
    }
  }
}

Three details matter. The in operator does an exact set membership check — no regex, no prefix match. If you want to permit any namespace beginning with dev-, use the regex operator with a Go stdlib pattern instead. The Hide block removes tools from discovery, so the agent’s planner never proposes them; this is stricter and quieter than denying at call time. And the scope: grant on the limit means the counter resets per issued token — a different agent operating under a different grant has its own counter, and you can revoke one without affecting the other.

You can stack conditions. The apply_manifest rule above requires both args.namespace and args.kind to pass, denying the call on the combination of fields rather than any single one.

Honest Limits

This is not a substitute for RBAC. The service account behind the MCP server should still be least-privilege — no cluster-admin, no wildcard verbs, no access to secrets the agent does not need. Cluster-level RBAC remains the primary wall, and if PolicyLayer is bypassed or misconfigured, RBAC is what stops the agent from owning the cluster.

PolicyLayer is useful as the second wall because of three things. It sees the call before it reaches the API server, so denies cost nothing on the cluster side. It logs every deny centrally, across every upstream MCP server you run, so you have one place to ask which production guardrails fired this week. And it can deny on the combination of fields — args.namespace plus args.kind plus args.action — which is awkward to express in RBAC and trivial in a policy condition.

Defence in depth, not defence in replacement.

Getting Started

Three steps to a wall.

One. Point your Kubernetes MCP server at a service account that is already least-privilege at the RBAC layer. No cluster-admin, no access to namespaces the agent has no business in. Treat this as the floor.

Two. Register the MCP server as an upstream in PolicyLayer and issue a scoped grant for the agent. The Grant is the unit of data-plane access — one per agent, one per environment — and it carries the policy attachment.

Three. Write the policy. Start with the three rules above: require on destructive tools, hide on cluster-admin tools, limits on delete_pod. Test against a sandbox cluster by asking the agent to do the things you want denied — point it at the production namespace, ask it to read secrets, run a loop — and watch the denies appear in the proxy log. Iterate from real refusals, not imagined ones.

Once the sandbox is quiet, roll the grant out to staging. Production access is a separate grant with a tighter policy and tighter limits, issued only when you have a reason.

Why This Matters

Two outcomes. First, one audit surface for “what did the agent try to do in production”. Every deny is logged with the tool, the grant, the policy decision, the rule pointer, the denial message, and top-level argument keys. PolicyLayer does not store argument values in proxy logs, so the namespace value itself is evaluated at request time but not retained in the log row.

Second, bounded blast radius. The worst case for a misaligned or compromised agent is no longer “deletes the production deployment”. It is “tries to delete the production deployment, gets denied at the gateway, and the attempt shows up in your proxy log shortly after”. That is the difference between an incident and a near-miss.

RBAC Alone Isn’t a Wall — It’s a Permission Surface

Namespace Allowlists on Destructive Tools

Honest Limits

Getting Started

Why This Matters

Take your agents live. Without losing control.