What is Token Exfiltration?
Token exfiltration is extracting authentication tokens, session tokens, or API tokens from an AI agent's environment through malicious tool calls or prompt manipulation.
WHY IT MATTERS
Tokens are the keys to the agent's kingdom. OAuth access tokens, API keys, JWT session tokens, and service account credentials — these authenticate the agent to every system it interacts with. Token exfiltration targets these credentials specifically, aiming to steal them for use outside the agent's controlled environment.
The attack surface is broad. Tokens may be accessible through environment variables (readable by system tools), configuration files (accessible to file system tools), HTTP headers (visible in request/response logging), and the agent's context itself (if tokens are embedded in system prompts or passed as parameters).
Exfiltration techniques are varied. A poisoned tool description might instruct the agent to include its Bearer token as a query parameter. A malicious tool might log all request headers server-side. An indirect injection might cause the agent to paste its environment variables into a seemingly innocent tool call. The token leaves the agent's secure boundary through a legitimate-looking operation.
Unlike credential theft of static secrets, token exfiltration often targets time-limited but currently active sessions. The stolen token may expire, but in the window it's valid, the attacker has the exact same access as the agent — often including production systems, customer data, and administrative functions.
HOW POLICYLAYER USES THIS
Intercept prevents token exfiltration by managing authentication at the proxy layer rather than the agent layer. Tokens are stored in Intercept's configuration and injected into server requests without passing through the agent's context. Argument validation policies detect and block parameters containing token-like patterns (Bearer tokens, API key formats, JWT structures). The audit trail flags any tool call where authentication material appears in unexpected parameter fields.