What is an Agent Trap?

1 min read Updated

Malicious web content or tool output specifically crafted to hijack an AI agent's behaviour, as defined by Google DeepMind's taxonomy of six trap categories: content injection, semantic manipulation, cognitive state, behavioural control, systemic, and human-in-the-loop.

WHY IT MATTERS

As AI agents increasingly navigate the web and process tool outputs, the information environment becomes an attack surface. Agent traps weaponise content that the agent processes — websites, tool responses, API outputs — to coerce it into unauthorised actions.

Unlike traditional prompt injection which targets the model directly, agent traps manipulate the environment the agent operates in. The agent's own capabilities — tool use, web browsing, file access — are turned against it.

HOW POLICYLAYER USES THIS

Intercept is a runtime defence against agent traps. By enforcing policies on every tool call, it prevents trapped agents from executing unauthorised actions — even if the agent's reasoning has been compromised.

FREQUENTLY ASKED QUESTIONS

What are the six trap categories?
Content Injection (perception), Semantic Manipulation (reasoning), Cognitive State (memory), Behavioural Control (action), Systemic (multi-agent dynamics), and Human-in-the-Loop (exploiting human oversight).
How do you defend against agent traps?
Runtime policy enforcement at the tool call level. Even if an agent's reasoning is compromised by a trap, the enforcement proxy blocks actions that violate defined policies.

FURTHER READING

Enforce policies on every tool call

Intercept is the open-source MCP proxy that enforces YAML policies on AI agent tool calls. No code changes needed.

npx -y @policylayer/intercept
github.com/policylayer/intercept →
// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.