What is an Agent Trap?
Malicious web content or tool output specifically crafted to hijack an AI agent's behaviour, as defined by Google DeepMind's taxonomy of six trap categories: content injection, semantic manipulation, cognitive state, behavioural control, systemic, and human-in-the-loop.
WHY IT MATTERS
As AI agents increasingly navigate the web and process tool outputs, the information environment becomes an attack surface. Agent traps weaponise content that the agent processes — websites, tool responses, API outputs — to coerce it into unauthorised actions.
Unlike traditional prompt injection which targets the model directly, agent traps manipulate the environment the agent operates in. The agent's own capabilities — tool use, web browsing, file access — are turned against it.
HOW POLICYLAYER USES THIS
Intercept is a runtime defence against agent traps. By enforcing policies on every tool call, it prevents trapped agents from executing unauthorised actions — even if the agent's reasoning has been compromised.