// GLOSSARY -- AI AGENT SECURITY

What is an Agent Trap?

1 min read Updated Apr 5, 2026

Malicious web content or tool output specifically crafted to hijack an AI agent's behaviour, as defined by Google DeepMind's taxonomy of six trap categories: content injection, semantic manipulation, cognitive state, behavioural control, systemic, and human-in-the-loop.

WHY IT MATTERS

As AI agents increasingly navigate the web and process tool outputs, the information environment becomes an attack surface. Agent traps weaponise content that the agent processes — websites, tool responses, API outputs — to coerce it into unauthorised actions.

Unlike traditional prompt injection which targets the model directly, agent traps manipulate the environment the agent operates in. The agent's own capabilities — tool use, web browsing, file access — are turned against it.

HOW POLICYLAYER USES THIS

Intercept is a runtime defence against agent traps. By enforcing policies on every tool call, it prevents trapped agents from executing unauthorised actions — even if the agent's reasoning has been compromised.

FREQUENTLY ASKED QUESTIONS

What are the six trap categories?

Content Injection (perception), Semantic Manipulation (reasoning), Cognitive State (memory), Behavioural Control (action), Systemic (multi-agent dynamics), and Human-in-the-Loop (exploiting human oversight).

How do you defend against agent traps?

Runtime policy enforcement at the tool call level. Even if an agent's reasoning is compromised by a trap, the enforcement proxy blocks actions that violate defined policies.

What is an Agent Trap?

WHY IT MATTERS

HOW POLICYLAYER USES THIS

FREQUENTLY ASKED QUESTIONS

FURTHER READING

Let agents act without letting them run wild.

What is an Agent Trap?

WHY IT MATTERS

HOW POLICYLAYER USES THIS

FREQUENTLY ASKED QUESTIONS

RELATED TERMS

FURTHER READING

Let agents act without letting them run wild.