What is a Behavioural Control Trap?
An agent trap that hijacks an agent's capabilities to force unauthorised actions such as data exfiltration, sub-agent spawning, or embedded jailbreak execution.
WHY IT MATTERS
Unlike traps that corrupt reasoning, behavioural control traps directly commandeer what the agent does. They embed dormant jailbreak sequences in content, induce the agent to exfiltrate data to attacker endpoints, or exploit orchestrator privileges to spawn rogue sub-agents.
These are the most dangerous trap category because the agent actively performs harmful actions rather than just making bad decisions. Runtime enforcement that blocks the actions themselves — not just the reasoning — is the only reliable defence.
HOW POLICYLAYER USES THIS
Intercept blocks unauthorised tool calls regardless of why the agent is making them. If a behavioural control trap tricks the agent into calling a destructive tool, the policy still denies it.