// GLOSSARY -- AI AGENT SECURITY

What is a Behavioural Control Trap?

1 min read Updated Apr 5, 2026

An agent trap that hijacks an agent's capabilities to force unauthorised actions such as data exfiltration, sub-agent spawning, or embedded jailbreak execution.

WHY IT MATTERS

Unlike traps that corrupt reasoning, behavioural control traps directly commandeer what the agent does. They embed dormant jailbreak sequences in content, induce the agent to exfiltrate data to attacker endpoints, or exploit orchestrator privileges to spawn rogue sub-agents.

These are the most dangerous trap category because the agent actively performs harmful actions rather than just making bad decisions. Runtime enforcement that blocks the actions themselves — not just the reasoning — is the only reliable defence.

HOW POLICYLAYER USES THIS

PolicyLayer blocks unauthorised tool calls regardless of why the agent is making them. If a behavioural control trap tricks the agent into calling a destructive tool, the policy still denies it.

See the MCP Security reference →

FREQUENTLY ASKED QUESTIONS

What's the difference from prompt injection?

Prompt injection targets the model. Behavioural control traps target the agent's action capabilities — they make the agent do things, not just think things.

What is a Behavioural Control Trap?

WHY IT MATTERS

HOW POLICYLAYER USES THIS

FREQUENTLY ASKED QUESTIONS

FURTHER READING

Take your agents live. Without losing control.

What is a Behavioural Control Trap?

WHY IT MATTERS

HOW POLICYLAYER USES THIS

FREQUENTLY ASKED QUESTIONS

RELATED TERMS

FURTHER READING

Take your agents live. Without losing control.