What is a Compositional Fragment Trap?
A systemic trap that partitions a malicious payload into semantically benign fragments distributed across multiple agents, which only reconstitute into a full attack when the fragments are aggregated through multi-agent collaboration.
WHY IT MATTERS
Each fragment passes safety checks individually — 'retrieve this data,' 'format this output,' 'send this message.' None is malicious alone. But combined in sequence across agents, they form an attack: retrieve sensitive data, format it for exfiltration, send it to an external endpoint.
This exploits the gap between per-agent safety checks and system-level security. No individual agent violates its constraints, but the emergent multi-agent workflow does.
HOW POLICYLAYER USES THIS
Intercept's per-agent scoping limits what each agent can do independently. Combined with category restrictions (blocking exfiltration-pattern tool calls), it makes fragment assembly harder even across collaborating agents.