What is a Compositional Fragment Trap?

1 min read Updated

A systemic trap that partitions a malicious payload into semantically benign fragments distributed across multiple agents, which only reconstitute into a full attack when the fragments are aggregated through multi-agent collaboration.

WHY IT MATTERS

Each fragment passes safety checks individually — 'retrieve this data,' 'format this output,' 'send this message.' None is malicious alone. But combined in sequence across agents, they form an attack: retrieve sensitive data, format it for exfiltration, send it to an external endpoint.

This exploits the gap between per-agent safety checks and system-level security. No individual agent violates its constraints, but the emergent multi-agent workflow does.

HOW POLICYLAYER USES THIS

Intercept's per-agent scoping limits what each agent can do independently. Combined with category restrictions (blocking exfiltration-pattern tool calls), it makes fragment assembly harder even across collaborating agents.

FREQUENTLY ASKED QUESTIONS

How do you detect this?
It requires system-level analysis of multi-agent workflows, not just per-agent monitoring. Cross-agent audit trails that track data flow across agent boundaries can reveal compositional attacks.

FURTHER READING

Let agents act without letting them run wild.

Deterministic policy on every MCP tool call. Per-identity grants. Full audit log.

// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.