What is a Human-in-the-Loop Trap?
An agent trap that commandeers the agent to attack the human overseer by exploiting cognitive biases — using the agent as a channel to manipulate human decision-making through anchoring, framing, and approval fatigue.
WHY IT MATTERS
Human oversight is often the last line of defence against agent misbehaviour. Human-in-the-loop traps attack this defence directly — not by bypassing the human, but by manipulating them.
An agent presenting a biased summary anchors the human's decision. Repeated benign approval requests create fatigue, causing the human to rubber-stamp a malicious one. The agent frames options to exploit loss aversion or authority bias.
The most dangerous traps don't bypass human oversight — they weaponise it.
HOW POLICYLAYER USES THIS
Deterministic policy enforcement removes the human from routine decisions. Instead of asking a fatigued human to approve each tool call, Intercept applies consistent rules automatically — reserving human oversight for genuinely exceptional cases.