What is Agent Drift?
The gradual divergence of an AI agent's behaviour from its intended purpose over time, potentially caused by context accumulation, model updates, environmental changes, or evolving tool landscapes.
WHY IT MATTERS
Agent drift is subtle and insidious. Unlike a rogue agent that deviates sharply, a drifting agent changes gradually — each individual action seems reasonable, but the cumulative trajectory moves away from the intended purpose. It is the boiling frog problem applied to AI behaviour.
Several mechanisms cause drift. Context accumulation is the most common: as an agent processes more information over a session or across sessions, its decision-making shifts based on accumulated context that may include irrelevant, misleading, or adversarial content. The agent's effective system prompt evolves even though the actual system prompt has not changed.
Model updates are another driver. When the underlying LLM is updated — even minor version changes — the agent's behaviour can shift in unexpected ways. A tool call pattern that was reliable with one model version may behave differently with another. Without continuous monitoring, these shifts go unnoticed until they cause problems.
Environmental drift matters too. The MCP servers an agent connects to evolve — tools are added, argument schemas change, response formats shift. An agent calibrated for one environment may behave incorrectly when the environment changes underneath it, invoking tools with stale assumptions about their behaviour.
HOW POLICYLAYER USES THIS
Intercept provides a fixed reference frame against which agent drift is measured and contained. Regardless of how the agent's behaviour evolves, YAML policies define hard boundaries that do not drift. An agent that gradually shifts towards more permissive tool usage is caught the moment it exceeds policy boundaries. Intercept's audit logs also enable trend analysis — tracking tool usage patterns over time to detect gradual shifts before they become policy violations. The policies themselves serve as documentation of intended behaviour, making it straightforward to identify when actual behaviour has diverged.