What is a Sliding Window Rate Limit?
A rate limiting approach that uses a rolling time window rather than fixed intervals. Instead of resetting a counter every minute on the minute, it considers the last N seconds from the current moment, providing smoother and more predictable enforcement.
WHY IT MATTERS
Fixed-window rate limiting has a well-known edge case: an agent can make the maximum allowed calls at the end of one window and the start of the next, effectively doubling throughput at the boundary. For a limit of 10 calls per minute, an agent could make 20 calls in two seconds if they straddle the window reset.
Sliding window eliminates this by always looking backwards from the current moment. "10 calls per minute" means 10 calls in any 60-second span, not 10 calls between :00 and :59. This is particularly important for AI agents because agent frameworks often batch tool calls in rapid sequences — precisely the pattern that exploits fixed-window boundaries.
The trade-off is implementation complexity. Sliding window requires tracking individual call timestamps rather than a simple counter, consuming more memory. For most MCP proxy deployments, this overhead is negligible compared to the cost of the tool calls themselves.
HOW POLICYLAYER USES THIS
Intercept supports sliding window semantics in its rate limiting policies. When configured, Intercept tracks timestamps of recent tool calls and evaluates the count within the rolling window on each new request. This prevents boundary-straddling bursts and provides consistent, predictable enforcement regardless of when the agent happens to make its calls.