What is a Sliding Window Rate Limit?

2 min read Updated

A rate limiting approach that uses a rolling time window rather than fixed intervals. Instead of resetting a counter every minute on the minute, it considers the last N seconds from the current moment, providing smoother and more predictable enforcement.

WHY IT MATTERS

Fixed-window rate limiting has a well-known edge case: an agent can make the maximum allowed calls at the end of one window and the start of the next, effectively doubling throughput at the boundary. For a limit of 10 calls per minute, an agent could make 20 calls in two seconds if they straddle the window reset.

Sliding window eliminates this by always looking backwards from the current moment. "10 calls per minute" means 10 calls in any 60-second span, not 10 calls between :00 and :59. This is particularly important for AI agents because agent frameworks often batch tool calls in rapid sequences — precisely the pattern that exploits fixed-window boundaries.

The trade-off is implementation complexity. Sliding window requires tracking individual call timestamps rather than a simple counter, consuming more memory. For most MCP proxy deployments, this overhead is negligible compared to the cost of the tool calls themselves.

HOW POLICYLAYER USES THIS

Intercept supports sliding window semantics in its rate limiting policies. When configured, Intercept tracks timestamps of recent tool calls and evaluates the count within the rolling window on each new request. This prevents boundary-straddling bursts and provides consistent, predictable enforcement regardless of when the agent happens to make its calls.

FREQUENTLY ASKED QUESTIONS

Is sliding window more expensive than fixed window?
Slightly — it requires storing individual call timestamps rather than a simple counter. In practice, the overhead is negligible for typical MCP proxy workloads.
When should I use sliding window over token bucket?
Sliding window is simpler to reason about when you want a strict count-per-period guarantee. Token bucket is better when you want to allow controlled bursts while maintaining a sustained rate.
Does sliding window prevent all burst behaviour?
It prevents boundary-straddling bursts but still allows the full quota to be consumed in a short burst within any given window. Combine with burst limits for tighter control.

FURTHER READING

Enforce policies on every tool call

Intercept is the open-source MCP proxy that enforces YAML policies on AI agent tool calls. No code changes needed.

npx -y @policylayer/intercept
github.com/policylayer/intercept →
// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.