What is Token Bucket Rate Limiting?
A rate limiting algorithm where tokens are added to a bucket at a fixed rate. Each tool call consumes a token; calls are denied when the bucket is empty. The bucket's capacity determines burst tolerance while the refill rate sets the sustained throughput.
WHY IT MATTERS
Token bucket is the most widely used rate limiting algorithm in network engineering, and it translates naturally to MCP tool call governance. The core idea is simple: a bucket holds tokens that refill at a constant rate. Each tool call consumes one token. If the bucket is empty, the call is denied.
What makes token bucket superior to a simple counter is burst handling. A fixed-window counter of "10 calls per minute" allows 10 calls in the first second and then blocks for 59 seconds. Token bucket allows those 10 calls as a burst, but then meters subsequent calls at the refill rate — providing a smoother, more natural experience for the agent.
The two tuneable parameters — bucket capacity and refill rate — give operators precise control. A bucket with capacity 20 and refill rate 2/second allows short bursts of 20 calls but sustains only 2 calls per second over time. This is ideal for agent workflows that involve short flurries of tool calls followed by reasoning pauses.
HOW POLICYLAYER USES THIS
Intercept's rate limiting engine uses token bucket semantics internally. When a YAML policy specifies a rate limit with a burst allowance, Intercept maintains a token bucket per scope (tool, user, or global). The bucket capacity and refill rate are derived from the policy configuration, and state is tracked in memory for the lifetime of the proxy process.