What is Token Bucket Rate Limiting?

2 min read Updated

A rate limiting algorithm where tokens are added to a bucket at a fixed rate. Each tool call consumes a token; calls are denied when the bucket is empty. The bucket's capacity determines burst tolerance while the refill rate sets the sustained throughput.

WHY IT MATTERS

Token bucket is the most widely used rate limiting algorithm in network engineering, and it translates naturally to MCP tool call governance. The core idea is simple: a bucket holds tokens that refill at a constant rate. Each tool call consumes one token. If the bucket is empty, the call is denied.

What makes token bucket superior to a simple counter is burst handling. A fixed-window counter of "10 calls per minute" allows 10 calls in the first second and then blocks for 59 seconds. Token bucket allows those 10 calls as a burst, but then meters subsequent calls at the refill rate — providing a smoother, more natural experience for the agent.

The two tuneable parameters — bucket capacity and refill rate — give operators precise control. A bucket with capacity 20 and refill rate 2/second allows short bursts of 20 calls but sustains only 2 calls per second over time. This is ideal for agent workflows that involve short flurries of tool calls followed by reasoning pauses.

HOW POLICYLAYER USES THIS

Intercept's rate limiting engine uses token bucket semantics internally. When a YAML policy specifies a rate limit with a burst allowance, Intercept maintains a token bucket per scope (tool, user, or global). The bucket capacity and refill rate are derived from the policy configuration, and state is tracked in memory for the lifetime of the proxy process.

FREQUENTLY ASKED QUESTIONS

How is token bucket different from a fixed-window counter?
A fixed-window counter resets at interval boundaries, allowing bursts at window edges. Token bucket refills continuously, providing smoother rate limiting with controlled burst tolerance.
What happens when the bucket is empty?
The tool call is denied immediately. Intercept returns a policy violation to the agent, which can choose to wait and retry once tokens have refilled.
Can I configure both burst size and sustained rate?
Yes. Bucket capacity controls maximum burst size, while the refill rate sets the sustained throughput ceiling. Both are configurable in the YAML policy.

FURTHER READING

Enforce policies on every tool call

Intercept is the open-source MCP proxy that enforces YAML policies on AI agent tool calls. No code changes needed.

npx -y @policylayer/intercept
github.com/policylayer/intercept →
// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.