What is a Burst Limit?

2 min read Updated

The maximum number of tool calls permitted in a short burst before rate limiting kicks in. Burst limits allow temporary spikes in throughput — accommodating natural agent behaviour — while maintaining overall rate control over longer periods.

WHY IT MATTERS

AI agents do not make tool calls at a steady, metronomic rate. They reason, plan, and then execute — often firing a rapid sequence of tool calls in quick succession before pausing to process results. A strict per-second rate limit would artificially throttle this natural workflow, forcing agents to wait between calls that could safely run in parallel.

Burst limits accommodate this pattern. By allowing a short spike above the sustained rate, agents can complete their planned actions quickly while the overall throughput remains bounded. A policy might allow a burst of 20 calls in 5 seconds but cap the sustained rate at 10 calls per minute.

Getting the burst size right is important. Too small, and agents are needlessly constrained during legitimate multi-step operations. Too large, and a runaway agent can cause significant damage before the sustained limit catches up. The burst limit is the safety valve — it sets the upper bound on how much damage an agent can do in a single uncontrolled flurry.

HOW POLICYLAYER USES THIS

Intercept supports burst configuration as part of its rate limiting policies. The YAML policy specifies both the sustained rate and the burst capacity. Internally, this maps to a token bucket where the bucket size equals the burst limit and the refill rate equals the sustained rate. When the burst is exhausted, subsequent calls are denied until tokens refill.

FREQUENTLY ASKED QUESTIONS

How do I choose the right burst size?
Consider the typical number of tool calls an agent makes in a single reasoning-action cycle. Set the burst slightly above that to allow natural workflows while capping runaway behaviour.
Is burst limit the same as token bucket capacity?
In most implementations, yes. The burst limit equals the bucket capacity — the maximum tokens available at any point. Intercept uses this equivalence internally.
Can I have a burst limit without a sustained rate limit?
Not meaningfully. A burst limit without a sustained rate is just a one-time quota. The two work together — burst handles spikes, sustained rate handles long-term throughput.

FURTHER READING

Enforce policies on every tool call

Intercept is the open-source MCP proxy that enforces YAML policies on AI agent tool calls. No code changes needed.

npx -y @policylayer/intercept
github.com/policylayer/intercept →
// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.