What is a Burst Limit?
The maximum number of tool calls permitted in a short burst before rate limiting kicks in. Burst limits allow temporary spikes in throughput — accommodating natural agent behaviour — while maintaining overall rate control over longer periods.
WHY IT MATTERS
AI agents do not make tool calls at a steady, metronomic rate. They reason, plan, and then execute — often firing a rapid sequence of tool calls in quick succession before pausing to process results. A strict per-second rate limit would artificially throttle this natural workflow, forcing agents to wait between calls that could safely run in parallel.
Burst limits accommodate this pattern. By allowing a short spike above the sustained rate, agents can complete their planned actions quickly while the overall throughput remains bounded. A policy might allow a burst of 20 calls in 5 seconds but cap the sustained rate at 10 calls per minute.
Getting the burst size right is important. Too small, and agents are needlessly constrained during legitimate multi-step operations. Too large, and a runaway agent can cause significant damage before the sustained limit catches up. The burst limit is the safety valve — it sets the upper bound on how much damage an agent can do in a single uncontrolled flurry.
HOW POLICYLAYER USES THIS
Intercept supports burst configuration as part of its rate limiting policies. The YAML policy specifies both the sustained rate and the burst capacity. Internally, this maps to a token bucket where the bucket size equals the burst limit and the refill rate equals the sustained rate. When the burst is exhausted, subsequent calls are denied until tokens refill.