Runaway Tool Loops
Runaway Tool Loops
Summary
An agent — or a coordinating pair of agents — enters a self-sustaining loop of tool calls that nobody tells it to stop. Neither agent is malfunctioning; each is following its instructions precisely. The loop burns through API quotas, model tokens, third-party billing, or database writes until something external intervenes — usually the monthly invoice. Because individual calls look normal and the cost curve is gradual, loops frequently run for days before detection. The defining characteristic: no upper bound at any layer (budget, call count, session duration, round-trip count between coordinating agents).
How it works
Three common shapes:
Single-agent ReAct loop. The agent’s planner concludes it hasn’t achieved the goal, so it calls a tool. The tool result doesn’t decisively satisfy the goal, so the planner calls another tool. The goal criterion is fuzzy enough that no tool result ever counts as “done”. The agent keeps going until a framework-level iteration limit kicks in — or, if the framework has no such limit, until billing throws a hard error.
Multi-agent ping-pong. Two or more agents coordinate via A2A, a message bus, or a workflow engine. Agent A produces output; Agent B critiques it and requests refinement; Agent A refines; Agent B finds a new issue in the refinement. The loop is self-sustaining because each agent’s critique-or-produce behaviour generates fresh work for the other. No shared budget, no round-trip counter, no termination condition.
Retry storms. An agent receives a transient error from a tool and retries. The retry hits the same error. The agent — either lacking retry limits or interpreting its instructions to mean “keep trying until it works” — retries indefinitely, sometimes at full speed.
In every shape, the common property is: the cost of the next call is not visible to the decision-maker making the call.
Real-world example
Four-agent market-research pipeline, November 2025 — $47,000 in 11 days. A production market-research pipeline running four LangChain-based agents coordinating via A2A entered an unintended infinite loop. An Analyzer and a Verifier ping-ponged: the Analyzer would generate content, the Verifier would request further analysis, the Analyzer would oblige. Weekly API spend went $127 → $891 → $6,240 → $18,400 → final shutdown at $47,000 after 264 hours. The post-mortem identified two root causes: no per-agent budget caps, and no mechanism that could have terminated the session before the next API call completed. (dev.to/waxell, The $47,000 Agent Loop; dev.to/utibe, The AI Agent That Cost $47,000 While Everyone Thought It Was Working; medium / CodeOrbit A2A post-mortem, accessed 19-04-2026.) Verification note: the incident is described consistently across multiple first-person-framed posts that appear to share an original author; it is a real pattern and the dollar figure appears to be genuine, but we have not located a named corporate victim or a conventional post-mortem document. We mark this page verified: partial.
Framework-level acknowledgement. The pattern is well-known to every major agent framework. LangChain added iteration-limit configuration (max_iterations) and early_stopping_method precisely because agents stopping-due-to-iteration-limit-or-time-loops is a named, frequent failure mode. Claude Code’s documented token-budget management is positioned explicitly as a runaway-cost defence. (langchain.com agent observability; mindstudio.ai on Claude Code token budgets, accessed 19-04-2026.)
Impact
- Unbounded API spend at model providers (OpenAI, Anthropic, Google) — five- and six-figure bills.
- Rate-limit exhaustion against third-party APIs, cascading into outages for legitimate traffic.
- Database write amplification when the loop includes any create/update tool.
- Log and storage cost blow-ups, sometimes dwarfing the model spend.
- Alert fatigue masking other incidents — ops teams ignoring the noise because “it’s just the agent again”.
Detection
- Sustained elevated tool-call rate on a single session / agent ID.
- Repeating call patterns — same tool, same arguments, or cycling through a small set.
- Round-trip counters between pairs of agents exceeding any plausible human-task threshold.
- Session duration far exceeding typical task length.
- Cost-per-session curves that are monotonically increasing without plateauing.
- Token usage growing linearly with wall-clock time.
Prevention
Transport-layer policy enforcement is the right place for hard bounds because it can enforce them regardless of what the agent’s framework configuration does or doesn’t do. Every tool call traverses the proxy; every proxy decision has access to session-scoped counters.
Example Intercept policy imposing hard caps (syntax from shipped valid_policy.yaml):
version: "1"
description: "Hard bounds on tool-call loops and cost"
default: "allow"
tools:
"*":
rules:
- name: "per-session total call cap"
conditions:
- path: "state._global.session_calls"
op: "lt"
value: 500
on_deny: "Session exceeded 500 tool calls — possible runaway loop"
state:
counter: "session_calls"
window: "hour"
- name: "per-minute call rate"
conditions:
- path: "state._global.calls_per_minute"
op: "lt"
value: 30
on_deny: "Tool-call rate exceeded 30/minute"
state:
counter: "calls_per_minute"
window: "minute"
llm_call:
rules:
- name: "daily token budget"
conditions:
- path: "state.llm_call.daily_tokens"
op: "lt"
value: 2000000
on_deny: "Daily token budget (2M) reached"
state:
counter: "daily_tokens"
window: "day"
increment_from: "args.estimated_tokens"
web_fetch:
rules:
- name: "repeated-argument detection"
rate_limit: 10/minute
on_deny: "Too many fetches — possible retry storm"
Combine with:
- Hard per-session wall-clock timeout enforced at the orchestrator, independent of the proxy.
- Round-trip counter between any two coordinating agents (kill after N exchanges on the same task).
- A budget ledger that reserves cost before the next call, not after — alerts don’t stop loops, quotas do.
- Dashboard alerts on cost derivative (change per hour), not just cost level.
Sources
- The $47,000 Agent Loop: Why Token Budget Alerts Aren’t Budget Enforcement — dev.to, Waxell — accessed 19-04-2026
- The AI Agent That Cost $47,000 While Everyone Thought It Was Working — dev.to, Utibe Okodi — accessed 19-04-2026
- Our $47,000 AI Agent Production Lesson — Medium, CodeOrbit — accessed 19-04-2026
- AI Agent Observability — LangChain — accessed 19-04-2026
- AI Agent Token Budget Management: How Claude Code Prevents Runaway API Costs — MindStudio — accessed 19-04-2026
- Infinite Agent Loop: when an AI agent does not stop — Agent Patterns — accessed 19-04-2026
- The Cost Circuit Breaker: Financial Controls for Production AI Agents — Fountain City — accessed 19-04-2026
- How to Stop AI Agent Cost Blowups Before They Happen — dev.to, Sapph1re — accessed 19-04-2026
Related attacks
- Destructive Action Autonomy
- Prompt Injection via Tool Results
- Confused Deputy
Protect your agent in 30 seconds
Scans your MCP config and generates enforcement policies for every server.
npx -y @policylayer/intercept init