What is Pay-Per-Token Pricing?

2 min read Updated

Pay-per-token pricing is a billing model for large language models and other generative AI services where the consumer pays based on the exact number of input and output tokens processed, settled per-request rather than through monthly invoicing.

WHY IT MATTERS

LLM costs are inherently variable — a request generating 50 tokens costs less than one generating 5,000. Current providers (OpenAI, Anthropic) track token usage and invoice periodically. Pay-per-token pricing via blockchain rails settles this cost atomically per request.

The x402 protocol's roadmap includes an upto scheme designed for exactly this use case. The client authorises up to a maximum payment amount (e.g. $0.10). The server processes the request, counts actual tokens consumed, and settles the exact cost on-chain. The difference between the authorised maximum and actual usage is never charged.

This model benefits both parties:

  • Consumers — pay only for actual usage, no prepaid waste, instant access without accounts
  • Providers — guaranteed payment per request, no credit risk, no chargeback risk, global access without billing infrastructure

For AI agents, pay-per-token pricing enables sophisticated cost optimisation. An agent could route simple queries to cheap models and complex reasoning to premium models, paying the exact token cost for each — potentially across different providers in a single task.

Sub-cent settlement on L2 networks makes this economically viable. A 100-token response at $0.001 per 1K tokens costs $0.0001 — well within the range of Base transaction fees.

HOW POLICYLAYER USES THIS

PolicyLayer enforces maximum authorisation amounts for upto-scheme payments — preventing an agent from authorising excessive maximums even if the expected settlement is low. Per-provider daily caps limit total token spending across all requests.

FREQUENTLY ASKED QUESTIONS

Is pay-per-token pricing live on x402 today?
Not yet. The exact scheme (live today) charges a fixed amount per request. The upto scheme — which would enable true per-token settlement — is on the x402 roadmap. Providers can approximate it today by pricing exact requests at estimated token costs.
How does this compare to OpenAI's token pricing?
OpenAI bills per token but through account-based invoicing with API keys. x402 pay-per-token would settle on-chain per request — no account, no API key, no monthly invoice. Payment is instant and final at the protocol level.
What prevents overcharging in the upto scheme?
The client sets the maximum authorised amount. The facilitator can only settle up to that amount. Policy layers like PolicyLayer add additional protection by validating that the maximum authorisation is reasonable for the expected request.

FURTHER READING

Enforce policies on every tool call

Intercept is the open-source MCP proxy that enforces YAML policies on AI agent tool calls. No code changes needed.

npx -y @policylayer/intercept
github.com/policylayer/intercept →
// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.