What is Pay-Per-Token Pricing?
Pay-per-token pricing is a billing model for large language models and other generative AI services where the consumer pays based on the exact number of input and output tokens processed, settled per-request rather than through monthly invoicing.
WHY IT MATTERS
LLM costs are inherently variable — a request generating 50 tokens costs less than one generating 5,000. Current providers (OpenAI, Anthropic) track token usage and invoice periodically. Pay-per-token pricing via blockchain rails settles this cost atomically per request.
The x402 protocol's roadmap includes an upto scheme designed for exactly this use case. The client authorises up to a maximum payment amount (e.g. $0.10). The server processes the request, counts actual tokens consumed, and settles the exact cost on-chain. The difference between the authorised maximum and actual usage is never charged.
This model benefits both parties:
- Consumers — pay only for actual usage, no prepaid waste, instant access without accounts
- Providers — guaranteed payment per request, no credit risk, no chargeback risk, global access without billing infrastructure
For AI agents, pay-per-token pricing enables sophisticated cost optimisation. An agent could route simple queries to cheap models and complex reasoning to premium models, paying the exact token cost for each — potentially across different providers in a single task.
Sub-cent settlement on L2 networks makes this economically viable. A 100-token response at $0.001 per 1K tokens costs $0.0001 — well within the range of Base transaction fees.
HOW POLICYLAYER USES THIS
PolicyLayer enforces maximum authorisation amounts for upto-scheme payments — preventing an agent from authorising excessive maximums even if the expected settlement is low. Per-provider daily caps limit total token spending across all requests.