What is Inference?

1 min read Updated

Inference is the process of running a trained AI model on new inputs to generate outputs — the production phase where models serve real requests, as opposed to training where models learn.

WHY IT MATTERS

Training is learning; inference is doing. When you send a prompt to an LLM and get a response, that's inference. The model applies its learned patterns to your specific input.

Inference is where AI economics play out. Training happens once at enormous cost; inference happens millions of times. Optimizing inference through quantization, caching, and batching directly impacts cost and latency.

For AI agents, inference latency matters. A financial agent that takes 30 seconds to decide on a trade might miss the opportunity. Speculative decoding and model distillation help reduce time.

Running agents against MCP servers? Route them through PolicyLayer and every tool call is checked against policy first.

PUT POLICY ON YOUR TOOL CALLS →

Enforced before the call runs. Nothing to install.

FREQUENTLY ASKED QUESTIONS

How much does inference cost?
GPT-4 class models cost $10-30 per million tokens. Smaller models can be self-hosted for much less. For agents making many calls, inference costs are a significant expense.
What's the difference between inference and training?
Training adjusts model weights using large datasets. Inference uses fixed weights to process new inputs. Training is write; inference is read.
Can inference be done on-device?
Yes, for smaller models. Quantized 7B-13B parameter models run on modern laptops and phones. Frontier models require cloud GPUs.

FURTHER READING

Take your agents live. Without losing control.

Route your MCP traffic through PolicyLayer. Every tool call is checked against your policy before it runs: allow, deny, or require approval. Per-identity grants. Full audit log. Live in minutes.

Instant setup, no code required.

43,000+ MCP servers and 220,000+ tools scanned and risk-classified.

// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.