What are Agent Guardrails?
Safety mechanisms constraining AI agent behaviour within acceptable boundaries. Guardrails operate at multiple levels — from prompt instructions to infrastructure-level enforcement — to prevent agents from taking unauthorised or harmful actions through tool calls.
WHY IT MATTERS
Guardrails span three levels: prompt (natural language instructions telling the agent what not to do), application (code-level checks before executing tool calls), and infrastructure (external enforcement independent of the agent). Only infrastructure-level guardrails cannot be bypassed by prompt injection or agent reasoning.
Prompt-level guardrails are easily circumvented — a well-crafted prompt injection can override any natural language instruction. Application-level guardrails are stronger but require code changes for every agent and tool. Infrastructure-level guardrails are the gold standard — they operate outside the agent entirely.
Best systems layer all three: prompts set intent, application code provides first-pass validation, infrastructure provides the hard stop that cannot be bypassed.
HOW POLICYLAYER USES THIS
Intercept is the infrastructure-level guardrail layer. As a transparent MCP proxy, it enforces YAML-defined policies on every tool call — independent of the agent's reasoning, immune to prompt injection, and requiring no code changes. If the policy says deny, the call is denied. The agent's LLM cannot override this decision.