// GLOSSARY -- SECURITY & COMPLIANCE

What is AI Jailbreaking?

1 min read Updated Feb 19, 2026

Crafting inputs that bypass AI safety guidelines and constraints. For financial agents, jailbreaking could override spending instructions and trigger unauthorized transactions.

WHY IT MATTERS

Models are trained with safety guidelines. Jailbreaking finds ways around them through creative prompting, role-playing, or encoding tricks.

For financial agents, critical: if spending behavior relies only on prompts ("never spend over $100"), a jailbreak can override entirely.

New techniques emerge constantly. Any security relying solely on model instruction-following is fundamentally fragile.

HOW POLICYLAYER USES THIS

Even jailbroken agents can't bypass PolicyLayer — spending rules exist outside the model's reasoning. Jailbreaking the prompt doesn't affect infrastructure enforcement.

FREQUENTLY ASKED QUESTIONS

Can any model be jailbroken?

History suggests yes. Every major LLM has been jailbroken despite safety training. This is why financial security can't rely on model-level constraints alone.

How does PolicyLayer help?

PolicyLayer enforces spending rules in infrastructure — separate from the agent's LLM. The model can be fully compromised and the spending controls still hold.

What about fine-tuning for safety?

Fine-tuning helps but doesn't guarantee safety. New jailbreaking techniques often bypass fine-tuned constraints. Infrastructure-level controls provide the hard guarantee.

What is AI Jailbreaking?

WHY IT MATTERS

HOW POLICYLAYER USES THIS

FREQUENTLY ASKED QUESTIONS

FURTHER READING

Let agents act without letting them run wild.

What is AI Jailbreaking?

WHY IT MATTERS

HOW POLICYLAYER USES THIS

FREQUENTLY ASKED QUESTIONS

RELATED TERMS

FURTHER READING

Let agents act without letting them run wild.