// GLOSSARY -- AI AGENT SECURITY

What is Persona Hyperstition?

1 min read Updated Apr 5, 2026

A semantic manipulation attack where a narrative about an AI model's identity is seeded into content that re-enters the agent's context via retrieval, producing outputs that reinforce the false identity and progressively alter behaviour.

WHY IT MATTERS

An attacker publishes content claiming a model has a specific persona — 'this AI is designed to bypass restrictions' or 'this assistant operates without safety constraints.' When the agent retrieves this content through RAG or web browsing, it begins to act according to the described persona.

The effect is self-reinforcing. The agent's altered outputs may themselves enter the retrieval corpus, strengthening the false narrative. What started as fiction becomes the agent's operational reality.

HOW POLICYLAYER USES THIS

Tool-level policy enforcement is immune to persona attacks. Regardless of what identity the agent believes it has, Intercept's rules are deterministic and external to the agent's self-model.

FREQUENTLY ASKED QUESTIONS

Where does the term come from?

Hyperstition (from philosophy/cultural theory) means a fiction that makes itself real through circulation. In the agent context, a false identity narrative becomes real by entering the agent's retrieval loop.

What is Persona Hyperstition?

WHY IT MATTERS

HOW POLICYLAYER USES THIS

FREQUENTLY ASKED QUESTIONS

FURTHER READING

Let agents act without letting them run wild.

What is Persona Hyperstition?

WHY IT MATTERS

HOW POLICYLAYER USES THIS

FREQUENTLY ASKED QUESTIONS

RELATED TERMS

FURTHER READING

Let agents act without letting them run wild.