What is Persona Hyperstition?

1 min read Updated

A semantic manipulation attack where a narrative about an AI model's identity is seeded into content that re-enters the agent's context via retrieval, producing outputs that reinforce the false identity and progressively alter behaviour.

WHY IT MATTERS

An attacker publishes content claiming a model has a specific persona — 'this AI is designed to bypass restrictions' or 'this assistant operates without safety constraints.' When the agent retrieves this content through RAG or web browsing, it begins to act according to the described persona.

The effect is self-reinforcing. The agent's altered outputs may themselves enter the retrieval corpus, strengthening the false narrative. What started as fiction becomes the agent's operational reality.

HOW POLICYLAYER USES THIS

Tool-level policy enforcement is immune to persona attacks. Regardless of what identity the agent believes it has, Intercept's rules are deterministic and external to the agent's self-model.

FREQUENTLY ASKED QUESTIONS

Where does the term come from?
Hyperstition (from philosophy/cultural theory) means a fiction that makes itself real through circulation. In the agent context, a false identity narrative becomes real by entering the agent's retrieval loop.

FURTHER READING

Enforce policies on every tool call

Intercept is the open-source MCP proxy that enforces YAML policies on AI agent tool calls. No code changes needed.

npx -y @policylayer/intercept
github.com/policylayer/intercept →
// GET IN TOUCH

Have a question or want to learn more? Send us a message.

Message sent.

We'll get back to you soon.