What is Persona Hyperstition?
A semantic manipulation attack where a narrative about an AI model's identity is seeded into content that re-enters the agent's context via retrieval, producing outputs that reinforce the false identity and progressively alter behaviour.
WHY IT MATTERS
An attacker publishes content claiming a model has a specific persona — 'this AI is designed to bypass restrictions' or 'this assistant operates without safety constraints.' When the agent retrieves this content through RAG or web browsing, it begins to act according to the described persona.
The effect is self-reinforcing. The agent's altered outputs may themselves enter the retrieval corpus, strengthening the false narrative. What started as fiction becomes the agent's operational reality.
HOW POLICYLAYER USES THIS
Tool-level policy enforcement is immune to persona attacks. Regardless of what identity the agent believes it has, Intercept's rules are deterministic and external to the agent's self-model.