GitHub MCP Prompt Injection: The Cross-Repo Data Heist
A planted issue in a public repo can convince a developer's AI agent to copy private repo contents into that public repo. The official GitHub MCP server hands the agent the user's full GitHub credentials, so private and public scope live in the same context. The agent reads issue bodies as if they were user instructions. Anyone who can open an issue can plant instructions. The attack is the classic confused deputy, executed through tools the user trusted.
What happened
GitHub's official MCP server gives an agent read/write access to a user's repositories using the user's own credentials. A developer running it typically has both public and private repos in scope.
The attack is short. Open a public issue in any repo the agent might read. Phrase the body as a task. When the developer asks the agent to triage issues, the agent treats the body as legitimate input and acts on it. Read the private repo, write the contents back into the attacker's public repo.
The MCP tool boundary doesn't distinguish 'instructions from the user' from 'instructions found in tool output'. The agent has the user's full GitHub permissions and uses them.
The PolicyLayer angle
This is the lethal trifecta in concrete form. Untrusted input, sensitive read access, external write access, all in one agent context. The policy layer breaks the chain at the boundary, not inside the model.
The relevant controls: scope GitHub MCP access to specific repos per task, require approval for any cross-visibility write (private source, public destination), rate-limit data movement out of private repos. Each of these would have neutralised the attack regardless of how convincing the injected prompt was.
The general principle: when the agent's tools combine read access to sensitive data with write access to anything an attacker can read, prompt injection is a question of when, not whether. The defence isn't a smarter model. It's an out-of-band gate on the dangerous combinations.
Mitigations
Restrict GitHub MCP server scope per session. Treat issue and PR bodies as untrusted input. Require human approval for writes to public destinations when the agent has read private sources in the same session. Audit cross-repo agent actions.
FAQs
GitHub has shipped mitigations including narrower scope controls and clearer trust boundaries. The underlying class of attack, prompt injection through tool output, remains a general MCP risk that requires defence at the agent policy layer.
Not reliably. As Simon Willison and others have argued, telling the model to ignore prompt injection is not a defence. Reliable mitigation has to come from outside the model: scoping, approvals, allowlists.