Insider Brief
- The article examines prompt injection as one of the most significant security risks facing enterprise AI systems, explaining how attackers manipulate AI models through malicious instructions embedded in content.
- It explores real-world incidents including Microsoft Copilot’s EchoLeak vulnerability and explains why prompt injection differs fundamentally from traditional attacks such as SQL injection.
- The article outlines practical mitigation strategies, including permission controls, human approval, adversarial testing, and defense-in-depth, while noting that prompt injection is likely to remain a persistent AI security challenge.
In June 2025, security researchers at Aim Labs disclosed a vulnerability in Microsoft 365 Copilot that required nothing from the victim at all. An attacker simply sent an email. Hidden inside that email were instructions intended not for the human recipient but for the AI assistant that would eventually read it.
Weeks or months later, when the employee asked Copilot to summarize recent documents, the AI retrieved that email as context, read the hidden instructions, and began exfiltrating sensitive internal data to an external server. The vulnerability was assigned a CVSS severity score of 9.3, among the highest possible ratings, and became known as EchoLeak.
EchoLeak is the clearest illustration so far of prompt injection, a vulnerability class that OWASP ranks as the single most critical security risk facing AI applications today. This piece walks through how the attack actually works, why it behaves differently from injection attacks security teams already know, what it has done to production systems already in use, and what can realistically be done about it.
How Prompt Injection Works
Large language models process everything fed into them, instructions and content alike, as a single stream of text. There is no architectural wall separating “trusted instructions from the developer” from “untrusted content from a website or document.” The model reads it all and tries to follow whatever instructions appear most relevant, regardless of where they came from – prompt injection exploits exactly that.
An attacker embeds instructions inside content the AI system will process, content that looks unremarkable to a human reader, hoping the model treats those embedded instructions as legitimate commands rather than as text to merely read or summarize.
This shows up in two main forms – direct prompt injection and Indirect prompt injection.
Direct prompt injection happens when an attacker interacts with an AI system themselves, typing instructions intended to override its original configuration. This is the version most people picture – someone typing “ignore your previous instructions” into a chatbot.
Indirect prompt injection is the more dangerous variant, and it is what made EchoLeak work. The attacker never interacts with the AI system at all. They plant malicious instructions somewhere the AI will later encounter on its own. The AI ingests that content as part of routine work and executes the hidden instructions without the user or the attacker ever directly communicating.
Why It’s Not Just SQL Injection With a New Name
It is tempting to treat prompt injection as a familiar problem in a new wrapper. SQL injection, the attack that plagued databases for decades, exploited a much similar idea: an application failed to distinguish between code and data, so an attacker snuck executable commands into a data field. That problem was largely solved with parameterized queries, separating instructions from data at the database layer.
Prompt injection does not have an equivalent fix, and the reason is anatomical rather than a matter of insufficient engineering. SQL has a formal grammar. A database can mechanically verify whether a string is data or a command. Natural language has no such grammar. There is no reliable way to mark certain words as definitely an instruction and others as definitely just content, because the entire value of a language model is its ability to interpret meaning flexibly across arbitrary phrasing.
The UK’s National Cyber Security Centre described large language models in a December 2025 assessment as “inherently confusable deputies” – systems that can be coerced into acting against an organization’s interests because there is no robust internal separation between trusted instructions and the content they process. Security researchers Bruce Schneier and Barath Raghavan made a similar argument in IEEE Spectrum, suggesting prompt injection may never be fully solved within current LLM architectures, because the code-versus-data distinction that tamed SQL injection simply does not exist inside a language model.
Where This Has Already Worked
EchoLeak demonstrated that prompt injection works against production enterprise software, not just research demos. It chained several techniques – bypassing Microsoft’s own prompt-injection detection filters by phrasing the hidden instructions so they never explicitly referenced AI or Copilot, evading link redaction through a Markdown formatting trick, and exploiting an approved Microsoft domain to quietly send data out. Microsoft patched the underlying flaw, but researchers who studied the case in depth noted that the broader category of risk persists for any organization running retrieval-augmented AI (RAG) assistants – which is most of them.
Security researchers have documented critical vulnerabilities with similarly high severity scores in GitHub Copilot and the Cursor coding assistant, both involving prompt injection chains that led to remote code execution. Independent researcher Johann Rehberger spent his own money testing the security of Devin, an autonomous coding agent, and found it could be manipulated through crafted prompts into exposing network ports, leaking access tokens, and installing command-and-control malware.
Similarly, in March 2026, researchers at Unit 42 documented the first large-scale indirect prompt injection attacks observed in the wild on live commercial platforms, including attacks designed to evade ad content review systems.
The point is, all these are vulnerabilities found in tools enterprises are actively running today.
Why Agentic Systems Raise the Stakes
A chatbot that can only respond with text has a limited blast radius. If it is tricked by a prompt injection, the worst outcome is usually that it says something it should not.
But in the case of AI agents things are even more serious. Agentic systems are built specifically to take action such as send emails, modify files, execute code, move money, query databases..etc. When such a system falls for a prompt injection, the attacker is literally commandeering whatever capabilities and access the agent has been granted.
This part of the threat model has changed fast, and recent testing has started quantifying it directly. Anthropic’s system card for one of its frontier models reported that a single prompt injection attempt against a browser-using agent succeeded 17.8% of the time when the agent had no specific safeguards in place. The International AI Safety Report found that sophisticated attackers can bypass even well-defended models roughly half the time given ten attempts.
Every additional permission and additional system an AI agent is connected to expands what a successful prompt injection can do. An agent that can only read email is a modest risk. An agent that can read email and also send wire transfers, modify production code, or query a customer database is a fundamentally different risk, even though the underlying vulnerability is identical in both cases.
And this is why being careful about what permissions are being handed over to such tools matter.
What Defenses Exist, and Where They Fall Short
Several mitigation approaches are in active use, and each genuinely reduces risk. None of them eliminate it.
Input and output filtering scans incoming content for patterns associated with injection attempts and scans outgoing responses for signs of leaked data. This is a reasonable baseline, but EchoLeak demonstrated the limit – where the attacker phrased the hidden instructions so they never resembled an obvious injection pattern, and the filter missed it.
Permission and scope restriction limits what an AI agent is allowed to access or do, directly bounding the blast radius described above. This is one of the more effective controls available, with a real tradeoff: an agent with fewer permissions is also less useful, and organizations under pressure to demonstrate AI value sometimes grant broader access than their security posture would otherwise allow.
Human approval for high-risk actions requires explicit sign-off before an AI agent executes financial transactions, system changes, or external communications. This closes off the most damaging outcomes, but the 2025 incidents researchers studied showed that automated, configuration-based approval systems intended to streamline this step can themselves be compromised – an argument for keeping genuinely high-risk approvals manual rather than automating them away for convenience.
Provenance and context isolation tags content by source and restricts how an AI model can act on lower-trust content, an approach researchers studying EchoLeak specifically recommended. This is promising but still maturing, and not yet a standard feature across most commercial AI products.
Continuous adversarial testing matters because attack techniques evolve quickly, and a one-time security review has a short shelf life. Organizations with a more mature posture run ongoing red-team exercises specifically targeting their AI deployments, the way penetration testing is treated for conventional infrastructure.
The summary, echoed by multiple vendors and researchers studying this problem, is that no complete solution exists today. OpenAI itself acknowledged in early 2026, when launching additional safeguards for its browser-based AI product, that prompt injection in that category of product “may never be fully patched.”
What Organizations and Security Leaders Should Take From This
Treat prompt injection as a permanent feature of the AI threat landscape, not a bug awaiting a future fix. Risk assessments, vendor evaluations, and incident response plans should assume it is present, rather than treating its absence as the default expectation.
Scrutinize agent access the way you would scrutinize granting broad system privileges to a new employee or a new third-party integration. The relevant question is not only “can this agent do something useful” but “what happens if this agent is successfully manipulated.”
Build defense in depth deliberately, because no single control is sufficient here. Filtering, permission scoping, human approval gates, and ongoing testing are not redundant. Each closes a different gap that the others leave open.
Prompt injection has moved well past the point of being a research curiosity. It is an active, demonstrated attack class against production enterprise software, and the organizations deploying AI agents fastest are also the ones with the most to lose if they treat it as theoretical.
For readers looking to build out that fuller picture, a few pieces worth reading: What Leaders Need to Understand About AI and Data Privacy covers how data supplied to AI systems gets internalized in ways that are difficult to isolate or reverse, a problem that compounds the stakes of any successful prompt injection. And How AI-Powered Scams Are Growing and What Businesses Can Do looks at the human-facing side of AI-enabled attacks, a useful complement to the system-facing risk covered here.