Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms
Abstract
1. Introduction
1.1. The Evolution of LLM Applications and Emerging Security Landscape
1.2. OWASP LLM01:2025: Prompt Injection as the Primary Threat
1.3. Scope and Research Methodology
1.4. Structure and Contributions
2. Background and Fundamentals
2.1. Large Language Model Architecture and Inference
2.2. Prompt Engineering and System Prompts
2.3. AI Agent Systems and Tool-Augmented LLMs
2.3.1. Model Context Protocol (MCP)
2.3.2. Multi-Agent Systems
2.4. Retrieval-Augmented Generation: Enhancing LLMs with External Knowledge
Vector Database Vulnerabilities
2.5. Trust Boundaries and Attack Surface
3. Taxonomy of Prompt Injection Attacks
3.1. Direct Prompt Injection: Jailbreaking Techniques
3.1.1. Game-Based Manipulation: The ChatGPT Windows Keys Case
3.1.2. Role-Playing and Adversarial Optimization
3.1.3. Obfuscation Techniques
3.2. Indirect Prompt Injection: External Content Attacks
3.2.1. Web Content Poisoning
3.2.2. Document Injection
3.2.3. Email and Message Injection
3.3. Tool-Based Injection: Exploiting AI Agent Capabilities
3.3.1. Tool Poisoning in MCP
3.3.2. Hidden Unicode Instructions
3.3.3. Rug Pull Attacks
3.4. Comparative Analysis of Attack Vectors
3.5. Critical Analysis: Convergence Patterns and Research Gaps
- Absence of formal threat models for agent systems: While OWASP provides risk taxonomy, the field lacks rigorous threat models quantifying attack success probabilities under different defensive configurations. Without such models, organizations cannot make informed decisions about risk–utility trade-offs.
- Dearth of empirical data on real-world attack frequency: Most documented exploits originate from controlled research demonstrations. Industry lacks transparency regarding how often prompt injection attacks occur in production deployments, which defenses provide measurable risk reduction, and how attacker tactics evolve in response to countermeasures.
- Limited understanding of cross-context attack propagation: Research examines attack vectors in isolation—RAG poisoning, tool poisoning, memory exploitation—yet real systems combine these components. How do malicious instructions injected through poisoned RAG spread to multi-agent systems? Can compromised agents “infect” others through A2A communication protocols? These questions remain unanswered.
4. Vulnerabilities in AI Agent Systems
4.1. GitHub Copilot Security Failures
4.1.1. CVE-2025-53773: YOLO Mode RCE
4.1.2. CamoLeak: CVSS 9.6 Secret Exfiltration
4.1.3. AI Viruses and ZombAI Networks
4.2. Claude MCP Ecosystem Risks
4.2.1. GitHub MCP Issue Injection
4.2.2. MCP Inspector RCE: CVE-2025-49596
4.2.3. Industrial Control Systems Compromise via MCP
4.3. Cross-Platform Attack Vectors and Privilege Escalation
5. RAG System Vulnerabilities
5.1. Knowledge Base Poisoning Attacks
5.2. Vector Database Exploitation
5.3. Memory-Based Persistence and Long-Term Compromise
5.4. Comparative Analysis of RAG Attack Vectors
- Attack scalability: Demonstrated in the PoisonedRAG study [28], showing that five documents achieve 90% success—indicating minimal attacker effort for maximum impact.
- Persistence duration: Classified by recovery complexity (session-level, corpus update-level, or cross-session level) based on documented incident response times [11].
- Detection evasion: Assessed through reported performance of detection systems (mathematical-level attacks evade human inspection [7]).
6. Case Studies: Real-World Exploits
6.1. Development Tools Compromise
6.2. Conversational AI Jailbreaks
6.3. Additional Cross-Domain Case Studies
7. Defense Mechanisms and Mitigation
7.1. Input Validation and Isolation
7.2. Architectural Defenses and Sandboxing
7.3. Prompt Engineering for Security
7.4. Detection and Monitoring
7.5. RAG-Specific Defenses
8. OWASP Framework and Industry Best Practices
8.1. OWASP Top 10 LLM 2025: Comprehensive Analysis
8.2. Industry Standards and Compliance Requirements
8.3. Secure Development Lifecycle for LLM Applications
9. Open Challenges and Fundamental Limitations
9.1. The Stochastic Nature Problem
9.2. The Alignment Paradox in Agent Systems
9.3. Detection Systems as Security Theater
9.4. The Usability–Security Trade-Off
9.5. Proposed Defense-in-Depth Framework
9.6. Mapping the PALADIN Defense Framework to the OWASP Framework
10. Future Research Directions
10.1. Formal Verification and Provable Security
10.2. Novel Defensive Architectures
10.3. Human–AI Collaboration Models
10.4. Regulatory and Policy Research
11. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Baran, G. GitHub Copilot RCE vulnerability via prompt injection leads to full system compromise. Cybersecurity News, 14 August 2025. Available online: https://cybersecuritynews.com/github-copilot-rce-vulnerability/ (accessed on 1 December 2025).
- Prompt Injection and Jailbreaking Are Not the Same Thing. Available online: https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/ (accessed on 1 December 2025).
- OWASP Top 10 for LLM Applications—LLM01: Prompt Injection. Available online: https://genai.owasp.org/llmrisk/llm01-prompt-injection/ (accessed on 1 December 2025).
- Vongthongsri, K. OWASP Top 10 2025 for LLM applications: What’s new? Risks, and mitigation techniques. Confident AI, 8 August 2025. Available online: https://www.confident-ai.com/blog/owasp-top-10-2025-for-llm-applications-risks-and-mitigation-techniques (accessed on 1 December 2025).
- GitHub Copilot: Remote Code Execution via Prompt Injection. Available online: https://embracethered.com/blog/posts/2025/github-copilot-remote-code-execution-via-prompt-injection/ (accessed on 1 December 2025).
- CamoLeak: Critical GitHub Copilot Vulnerability Leaks Private Source Code. Available online: https://www.legitsecurity.com/blog/camoleak-critical-github-copilot-vulnerability-leaks-private-source-code (accessed on 1 December 2025).
- AI Under the Microscope: What’s Changed in the OWASP Top 10 for LLMs 2025. Available online: https://blog.qualys.com/vulnerabilities-threat-research/2024/11/25/ai-under-the-microscope-whats-changed-in-the-owasp-top-10-for-llms-2025 (accessed on 1 December 2025).
- GitHub MCP Vulnerability Has Far-Reaching Consequences. Available online: https://cybernews.com/security/github-mcp-vulnerability-has-far-reaching-consequences/ (accessed on 1 December 2025).
- Errico, H.; Ngiam, J.; Sojan, S. Securing the Model Context Protocol (MCP): Risks, controls, and governance. arXiv 2025, arXiv:2511.20920. [Google Scholar] [CrossRef]
- Lakshmanan, R. Researchers demonstrate how MCP prompt injection can be used for both attack and defense. The Hacker News, 30 April 2025. Available online: https://thehackernews.com/2025/04/experts-uncover-critical-mcp-and-a2a.html (accessed on 1 December 2025).
- Webster, I. RAG data poisoning: Key concepts explained. Promptfoo, 4 November 2024. Available online: https://www.promptfoo.dev/blog/rag-poisoning/ (accessed on 1 December 2025).
- Prompt Injection in LLMs: A Complete Guide. Available online: https://www.evidentlyai.com/llm-guide/prompt-injection-llm (accessed on 1 December 2025).
- OWASP Top 10 for Large Language Model Applications v2025. Available online: https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-v2025.pdf (accessed on 8 December 2025).
- Tangermann, V. Clever jailbreak makes ChatGPT give away pirated Windows activation keys. Futurism, 11 July 2025. Available online: https://futurism.com/clever-jailbreak-chatgpt-windows-activation-keys (accessed on 1 December 2025).
- Liu, Y.; Deng, G.; Xu, Z.; Li, Y.; Zheng, Y.; Zhang, Y.; Zhao, L.; Zhang, T.; Wang, K.; Liu, Y. Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study. arXiv 2024, arXiv:2305.13860. [Google Scholar] [CrossRef]
- Liu, Y.; Deng, G.; Xu, Z.; Li, Y.; Zheng, Y.; Zhang, Y.; Zhao, L.; Zhang, T.; Wang, K. A Hitchhiker’s Guide to Jailbreaking ChatGPT via Prompt Engineering. In Proceedings of the 4th International Workshop on Software Engineering and AI for Data Quality in Cyber-Physical Systems/Internet of Things (SEA4DQ 2024), Porto de Galinhas, Brazil, 15–16 July 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 12–21. [Google Scholar] [CrossRef]
- ChatGPT Leaks Windows Keys Including Wells Fargo License via Clever Game Prompt. Available online: https://meterpreter.org/chatgpt-leaks-windows-keys-including-wells-fargo-license-via-clever-game-prompt/ (accessed on 8 December 2025).
- Shi, J.; Yuan, Z.; Liu, Y.; Huang, Y.; Zhou, P.; Sun, L.; Gong, N.Z. Optimization-based prompt injection attack to LLM-as-a-judge. arXiv 2024, arXiv:2403.17710. [Google Scholar] [CrossRef]
- Model Context Protocol: Security Risks and Exploits. Available online: https://embracethered.com/blog/posts/2025/model-context-protocol-security-risks-and-exploits/ (accessed on 1 December 2025).
- Prompt Injection in Operational Technology: SCADA Attack Demonstration. Available online: https://veganmosfet.github.io/2025/07/14/prompt_injection_OT.html (accessed on 1 December 2025).
- Naminas, K. Prompt injection: Top techniques for LLM safety. Label Your Data, 17 September 2024. Available online: https://labelyourdata.com/articles/llm-fine-tuning/prompt-injection (accessed on 1 December 2025).
- Constantin, L. GitHub Copilot prompt injection flaw leaked sensitive data from private repos. CSO Online, 8 October 2025. Available online: https://www.csoonline.com/article/4069887/github-copilot-prompt-injection-flaw-leaked-sensitive-data-from-private-repos.html (accessed on 1 December 2025).
- Announcing the Adaptive Prompt Injection Challenge: LLMail-Inject. Available online: https://msrc.microsoft.com/blog/2024/12/announcing-the-adaptive-prompt-injection-challenge-llmail-inject (accessed on 8 December 2025).
- Vulnerable MCP: Security Vulnerabilities in Model Context Protocol. Available online: https://vulnerablemcp.info/ (accessed on 1 December 2025).
- GitHub Copilot RCE Vulnerability Lets Attackers Execute Malicious Code. Available online: https://gbhackers.com/github-copilot-rce-vulnerability/ (accessed on 1 December 2025).
- GitHub Copilot Vulnerability Exposes User Data and Private Repositories. Available online: https://cybersecuritynews.com/github-copilot-vulnerability/ (accessed on 1 December 2025).
- Page, C. GitHub Copilot Chat turns blabbermouth with crafty prompt injection attack. The Register, 9 October 2025. Available online: https://www.theregister.com/2025/10/09/github_copilot_chat_vulnerability/ (accessed on 1 December 2025).
- PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation. Available online: https://github.com/sleeepeer/PoisonedRAG (accessed on 1 December 2025).
- Clop, C.; Teglia, Y. Backdoored retrievers for prompt injection attacks on retrieval augmented generation of large language models. arXiv 2024, arXiv:2410.14479. [Google Scholar] [CrossRef]
- GitHub Copilot: Remote Code Execution via Prompt Injection (CVE-2025-53773). Available online: https://vivekfordevsecopsciso.medium.com/github-copilot-remote-code-execution-via-prompt-injection-cve-2025-53773-38b4792e70fb (accessed on 8 December 2025).
- Ramirez, S. GitHub Copilot’s prompt injection flaw sparks security concerns. SQ Magazine, 10 October 2025. Available online: https://sqmagazine.co.uk/github-copilot-prompt-injection-camoleak/ (accessed on 1 December 2025).
- Divya. Simple prompt injection lets hackers bypass OpenAI guardrails framework. GBHackers, 14 October 2025. Available online: https://gbhackers.com/hackers-bypass-openai-guardrails-framework/ (accessed on 8 December 2025).
- Claude Code Security Documentation. Available online: https://docs.claude.com/en/docs/claude-code/security (accessed on 1 December 2025).
- Hung, K.-H.; Ko, C.-Y.; Rawat, A.; Chung, I.-H.; Hsu, W.H.; Chen, P.-Y. Attention tracker: Detecting prompt injection attacks in LLMs. In Findings of the Association for Computational Linguistics: NAACL 2025; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; Available online: https://aclanthology.org/2025.findings-naacl.123.pdf (accessed on 8 December 2025).
- Tan, X.; Luan, H.; Luo, M.; Sun, X.; Chen, P.; Dai, J. RevPRAG: Revealing poisoning attacks in retrieval-augmented generation through LLM activation analysis. arXiv 2024, arXiv:2411.18948. [Google Scholar]
- Zhang, B.; Chen, Y.; Fang, M.; Liu, Z.; Nie, L.; Li, T.; Liu, Z. Practical poisoning attacks against retrieval-augmented generation. arXiv 2025, arXiv:2504.03957. [Google Scholar] [CrossRef]
- Gulyamov, S.; Jurayev, S. Cybersecurity threats and data breaches: Legal implication in cyberspace contracts. Young Sci. 2023, 1, 19–22. Available online: https://in-academy.uz/index.php/yo/article/view/21738 (accessed on 1 December 2025).
- Gulyamov, S.S.; Rodionov, A.A. Cyber hygiene as an effective psychological measure in the prevention of cyber addictions. Psychol. Law 2024, 14, 77–91. [Google Scholar] [CrossRef]
- Taeihagh, A. Governance of artificial intelligence. Policy Soc. 2021, 40, 137–157. [Google Scholar] [CrossRef]
- Nastoska, A.; Jancheska, B.; Rizinski, M.; Trajanov, D. Evaluating Trustworthiness in AI: Risks, Metrics, and Applications Across Industries. Electronics 2025, 14, 2717. [Google Scholar] [CrossRef]
- Hackett, W.; Birch, L.; Trawicki, S.; Suri, N.; Garraghan, P. Bypassing LLM guardrails: An empirical analysis of evasion attacks against prompt injection and jailbreak detection systems. arXiv 2025, arXiv:2504.11168. [Google Scholar]
- Mayoral-Vilches, V.; Rynning, P. Cybersecurity AI: Hacking the AI hackers via prompt injection. arXiv 2025, arXiv:2508.21669. [Google Scholar] [CrossRef]
- Fu, Y.; Liang, P.; Tahir, A.; Li, Z.; Shahin, M.; Yu, J.; Chen, J. Security weaknesses of Copilot-generated code in GitHub projects: An empirical study. ACM Trans. Softw. Eng. Methodol. 2025, 34, 1–34. [Google Scholar] [CrossRef]
- Ramakrishnan, B.; Balaji, A. Securing AI agents against prompt injection attacks. arXiv 2025, arXiv:2511.15759. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Tihanyi, N.; Hamouda, D.; Maglaras, L.; Debbah, M. From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows. arXiv 2025, arXiv:2506.23260. [Google Scholar] [CrossRef]
| Main Category | Subcategory | Target LLM Models | Representative Examples | Key Characteristics | Distribution of Primary Sources in Taxonomy |
|---|---|---|---|---|---|
| Direct Injection (Jailbreaking) | Manipulation through game mechanics | Chat GPT-4.0 | Windows key exploit via “guess the number” game mechanic with HTML obfuscation | Requires active user interaction; exploits psychological triggers and game rules; easily patched by vendors, but methods constantly evolve | |
| Role-playing and adversarial optimization | Chat GPT-4.0, Claude Opus | DAN (Do Anything Now) exploits, JudgeDeceiver with gradient-based optimization | Exploits model training on fiction and dramatic contexts; uses automated gradient methods to find bypass sequences | ||
| Obfuscation techniques | Claude (all versions) | Hiding via Unicode tags (U+E0000-U+E007F), base64 encoding, nested HTML tags | Creates hidden channels invisible to human review; bypasses keyword filters; requires multi-layered analysis for detection | ||
| Indirect Injection | Web content poisoning | Bing Chat, ChatGPT with browsing | RAG context poisoning via malicious websites, cross-tab access exploit with CSS-invisible text | Operates invisibly without victim awareness; scales to millions of users through poisoning popular sites; exploits trust in external sources | |
| Document injection | GitHub Copilot Chat | Hidden instructions in Markdown comments (<!-- -->), white text on white background in PDFs, base64 encoding in documents | Completely bypasses human visual inspection; persists in repositories spreading to other developers; can cause physical damage when integrated with OT systems | ||
| Email and message injection | Slack AI | Data exfiltration via RAG poisoning in corporate chat systems, tool invocations via hidden instructions in email | Scales through mass mailing; requires no clicks or downloads from victims; automatically executes during AI assistant processing | ||
| Tool-Based Injection | MCP tool poisoning | Claude Desktop | Hidden instructions “always BCC attacker@evil.com” in MCP email tool descriptions | Persists across all user sessions; completely invisible in client UI; exploits privilege escalation through legitimate credentials | Methodological Frameworks (also necessary for analysis)
|
| Unicode hidden instructions | Claude Code | Unicode tags (U+E0000-U+E007F) in MCP tool descriptions, ANSI terminal escape sequences | Characters completely invisible in modern browsers; processed by LLM during tool selection; bypass all visual security filters | ||
| “Veil dropping” attacks | All MCP systems with dynamic updates | Time-delayed malicious mutation after benign behavior period; gradual escalation of maliciousness | Initially passes all security checks; mutates after establishing user trust; most MCP implementations do not track tool description history |
| Base System Vulnerability | Attack Exploiting Vulnerability | Attack Mechanism | Affected Component | Impact Scope | Persistence |
|---|---|---|---|---|---|
| Insufficient validation of document sources into RAG corpus | Knowledge base poisoning | Semantic optimization of malicious documents to match target queries | Document corpus, search index | Targeting of specific queries (pinpoint impact) | Requires corpus update for removal; reindexing takes hours |
| Absence of embedding integrity verification and anomaly monitoring | Vector database exploitation | Adversarial optimization of embeddings for mathematically precise matching with arbitrary queries | Embedding space, similarity computation | Broad: intercepts query clusters, not individual queries | Indistinguishable to human inspection; requires statistical analysis of embedding distributions |
| Unprotected long-term memory storage without context isolation | Memory-based persistence | Injection of malicious instructions through conversation history for persistent influence | Long-term memory systems, context storage | User-specific in personal systems; enterprise-level in corporate deployments | Survives sessions, logouts, device changes through server-side storage |
| Vulnerability (from Table 2) | Proposed Defense Mechanism | Implementation Details | OWASP Compliance |
|---|---|---|---|
| Insufficient document source validation | Document provenance verification + Content sanitization + Trust-based filtering | Cryptographic source verification before indexing; whitelists of trusted domains; content analysis for hidden text | LLM04:2025 (Data and Model Poisoning) |
| Absence of embedding integrity verification | Adversarial embedding detection + Statistical anomaly monitoring + Ensemble retrievers | Embedding distribution analysis using PCA; outlier detection; majority voting from multiple retrieval methods | LLM08:2025 (Vector and Embedding Weaknesses) |
| Unprotected long-term memory storage | Session isolation + Memory encryption + Time-based expiration + User checkpoints | Strict separation of user contexts; AES-256 for server-side storage; automatic expiration of sensitive memories; periodic user verification | LLM02:2025 (Sensitive Information Disclosure) |
| All RAG vulnerabilities (general defense) | Content paraphrasing + Query reformulation + Knowledge augmentation (redundancy) | Rewriting retrieved documents before LLM processing; reformulating user queries to reduce poisoned content retrieval; fetching multiple redundant documents with majority voting | LLM01:2025 (Prompt Injection-indirect) |
| PALADIN Layer | Primary Defense Mechanism | Mitigated OWASP Risks | Technical Implementation Details |
|---|---|---|---|
| Layer 1: Input Validation and Sanitization | Injection pattern matching + Semantic intent analysis + Trust-based filtering | LLM01:2025—Prompt Injection (direct) | Regex database for known jailbreak patterns; semantic similarity assessment with malicious training data; blacklists of forbidden instructions |
| Layer 2: Context Isolation and Delimiters | Hierarchical prompt levels + XML/JSON structuring + Explicit instruction prioritization | LLM07:2025—System Prompt Leakage LLM01:2025—Prompt Injection (context confusion) | Three-level hierarchy: System (level 0) > User (level 1) > External (level 2); XML tags for forced separation; cryptographic hashes for modification detection |
| Layer 3: Behavioral Monitoring and Anomaly Detection | Tool call sequence analysis + Baseline deviation detection + Exfiltration pattern detection | LLM06:2025—Excessive Agency LLM02:2025—Sensitive Information Disclosure LLM10:2025—Unbounded Consumption | Training on legitimate user behavior (e.g., 30-day window); statistical models (isolation forest, autoencoders); real-time detection with alert thresholds |
| Layer 4: Tool Call Authorization and Sandboxing | Explicit user approval for sensitive operations + Virtualized execution + Principle of least privilege | LLM06:2025—Excessive Agency LLM03:2025—Supply Chain Vulnerabilities LLM05:2025—Improper Output Handling | MANDATORY approval for: filesystem access, network requests, DB modifications; Docker/gVisor sandboxes for code execution; RBAC with resource-based access control lists |
| Layer 5: Output Filtering and Verification | PII detection in responses + Policy compliance checking + Exfiltration pattern detection + Regeneration on suspicion | LLM02:2025—Sensitive Information Disclosure LLM09:2025—Misinformation LLM05:2025—Improper Output Handling | NER regex matching for PII (email, phone numbers, SSN); Base64 and URL encoding detection; semantic verification of alignment with user intent |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Gulyamov, S.; Gulyamov, S.; Rodionov, A.; Khursanov, R.; Mekhmonov, K.; Babaev, D.; Rakhimjonov, A. Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms. Information 2026, 17, 54. https://doi.org/10.3390/info17010054
Gulyamov S, Gulyamov S, Rodionov A, Khursanov R, Mekhmonov K, Babaev D, Rakhimjonov A. Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms. Information. 2026; 17(1):54. https://doi.org/10.3390/info17010054
Chicago/Turabian StyleGulyamov, Saidakhror, Said Gulyamov, Andrey Rodionov, Rustam Khursanov, Kambariddin Mekhmonov, Djakhongir Babaev, and Akmaljon Rakhimjonov. 2026. "Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms" Information 17, no. 1: 54. https://doi.org/10.3390/info17010054
APA StyleGulyamov, S., Gulyamov, S., Rodionov, A., Khursanov, R., Mekhmonov, K., Babaev, D., & Rakhimjonov, A. (2026). Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms. Information, 17(1), 54. https://doi.org/10.3390/info17010054
