Embedding-Based Detection of Indirect Prompt Injection Attacks in Large Language Models Using Semantic Context Analysis
Abstract
1. Introduction
- We formalise indirect prompt injection detection as a semantic context consistency problem between user intent and external content in LLM-integrated applications.
- We build a balanced dataset for IPIA detection using BIPIA malicious samples and LLM-generated benign examples, and systematically evaluate three embedding models combined with three tree-based classifiers.
- We demonstrate that an embedding-based detector using OpenAI text-embedding-3-small and XGBoost achieves strong performance while remaining model-agnostic and suitable for real-time deployment.
- We provide qualitative and quantitative analyses of the learned embedding space using t-SNE and UMAP projections, offering interpretable insight into how benign and malicious samples are separated.
2. Related Work
2.1. Prompt Injection Attacks and LLM Security
2.2. Indirect Prompt Injection and Benchmarks
2.3. Defence Mechanisms for Indirect Prompt Injection
2.4. Embedding-Based Detection and Text Embeddings
2.5. Summary and Research Gap
3. Materials and Methods
3.1. Overview of the Proposed Framework
3.2. Dataset Construction and Preprocessing
where the separator consists of two newline characters, a horizontal rule, and two further newline characters. This ensures that the embedding models see both the user request and the external content in a consistent format that matches our implementation. The final balanced dataset is stored as a JSONL file (final_training_dataset.jsonl) with three fields per instance: user_intent, context, and a binary label (1 for IPIA, 0 for benign).[EXTERNAL_CONTENT]\n\n--\n\n[USER_INTENT]
3.3. Embedding Models
3.4. Mathematical Formulation
3.5. Classifier Models
3.6. Training Procedure
3.7. Implementation Algorithms
| Algorithm 1 Embedding extraction |
| Require: Dataset , embedding model identifier , batch size B, separator string S Ensure: Embedding matrix , label vector
|
| Algorithm 2 Classifier training and evaluation |
| Require: Embedding matrix , labels , test size ratio r, random seed s, classifier types Ensure: Performance metrics for all embedding–classifier combinations
|
3.8. Performance Metrics
3.9. Novel Aspects of the Proposed Approach
4. Results and Discussion
- What is the best embedding–classifier combination for detecting IPIAs? (This is answered in Section 4.3).
- Which machine learning classifier performs best for IPIA detection? (This is answered in Section 4.6).
- How do open-source and closed-source embedding models compare in terms of performance, cost, and deployment considerations? (This is answered in Section 4.5).
4.1. Performance Summary
4.2. Visual Comparison of Model Configurations
4.3. Best Performing Configuration
4.4. Precision–Recall and ROC Curve Analysis
4.5. Embedding Model Comparison
4.6. Classifier Analysis
4.7. Dimensionality Reduction and Embedding Space Structure
4.8. Interpretation and Deployment Implications
4.9. Comparison with Existing IPIA Defence and Detection Approaches
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar] [CrossRef]
- Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. PaLM: Scaling Language Modeling with Pathways. arXiv 2022, arXiv:2204.02311. [Google Scholar] [CrossRef]
- OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- Alatise, T.I.; Nottidge, O.E. Threat Detection and Response with SIEM System. Int. J. Comput. Sci. Inf. Secur. 2024, 22, 36–38. [Google Scholar] [CrossRef]
- Yao, Y.; Duan, J.; Xu, K.; Cai, Y.; Sun, Z.; Zhang, Y. A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confid. Comput. 2024, 4, 100211. [Google Scholar] [CrossRef]
- Brohi, S.; Mastoi, Q.u.a.; Jhanjhi, N.Z.; Pillai, T.R. A Research Landscape of Agentic AI and Large Language Models: Applications, Challenges and Future Directions. Algorithms 2025, 18, 499. [Google Scholar] [CrossRef]
- Li, M.Q.; Fung, B.C. Security Concerns for Large Language Models: A Survey. J. Inf. Secur. Appl. 2025, 95, 104284. [Google Scholar] [CrossRef]
- Kumar, P. Adversarial Attacks and Defenses for Large Language Models (LLMs): Methods, Frameworks & Challenges. Int. J. Multimed. Inf. Retr. 2024, 13, 26. [Google Scholar] [CrossRef]
- Sheng, Z.; Chen, Z.; Gu, S.; Huang, H.; Gu, G.; Huang, J. LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights. ACM Comput. Surv. 2025, 58, 1–35. [Google Scholar] [CrossRef]
- Hamid, R.; Brohi, S. A Review of Large Language Models in Healthcare: Taxonomy, Threats, Vulnerabilities, and Framework. Big Data Cogn. Comput. 2024, 8, 161. [Google Scholar] [CrossRef]
- Perez, F.; Ribeiro, I. Ignore Previous Prompt: Attack Techniques for Language Models. arXiv 2022, arXiv:2211.09527. [Google Scholar] [CrossRef]
- Liu, Y.; Jia, Y.; Geng, R.; Jia, J.; Gong, N.Z. Formalizing and Benchmarking Prompt Injection Attacks and Defenses. arXiv 2025, arXiv:2310.12815. [Google Scholar] [CrossRef]
- Greshake, K.; Abdelnabi, S.; Mishra, S.; Endres, C.; Holz, T.; Fritz, M. Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security; Association for Computing Machinery: New York, NY, USA; pp. 79–90. [CrossRef]
- Willison, S. Multi-Modal Prompt Injection Image Attacks Against GPT-4V. 2023. Available online: https://simonwillison.net/2023/Oct/14/multi-modal-prompt-injection (accessed on 27 May 2025).
- Yi, J.; Xie, Y.; Zhu, B.; Kiciman, E.; Sun, G.; Xie, X.; Wu, F. Benchmarking and Defending against Indirect Prompt Injection Attacks on Large Language Models. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1; Association for Computing Machinery: New York, NY, USA; pp. 1809–1820. [CrossRef]
- Zhan, Q.; Liang, Z.; Ying, Z.; Kang, D. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. arXiv 2024, arXiv:2403.02691. [Google Scholar] [CrossRef]
- Zhan, Q.; Fang, R.; Panchal, H.S.; Kang, D. Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents. arXiv 2025, arXiv:2503.00061. [Google Scholar] [CrossRef]
- Suo, X. Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications. arXiv 2024, arXiv:2401.07612. [Google Scholar] [CrossRef]
- Hines, K.; Lopez, G.; Hall, M.; Zarfati, F.; Zunger, Y.; Kiciman, E. Defending Against Indirect Prompt Injection Attacks With Spotlighting. arXiv 2024, arXiv:2403.14720. [Google Scholar] [CrossRef]
- Wu, F.; Cecchetti, E.; Xiao, C. System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective. arXiv 2024, arXiv:2409.19091. [Google Scholar] [CrossRef]
- Wang, J.; Wu, F.; Li, W.; Pan, J.; Suh, E.; Mao, Z.M.; Chen, M.; Xiao, C. FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks. arXiv 2024, arXiv:2410.21492. [Google Scholar] [CrossRef]
- Kokkula, S.; R, S.; R, N.; Aashishkumar; Divya, G. Palisade—Prompt Injection Detection Framework. arXiv 2024, arXiv:2410.21146. [Google Scholar] [CrossRef]
- Hung, K.H.; Ko, C.Y.; Rawat, A.; Chung, I.H.; Hsu, W.H.; Chen, P.Y. Attention Tracker: Detecting Prompt Injection Attacks in LLMs. arXiv 2025, arXiv:2411.00348. [Google Scholar] [CrossRef]
- Ayub, M.A.; Majumdar, S. Embedding-based classifiers can detect Prompt Injection Attacks. arXiv 2024, arXiv:2410.22284. [Google Scholar] [CrossRef]
- OpenAI. New Embedding Models and API Updates OpenAI Documentation. 2024. Available online: https://openai.com/index/new-embedding-models-and-api-updates/ (accessed on 6 June 2025).
- Li, Z.; Zhang, X.; Zhang, Y.; Long, D.; Xie, P.; Zhang, M. Towards General Text Embeddings With Multi-Stage Contrastive Learning. arXiv 2023, arXiv:2308.03281. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Association for Computational Linguistics: Stroudsburg, PA, USA, 2019. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems; NIPS’17; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 3149–3157. Available online: https://dl.acm.org/doi/10.5555/3294996.3295074 (accessed on 15 June 2025).
- Maaten, L.v.d.; Hinton, G. Visualizing Data Using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. Available online: http://jmlr.org/papers/v9/vandermaaten08a.html (accessed on 15 June 2025).
- McInnes, L.; Healy, J.; Saul, N.; Großberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 2018, 3, 861. [Google Scholar] [CrossRef]
- Mathew, E.S. Enhancing Security in Large Language Models: A Comprehensive Review of Prompt Injection Attacks and Defenses. J. Artif. Intell. 2025, 7, 347–363. [Google Scholar] [CrossRef]
- Zou, A.; Wang, Z.; Carlini, N.; Nasr, M.; Kolter, J.Z.; Fredrikson, M. Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv 2023, arXiv:2307.15043. [Google Scholar] [CrossRef]
- Shi, J.; Yuan, Z.; Liu, Y.; Huang, Y.; Zhou, P.; Sun, L.; Gong, N.Z. Optimization-Based Prompt Injection Attack to LLM-as-a-Judge. arXiv 2024, arXiv:2403.17710. [Google Scholar] [CrossRef]
- Huang, Y.; Wang, C.; Jia, X.; Guo, Q.; Juefei-Xu, F.; Zhang, J.; Liu, Y.; Pu, G. Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 5796–5816. [Google Scholar] [CrossRef]
- Heibel, J.; Lowd, D. MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants. arXiv 2024, arXiv:2407.11072. [Google Scholar] [CrossRef]
- Xue, J.; Zheng, M.; Hu, Y.; Liu, F.; Chen, X.; Lou, Q. BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models. arXiv 2024, arXiv:2406.00083. [Google Scholar] [CrossRef]
- Liang, X.; Niu, S.; Li, Z.; Zhang, S.; Wang, H.; Xiong, F.; Fan, Z.; Tang, B.; Zhao, J.; Yang, J.; et al. SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Che, W., Nabende, J., Shutova, E., Pilehvar, M.T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025. [Google Scholar] [CrossRef]
- Zhao, W.; Gupta, A.; Chung, T.; Huang, J. SPC: Soft Prompt Construction for Cross Domain Generalization. In Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023); Can, B., Mozes, M., Cahyawijaya, S., Saphra, N., Kassner, N., Ravfogel, S., Ravichander, A., Zhao, C., Augenstein, I., Rogers, A., et al., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 118–130. [Google Scholar] [CrossRef]
- Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2022, arXiv:2108.07258. [Google Scholar] [CrossRef]








| Study | Main Focus | Attack Type | Defence/Approach | Limitation/Gap |
|---|---|---|---|---|
| Greshake et al. [14] | Real-world vulnerabilities in LLM-integrated apps | Indirect prompt injection | Empirical analysis and case studies | Describes attacks and impact; does not provide a generic learning-based detector |
| Yi et al. (BIPIA) [16] | Dataset and benchmark for IPIAs | Indirect prompt injection | Benchmarking and evaluation framework | Focuses on evaluation and prevention strategies; provides only malicious samples; no stand-alone semantic detector for IPIA |
| Zhan et al. (InjecAgent) [17] | IPIAs in tool-integrated LLM agents | Indirect prompt injection | Agent benchmark and attack framework | Analyses agent vulnerabilities; does not propose a general-purpose detection model over embeddings |
| Willison [15] | Multimodal prompt injection on GPT-4V | Indirect prompt injection | Demonstration of image-based IPIAs | Highlights risk in vision–language models; no formal detection framework |
| Huang et al. [37] | Goal hijacking attacks | Direct/indirect prompt injection | Semantics-guided adversarial prompt construction | Focus on attack generation, not on robust detection layers |
| Zhan et al. [18] | Adaptive attacks on IPIA defences | Indirect prompt injection | Adaptive attack strategies against deployed defences | Shows many defences are breakable; does not provide a simple deployable detector |
| Suo [19] | Signing trusted instructions | Prompt injection (general) | Signed-Prompt scheme for verification | Requires signed infrastructure; does not inspect semantic consistency of user–content pairs |
| Hines et al. [20] | Prompt-level separation of sources | Indirect prompt injection | Spotlighting (marking user vs. external text) | Relies on prompting conventions; no explicit ML classifier for IPIA detection |
| Wang et al. (FATH) [22] | Authentication-based defence | Indirect prompt injection | Hash-based test-time authentication | Requires authenticated prompts; does not scale easily to arbitrary external content |
| Wu et al. [21] | System-level defences | Indirect prompt injection | Information flow control around LLMs | Requires architectural changes and enforcement; not a lightweight detector |
| Kokkula and Divya (Palisade) [23] | Prompt injection detection framework | Prompt injection (general) | Rule-based detection for LLM apps | Uses heuristics; limited semantic modelling and generalisation |
| Hung et al. (Attention Tracker) [24] | Representation-based detection | Prompt injection (general) | Monitoring attention patterns in LLMs | Model-specific and internal; not model-agnostic or embedding-based |
| BadRAG [39] | RAG vulnerabilities | Indirect prompt injection and poisoning | Security evaluation of RAG pipelines | Focuses on RAG; no generic classifier over user–content context |
| SafeRAG [40] | Benchmarking RAG security | Indirect prompt injection and other threats | Benchmark and taxonomy for secure RAG | Provides evaluation, but not a dedicated semantic IPIA detector |
| Ayub and Majumdar [25] | Embedding-based detection of PIAs | Direct prompt injection | Embedding models and tree-based classifiers | Limited to direct prompts; does not jointly model user intent and external content |
| Yao et al. [6] | Survey on LLM security and privacy | Multiple threats | Comprehensive taxonomy of attacks and defences | Broad survey; does not propose specific detection methods for IPIAs |
| Li and Fung [8] | Security concerns for LLMs | Multiple threats | Survey of security issues | General security survey; no dedicated IPIA detection framework |
| Mathew [34] | Prompt Injection Attacks and defences | Direct and indirect prompt injection | Comprehensive review | Review paper; does not implement or evaluate detection methods |
| Kumar [9] | Adversarial attacks on LLMs | Multiple adversarial threats | Survey of methods and frameworks | Focuses on adversarial attacks broadly; no specific IPIA detector |
| Sheng et al. [10] | LLMs in software security | Vulnerability detection | Survey of techniques | Focuses on code vulnerability detection; not on prompt injection |
| Zou et al. [35] | Universal adversarial attacks | Direct prompt injection | Optimisation-based attack generation | Attack-focused; no defence mechanism |
| This work | Semantic detection of IPIAs using embeddings | Indirect prompt injection | Joint embedding of user intent and external content and tree-based classifiers | Provides a lightweight, model-agnostic detector focused on semantic consistency between user intent and external content for IPIA detection |
| Classifier | Hyperparameters |
|---|---|
| Random Forest | , , , |
| XGBoost | , , , |
| LightGBM | , , , |
| Configuration | Accuracy | F1-Score | ROC-AUC | PR-AUC | Time (ms/Sample) |
|---|---|---|---|---|---|
| OpenAI–XGBoost | 0.977 | 0.977 | 0.997 | 0.996 | 0.0010 |
| OpenAI–LightGBM | 0.955 | 0.955 | 0.990 | 0.989 | 0.0016 |
| OpenAI–Random Forest | 0.937 | 0.938 | 0.984 | 0.981 | 0.0067 |
| GTE-large–XGBoost | 0.919 | 0.920 | 0.974 | 0.970 | 0.0008 |
| MiniLM-L6-v2–XGBoost | 0.896 | 0.897 | 0.961 | 0.955 | 0.0005 |
| GTE-large–LightGBM | 0.884 | 0.885 | 0.954 | 0.950 | 0.0013 |
| GTE-large–Random Forest | 0.881 | 0.884 | 0.950 | 0.942 | 0.0060 |
| MiniLM-L6-v2–LightGBM | 0.851 | 0.854 | 0.930 | 0.923 | 0.0011 |
| MiniLM-L6-v2–Random Forest | 0.848 | 0.852 | 0.923 | 0.913 | 0.0042 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Alamsabi, M.; Tchuindjang, M.; Brohi, S. Embedding-Based Detection of Indirect Prompt Injection Attacks in Large Language Models Using Semantic Context Analysis. Algorithms 2026, 19, 92. https://doi.org/10.3390/a19010092
Alamsabi M, Tchuindjang M, Brohi S. Embedding-Based Detection of Indirect Prompt Injection Attacks in Large Language Models Using Semantic Context Analysis. Algorithms. 2026; 19(1):92. https://doi.org/10.3390/a19010092
Chicago/Turabian StyleAlamsabi, Mohammed, Michael Tchuindjang, and Sarfraz Brohi. 2026. "Embedding-Based Detection of Indirect Prompt Injection Attacks in Large Language Models Using Semantic Context Analysis" Algorithms 19, no. 1: 92. https://doi.org/10.3390/a19010092
APA StyleAlamsabi, M., Tchuindjang, M., & Brohi, S. (2026). Embedding-Based Detection of Indirect Prompt Injection Attacks in Large Language Models Using Semantic Context Analysis. Algorithms, 19(1), 92. https://doi.org/10.3390/a19010092

