EvoShield: Selective Test-Time Adaptation for Prompt Injection Detection via Active LLM Querying
Abstract
1. Introduction
- We formulate prompt injection detection under evolving deployment streams as a selective test-time adaptation problem, shifting the task from static offline classification to online guarded decision making.
- We design a deployment-oriented adaptation protocol in which predictive uncertainty controls external supervision, and parsable queried labels are reused to update the local detector on recently encountered difficult cases.
- We validate EvoShield on four prompt injection and jailbreak detection benchmarks, showing that selective test-time adaptation can preserve strong detection performance while substantially reducing external API calls.
2. Related Work
3. Materials and Methods
3.1. Framework Overview
3.2. Prompt-Based Local Detector
3.3. Uncertainty-Triggered LLM Query
3.4. Review-Based Test-Time Adaptation
4. Experiments
4.1. Datasets and Experimental Setup
4.2. Metrics and Implementation Details
5. Results
5.1. Comparison with Baselines
5.1.1. High Performance Retention with Drastic Cost Reduction
5.1.2. Performance Gains in Specific Environments
5.1.3. Data-Scale-Driven Routing Proportion
5.1.4. Practical Runtime and Hardware Overhead
5.2. Sensitivity Analysis of Routing and Review Parameters
5.3. Ablation Studies
5.3.1. Active Querying Elevates Baseline Detection
5.3.2. Review Mechanism Drastically Reduces Query Costs
5.3.3. Review Mechanism Particularly Benefits Small Sample Regimes
5.4. Cross-Dataset Distribution Shift Evaluation
6. Discussion
6.1. Validation of the Selective Routing Mechanism
6.1.1. Selective Division of Predictive Responsibility
6.1.2. Stable Local Performance on Easier Inputs
6.1.3. Hard Cases Remain Hard Even for Strong LLMs
6.2. Dynamics of Uncertainty-Driven Querying
6.2.1. Task-Dependent Query Dynamics
6.2.2. Uncertainty Serves as a Meaningful Routing Signal
6.3. Limitations: High-Confidence False Negatives and Entropy-Evasion Risk
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| API | Application programming interface |
| JC | Jailbreak classification |
| LLM | Large language model |
| PI | Prompt injection |
| SG | Safe-Guard |
| TTA | Test-time adaptation |
References
- Liu, Y.; Deng, G.; Li, Y.; Wang, K.; Wang, Z.; Wang, X.; Zhang, T.; Liu, Y.; Wang, H.; Zheng, Y.; et al. Prompt injection attack against llm-integrated applications. arXiv 2023, arXiv:2306.05499. [Google Scholar]
- Debenedetti, E.; Zhang, J.; Balunovic, M.; Beurer-Kellner, L.; Fischer, M.; Tramèr, F. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents. Adv. Neural Inf. Process. Syst. 2024, 37, 82895–82920. [Google Scholar]
- Kokkula, S.; Divya, G. Palisade–Prompt Injection Detection Framework. arXiv 2024, arXiv:2410.21146. [Google Scholar]
- Gosmar, D.; Dahl, D.A.; Gosmar, D. Prompt injection detection and mitigation via AI multi-agent NLP frameworks. arXiv 2025, arXiv:2503.11517. [Google Scholar] [CrossRef]
- Shi, T.; Zhu, K.; Wang, Z.; Jia, Y.; Cai, W.; Liang, W.; Wang, H.; Alzahrani, H.; Lu, J.; Kawaguchi, K.; et al. Promptarmor: Simple yet effective prompt injection defenses. arXiv 2025, arXiv:2507.15219. [Google Scholar] [CrossRef]
- Zou, W.; Liu, Y.; Wang, Y.; Chen, Y.; Gong, N.; Jia, J. PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features. arXiv 2025, arXiv:2510.14005. [Google Scholar]
- Chen, Y.; Li, H.; Sui, Y.; He, Y.; Liu, Y.; Song, Y.; Hooi, B. Can indirect prompt injection attacks be detected and removed? In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, 27 July–1 August 2025; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 18189–18206. [Google Scholar]
- Hackett, W.; Birch, L.; Trawicki, S.; Suri, N.; Garraghan, P. Bypassing LLM guardrails: An empirical analysis of evasion attacks against prompt injection and jailbreak detection systems. In Proceedings of the First Workshop on LLM Security (LLMSEC), Vienna, Austria, 1 August 2025; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 101–114. [Google Scholar]
- Liang, L.; Wang, G.; Lin, C.; Feng, Z. PTE: Prompt tuning with ensemble verbalizers. Expert Syst. Appl. 2025, 262, 125600. [Google Scholar] [CrossRef]
- Jacob, D.; Alzahrani, H.; Hu, Z.; Alomair, B.; Wagner, D. Promptshield: Deployable detection for prompt injection attacks. In Proceedings of the Fifteenth ACM Conference on Data and Application Security and Privacy, Porto, Portugal, 19–21 June 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 341–352. [Google Scholar]
- Li, H.; Liu, X.; Zhang, N.; Xiao, C. PIGuard: Prompt injection guardrail via mitigating overdefense for free. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, 27 July–1 August 2025; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 30420–30437. [Google Scholar]
- Li, H.; Liu, X. Injecguard: Benchmarking and mitigating over-defense in prompt injection guardrail models. arXiv 2024, arXiv:2410.22770. [Google Scholar]
- Lee, D.; Xie, S.; Rahman, S.; Pat, K.; Lee, D.; Chen, Q.A. “Prompter Says”: A Linguistic Approach to Understanding and Detecting Jailbreak Attacks Against Large-Language Models. In Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis, Salt Lake City, UT, USA, 14 October 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 77–87. [Google Scholar]
- Wang, X.; Wang, W.; Ji, Z.; Li, Z.; Ma, P.; Wu, D.; Wang, S. STShield: Single-token sentinel for real-time jailbreak detection in large language models. arXiv 2025, arXiv:2503.17932. [Google Scholar]
- Yi, J.; Xie, Y.; Zhu, B.; Kiciman, E.; Sun, G.; Xie, X.; Wu, F. Benchmarking and defending against indirect prompt injection attacks on large language models. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, Toronto, ON, Canada, 3–7 August 2025; Association for Computing Machinery: New York, NY, USA, 2025; pp. 1809–1820. [Google Scholar]
- Hines, K.; Lopez, G.; Hall, M.; Zarfati, F.; Zunger, Y.; Kiciman, E. Defending against indirect prompt injection attacks with spotlighting. arXiv 2024, arXiv:2403.14720. [Google Scholar] [CrossRef]
- Zhan, Q.; Fang, R.; Panchal, H.S.; Kang, D. Adaptive attacks break defenses against indirect prompt injection attacks on llm agents. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, NM, USA, 29 April–4 May 2025; Association for Computing Machinery: New York, NY, USA, 2025; pp. 7101–7117. [Google Scholar]
- Wang, D.; Shelhamer, E.; Liu, S.; Olshausen, B.; Darrell, T. Tent: Fully test-time adaptation by entropy minimization. arXiv 2020, arXiv:2006.10726. [Google Scholar]
- Niu, S.; Wu, J.; Zhang, Y.; Chen, Y.; Zheng, S.; Zhao, P.; Tan, M. Efficient test-time model adaptation without forgetting. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 16888–16905. [Google Scholar]
- Wang, G.; Yang, L.; Zhuang, F.; Han, L.; Hao, Z.; Xiao, X.; Lin, C. Robust synchronization of chaotic systems using noise-resistant gradient neural dynamics: Design and application. Eng. Appl. Artif. Intell. 2026, 167, 113854. [Google Scholar] [CrossRef]
- Gui, S.; Li, X.; Ji, S. Active test-time adaptation: Theoretical analyses and an algorithm. arXiv 2024, arXiv:2404.05094. [Google Scholar] [CrossRef]
- Hu, J.; Zhang, Z.; Chen, G.; Wen, X.; Shuai, C.; Luo, W.; Xiao, B.; Li, Y.; Tan, M. Test-time learning for large language models. arXiv 2025, arXiv:2505.20633. [Google Scholar] [CrossRef]
- Settles, B. Active Learning Literature Survey; Department of Computer Sciences, University of Wisconsin-Madison: Madison, WI, USA, 2009. [Google Scholar]
- Schick, T.; Schütze, H. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Kyiv, Ukraine, 19–23 April 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 255–269. [Google Scholar]
- Schick, T.; Schütze, H. It’s not just size that matters: Small language models are also few-shot learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico, 6–11 June 2021; pp. 2339–2352. [Google Scholar]
- Gao, T.; Fisch, A.; Chen, D. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 3816–3830. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- Deepset. Prompt-Injections, Hugging Face. 2023. Available online: https://huggingface.co/datasets/deepset/prompt-injections (accessed on 15 April 2026).
- Jackhhao. Jailbreak-Classification. 2023. Available online: https://huggingface.co/datasets/jackhhao/jailbreak-classification (accessed on 15 April 2026).
- Li, H.; Dong, Q.; Tang, Z.; Wang, C.; Zhang, X.; Huang, H.; Huang, S.; Huang, X.; Huang, Z.; Zhang, D.; et al. Synthetic data (almost) from scratch: Generalized instruction tuning for language models. arXiv 2024, arXiv:2402.13064. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 721. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Punta Cana, Dominican Republic, 16–20 Novemebr 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 38–45. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]



| Dataset | Splits | Samples | Label Distribution |
|---|---|---|---|
| JC-balance | train | 32 | 0:16 (50.0%), 1:16 (50.0%) |
| test | 1274 | 0:624 (49.0%), 1:650 (51.0%) | |
| JC-imbalance | train | 32 | 0:16 (50.0%), 1:16 (50.0%) |
| test | 1966 | 0:1316 (66.9%), 1:650 (33.1%) | |
| PI | train | 32 | 0:16 (50.0%), 1:16 (50.0%) |
| test | 630 | 0:383 (60.8%), 1:247 (39.2%) | |
| SG | train | 32 | 0:16 (50.0%), 1:16 (50.0%) |
| val | 205 | 0:143 (69.8%), 1:62 (30.2%) | |
| test | 2052 | 0:1440 (70.2%), 1:612 (29.8%) |
| Model | Method | Acc | Macro-P | Macro-R | Macro-F1 | Overall | API Calls |
|---|---|---|---|---|---|---|---|
| Task: JC (Balanced) | |||||||
| claude-sonnet-4-6 | Vanilla | 0.967 | 0.967 | 0.967 | 0.967 | 0.967 | 1274 |
| EvoShield | 0.951 | 0.951 | 0.951 | 0.951 | 0.951 | 133 | |
| Retention (%) | 98.3% | 98.3% | 98.3% | 98.3% | 98.3% | 10.4% | |
| gemini-3.1-pro-preview | Vanilla | 0.966 | 0.966 | 0.966 | 0.966 | 0.966 | 1274 |
| EvoShield | 0.958 | 0.958 | 0.958 | 0.958 | 0.958 | 138 | |
| Retention (%) | 99.2% | 99.2% | 99.2% | 99.2% | 99.2% | 10.8% | |
| gpt-5.2 | Vanilla | 0.901 | 0.914 | 0.899 | 0.900 | 0.904 | 1274 |
| EvoShield | 0.890 | 0.901 | 0.888 | 0.889 | 0.892 | 147 | |
| Retention (%) | 98.8% | 98.6% | 98.8% | 98.8% | 98.7% | 11.5% | |
| grok-4-1-fast-reasoning | Vanilla | 0.969 | 0.969 | 0.969 | 0.969 | 0.969 | 1274 |
| EvoShield | 0.954 | 0.955 | 0.954 | 0.954 | 0.954 | 137 | |
| Retention (%) | 98.5% | 98.6% | 98.5% | 98.5% | 98.5% | 10.8% | |
| Task: JC (Imbalanced) | |||||||
| claude-sonnet-4-6 | Vanilla | 0.968 | 0.962 | 0.966 | 0.964 | 0.965 | 1966 |
| EvoShield | 0.964 | 0.967 | 0.952 | 0.959 | 0.961 | 133 | |
| Retention (%) | 99.6% | 100.5% | 98.6% | 99.5% | 99.6% | 6.8% | |
| gemini-3.1-pro-preview | Vanilla | 0.967 | 0.961 | 0.965 | 0.963 | 0.964 | 1966 |
| EvoShield | 0.959 | 0.961 | 0.947 | 0.953 | 0.955 | 124 | |
| Retention (%) | 99.2% | 100.0% | 98.1% | 99.0% | 99.1% | 6.3% | |
| gpt-5.2 | Vanilla | 0.864 | 0.851 | 0.893 | 0.858 | 0.867 | 1966 |
| EvoShield | 0.862 | 0.847 | 0.887 | 0.855 | 0.863 | 276 | |
| Retention (%) | 99.8% | 99.5% | 99.3% | 99.7% | 99.5% | 14.0% | |
| grok-4-1-fast-reasoning | Vanilla | 0.975 | 0.970 | 0.973 | 0.971 | 0.972 | 1966 |
| EvoShield | 0.969 | 0.970 | 0.960 | 0.965 | 0.966 | 123 | |
| Retention (%) | 99.4% | 100.0% | 98.7% | 99.4% | 99.4% | 6.3% | |
| Task: PI | |||||||
| claude-sonnet-4-6 | Vanilla | 0.962 | 0.967 | 0.954 | 0.960 | 0.961 | 630 |
| EvoShield | 0.929 | 0.922 | 0.932 | 0.926 | 0.927 | 247 | |
| Retention (%) | 96.6% | 95.3% | 97.7% | 96.5% | 96.5% | 39.2% | |
| gemini-3.1-pro-preview | Vanilla | 0.966 | 0.973 | 0.957 | 0.964 | 0.965 | 630 |
| EvoShield | 0.943 | 0.946 | 0.934 | 0.939 | 0.941 | 162 | |
| Retention (%) | 97.6% | 97.2% | 97.6% | 97.4% | 97.5% | 25.7% | |
| gpt-5.2 | Vanilla | 0.935 | 0.946 | 0.920 | 0.930 | 0.933 | 630 |
| EvoShield | 0.889 | 0.919 | 0.860 | 0.876 | 0.886 | 165 | |
| Retention (%) | 95.1% | 97.1% | 93.5% | 94.2% | 95.0% | 26.2% | |
| grok-4-1-fast-reasoning | Vanilla | 0.956 | 0.962 | 0.945 | 0.953 | 0.954 | 630 |
| EvoShield | 0.932 | 0.945 | 0.915 | 0.926 | 0.930 | 163 | |
| Retention (%) | 97.5% | 98.2% | 96.8% | 97.2% | 97.5% | 25.9% | |
| Task: SG | |||||||
| claude-sonnet-4-6 | Vanilla | 0.950 | 0.944 | 0.937 | 0.940 | 0.943 | 2052 |
| EvoShield | 0.962 | 0.968 | 0.941 | 0.953 | 0.956 | 174 | |
| Retention (%) | 101.3% | 102.5% | 100.4% | 101.4% | 101.4% | 8.5% | |
| gemini-3.1-pro-preview | Vanilla | 0.943 | 0.942 | 0.921 | 0.931 | 0.934 | 2052 |
| EvoShield | 0.942 | 0.953 | 0.909 | 0.927 | 0.933 | 197 | |
| Retention (%) | 99.9% | 101.2% | 98.7% | 99.6% | 99.9% | 9.6% | |
| gpt-5.2 | Vanilla | 0.896 | 0.869 | 0.904 | 0.882 | 0.888 | 2052 |
| EvoShield | 0.925 | 0.900 | 0.936 | 0.915 | 0.919 | 235 | |
| Retention (%) | 103.2% | 103.6% | 103.5% | 103.7% | 103.5% | 11.5% | |
| grok-4-1-fast-reasoning | Vanilla | 0.843 | 0.823 | 0.879 | 0.831 | 0.844 | 2052 |
| EvoShield | 0.857 | 0.835 | 0.892 | 0.845 | 0.857 | 316 | |
| Retention (%) | 101.7% | 101.5% | 101.5% | 101.7% | 101.5% | 15.4% | |
| External LLM | Pure LLM Time | LLM Sec./Sample | EvoShield Time | EvoShield Sec./Sample | Time Reduction |
|---|---|---|---|---|---|
| Task: JC (Balanced) | |||||
| claude-sonnet-4-6 | 5:12:10 | 14.70 | 33:23 | 1.57 | 89.3% |
| gemini-3.1-pro-preview | 3:25:15 | 9.67 | 37:22 | 1.76 | 81.8% |
| gpt-5.2 | 1:05:03 | 3.06 | 10:37 | 0.50 | 83.7% |
| grok-4-1-fast-reasoning | 1:09:03 | 3.25 | 10:12 | 0.48 | 85.2% |
| Task: JC (Imbalanced) | |||||
| claude-sonnet-4-6 | 3:30:29 | 6.42 | 23:04 | 0.70 | 89.0% |
| gemini-3.1-pro-preview | 3:28:56 | 6.38 | 26:51 | 0.82 | 87.1% |
| gpt-5.2 | 1:09:51 | 2.13 | 19:37 | 0.60 | 71.9% |
| grok-4-1-fast-reasoning | 1:48:00 | 3.30 | 11:37 | 0.35 | 89.2% |
| Task: PI | |||||
| claude-sonnet-4-6 | 1:48:19 | 10.32 | 10:19 | 0.98 | 90.5% |
| gemini-3.1-pro-preview | 2:10:19 | 12.41 | 32:12 | 3.07 | 75.3% |
| gpt-5.2 | 29:23 | 2.80 | 10:41 | 1.02 | 63.6% |
| grok-4-1-fast-reasoning | 31:11 | 2.97 | 9:50 | 0.94 | 68.5% |
| Task: SG | |||||
| claude-sonnet-4-6 | 3:10:22 | 5.57 | 34:45 | 1.02 | 81.7% |
| gemini-3.1-pro-preview | 3:47:12 | 6.64 | 1:10:09 | 2.05 | 69.1% |
| gpt-5.2 | 57:17 | 1.68 | 17:21 | 0.51 | 69.7% |
| grok-4-1-fast-reasoning | 2:33:02 | 4.48 | 30:49 | 0.90 | 79.9% |
| Entropy Threshold | Avg. Acc | Avg. Macro-F1 | Avg. LLM Calls | Avg. LLM Ratio |
|---|---|---|---|---|
| 0.05 | 0.9024 | 0.8928 | 180.5 | 14.69% |
| 0.10 | 0.9022 | 0.8938 | 173.0 | 14.01% |
| 0.20 | 0.8848 | 0.8746 | 152.0 | 12.40% |
| 0.30 | 0.8617 | 0.8472 | 119.0 | 8.84% |
| 0.40 | 0.8552 | 0.8383 | 101.8 | 7.93% |
| Review Window Size | Avg. Acc | Avg. Macro-F1 | Avg. LLM Calls | Avg. LLM Ratio |
|---|---|---|---|---|
| 0 | 0.7574 | 0.6895 | 266.2 | 17.97% |
| 32 | 0.8845 | 0.8720 | 185.0 | 15.57% |
| 64 | 0.8923 | 0.8813 | 159.8 | 13.36% |
| 128 | 0.8696 | 0.8598 | 149.8 | 12.28% |
| 256 | 0.8595 | 0.8474 | 132.0 | 10.88% |
| 512 | 0.8856 | 0.8739 | 128.0 | 11.23% |
| External LLM | Method | Acc | Macro-P | Macro-R | Macro-F1 | Overall | API Calls |
|---|---|---|---|---|---|---|---|
| Task: JC (Balanced) | |||||||
| grok-4-1-fast-reasoning | Vanilla | 0.916 | 0.917 | 0.916 | 0.916 | 0.916 | 0 |
| w/o Review | 0.969 | 0.969 | 0.969 | 0.969 | 0.969 | 300 | |
| EvoShield | 0.954 | 0.955 | 0.954 | 0.954 | 0.954 | 137 | |
| gpt-5.2 | Vanilla | 0.928 | 0.930 | 0.929 | 0.928 | 0.928 | 0 |
| w/o Review | 0.519 | 0.757 | 0.509 | 0.357 | 0.536 | 32 | |
| EvoShield | 0.890 | 0.901 | 0.888 | 0.889 | 0.892 | 147 | |
| Task: JC (Imbalanced) | |||||||
| grok-4-1-fast-reasoning | Vanilla | 0.875 | 0.859 | 0.859 | 0.859 | 0.863 | 0 |
| w/o Review | 0.972 | 0.975 | 0.962 | 0.968 | 0.969 | 299 | |
| EvoShield | 0.969 | 0.970 | 0.960 | 0.965 | 0.966 | 123 | |
| gpt-5.2 | Vanilla | 0.767 | 0.779 | 0.814 | 0.763 | 0.781 | 0 |
| w/o Review | 0.886 | 0.867 | 0.901 | 0.877 | 0.883 | 721 | |
| EvoShield | 0.862 | 0.847 | 0.887 | 0.855 | 0.863 | 276 | |
| Task: PI | |||||||
| grok-4-1-fast-reasoning | Vanilla | 0.806 | 0.798 | 0.794 | 0.796 | 0.798 | 0 |
| w/o Review | 0.867 | 0.897 | 0.834 | 0.850 | 0.862 | 347 | |
| EvoShield | 0.932 | 0.945 | 0.915 | 0.926 | 0.930 | 163 | |
| gpt-5.2 | Vanilla | 0.792 | 0.796 | 0.810 | 0.790 | 0.797 | 0 |
| w/o Review | 0.884 | 0.918 | 0.853 | 0.870 | 0.881 | 408 | |
| EvoShield | 0.889 | 0.919 | 0.860 | 0.876 | 0.886 | 165 | |
| Task: SG | |||||||
| grok-4-1-fast-reasoning | Vanilla | 0.846 | 0.818 | 0.864 | 0.831 | 0.840 | 0 |
| w/o Review | 0.844 | 0.816 | 0.860 | 0.828 | 0.837 | 737 | |
| EvoShield | 0.857 | 0.835 | 0.892 | 0.845 | 0.857 | 316 | |
| gpt-5.2 | Vanilla | 0.812 | 0.807 | 0.866 | 0.803 | 0.822 | 0 |
| w/o Review | 0.798 | 0.794 | 0.695 | 0.716 | 0.751 | 348 | |
| EvoShield | 0.925 | 0.900 | 0.936 | 0.915 | 0.919 | 235 | |
| Task | External LLM | Attack Recall | FNR | FRR |
|---|---|---|---|---|
| JC (Balanced) | claude-sonnet-4-6 | 0.948 | 0.052 | 0.046 |
| gemini-3.1-pro-preview | 0.943 | 0.057 | 0.027 | |
| gpt-5.2 | 0.971 | 0.029 | 0.194 | |
| grok-4-1-fast-reasoning | 0.929 | 0.071 | 0.021 | |
| JC (Imbalanced) | claude-sonnet-4-6 | 0.915 | 0.085 | 0.011 |
| gemini-3.1-pro-preview | 0.909 | 0.091 | 0.016 | |
| gpt-5.2 | 0.958 | 0.042 | 0.185 | |
| grok-4-1-fast-reasoning | 0.932 | 0.068 | 0.013 | |
| PI | claude-sonnet-4-6 | 0.947 | 0.053 | 0.084 |
| gemini-3.1-pro-preview | 0.895 | 0.105 | 0.026 | |
| gpt-5.2 | 0.725 | 0.275 | 0.005 | |
| grok-4-1-fast-reasoning | 0.838 | 0.162 | 0.008 | |
| SG | claude-sonnet-4-6 | 0.891 | 0.109 | 0.008 |
| gemini-3.1-pro-preview | 0.827 | 0.173 | 0.009 | |
| gpt-5.2 | 0.964 | 0.036 | 0.092 | |
| grok-4-1-fast-reasoning | 0.980 | 0.020 | 0.196 |
| Segment/Window | Samples | Acc | Attack Recall | FNR | QR | Ent. |
|---|---|---|---|---|---|---|
| Overall stream | 1904 | 0.859 | 0.855 | 0.145 | 0.127 | 0.070 |
| JC (Balanced) segment | 1274 | 0.883 | 0.972 | 0.028 | 0.151 | 0.084 |
| PI segment | 630 | 0.811 | 0.547 | 0.453 | 0.076 | 0.044 |
| 128 before boundary | 128 | 0.891 | 0.986 | 0.014 | 0.008 | 0.009 |
| 128 after boundary | 128 | 0.805 | 0.500 | 0.500 | 0.086 | 0.050 |
| Last 128 PI samples | 128 | 0.883 | 0.722 | 0.278 | 0.031 | 0.017 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zheng, Z.; Liang, J.; Hu, M.; Pei, Y.; Xu, G.; Wu, Z. EvoShield: Selective Test-Time Adaptation for Prompt Injection Detection via Active LLM Querying. Mathematics 2026, 14, 1719. https://doi.org/10.3390/math14101719
Zheng Z, Liang J, Hu M, Pei Y, Xu G, Wu Z. EvoShield: Selective Test-Time Adaptation for Prompt Injection Detection via Active LLM Querying. Mathematics. 2026; 14(10):1719. https://doi.org/10.3390/math14101719
Chicago/Turabian StyleZheng, Zanhong, Jieming Liang, Mengqin Hu, Yijuan Pei, Guobao Xu, and Zhenlu Wu. 2026. "EvoShield: Selective Test-Time Adaptation for Prompt Injection Detection via Active LLM Querying" Mathematics 14, no. 10: 1719. https://doi.org/10.3390/math14101719
APA StyleZheng, Z., Liang, J., Hu, M., Pei, Y., Xu, G., & Wu, Z. (2026). EvoShield: Selective Test-Time Adaptation for Prompt Injection Detection via Active LLM Querying. Mathematics, 14(10), 1719. https://doi.org/10.3390/math14101719

