Detecting Prompt Injection Attacks in Generative AI Systems: A Hybrid SIEM and One-Class SVM Framework
Abstract
1. Introduction
1.1. Motivation and Problem Statement
1.2. The SOC Integration Challenge
1.3. Threat Landscape and Significance
1.4. Limitations of Existing Defenses
1.5. Our Contribution
- Instruments an OpenAI-compatible LLM gateway to emit structured telemetry into Elastic SIEM, reducing the need for parallel monitoring infrastructure;
- Applies four expert-authored correlation rules over session-level behavioral signals, enabling the detection of multi-turn reconnaissance campaigns;
- Augments rule coverage with an OCSVM trained exclusively on benign interactions, requiring no labeled attack examples and enabling the detection of behavioral deviations from the learned benign manifold;
- Achieves an F1-score of 0.883 and a recall of 0.810, and reduces ASR to 19.0% with an MTTD of 2.3 s under the evaluated Phi-3 Mini configuration, showing improvements over the keyword-filter baseline in the experimental setting;
- Provides an OpenAI-compatible gateway design intended to reduce vendor lock-in and support integration with heterogeneous LLM deployments, subject to deployment-specific calibration;
- Demonstrates architectural advantages for SOC integration without requiring changes to the underlying LLM, while acknowledging that performance generalization across model families, prompt templates, and application domains require further validation.
2. Background and Related Work
2.1. Prompt Injection Attack Surface
2.2. State-of-the-Art Defenses: Why They Fall Short
- Keyword filters: Work well for obvious attacks (“help me hack”), fail against evasion (“how would one technically execute”).
- Embedding similarity: Requires GPU inference for every request, increasing latency by 200–500 ms which is unacceptable for interactive applications.
- Supervised classifiers: Need 1000+ labeled attack examples before reaching acceptable performance, delaying deployment by weeks.
- Model-internal approaches: Unavailable for API-only models (OpenAI, Anthropic, and Azure OpenAI), which represent the majority of enterprise deployments.
2.3. SIEM Foundations and Anomaly Detection
3. System Architecture and Design
3.1. Architecture Overview
- Interaction Layer: User requests, authentication, LLM inference, and response transmission.
- Enrichment Layer: RAG retrieval, external API integration, tool management, and context assembly.
- Detection Layer: Telemetry collection, SIEM correlation, anomaly detection, alert generation, and escalation.

3.2. Telemetry Acquisition and Enrichment
- Session Context: Prompt ID, Session ID, User identifier (hashed), and organizational unit;
- Timing: High-resolution timestamps, request-response latency, queue delay, and processing time.
- Content Metrics: Token count, response entropy, prompt length, and character encoding anomalies.
- Behavioral Signals: Refusal flag, outcome category, tool invocations, and model temperature/parameters used.
- Data Provenance: Source of retrieved context, document metadata, and RAG confidence scores.
- External Scores: Pre-classifier scores from deployed detectors and keyword match counts.
- System State: CPU/memory utilization, concurrent requests, queue depth, and cache hit rates.
- Anomaly Indicators: Keyword presence, suspicious patterns, encoding anomalies, and unusual request structure.
3.3. Hybrid Detection Architecture
3.3.1. Layer 1: SIEM Rule-Based Correlation
- Rule R1 (Keyword Burst): Flags sessions with > policy-override phrases within 60 s (tuned: ). Examples: “Ignore instructions”, “Override system prompt”, and “Forget previous constraints”.
- Rule R2 (Refusal Deviation): Triggers when session refusal rate deviates > from baseline. Detects systematic probing where adversaries test model boundaries.
- Rule R3 (Token Spike): Detects token count increases > rolling median. Indicates context-stuffing attempts.
- Rule R4 (Provenance Anomaly): Activates when high-risk sources appear in > sessions within 60 s. Flags documents associated with previous attacks.
3.3.2. Layer 2: OCSVM Anomaly Detection
3.4. Decision Fusion and Alert Generation
- If Rule triggers, assign severity based on rule-specific thresholds (Low/Medium/High);
- If the OCSVM score , assign anomaly-based severity (Medium/High);
- If both trigger, escalate severity to High and generate critical alert;
- Generate alert with session context, behavioral signals, recommended actions, and decision provenance;
- Route High severity alerts to the real-time SOC dashboard; batch Medium alerts for analyst review.
4. System Design Strengths
4.1. Modularity and Vendor Independence
- Model-Portable Architecture: The underlying LLM can be changed without redesigning the telemetry and SIEM integration pipeline. However, SIEM rule thresholds and OCSVM parameters should be revalidated when moving from Phi-3 Mini to larger or proprietary models such as Llama 3, GPT-4, Claude, or Gemini.
- Organizational Flexibility: Enterprises can migrate between commercial APIs, such as OpenAI, Azure OpenAI, and Anthropic, and open-weight models, such as Llama and Mistral, without redesigning the gateway and SIEM integration components. This flexibility is useful as organizations evaluate competing platforms, provided that detection baselines are recalibrated for the selected deployment.
- Cost and Deployment Evaluation: The architecture supports controlled comparison of LLM providers and deployment configurations while keeping the telemetry and SIEM workflow consistent. However, detection performance should be revalidated when the underlying model, prompt template, or application domain changes.
4.2. Seamless SIEM Integration
- Centralized Alerting: SOC analysts manage LLM security alerts alongside network, endpoint, and application security events using existing alert dashboards and triage workflows. No parallel monitoring required.
- Incident Correlation: The SIEM correlation engine can link LLM attacks to upstream network reconnaissance, credential compromise, or lateral movement. Example: a prompt injection campaign correlating with failed VPN logins suggests organized external threat.
- Playbook Automation: Organizations leverage existing SOAR integrations to automate response actions: blocking users, isolating sessions, escalating to human analysts, disabling API keys. No custom automation logic required.
- Compliance Reporting: SIEM audit trails provide complete forensic records for regulatory investigations and compliance audits. Critical for regulated industries where detection and response must be documented.
4.3. Layered Detection Reduces False Positive Burden
- Tier 1 (Gateway Keywords): Catches obvious attacks with extremely high precision (0.988), reducing downstream processing load by ∼40%.
- Tier 2 (SIEM Correlation): Aggregates benign-appearing individual prompts into suspicious session patterns with 0.976 precision.
- Tier 3 (OCSVM): Identifies behavioral anomalies with 0.971 precision while catching attacks that bypass Tiers 1 and 2.
4.4. Unsupervised Learning Eliminates Training Data Bottleneck
- Benign-Only Initialization: The OCSVM can be trained on locally observed benign traffic without requiring labeled prompt-injection examples. However, the resulting benign manifold should be validated against deployment-specific traffic before automated blocking is enabled.
- Sensitivity to Novel Behavioral Deviations: Previously unseen attack variants may be flagged when they produce measurable deviations in features such as entropy, latency, refusal behavior, or risk indicators. This does not guarantee detection of all zero-day attacks, but it reduces dependence on a fixed catalog of known malicious prompts.
- Periodic Recalibration: As benign usage patterns evolve, the OCSVM can be retrained or recalibrated on accumulated benign interactions. Such recalibration should be accompanied by monitoring of false positives, threshold stability, and alert quality.
- API-Only Compatibility: The unsupervised approach does not require access to model internals, making it suitable for black-box and API-only deployments, provided that the necessary gateway-level telemetry is available.
4.5. Session-Level Visibility Captures Multi-Turn Campaigns
- Reconnaissance Campaigns: Multiple low-risk probes that reveal model behavior before full exploitation. Examples: “What security constraints are you operating under?”, “Can you ignore system instructions?”, and “What happens if you refuse?”.
- Incremental Escalation: Adversaries gradually increasing request token counts or refusal rates to avoid individual-prompt detectors. Example: context-stuffing attacks that increase 10% per request until they succeed.
- Distributed Attacks: Multiple users coordinating attacks where individual sessions appear benign but session-group patterns indicate coordination.
4.6. Observability and Debuggability
- Full Event History: SIEM retains complete query-response pairs with timing, tokens, entropy, and refusal information, consistent with broader LLM observability practices using telemetry pipelines. Investigators can replay entire attack campaigns.
- Decision Provenance: Alerts include which rule or OCSVM feature triggered detection, enabling operator understanding and informed threshold tuning.
- Threshold Tuning: Operators can adjust SIEM rule parameters and OCSVM thresholds through SIEM configuration, no code deployment required.
- Drift Monitoring: Framework tracks baseline refusal rates, token count distributions, and latency patterns, enabling the detection of model behavior changes indicating configuration drift or compromise.
5. Experimental Methodology
5.1. Evaluation Design and Scenarios
5.2. Dataset Construction
- Malicious: 900 prompts from the CySecBench dataset repository [38], stratified across five attack types (direct injection, indirect injection, token smuggling, context hijacking, and role-based jailbreak).
- Benign: 200 prompts from Stanford Alpaca [39], reflecting legitimate cybersecurity queries and business use cases.
- Direct Injection (180 samples): Adversarial suffixes appended to user queries.
- Indirect Injection (180 samples): Payloads embedded in RAG-retrieved documents.
- Token Smuggling (180 samples): Attacks using encoding/obfuscation to evade keyword filters.
- Context Hijacking (180 samples): Attempts to override system instructions and change model behavior.
- Role-Based Jailbreak (180 samples): Persona-based attacks (roleplay, hypothetical scenarios, and “what if” framing).

5.3. Metrics and Evaluation Protocol
- Precision, Recall, and F1-Score: Standard classification metrics indicating detection accuracy under the evaluated test distribution.
- Attack Success Rate (ASR): Percentage of malicious prompts that produce harmful or policy-violating outputs, where a harmful output is defined as a response that substantially complies with the malicious instruction, reveals restricted or sensitive information, provides actionable misuse guidance, or follows an instruction intended to override the system or developer constraints.
- False Positive Rate (FPR): False alarms on benign prompts, indicating operational acceptability under the evaluated benign distribution.
- Mean Time to Detection (MTTD): Average time from injection to alert, indicating the incident response window.
- AUC-ROC: Receiver operating characteristic curve under variable threshold, indicating threshold flexibility.
6. Results
6.1. Overall Performance
6.2. Scenario A: Gateway Keyword Baseline
- 68%: Jailbreak variants using role-playing or hypothetical framing (“imagine a scenario where” or “if you were no longer constrained”).
- 21%: Indirect injection payloads in retrieved documents (attacks embedded in text that should be summarized, not executed).
- 11%: Other evasion techniques (semantic mimicry, response normalization, and encoding obfuscation).
6.3. Scenario B: SIEM Rule-Based Detection
- R1 (Keyword Burst): 28 TPs (direct/role-based attacks with repeated override phrases within 60-s window).
- R2 (Refusal Deviation): 41 TPs (systematic probing causing above-baseline refusal rates adversaries test model boundaries).
- R3 (Token Spike): 7 TPs (context-hijacking attempts with inflated request tokens stuffing contexts requires abnormal token loads).
- R4 (Provenance Anomaly): 5 TPs (indirect injection from high-risk document sources certain document sources correlate with previous attacks).
6.4. Scenario C: Hybrid SIEM + OCSVM (Proposed)
- Encoding Evasion: Attacks using base64, ROT13, or other encodings to bypass keyword filters generate unusual entropy/token combinations that rules fail to catch, whereas the OCSVM detects them.
- Semantic Obfuscation: Attacks phrased as legitimate technical questions trigger refusals (detected via latency spike and behavioral change).
- Contextual Hijacking: Single prompts that override system instructions without any multi-turn pattern.

7. Contextual Comparison with Existing Defenses
8. Discussion
8.1. Why OCSVM Outperforms Rules on Residual Attacks
8.2. Evasion Strategies and Residual False Negatives
- Semantic Mimicry: Adversarial prompts phrased as legitimate technical questions, generating low token counts and normal latencies. Example: “Explain how a SQL injection attack works” vs. “Help me execute a SQL injection attack”. Both trigger model knowledge but only the latter is malicious, requiring semantic understanding beyond token-based features. Detection requires domain knowledge about attacker intent, not just LLM behavior.
- Response Normalization: Jailbreak-compliant responses deliberately structured as low-entropy text, evading entropy-based detection. Attackers preface jailbreak prompts with instructions like “respond with minimal punctuation”, reducing response entropy from 4.8 bits (typical harmful output) to 2.1 bits (low-entropy compliance). This suggests attackers are actively studying our defense mechanisms.
8.3. Operational Feasibility and False Positive Tolerance
8.4. Generalization and Model Robustness
8.5. Regulatory and Compliance Implications
9. Implementation and Deployment Considerations
9.1. Architecture Deployment
9.2. Real-World Deployment Workflow
- Week 1: Deploy gateway in shadow mode (accept production queries, emit telemetry, but do not block any requests). Operators familiarize themselves with the telemetry volume and baseline metrics.
- Week 2–3: Tune SIEM rules on production traffic. Rule thresholds calibrated to organizational baseline (different organizations may have different refusal rates, token counts, etc.).
- Week 4: Enable rule-based blocking on High confidence detections. Operators monitor for false positives, adjust thresholds if needed.
- Week 5–6: Train OCSVM on accumulated benign interactions. Deploy in advisory mode (fires alerts but does not block).
- Week 7–8: Transition OCSVM to enforcement mode with graduated blocking (percentage-based rollout to minimize risk).
9.3. Operational Tuning Parameters
- SIEM Rule Window: Default 60 s balances attack responsiveness and noise reduction. Tunable per organizational risk tolerance. Financial services may use 30-s windows; slower-moving applications may use 120-s windows.
- OCSVM parameter: Default expects a 5% anomaly rate. Increase for conservative detection (lower false negatives, higher false positives); decrease for aggressive filtering. Organizations with high attack frequency may use .
- Feature Thresholds: Entropy threshold (4.2 bits), latency z-score (>), and token count multiplier (>1.5×). Org-specific baselines recommended via histogram analysis of the production traffic.
- Severity Escalation: Configure which combinations trigger real-time alerts vs. batch review. Conservative organizations may escalate all Medium Risk events; aggressive organizations may only escalate High+High (both rules and OCSVM).
9.4. Monitoring and Maintenance
- Baseline Drift Detection: Monthly comparison of benign interaction distributions, including refusal rates, token counts, and latency, against the historical baseline. A sustained deviation greater than 10% is treated as an operational indicator of possible model behavior change, such as model fine-tuning, prompt-engineering changes, or the deployment of a new model version, and may require OCSVM retraining.
- Alert Quality Metrics: Track the false positive rate, mean time to detection, and alert handling time. Thresholds should be adjusted if FPR consistently exceeds 0.15, indicating alert-fatigue risk, or drops below 0.05, indicating possible over-suppression.
- OCSVM Performance: Quarterly retraining on accumulated benign data to incorporate distribution shift and improve anomaly detection as usage patterns evolve.
- Rule Effectiveness Audit: Semi-annual review of rule R1–R4 performance. If any rule consistently underperforms, for example, by producing fewer than five true positives per month under the local threat environment, operators may consider deprecation, retuning, or the reallocation of computational budget.
10. Conclusions
- Keyword filter alone: 54.0% recall, with 46.0% of attacks succeeding.
- SIEM rules added: 63.0% recall, with 37.0% of attacks succeeding.
- OCSVM added: 81.0% recall, with 19.0% of attacks succeeding.
- Vendor-neutral gateway architecture supporting LLM provider flexibility at the integration layer, subject to deployment-specific validation and calibration;
- SIEM integration supporting SOC alerting, correlation, and playbook automation;
- Unsupervised learning reducing dependence on labeled attack examples for initial anomaly modeling;
- Session-level visibility capturing multi-turn attack campaigns that application-layer detectors may miss;
- Deployment-oriented implementation requiring no modification to the underlying LLM and integrating through gateway-level telemetry, while requiring deployment-specific validation before production enforcement.
Future Directions
- Cross-Model, Cross-Template, and Cross-Domain Validation: Evaluate framework portability across larger and more diverse LLM families, including Llama 3, GPT-4, Claude, and Gemini, as well as different prompt templates and application domains. This evaluation should compare entropy, latency, refusal-rate, token-count, and anomaly-score distributions across settings and recalibrate SIEM and OCSVM thresholds where needed.
- Broader Benign and Mixed-Corpus Evaluation: Expand the benign evaluation set beyond Stanford Alpaca by incorporating domain-specific enterprise prompts, multilingual prompts, code-oriented prompts, policy queries, customer-support interactions, and long-form document-analysis tasks. Future studies should also evaluate mixed benign–malicious corpora to better estimate operational false positive rates, threshold stability, and alert burden under realistic SOC conditions.
- Adversarial Robustness: Formal evaluation against adaptive attacks using contemporary red-teaming benchmarks [36,40] and gradient-based evasion techniques. Organizations deploying against sophisticated adversaries should validate robustness under deployment-specific threat models before relying on automated enforcement.
- Semantic Analysis Layer: Integration of transformer-based semantic classifiers, such as DistilBERT fine-tuned on cybersecurity intent, to detect semantic mimicry attacks that behavioral features cannot catch.
- Federated Learning: Distributed OCSVM training across multiple organizations to improve benign-behavior modeling and capture organization-specific variations without centralizing proprietary data.
- Real-Time Integration: Production deployment and monitoring in commercial SOC environments with multi-year telemetry collection, enabling longitudinal analysis of attack trends.
- LLM Agent Expansion: Extend framework to detect prompt injection in agentic LLM systems with tool access and multi-step reasoning. Tool use introduces new attack vectors, including the use of LLM-controlled functions to execute unauthorized actions, requiring specialized detection logic.
- Threat Intelligence Integration: Automatic correlation with external threat feeds, including threat actors and known payloads, to enable proactive hunting and improved context for alert triage.
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| LLM | Large Language Model |
| SIEM | Security Information and Event Management |
| SOC | Security Operations Center |
| OCSVM | One-Class Support Vector Machine |
| ASR | Attack Success Rate |
| MTTD | Mean Time to Detection |
| FPR | False Positive Rate |
| TP | True Positive |
| FP | False Positive |
| FN | False Negative |
| TN | True Negative |
| API | Application Programming Interface |
| RAG | Retrieval-Augmented Generation |
| SOAR | Security Orchestration, Automation, and Response |
| IPI | Indirect Prompt Injection |
| OWASP | Open Worldwide Application Security Project |
| NIST | National Institute of Standards and Technology |
| RBF | Radial Basis Function |
| AUC-ROC | Area Under the Receiver Operating Characteristic Curve |
References
- Jaffal, N.O.; Alkhanafseh, M.; Mohaisen, D. Large Language Models in Cybersecurity: A Survey of Applications, Vulnerabilities, and Defense Techniques. AI 2025, 6, 216. [Google Scholar] [CrossRef]
- Yao, Y.; Duan, J.; Xu, K.; Cai, Y.; Sun, Z.; Zhang, Y. A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confid. Comput. 2024, 4, 100211. [Google Scholar] [CrossRef]
- Das, B.C.; Amini, M.H.; Wu, Y. Security and Privacy Challenges of Large Language Models: A Survey. ACM Comput. Surv. 2025, 57, 1–39. [Google Scholar] [CrossRef]
- Nelson, A.; Rekhi, S.; Scarfone, K.; Souppaya, M. Incident Response Recommendations and Considerations for Cybersecurity Risk Management: A CSF 2.0 Community Profile; Technical Report NIST SP 800-61r3; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2025. [Google Scholar] [CrossRef]
- Cybersecurity and Infrastructure Security Agency (CISA). Guidance for SIEM and SOAR Implementation; Online Resource; U.S. Department of Homeland Security: Washington, DC, USA, 2025. Available online: https://www.cisa.gov/resources-tools/resources/guidance-siem-and-soar-implementation (accessed on 10 April 2026).
- European Union Agency for Cybersecurity (ENISA). How to Set Up CSIRT and SOC: Good Practice Guide; Technical Report; ENISA: Brussels, Belgium, 2020; Available online: https://www.enisa.europa.eu/sites/default/files/publications/ENISA%20Report%20-%20How%20to%20setup%20CSIRT%20and%20SOC.pdf (accessed on 5 April 2026).
- Giarimpampa, D.; Meier, R.; Bissyande, T.F.; Lenders, V.; Klein, J. Exploring the Role of Artificial Intelligence in Enhancing Security Operations: A Systematic Review. ACM Comput. Surv. 2025, 58, 1–38. [Google Scholar] [CrossRef]
- OWASP Foundation. OWASP Top 10 for Large Language Model Applications; Online Resource; OWASP Foundation: Wilmington, DE, USA, 2024; Available online: https://owasp.org/www-project-top-10-for-large-language-model-applications/ (accessed on 10 April 2026).
- Vassilev, A.; Oprea, A.; Fordyce, A.; Anderson, H.; Davies, X.; Hamin, M. Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations; Technical Report NIST AI 100-2e2025; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2025. [Google Scholar] [CrossRef]
- Wang, H.; Li, H.; Huang, M.; Sha, L. ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing; Al-Onaizan, Y., Bansal, M., Chen, Y.N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 2697–2711. [Google Scholar] [CrossRef]
- Huang, D.; Shah, A.; Araujo, A.; Wagner, D.; Sitawarin, C. Stronger Universal and Transferable Attacks by Suppressing Refusals. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers); Chiruzzo, L., Ritter, A., Wang, L., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 5850–5876. [Google Scholar] [CrossRef]
- Yi, J.; Xie, Y.; Zhu, B.; Kiciman, E.; Sun, G.; Xie, X.; Wu, F. Benchmarking and Defending against Indirect Prompt Injection Attacks on Large Language Models. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1; ACM: New York, NY, USA, 2025; pp. 1809–1820. [Google Scholar] [CrossRef]
- Jacob, D.; Alzahrani, H.; Hu, Z.; Alomair, B.; Wagner, D. PromptShield: Deployable Detection for Prompt Injection Attacks. In Proceedings of the Fifteenth ACM Conference on Data and Application Security and Privacy; ACM: New York, NY, USA, 2025; pp. 341–352. [Google Scholar] [CrossRef]
- Protect AI. Rebuff: Prompt Injection Detector (Version v0.1.1); GitHub Repository Release; GitHub, Inc.: San Francisco, CA, USA, 2024; Available online: https://github.com/protectai/rebuff/releases/tag/v0.1.1 (accessed on 10 April 2026).
- Pingua, B.; Murmu, D.; Kandpal, M.; Rautaray, J.; Mishra, P.; Barik, R.K.; Saikia, M.J. Mitigating adversarial manipulation in LLMs: A prompt-based approach to counter Jailbreak attacks (Prompt-G). PeerJ Comput. Sci. 2024, 10, e2374. [Google Scholar] [CrossRef] [PubMed]
- Hung, K.H.; Ko, C.Y.; Rawat, A.; Chung, I.H.; Hsu, W.H.; Chen, P.Y. Attention Tracker: Detecting Prompt Injection Attacks in LLMs. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025; Chiruzzo, L., Ritter, A., Wang, L., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 2309–2322. [Google Scholar] [CrossRef]
- Greshake, K.; Abdelnabi, S.; Mishra, S.; Endres, C.; Holz, T.; Fritz, M. Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security; ACM: New York, NY, USA, 2023; pp. 79–90. [Google Scholar] [CrossRef]
- OWASP Foundation. LLM01: Prompt Injection—OWASP Top 10 for LLM Applications; Online Resource; OWASP Foundation: Wilmington, DE, USA, 2025; Available online: https://genai.owasp.org/llmrisk/llm01-prompt-injection/ (accessed on 8 April 2026).
- Robey, A.; Wong, E.; Hassani, H.; Pappas, G.J. SmoothLLM: Defending Large Language Models against Jailbreaking Attacks. Trans. Mach. Learn. Res. 2025, 1–41. Available online: https://openreview.net/forum?id=laPAh2hRFC (accessed on 10 April 2026).
- Shumailov, I.; Shumaylov, Z.; Zhao, Y.; Papernot, N.; Anderson, R.; Gal, Y. AI models collapse when trained on recursively generated data. Nature 2024, 631, 755–759. [Google Scholar] [CrossRef]
- Gavish, A.; Google GenAI Security Team. Mitigating Prompt Injection Attacks with a Layered Defense Strategy; Google Online Security Blog: Mountain View, CA, USA, 2025; Available online: https://blog.google/security/mitigating-prompt-injection-attacks/ (accessed on 10 April 2026).
- Microsoft Security Response Center (MSRC). How Microsoft Defends Against Indirect Prompt Injection Attacks; MSRC Blog: Redmond, WA, USA, 2025; Available online: https://www.microsoft.com/en-us/msrc/blog/2025/07/how-microsoft-defends-against-indirect-prompt-injection-attacks (accessed on 10 April 2026).
- Guardrails AI. detect_prompt_injection: Plugin for Detecting Prompt Injection Attacks; GitHub Repository: San Francisco, CA, USA, 2024; Available online: https://github.com/guardrails-ai/detect_prompt_injection (accessed on 10 April 2026).
- Kang, D.; Li, X.; Stoica, I.; Guestrin, C.; Zaharia, M.; Hashimoto, T. Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks. In Proceedings of the 2024 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 23 May 2024; pp. 132–143. [Google Scholar] [CrossRef]
- Chan, A.; Ezell, C.; Kaufmann, M.; Wei, K.; Hammond, L.; Bradley, H.; Bluemke, E.; Rajkumar, N.; Krueger, D.; Kolt, N.; et al. Visibility into AI Agents. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency; ACM: New York, NY, USA, 2024; pp. 958–973. [Google Scholar] [CrossRef]
- Maderamitla, P.; Katragadda, S.R. Observability for LLM apps: What to log, privacy-safe telemetry, KPIs. Front. Comput. Sci. Artif. Intell. 2026, 5, 10–14. [Google Scholar] [CrossRef]
- Chen, Y.; Li, H.; Sui, Y.; He, Y.; Liu, Y.; Song, Y.; Hooi, B. Can Indirect Prompt Injection Attacks Be Detected and Removed? In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Che, W., Nabende, J., Shutova, E., Pilehvar, M.T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 18189–18206. [Google Scholar] [CrossRef]
- Şaşal, S.; Can, Ö. Prompt Injection Attacks on Large Language Models: Multi-Model Security Analysis with Categorized Attack Types. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (KDIR); SciTePress: Setúbal, Portugal, 2025; pp. 517–524. [Google Scholar] [CrossRef]
- Addepalli, S.; Varun, Y.; Suggala, A.; Shanmugam, K.; Jain, P. Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts? In Proceedings of the International Conference on Learning Representations (ICLR 2025), Singapore, 24–28 April 2025; OpenReview.net, 2025. Volume 2025, pp. 43611–43631. Available online: https://openreview.net/forum?id=LO4MEPoqrG (accessed on 10 April 2026).
- Ji, J.; Hou, B.; Robey, A.; Pappas, G.J.; Hassani, H.; Zhang, Y.; Wong, E.; Chang, S. Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics; Inui, K., Sakti, S., Wang, H., Wong, D.F., Bhattacharyya, P., Banerjee, B., Ekbal, A., Chakraborty, T., Singh, D.P., Eds.; The Asian Federation of Natural Language Processing and The Association for Computational Linguistics: Mumbai, India, 2025; pp. 7–40. [Google Scholar] [CrossRef]
- Geng, T.; Xu, Z.; Qu, Y.; Wong, W.E. Prompt Injection Attacks on Large Language Models: A Survey of Attack Methods, Root Causes, and Defense Strategies. Comput. Mater. Contin. 2026, 87, 4. [Google Scholar] [CrossRef]
- National Cyber Security Centre (NCSC). Thinking About the Security of AI Systems; NCSC Blog: London, UK, 2025. Available online: https://www.ncsc.gov.uk/blog-post/thinking-about-security-ai-systems (accessed on 3 April 2026).
- González-Granadillo, G.; González-Zarzosa, S.; Diaz, R. Security Information and Event Management (SIEM): Analysis, Trends, and Usage in Critical Infrastructures. Sensors 2021, 21, 4759. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Jia, Y.; Geng, R.; Jia, J.; Gong, N.Z. Formalizing and Benchmarking Prompt Injection Attacks and Defenses. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24); USENIX Association: Berkeley, CA, USA, 2024; pp. 1831–1847. Available online: https://www.usenix.org/conference/usenixsecurity24/presentation/liu-yupei (accessed on 10 April 2026).
- Chen, S.; Piet, J.; Sitawarin, C.; Wagner, D. StruQ: Defending against prompt injection with structured queries. In Proceedings of the USENIX Security Symposium, Seattle, WA, USA, 13–15 August 2025. [Google Scholar]
- Chao, P.; Debenedetti, E.; Robey, A.; Andriushchenko, M.; Croce, F.; Sehwag, V.; Dobriban, E.; Flammarion, N.; Pappas, G.J.; Tramèr, F.; et al. JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems 37 (NeurIPS 2024), Datasets and Benchmarks Track, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar] [CrossRef]
- Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the Support of a High-Dimensional Distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef] [PubMed]
- Wahréus, J.; Hussain, A.M.; Papadimitratos, P. CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models; Online Resource; Github Repository: San Francisco, CA, USA, 2025; Available online: https://github.com/cysecbench/dataset/blob/main/CySecBench_paper.pdf (accessed on 20 March 2026).
- Taori, R.; Gulrajani, I.; Zhang, T.; Dubois, Y.; Li, X.; Guestrin, C.; Liang, P.; Hashimoto, T.B. Stanford Alpaca: An Instruction-following LLaMA Model; Github Repository: San Francisco, CA, USA, 2023; Available online: https://github.com/tatsu-lab/stanford_alpaca (accessed on 20 March 2026).
- Mazeika, M.; Phan, L.; Yin, X.; Zou, A.; Wang, Z.; Mu, N.; Sakhaee, E.; Li, N.; Basart, S.; Li, B.; et al. HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal. In Proceedings of the 41st International Conference on Machine Learning; Proceedings of Machine Learning Research; Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., Berkenkamp, F., Eds.; PMLR: Vienna, Austria, 2024; Volume 235, pp. 35181–35224. Available online: https://proceedings.mlr.press/v235/mazeika24a.html (accessed on 10 April 2026).
- Xu, Z.; Liu, Y.; Deng, G.; Li, Y.; Picek, S. A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 7432–7449. [Google Scholar] [CrossRef]








| Scenario | Configuration |
|---|---|
| A (Baseline) | Gateway keyword filter only |
| B (SIEM) | Gateway + SIEM correlation rules (R1–R4) |
| C (Hybrid) | Gateway + SIEM rules + OCSVM |
| Metric | A | B | C | Unit |
|---|---|---|---|---|
| Precision | 0.988 | 0.976 | 0.971 | – |
| Recall | 0.540 | 0.630 | 0.810 | – |
| F1-Score | 0.700 | 0.766 | 0.883 | – |
| FPR | 0.030 | 0.070 | 0.110 | – |
| ASR | 46.0 | 37.0 | 19.0 | % |
| AUC-ROC | 0.724 | 0.782 | 0.864 | – |
| MTTD | – | 4.1 | 2.3 | s |
| Accuracy | 61.8 | 68.5 | 82.4 | % |
| Defense | Recall | Precision | F1 | Integration |
|---|---|---|---|---|
| Keyword Filter | 0.540 | 0.988 | 0.700 | Standalone |
| Prompt-G | 0.72 | 0.81 | 0.76 | Standalone |
| PromptShield | 0.68 | 0.92 | 0.78 | Standalone |
| Attention Tracker | N/R | N/R | N/R | Requires internals |
| Rebuff | 0.65 | 0.89 | 0.75 | Standalone |
| SIEM-LLM (this work) | 0.810 | 0.971 | 0.883 | SIEM Integ. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Alshammari, A.A.; Alsaleh, O.I. Detecting Prompt Injection Attacks in Generative AI Systems: A Hybrid SIEM and One-Class SVM Framework. Electronics 2026, 15, 2242. https://doi.org/10.3390/electronics15112242
Alshammari AA, Alsaleh OI. Detecting Prompt Injection Attacks in Generative AI Systems: A Hybrid SIEM and One-Class SVM Framework. Electronics. 2026; 15(11):2242. https://doi.org/10.3390/electronics15112242
Chicago/Turabian StyleAlshammari, Abdulrahman A., and Omar I. Alsaleh. 2026. "Detecting Prompt Injection Attacks in Generative AI Systems: A Hybrid SIEM and One-Class SVM Framework" Electronics 15, no. 11: 2242. https://doi.org/10.3390/electronics15112242
APA StyleAlshammari, A. A., & Alsaleh, O. I. (2026). Detecting Prompt Injection Attacks in Generative AI Systems: A Hybrid SIEM and One-Class SVM Framework. Electronics, 15(11), 2242. https://doi.org/10.3390/electronics15112242

