A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server
Abstract
1. Introduction
1.1. Key Contributions
1.2. Research Utility
2. Data and Description
2.1. Scope
- Coverage period: 30 consecutive days (27 July 2025–25 August 2025)—randomly selected
- Total requests: 147,205
- Daily average: 4907
- Data formats: RAW ModSecurity audit log
2.2. Dataset Fields
2.3. Data Sample
| --c30afe70-A-- [01/Aug/2025:19:59:37 +0200] aI0AibHA3gfwyP1Wo76@lwAAAAg 100.122.171.187 47574 100.115.127.60 443 --c30afe70-B-- GET/.env HTTP/1.1 Host: www.d4941dc.hu User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 Connection: keep-alive Accept-Encoding: gzip --c30afe70-F-- HTTP/1.1 404 Not Found Upgrade: h2 Connection: Upgrade, Keep-Alive Last-Modified: Fri, 08 Dec 2023 13:47:02 GMT ETag: “70f-60bffd1aa7854” Accept-Ranges: bytes Content-Length: 1807 Keep-Alive: timeout=1, max=100 Content-Type: text/html --c30afe70-H-- Message: Warning. Matched phrase “/.env” at REQUEST_FILENAME. [file “/usr/share/modsecurity-crs/rules/REQUEST-930-APPLICATION-ATTACK-LFI.conf”] [line “125”] [id “930130”] [msg “Restricted File Access Attempt”] [data “Matched Data:/.env found within REQUEST_FILENAME:/.env”] [severity “CRITICAL”] [ver “OWASP_CRS/3.2.3”] [tag “application-multi”] [tag “language-multi”] [tag “platform-multi”] [tag “attack-lfi”] [tag “OWASP_CRS”] [tag “OWASP_CRS/WEB_ATTACK/FILE_INJECTION”] [tag “WASCTC/WASC-33”] [tag “OWASP_TOP_10/A4”] [tag “PCI/6.5.4”] Apache-Error: [file “apache2_util.c”] [line 273] [level 3] [client 100.122.171.187] ModSecurity: Warning. Matched phrase “/.env” at REQUEST_FILENAME. [file “/usr/share/modsecurity-crs/rules/REQUEST-930-APPLICATION-ATTACK-LFI.conf”] [line “125”] [id “930130”] [msg “Restricted File Access Attempt”] [data “Matched Data:/.env found within REQUEST_FILENAME:/.env”] [severity “CRITICAL”] [ver “OWASP_CRS/3.2.3”] [tag “application-multi”] [tag “language-multi”] [tag “platform-multi”] [tag “attack-lfi”] [tag “OWASP_CRS”] [tag “OWASP_CRS/WEB_ATTACK/FILE_INJECTION”] [tag “WASCTC/WASC-33”] [tag “OWASP_TOP_10/A4”] [tag “PCI/6.5.4”] [hostname “www.d4941dc.hu”] [uri “/.env”] [unique_id “aI0AibHA3gfwyP1Wo76@lwAAAAg”] Stopwatch: 1754071177142684 2452 (- - -) Stopwatch2: 1754071177142684 2452; combined=1448, p1=595, p2=743, p3=0, p4=0, p5=108, sr=100, sw=2, l=0, gc=0 Producer: ModSecurity for Apache/2.9.3 (http://www.modsecurity.org/); OWASP_CRS/3.2.3. Server: Apache Engine-Mode: “ENABLED” --c30afe70-Z-- |
2.4. Data Organization
3. Methods
3.1. Data Collection
3.2. Anonymization
- IPv4 addresses were consistently remapped to 100.64.0.0/10
- IPv6 addresses were consistently remapped to 2001:db8::/32
- Domains (Host, Authority, in URLs): The public suffix/TLD, as well as the label count, had been preserved, and the left labels had been anonymized.
- Directory paths in HTTP(S) URLs and request targets were anonymized to per-segment stable tokens with the slashes preserved
- Filesystem access paths starting with /var/www/ were anonymized per segment, and the final filename was preserved
- Cookies and sensitive query params were retained, but the values had been anonymized. If the value contained a domain name, the same procedure was applied as for the other domain names, rather than full anonymization.
- Any email addresses were anonymized unless they appeared in an agent string.
- Response filtering: All transactions that returned a “200 OK” response were omitted. Although these are part of the audit file, they triggered rules based on their output rather than the input itself and are not suitable for incoming data-based analysis.
- All request headers and values were retained.
- All response codes, headers, and values were retained.
- POST payloads were retained.
- User-agent strings were fully retained.
- Performance metrics, dates, and time stamps were fully retained.
- Path filenames (in URLs/requests) were fully preserved
- Query-string filenames were fully preserved
3.3. Attack Categorization
4. Dataset Characterization
4.1. Attack Category Distribution
4.2. Attack Code Distribution
4.3. User-Agent Patterns
5. Limitations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- ModSecurity. ModSecurity Open-Source Web Application Firewall. Available online: https://modsecurity.org (accessed on 22 September 2025).
- Bilic, I.; Josić, K.; Pranic, D.; Ribaric, S. Web application firewalls (WAFs) in protecting software. In Proceedings of the 35th DAAAM International Symposium on Intelligent Manufacturing and Automation, Vienna, Austria, 24–25 October 2024; DAAAM International: Vienna, Austria, 2024; pp. 306–311. [Google Scholar] [CrossRef]
- Prates, L.; dos Santos, R.P.; de Lima, T.L.; Costa, A.L. DevSecOps practices and tools. Int. J. Inf. Secur. 2025, 24, 11. [Google Scholar] [CrossRef]
- Dehlaghi-Ghadim, A.; Helali Moghadam, M.; Balador, A.; Hansson, H. ICS-Flow: An anomaly detection dataset for industrial control systems. arXiv 2023, arXiv:2305.09678. [Google Scholar] [CrossRef]
- Goldschmidt, P.; Chudá, D. Network intrusion datasets: A survey, limitations, and best practices. arXiv 2025, arXiv:2502.06688. [Google Scholar] [CrossRef]
- OWASP Core Rule Set Project. OWASP ModSecurity Core Rule Set. Available online: https://coreruleset.org (accessed on 22 September 2025).
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia, 10–12 November 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. CICIDS2017 Dataset. Canadian Institute for Cybersecurity, University of New Brunswick. 2018. Available online: https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 22 September 2025).
- Tavallaee, M.; Stakhanova, N.; Ghorbani, A.A. HTTP CSIC 2010 Dataset [Dataset]. Information Security Institute, Spanish Research Council (CSIC). 2010. Available online: https://www.kaggle.com/datasets/ispangler/csic-2010-web-application-attacks (accessed on 22 September 2025).
- Şen, Ö. Benchmark Evaluation of Anomaly-Based Intrusion Detection Systems. arXiv 2023, arXiv:2312.13705. [Google Scholar] [CrossRef]
- Lucz, G. A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server [Data Set]. Zenodo. 2025. Available online: https://zenodo.org/records/17178461 (accessed on 22 September 2025).
- Kasturi, G.; Zhao, P.; Alowaisheq, E.; Kotipalli, S.; Chen, Z. A large-scale study of malicious plugins in WordPress. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 2022), Boston, MA, USA, 10–12 August 2022; USENIX Association: Berkeley, CA, USA, 2022; pp. 1045–1062. Available online: https://www.usenix.org/conference/usenixsecurity22/presentation/kasturi (accessed on 22 September 2025).
- Mohamed Mohideen, M.A.; Nadeem, M.S.; Hardy, J.; Ali, H.; Tariq, U.U.; Sabrina, F.; Waqar, M.; Ahmed, S. Behind the Code: Identifying Zero-Day Exploits in WordPress. Futur. Internet 2024, 16, 256. [Google Scholar] [CrossRef]
- Thomas-Reynolds, D.; Butakov, S. Factors affecting the performance of web application firewall. In Proceedings of the 2020 Workshop on Information Security and Privacy (WISP 2020), Virtual, 12 December 2020; AIS Electronic Library (AISeL): Atlanta, GA, USA, 2020. Available online: https://aisel.aisnet.org/wisp2020/8 (accessed on 22 September 2025).
- glucz. Glucz/OWASP-Server-Configuration: Zenodo Release (v1.1). Zenodo. 2025. Available online: https://zenodo.org/records/17188106 (accessed on 22 September 2025).
- Antonov, A.; Sidorov, S. Web application firewalls: Comparative evaluation of ModSecurity, NAXSI, and Shadow Daemon. arXiv 2024, arXiv:2406.13547. [Google Scholar]
- OWASP ModSecurity Project. ModSecurity 2 Data Formats. GitHub. Available online: https://github.com/owasp-modsecurity/ModSecurity/wiki/ModSecurity-2-Data-Formats (accessed on 22 September 2025).
- Sarmin, S.; Sarkar, S.; Wang, Y.; Mohammed, N. Synthetic data: Revisiting the privacy–utility trade-off. arXiv 2025, arXiv:2502.19282. [Google Scholar] [CrossRef]
- Livadariu, I.; Dainotti, A.; Jonker, M.; Stiller, B.; Elmokashfi, A. On the accuracy of country-level IP geolocation. In Proceedings of the Applied Networking Research Workshop (ANRW 2020), Virtual, 30–31 July 2020; ACM/IRTF: New York, NY, USA, 2020; pp. 1–7. [Google Scholar] [CrossRef]
- Drivas, I.; Karampelas, P.; Anagnostopoulos, I.; Verginadis, Y. Content management systems performance and website speed. Information 2021, 12, 259. [Google Scholar] [CrossRef]
| Dataset | Source | Duration | Volume | Anonymization | Labeling |
|---|---|---|---|---|---|
| UNSW-NB15 [7] | Hybrid lab network (IXIA traffic generator + real attack traces) | Hours | 100 GB | IP address and payload | Attack category |
| CIC-IDS 2017 [8] | Simulated corporate network, including web traffic | 5 days | 80 GB | Partial | Attack type |
| HTTP CSIC 2010 [9] | Simulated e-commerce web application | Hours | 36,000 requests | Full | Normal or Attack |
| This work | Single live server, PHP-based + WordPress applications, web traffic only. | 30 days | 29 GB 142,000 requests | Stable tokens for path and domains, IP address hashed | OWASP WAF labels |
| Violation Category | Hit Count |
|---|---|
| WEB_ATTACK/FILE_INJECTION | 49,329 |
| UNTAGGED (444444) | 47,536 |
| POLICY/HEADER_RESTRICTED | 12,188 |
| UNTAGGED (920340) | 9492 |
| POLICY/EXT_RESTRICTED | 7882 |
| PROTOCOL_VIOLATION/INVALID_HREQ | 6885 |
| PROTOCOL_VIOLATION/IP_HOST | 5177 |
| WEB_ATTACK/SQL_INJECTION | 4062 |
| WEB_ATTACK/COMMAND_INJECTION | 3093 |
| POLICY/CONTENT_TYPE_NOT_ALLOWED | 1857 |
| WEB_ATTACK/PHP_INJECTION | 1718 |
| WEB_ATTACK/DIR_TRAVERSAL | 975 |
| UNTAGGED (920600) | 530 |
| PROTOCOL_VIOLATION/CONTENT_TYPE | 487 |
| AUTOMATION/SECURITY_SCANNER | 369 |
| PROTOCOL_VIOLATION/EVASION | 252 |
| WEB_ATTACK/XSS | 234 |
| POLICY/METHOD_NOT_ALLOWED | 143 |
| WEB_ATTACK/JAVA_INJECTION | 97 |
| WEB_ATTACK/RFI | 78 |
| UNTAGGED (933160) | 31 |
| UNTAGGED (941120) | 21 |
| UNTAGGED (932130) | 17 |
| UNTAGGED (944100) | 17 |
| UNTAGGED (944110) | 17 |
| UNTAGGED (944130) | 17 |
| WEB_ATTACK/HEADER_INJECTION | 17 |
| UNTAGGED (200002) | 16 |
| UNTAGGED (932105) | 11 |
| PROTOCOL_VIOLATION/INVALID_REQ | 10 |
| UNTAGGED (932115) | 9 |
| WEB_ATTACK/NODEJS_INJECTION | 8 |
| UNTAGGED (941130) | 7 |
| WEB_ATTACK/RESPONSE_SPLITTING | 6 |
| PROTOCOL_VIOLATION/CONTENT_TYPE_CHARSET | 3 |
| UNTAGGED (934100) | 3 |
| PROTOCOL_VIOLATION/EMPTY_HEADER_UA | 2 |
| UNTAGGED (922120) | 2 |
| UNTAGGED (932150) | 2 |
| UNTAGGED (942350) | 2 |
| UNTAGGED (942360) | 2 |
| WEB_ATTACK/SESSION_FIXATION | 2 |
| UNTAGGED (921150) | 1 |
| UNTAGGED (941170) | 1 |
| Rule ID | Security or Performance Violation | Hit Count |
|---|---|---|
| 930130 | Restricted File Access Attempt | 48,216 |
| 444444 | BAD BOT—Detected and Blocked. | 47,536 |
| 920450 | HTTP header is restricted by policy (/accept-charset/) | 12,147 |
| 920340 | Request Containing Content, but Missing Content-Type header | 9659 |
| 920440 | URL file extension is restricted by policy | 7882 |
| 920210 | Multiple/Conflicting Connection Header Data Found. | 6710 |
| 920350 | Host header is a numeric IP address | 5177 |
| 942100 | SQL Injection Attack Detected via libinjection | 3978 |
| 942360 | Detects concatenated basic SQL injection and SQLLFI attempts | 2155 |
| 932150 | Remote Command Execution: Direct Unix Command Execution | 2074 |
| 920420 | Request content type is not allowed by policy | 1857 |
| 942160 | Detects blind sqli tests using sleep() or benchmark(). | 1630 |
| 932130 | Remote Command Execution: Unix Shell Expression Found | 1507 |
| 930120 | OS File Access Attempt | 1115 |
| 933160 | PHP Injection Attack: High-Risk PHP Function Call Found | 2086 |
| 930100 | Path Traversal Attack (/../) | 949 |
| 930110 | Path Traversal Attack (/../) | 806 |
| 932160 | Remote Command Execution: Unix Shell Code Found | 772 |
| 933150 | PHP Injection Attack: High-Risk PHP Function Name Found | 761 |
| 920600 | Illegal Accept header: charset parameter | 605 |
| 920470 | Illegal Content-Type header | 488 |
| 933160 | PHP Injection Attack: High-Risk PHP Function Call Found | 434 |
| 913100 | Found User-Agent associated with security scanner | 369 |
| 933100 | PHP Injection Attack: PHP Open Tag Found | 266 |
| 932115 | Remote Command Execution: Windows Command Injection | 228 |
| 942190 | Detects MSSQL code execution and information gathering attempts | 211 |
| 942280 | Detects Postgres pg_sleep injection, waitfor delay attacks and database shutdown attempts | 191 |
| 933140 | PHP Injection Attack: I/O Stream Found | 182 |
| 920170 | GET or HEAD Request with Body Content. | 171 |
| 921421 | Content-Type header: Dangerous content type outside the mime type declaration | 169 |
| 911100 | Method is not allowed by policy | 143 |
| 933120 | PHP Injection Attack: Configuration Directive Found | 141 |
| 933110 | PHP Injection Attack: PHP Script File Upload Found | 140 |
| 920270 | Invalid character in request (null character) | 138 |
| 932105 | Remote Command Execution: Unix Command Injection | 124 |
| 944130 | Suspicious Java class detected | 110 |
| 941120 | XSS Filter—Category 2: Event Handler Vector | 101 |
| 941110 | XSS Filter—Category 1: Script Tag Vector | 85 |
| 931100 | Possible Remote File Inclusion (RFI) Attack: URL Parameter using IP Address | 78 |
| 920240 | URL Encoding Abuse Attack Attempt | 74 |
| 942240 | Detects MySQL charset switch and MSSQL DoS attempts | 74 |
| 944110 | Remote Command Execution: Java process spawn (CVE-2017-9805) | 73 |
| 944100 | Remote Command Execution: Suspicious Java class detected | 67 |
| 942270 | Looking for basic sql injection. Common attack string for mysql, oracle and others. | 42 |
| 941130 | XSS Filter—Category 3: Attribute Vector | 41 |
| 920220 | URL Encoding Abuse Attack Attempt | 40 |
| 920450 | HTTP header is restricted by policy (/content-encoding/) | 37 |
| 942170 | Detects SQL benchmark and sleep injection attempts including conditional queries | 34 |
| 942140 | SQL Injection Attack: Common DB Names Detected | 30 |
| 942350 | Detects MySQL UDF injection and other data/structure manipulation attempts | 26 |
| 200002 | Failed to parse request body. | 25 |
| 941180 | Node-Validator Blacklist Keywords | 24 |
| 941170 | NoScript XSS InjectionChecker: Attribute Injection | 20 |
| 921150 | HTTP Header Injection Attack via payload (CR/LF detected) | 18 |
| 932170 | Remote Command Execution: Shellshock (CVE-2014-6271) | 16 |
| 934100 | Node.js Injection Attack | 16 |
| 941140 | XSS Filter—Category 4: Javascript URI Vector | 14 |
| 933170 | PHP Injection Attack: Serialized Object Injection | 10 |
| 942230 | Detects conditional SQL injection attempts | 9 |
| 920100 | Invalid HTTP Request Line | 8 |
| 933130 | PHP Injection Attack: Variables Found | 8 |
| 941210 | IE XSS Filters—Attack Detected. | 8 |
| 933210 | PHP Injection Attack: Variable Function Call Found | 7 |
| 921130 | HTTP Response Splitting Attack | 6 |
| 933180 | PHP Injection Attack: Variable Function Call Found | 6 |
| 941190 | IE XSS Filters—Attack Detected. | 6 |
| 941370 | JavaScript global variable found | 5 |
| 920180 | POST without Content-Length or Transfer-Encoding headers. | 4 |
| 920450 | HTTP header is restricted by policy (/content-range/) | 4 |
| 942500 | MySQL in-line comment detected. | 4 |
| 944120 | Remote Command Execution: Java serialization (CVE-2015-5842) | 4 |
| 920480 | Request content type charset is not allowed by policy | 3 |
| 920120 | Attempted multipart/form-data bypass | 2 |
| 920330 | Empty User Agent Header | 2 |
| 921120 | HTTP Response Splitting Attack | 2 |
| 921160 | HTTP Header Injection Attack via payload (CR/LF and header-name detected) | 2 |
| 922120 | Content-Transfer-Encoding was deprecated by rfc7578 in 2015 and should not be used | 2 |
| 933200 | PHP Injection Attack: Wrapper scheme detected | 2 |
| 941250 | IE XSS Filters—Attack Detected. | 2 |
| 942320 | Detects MySQL and PostgreSQL stored procedure/function injections | 2 |
| 943120 | Possible Session Fixation Attack: SessionID Parameter Name with No Referer | 2 |
| 932140 | Remote Command Execution: Windows FOR/IF Command Found | 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lucz, G.; Forstner, B. A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server. Data 2025, 10, 186. https://doi.org/10.3390/data10110186
Lucz G, Forstner B. A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server. Data. 2025; 10(11):186. https://doi.org/10.3390/data10110186
Chicago/Turabian StyleLucz, Geza, and Bertalan Forstner. 2025. "A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server" Data 10, no. 11: 186. https://doi.org/10.3390/data10110186
APA StyleLucz, G., & Forstner, B. (2025). A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server. Data, 10(11), 186. https://doi.org/10.3390/data10110186

