A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server

Lucz, Geza; Forstner, Bertalan

doi:10.3390/data10110186

Open AccessData Descriptor

A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server

by

Geza Lucz

^*

and

Bertalan Forstner

Department of Automation and Applied Informatics, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, Műegyetem rkp. 3., H-1111 Budapest, Hungary

^*

Author to whom correspondence should be addressed.

Data 2025, 10(11), 186; https://doi.org/10.3390/data10110186

Submission received: 25 September 2025 / Revised: 26 October 2025 / Accepted: 9 November 2025 / Published: 11 November 2025

(This article belongs to the Section Information Systems and Data Management)

Download Versions Notes

Abstract

We present a real-world dataset capturing thirty consecutive days of malicious HTTP traffic filtered and blocked by the OWASP ModSecurity Web Application Firewall (WAF) on a live production server. Each entry corresponds to a request that triggered one or more rules in the OWASP Core Rule Set (CRS), resulting in its inclusion in the audit log due to suspected exploitation attempts. The dataset includes attack categories such as SQL injection, cross-site scripting (XSS), local file inclusion, scanner probes, and various malformed or evasive input forms. The data has been carefully anonymized to protect sensitive information while preserving critical structural tags, including request method, URI, triggered rule IDs, request headers, and user-agent strings. This dataset provides a real-world resource for cybersecurity researchers, particularly those developing or evaluating intrusion detection systems (IDSs), WAF rule tuning strategies, anomaly detection algorithms, and adversarial machine learning models. The dataset also allows performance testing of threat prevention pipelines. By making this dataset publicly available, we aim to support reproducible research in web security, encourage benchmarking of detection techniques under real-world conditions, and contribute insight into the nature of contemporary web-based threats observed in an uncontrolled environment.

Keywords:

Web Application Firewall (WAF); ModSecurity; OWASP Core Rule Set (CRS); HTTP request filtering; intrusion detection dataset; real-world dataset; anonymized network data

1. Introduction

Today, web applications remain a primary target for attackers due to their direct exposure and frequent role as entry points to sensitive systems and data. Although Web Application Firewalls (WAFs) such as OWASP ModSecurity [1] are widely deployed [2,3] to mitigate common exploitation attempts, the academic community often lacks open, real-world data to validate detection and defense strategies. Many public datasets in this area originate from controlled experiments [4]. A review of 89 publicly available NIDS datasets revealed that a substantial proportion were generated under experimental conditions, involving simulated or emulated components rather than authentic network traffic with natural background noise [5].

This paper introduces a dataset of blocked HTTP requests collected over a continuous thirty-day period from a production server operating in an uncontrolled, adversarial environment. The dataset provides request-level details and is anonymized to protect sensitive information while retaining attack-relevant features such as CRS rule IDs [6] and payload and HTTP header information.

Table 1 summarizes the key characteristics of widely used datasets and contrasts them with our own. Our goal is to bridge the gap between simulated and real data over HTTP(S) transport, forming the basis for the contributions discussed in the following sections.

1.1. Key Contributions

The dataset exhibits several key properties that enhance its research value. First, it demonstrates realism, as the traffic reflects live attack attempts against a real-world application environment without reliance on synthetic injection. Second, it ensures diversity, spanning multiple attack classes that include, but are not limited to, injection attempts, scanner probes, and obfuscation strategies. The dataset is anonymized with utility preservation whereby sensitive identifiers have been sanitized while retaining structural features essential for analysis. Finally, it emphasizes accessibility, being provided in its original raw format to facilitate use across diverse research domains.

1.2. Research Utility

The dataset provides multiple avenues for empirical research, including benchmarking IDS/IPS models against much-needed real-world attack data [10] and developing anomaly detection algorithms that are sensitive to obfuscated payloads. It also enables the study of automated scanning and bot-driven exploitation attempts, as well as the evaluation of rule tuning and false-positive mitigation strategies in WAFs. Moreover, the dataset can be applied to training adversarial machine learning models with real attack payloads while adhering to anonymization constraints.

2. Data and Description

The dataset is available on Zenodo [11] under the Creative Commons Attribution 4.0 International license. It consists of request-level ModSecurity audit log entries from a live web server protected by ModSecurity v[2.9.3-1+deb10u2] with the OWASP CRS v[3.2.3-0+deb10u3]. The source server was part of a commercial fleet, running multiple WordPress installations with the WooCommerce e-commerce addon, as well as custom PHP scripts, managed by third-party developers. Customer-managed WordPress installations tend to receive updates in a delayed manner, if at all, and as such are a common target for exploitation [12,13].

In our case, ModSecurity was configured to maximize end-user performance; therefore, a single additional custom rule (ID: 444444) was added for “early denial” of access to “well-known” abusive bots based on the user-agent string. The exact rule can be found in the supporting OWASP server configuration documentation; however, structurally, it lists all discovered data scraper bots under a single rule ID and is expected to dominate the statistics.

To protect WordPress’s daily operations, seven rules were removed for known detrimental interactions (ID: 941160, 949110, 980130, 941100, 932110, 200004, 932100) based on customer feedback. This means that when genuine customers encountered a WAF-based rejection while operating either the front-end or back-end of WordPress, the conflicting rule was deactivated. Having an early rejection option and a simplified ruleset have also been found to significantly boost performance [14].

Only those anomalous requests that had been tagged by the WAF module are included in the dataset. The exact server configurations are available from GitHub and Zenodo [15]. Using these files will set up the same logging pipeline utilized in the data collection.

2.1. Scope

Coverage period: 30 consecutive days (27 July 2025–25 August 2025)—randomly selected
Total requests: 147,205
Daily average: 4907
Data formats: RAW ModSecurity audit log

2.2. Dataset Fields

Each record was assigned a unique identifier {UNIQUE_ID}, and the audit data was divided into a maximum of six sections (not including the closing section).

The --{UNIQUE_ID}-A-- section contains the audit log header, which provides general metadata about the transaction, including the unique transaction ID, client IP address, timestamp, host, and request line (method, URI, and protocol).

The --{UNIQUE_ID}-B-- section stores the request headers, capturing all HTTP request headers as received from the client.

The --{UNIQUE_ID}-C-- section records the request body, typically consisting of POST data or payloads.

The --{UNIQUE_ID}-E-- section holds the intermediary response body, which is not always available. This part contains the response body as observed at the end of the response phase, before any transformations or filtering applied by ModSecurity or the server.

The --{UNIQUE_ID}-F-- section includes the final response headers that ModSecurity and other web server modules sent back to the client.

The --{UNIQUE_ID}-H-- section provides audit log trailer information, including details about rules triggered, actions taken, tagging data, and performance metrics.

Finally, the --{UNIQUE_ID}-Z-- section serves as the audit log terminator, marking the end of the log entry for that transaction.

2.3. Data Sample

We include a representative log excerpt to illustrate the plain-text raw log format. This example illustrates the dataset’s structure and highlights its straightforward accessibility for text-based processing utilities, enabling researchers to quickly adapt existing log analysis tools or develop new ones for experimental purposes.

--c30afe70-A--
[01/Aug/2025:19:59:37 +0200] aI0AibHA3gfwyP1Wo76@lwAAAAg 100.122.171.187 47574 100.115.127.60 443
--c30afe70-B--
GET/.env HTTP/1.1
Host: www.d4941dc.hu
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
Connection: keep-alive
Accept-Encoding: gzip
--c30afe70-F--
HTTP/1.1 404 Not Found
Upgrade: h2
Connection: Upgrade, Keep-Alive
Last-Modified: Fri, 08 Dec 2023 13:47:02 GMT
ETag: “70f-60bffd1aa7854”
Accept-Ranges: bytes
Content-Length: 1807
Keep-Alive: timeout=1, max=100
Content-Type: text/html
--c30afe70-H--
Message: Warning. Matched phrase “/.env” at REQUEST_FILENAME. [file “/usr/share/modsecurity-crs/rules/REQUEST-930-APPLICATION-ATTACK-LFI.conf”] [line “125”] [id “930130”] [msg “Restricted File Access Attempt”] [data “Matched Data:/.env found within REQUEST_FILENAME:/.env”] [severity “CRITICAL”] [ver “OWASP_CRS/3.2.3”] [tag “application-multi”] [tag “language-multi”] [tag “platform-multi”] [tag “attack-lfi”] [tag “OWASP_CRS”] [tag “OWASP_CRS/WEB_ATTACK/FILE_INJECTION”] [tag “WASCTC/WASC-33”] [tag “OWASP_TOP_10/A4”] [tag “PCI/6.5.4”]
Apache-Error: [file “apache2_util.c”] [line 273] [level 3] [client 100.122.171.187] ModSecurity: Warning. Matched phrase “/.env” at REQUEST_FILENAME. [file “/usr/share/modsecurity-crs/rules/REQUEST-930-APPLICATION-ATTACK-LFI.conf”] [line “125”] [id “930130”] [msg “Restricted File Access Attempt”] [data “Matched Data:/.env found within REQUEST_FILENAME:/.env”] [severity “CRITICAL”] [ver “OWASP_CRS/3.2.3”] [tag “application-multi”] [tag “language-multi”] [tag “platform-multi”] [tag “attack-lfi”] [tag “OWASP_CRS”] [tag “OWASP_CRS/WEB_ATTACK/FILE_INJECTION”] [tag “WASCTC/WASC-33”] [tag “OWASP_TOP_10/A4”] [tag “PCI/6.5.4”] [hostname “www.d4941dc.hu”] [uri “/.env”] [unique_id “aI0AibHA3gfwyP1Wo76@lwAAAAg”]
Stopwatch: 1754071177142684 2452 (- - -)
Stopwatch2: 1754071177142684 2452; combined=1448, p1=595, p2=743, p3=0, p4=0, p5=108, sr=100, sw=2, l=0, gc=0
Producer: ModSecurity for Apache/2.9.3 (http://www.modsecurity.org/); OWASP_CRS/3.2.3.
Server: Apache
Engine-Mode: “ENABLED”
--c30afe70-Z--

2.4. Data Organization

The dataset is distributed as a compressed archive, with a separate folder representing each day of the thirty-day observation period. The folder names correspond to calendar dates in the format dd-MMM-yyyy. The raw ModSecurity audit data is stored in a single file named modsec_audit.anon.log within each folder. The day-by-day separation allows for both incremental loading and parallelized analysis.

3. Methods

3.1. Data Collection

For our data collection, we used modSecurity as the “standard” open-source WAF, the best current candidate to receive machine learning-based extensions [16]. All requests that matched any WAF rule were logged. Logs were written in native ModSecurity audit log format with full metadata [17].

3.2. Anonymization

Because raw logs contain sensitive client and server data, a deterministic anonymization was applied. Deterministic means that given the same input, it will provide the same output. In the case of IP addresses, this involves a random table lookup between source and target ranges, eliminating collisions. In contrast, text-based data undergoes a fixed salt-based one-way encryption with a maximum length of 16 bytes (SHA-256 (salt + payload)[:16]), limiting the chance of collisions. These methods ensure integrity across files while preventing the recovery of the original values. We balanced privacy preservation against data fidelity [18] with the following transformations.

IPv4 addresses were consistently remapped to 100.64.0.0/10
IPv6 addresses were consistently remapped to 2001:db8::/32
Domains (Host, Authority, in URLs): The public suffix/TLD, as well as the label count, had been preserved, and the left labels had been anonymized.
Directory paths in HTTP(S) URLs and request targets were anonymized to per-segment stable tokens with the slashes preserved
Filesystem access paths starting with /var/www/ were anonymized per segment, and the final filename was preserved
Cookies and sensitive query params were retained, but the values had been anonymized. If the value contained a domain name, the same procedure was applied as for the other domain names, rather than full anonymization.
Any email addresses were anonymized unless they appeared in an agent string.
Response filtering: All transactions that returned a “200 OK” response were omitted. Although these are part of the audit file, they triggered rules based on their output rather than the input itself and are not suitable for incoming data-based analysis.

All other content was subject to the email filtering rules; therefore, it was preserved unless it contained an email address, in which case the email address was masked. This includes the following:

All request headers and values were retained.
All response codes, headers, and values were retained.
POST payloads were retained.
User-agent strings were fully retained.
Performance metrics, dates, and time stamps were fully retained.
Path filenames (in URLs/requests) were fully preserved
Query-string filenames were fully preserved

3.3. Attack Categorization

Requests were categorized according to the CRS rule tags associated with each alert. The categories identified include SQL injection (SQLi), cross-site scripting (XSS), local file inclusion (LFI), remote file inclusion (RFI), scanner and reconnaissance activity, as well as protocol anomalies and obfuscation. A single request may fall into multiple categories if more than one rule is triggered.

4. Dataset Characterization

Our objective was to preserve the original dataset and enable researchers to conduct independent analyses without being influenced by our prior interpretations. Limiting our intervention to anonymization ensures that the dataset remains as close as possible to the original while safeguarding sensitive information. This approach enables the application of various analytical methods, including off-the-shelf software currently used to analyze ModSecurity audit files. Furthermore, the current dataset is well-suited for exploratory research and benchmarking new detection techniques under realistic conditions.

4.1. Attack Category Distribution

Table 2 summarizes the observed attack categories. The information is located in an H section tag in the raw audit log files. Some rules, including the custom 444444 bot rejection section, do not contain tag information. These are listed as UNTAGGED, followed by the rule ID that triggered the audit log entry.

4.2. Attack Code Distribution

Table 3 summarizes the attack codes that had been observed. The raw audit log files contain this information as “id” and “msg”.

4.3. User-Agent Patterns

We provide for analysis of user-agent strings; all original information has been preserved.

5. Limitations

The IP addresses in the database were converted consistently, so that a specific address was always replaced with its exact replacement. While this allows accurate statistics regarding the unique attacker sources, they cannot be used to determine geographic and network origins. While IP address and geographic location matching accuracy is limited at best [19], that information is lost through the obfuscation process. While this limits threat intelligence-style research, the utility for attack patterns is fully preserved.

All data used in this study were collected from a single server. The attack distribution and access patterns are therefore specific to the hosted content, namely WordPress sites and the most associated back-end components: MySQL and PHP scripting tools. This scope inherently limits generalizability.

The following rules, 941160, 949110, 980130, 941100, 932110, 200004, 932100, have been deactivated in ModSecurity to allow a seamless WordPress experience to end-users. Therefore, these violations do not appear in the attack code and category distributions.

During the anonymization process, file and URL path information were converted to stable tokens. This prevents the full interpretation of messages that were triggered by a path segment match. However, the trigger message contains the matched string or a regex with a group of strings that activate the match.

The dataset contains only those hits that have triggered at least a single ModSecurity rule. This means that requests that may have been malicious but were not detected by ModSecurity are excluded.

ModSecurity is a mature software solution that can be configured in multiple ways to handle malicious requests. We included our full configuration to allow for a comprehensive analysis of why a particular request resulted in a specific response or error code. Different configurations could have triggered the same rule and created an entry in the audit log, but with a different response code.

6. Conclusions

This dataset offers a rare, now publicly available view of real-world web attack traffic, anonymized to protect sensitive information while preserving research value. It is intended to support reproducible evaluation of detection methods, provide evidence for understanding attacker behavior, and serve as a basis for advocating web application security.

Beyond its use for security research, the dataset also highlights a performance optimization opportunity for WordPress. WordPress constructs all error pages as if they were intended for human users, consuming roughly the same CPU, I/O, and network resources as serving a normal page. By estimating the total resources spent on generating these error responses and partially or fully offloading them through simplified error pipelines or lightweight redirects, the system can maintain higher responsiveness during peak load, a main weakness of CMS platforms, especially when improperly tuned [20]. Running the dataset under different error-handling modes enables the quantification of these potential performance gains and the identification of the most efficient mitigation strategy.

Author Contributions

Methodology, G.L., Data curation, G.L., Writing—original draft, G.L., Writing—review & editing, G.L. and B.F., Supervision, B.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Artificial Intelligence National Laboratory: European Union project RRF-2.3.1-21-2022-00004.

Institutional Review Board Statement

The dataset was collected in accordance with applicable legal and ethical requirements. The server operator (the authors’ own infrastructure) provided explicit approval for data collection and publication. Only requests flagged as anomalous or malicious by the Web Application Firewall were included, and all sensitive identifiers were extensively anonymized to ensure that no personal data could be traced back to individual users. This anonymization process aligns with GDPR principles by removing or hashing personal identifiers while preserving research-relevant structural features. A risk assessment was conducted prior to publication, and the dataset was released under appropriate safeguards to strike a balance between research utility and privacy protection.

Data Availability Statement

The dataset is publicly available at https://doi.org/10.5281/zenodo.17178461 (accessed on 22 September 2025).

Conflicts of Interest

The authors declare no conflict of interest.

References

ModSecurity. ModSecurity Open-Source Web Application Firewall. Available online: https://modsecurity.org (accessed on 22 September 2025).
Bilic, I.; Josić, K.; Pranic, D.; Ribaric, S. Web application firewalls (WAFs) in protecting software. In Proceedings of the 35th DAAAM International Symposium on Intelligent Manufacturing and Automation, Vienna, Austria, 24–25 October 2024; DAAAM International: Vienna, Austria, 2024; pp. 306–311. [Google Scholar] [CrossRef]
Prates, L.; dos Santos, R.P.; de Lima, T.L.; Costa, A.L. DevSecOps practices and tools. Int. J. Inf. Secur. 2025, 24, 11. [Google Scholar] [CrossRef]
Dehlaghi-Ghadim, A.; Helali Moghadam, M.; Balador, A.; Hansson, H. ICS-Flow: An anomaly detection dataset for industrial control systems. arXiv 2023, arXiv:2305.09678. [Google Scholar] [CrossRef]
Goldschmidt, P.; Chudá, D. Network intrusion datasets: A survey, limitations, and best practices. arXiv 2025, arXiv:2502.06688. [Google Scholar] [CrossRef]
OWASP Core Rule Set Project. OWASP ModSecurity Core Rule Set. Available online: https://coreruleset.org (accessed on 22 September 2025).
Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia, 10–12 November 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–6. [Google Scholar] [CrossRef]
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. CICIDS2017 Dataset. Canadian Institute for Cybersecurity, University of New Brunswick. 2018. Available online: https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 22 September 2025).
Tavallaee, M.; Stakhanova, N.; Ghorbani, A.A. HTTP CSIC 2010 Dataset [Dataset]. Information Security Institute, Spanish Research Council (CSIC). 2010. Available online: https://www.kaggle.com/datasets/ispangler/csic-2010-web-application-attacks (accessed on 22 September 2025).
Şen, Ö. Benchmark Evaluation of Anomaly-Based Intrusion Detection Systems. arXiv 2023, arXiv:2312.13705. [Google Scholar] [CrossRef]
Lucz, G. A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server [Data Set]. Zenodo. 2025. Available online: https://zenodo.org/records/17178461 (accessed on 22 September 2025).
Kasturi, G.; Zhao, P.; Alowaisheq, E.; Kotipalli, S.; Chen, Z. A large-scale study of malicious plugins in WordPress. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 2022), Boston, MA, USA, 10–12 August 2022; USENIX Association: Berkeley, CA, USA, 2022; pp. 1045–1062. Available online: https://www.usenix.org/conference/usenixsecurity22/presentation/kasturi (accessed on 22 September 2025).
Mohamed Mohideen, M.A.; Nadeem, M.S.; Hardy, J.; Ali, H.; Tariq, U.U.; Sabrina, F.; Waqar, M.; Ahmed, S. Behind the Code: Identifying Zero-Day Exploits in WordPress. Futur. Internet 2024, 16, 256. [Google Scholar] [CrossRef]
Thomas-Reynolds, D.; Butakov, S. Factors affecting the performance of web application firewall. In Proceedings of the 2020 Workshop on Information Security and Privacy (WISP 2020), Virtual, 12 December 2020; AIS Electronic Library (AISeL): Atlanta, GA, USA, 2020. Available online: https://aisel.aisnet.org/wisp2020/8 (accessed on 22 September 2025).
glucz. Glucz/OWASP-Server-Configuration: Zenodo Release (v1.1). Zenodo. 2025. Available online: https://zenodo.org/records/17188106 (accessed on 22 September 2025).
Antonov, A.; Sidorov, S. Web application firewalls: Comparative evaluation of ModSecurity, NAXSI, and Shadow Daemon. arXiv 2024, arXiv:2406.13547. [Google Scholar]
OWASP ModSecurity Project. ModSecurity 2 Data Formats. GitHub. Available online: https://github.com/owasp-modsecurity/ModSecurity/wiki/ModSecurity-2-Data-Formats (accessed on 22 September 2025).
Sarmin, S.; Sarkar, S.; Wang, Y.; Mohammed, N. Synthetic data: Revisiting the privacy–utility trade-off. arXiv 2025, arXiv:2502.19282. [Google Scholar] [CrossRef]
Livadariu, I.; Dainotti, A.; Jonker, M.; Stiller, B.; Elmokashfi, A. On the accuracy of country-level IP geolocation. In Proceedings of the Applied Networking Research Workshop (ANRW 2020), Virtual, 30–31 July 2020; ACM/IRTF: New York, NY, USA, 2020; pp. 1–7. [Google Scholar] [CrossRef]
Drivas, I.; Karampelas, P.; Anagnostopoulos, I.; Verginadis, Y. Content management systems performance and website speed. Information 2021, 12, 259. [Google Scholar] [CrossRef]

Table 1. Dataset comparison.

Dataset	Source	Duration	Volume	Anonymization	Labeling
UNSW-NB15 [7]	Hybrid lab network (IXIA traffic generator + real attack traces)	Hours	100 GB	IP address and payload	Attack category
CIC-IDS 2017 [8]	Simulated corporate network, including web traffic	5 days	80 GB	Partial	Attack type
HTTP CSIC 2010 [9]	Simulated e-commerce web application	Hours	36,000 requests	Full	Normal or Attack
This work	Single live server, PHP-based + WordPress applications, web traffic only.	30 days	29 GB 142,000 requests	Stable tokens for path and domains, IP address hashed	OWASP WAF labels

Table 2. Attack category statistics.

Violation Category	Hit Count
WEB_ATTACK/FILE_INJECTION	49,329
UNTAGGED (444444)	47,536
POLICY/HEADER_RESTRICTED	12,188
UNTAGGED (920340)	9492
POLICY/EXT_RESTRICTED	7882
PROTOCOL_VIOLATION/INVALID_HREQ	6885
PROTOCOL_VIOLATION/IP_HOST	5177
WEB_ATTACK/SQL_INJECTION	4062
WEB_ATTACK/COMMAND_INJECTION	3093
POLICY/CONTENT_TYPE_NOT_ALLOWED	1857
WEB_ATTACK/PHP_INJECTION	1718
WEB_ATTACK/DIR_TRAVERSAL	975
UNTAGGED (920600)	530
PROTOCOL_VIOLATION/CONTENT_TYPE	487
AUTOMATION/SECURITY_SCANNER	369
PROTOCOL_VIOLATION/EVASION	252
WEB_ATTACK/XSS	234
POLICY/METHOD_NOT_ALLOWED	143
WEB_ATTACK/JAVA_INJECTION	97
WEB_ATTACK/RFI	78
UNTAGGED (933160)	31
UNTAGGED (941120)	21
UNTAGGED (932130)	17
UNTAGGED (944100)	17
UNTAGGED (944110)	17
UNTAGGED (944130)	17
WEB_ATTACK/HEADER_INJECTION	17
UNTAGGED (200002)	16
UNTAGGED (932105)	11
PROTOCOL_VIOLATION/INVALID_REQ	10
UNTAGGED (932115)	9
WEB_ATTACK/NODEJS_INJECTION	8
UNTAGGED (941130)	7
WEB_ATTACK/RESPONSE_SPLITTING	6
PROTOCOL_VIOLATION/CONTENT_TYPE_CHARSET	3
UNTAGGED (934100)	3
PROTOCOL_VIOLATION/EMPTY_HEADER_UA	2
UNTAGGED (922120)	2
UNTAGGED (932150)	2
UNTAGGED (942350)	2
UNTAGGED (942360)	2
WEB_ATTACK/SESSION_FIXATION	2
UNTAGGED (921150)	1
UNTAGGED (941170)	1

Table 3. Attack rule violation ID statistics.

Rule ID	Security or Performance Violation	Hit Count
930130	Restricted File Access Attempt	48,216
444444	BAD BOT—Detected and Blocked.	47,536
920450	HTTP header is restricted by policy (/accept-charset/)	12,147
920340	Request Containing Content, but Missing Content-Type header	9659
920440	URL file extension is restricted by policy	7882
920210	Multiple/Conflicting Connection Header Data Found.	6710
920350	Host header is a numeric IP address	5177
942100	SQL Injection Attack Detected via libinjection	3978
942360	Detects concatenated basic SQL injection and SQLLFI attempts	2155
932150	Remote Command Execution: Direct Unix Command Execution	2074
920420	Request content type is not allowed by policy	1857
942160	Detects blind sqli tests using sleep() or benchmark().	1630
932130	Remote Command Execution: Unix Shell Expression Found	1507
930120	OS File Access Attempt	1115
933160	PHP Injection Attack: High-Risk PHP Function Call Found	2086
930100	Path Traversal Attack (/../)	949
930110	Path Traversal Attack (/../)	806
932160	Remote Command Execution: Unix Shell Code Found	772
933150	PHP Injection Attack: High-Risk PHP Function Name Found	761
920600	Illegal Accept header: charset parameter	605
920470	Illegal Content-Type header	488
933160	PHP Injection Attack: High-Risk PHP Function Call Found	434
913100	Found User-Agent associated with security scanner	369
933100	PHP Injection Attack: PHP Open Tag Found	266
932115	Remote Command Execution: Windows Command Injection	228
942190	Detects MSSQL code execution and information gathering attempts	211
942280	Detects Postgres pg_sleep injection, waitfor delay attacks and database shutdown attempts	191
933140	PHP Injection Attack: I/O Stream Found	182
920170	GET or HEAD Request with Body Content.	171
921421	Content-Type header: Dangerous content type outside the mime type declaration	169
911100	Method is not allowed by policy	143
933120	PHP Injection Attack: Configuration Directive Found	141
933110	PHP Injection Attack: PHP Script File Upload Found	140
920270	Invalid character in request (null character)	138
932105	Remote Command Execution: Unix Command Injection	124
944130	Suspicious Java class detected	110
941120	XSS Filter—Category 2: Event Handler Vector	101
941110	XSS Filter—Category 1: Script Tag Vector	85
931100	Possible Remote File Inclusion (RFI) Attack: URL Parameter using IP Address	78
920240	URL Encoding Abuse Attack Attempt	74
942240	Detects MySQL charset switch and MSSQL DoS attempts	74
944110	Remote Command Execution: Java process spawn (CVE-2017-9805)	73
944100	Remote Command Execution: Suspicious Java class detected	67
942270	Looking for basic sql injection. Common attack string for mysql, oracle and others.	42
941130	XSS Filter—Category 3: Attribute Vector	41
920220	URL Encoding Abuse Attack Attempt	40
920450	HTTP header is restricted by policy (/content-encoding/)	37
942170	Detects SQL benchmark and sleep injection attempts including conditional queries	34
942140	SQL Injection Attack: Common DB Names Detected	30
942350	Detects MySQL UDF injection and other data/structure manipulation attempts	26
200002	Failed to parse request body.	25
941180	Node-Validator Blacklist Keywords	24
941170	NoScript XSS InjectionChecker: Attribute Injection	20
921150	HTTP Header Injection Attack via payload (CR/LF detected)	18
932170	Remote Command Execution: Shellshock (CVE-2014-6271)	16
934100	Node.js Injection Attack	16
941140	XSS Filter—Category 4: Javascript URI Vector	14
933170	PHP Injection Attack: Serialized Object Injection	10
942230	Detects conditional SQL injection attempts	9
920100	Invalid HTTP Request Line	8
933130	PHP Injection Attack: Variables Found	8
941210	IE XSS Filters—Attack Detected.	8
933210	PHP Injection Attack: Variable Function Call Found	7
921130	HTTP Response Splitting Attack	6
933180	PHP Injection Attack: Variable Function Call Found	6
941190	IE XSS Filters—Attack Detected.	6
941370	JavaScript global variable found	5
920180	POST without Content-Length or Transfer-Encoding headers.	4
920450	HTTP header is restricted by policy (/content-range/)	4
942500	MySQL in-line comment detected.	4
944120	Remote Command Execution: Java serialization (CVE-2015-5842)	4
920480	Request content type charset is not allowed by policy	3
920120	Attempted multipart/form-data bypass	2
920330	Empty User Agent Header	2
921120	HTTP Response Splitting Attack	2
921160	HTTP Header Injection Attack via payload (CR/LF and header-name detected)	2
922120	Content-Transfer-Encoding was deprecated by rfc7578 in 2015 and should not be used	2
933200	PHP Injection Attack: Wrapper scheme detected	2
941250	IE XSS Filters—Attack Detected.	2
942320	Detects MySQL and PostgreSQL stored procedure/function injections	2
943120	Possible Session Fixation Attack: SessionID Parameter Name with No Referer	2
932140	Remote Command Execution: Windows FOR/IF Command Found	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lucz, G.; Forstner, B. A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server. Data 2025, 10, 186. https://doi.org/10.3390/data10110186

AMA Style

Lucz G, Forstner B. A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server. Data. 2025; 10(11):186. https://doi.org/10.3390/data10110186

Chicago/Turabian Style

Lucz, Geza, and Bertalan Forstner. 2025. "A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server" Data 10, no. 11: 186. https://doi.org/10.3390/data10110186

APA Style

Lucz, G., & Forstner, B. (2025). A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server. Data, 10(11), 186. https://doi.org/10.3390/data10110186

Article Menu

A Thirty-Day Dataset of Malicious HTTP Requests Blocked by OWASP ModSecurity on a Production Web Server

Abstract

1. Introduction

1.1. Key Contributions

1.2. Research Utility

2. Data and Description

2.1. Scope

2.2. Dataset Fields

2.3. Data Sample

2.4. Data Organization

3. Methods

3.1. Data Collection

3.2. Anonymization

3.3. Attack Categorization

4. Dataset Characterization

4.1. Attack Category Distribution

4.2. Attack Code Distribution

4.3. User-Agent Patterns

5. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI