You are currently viewing a new version of our website. To view the old version click .
Computers
  • Article
  • Open Access

Published: 26 May 2025

Detecting Zero-Day Web Attacks with an Ensemble of LSTM, GRU, and Stacked Autoencoders

and
1
Department of Electrical and Computer Engineering, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
2
Research Engineer, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden
*
Authors to whom correspondence should be addressed.
This article belongs to the Special Issue Using New Technologies in Cyber Security Solutions (2nd Edition)

Abstract

The increasing sophistication of web-based services has intensified the risk of zero-day attacks, exposing critical vulnerabilities in user information security. Traditional detection systems often rely on labeled attack data and struggle to identify novel threats without prior knowledge. This paper introduces a novel one-class ensemble method for detecting zero-day web attacks, combining the strengths of Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and stacked autoencoders through latent representation concatenation and compression. Additionally, a structured tokenization strategy based on character-level analysis is employed to enhance input consistency and reduce feature dimensionality. The proposed method was evaluated using the CSIC 2012 dataset, achieving 97.58% accuracy, 97.52% recall, 99.76% specificity, and 99.99% precision, with a false positive rate of just 0.2%. Compared to conventional ensemble techniques like majority voting, our approach demonstrates superior anomaly detection performance by fusing diverse feature representations at the latent level rather than the output level. These results highlight the model’s effectiveness in accurately detecting unknown web attacks with low false positives, addressing major limitations of existing detection frameworks.

1. Introduction

In modern digital infrastructure, websites and web-based applications play a crucial role in facilitating economic, educational, recreational, and political activities. However, as the reliance on these platforms increases, so does the risk of security threats, including unauthorized access, data breaches, and service disruptions. One of the primary attack vectors involves manipulating web requests, where adversaries masquerade as legitimate users to exploit vulnerabilities. Consequently, the detection and mitigation of malicious web requests have become vital for ensuring the security of any online service, including websites, web applications, and Content Delivery Networks (CDNs).
To counter such threats, various security mechanisms, including Web Application Firewalls (WAFs) and blacklisting techniques, have been deployed. While these methods offer some level of protection, they remain ineffective against zero-day attacks and novel exploits that lack predefined security signatures []. The primary challenge associated with zero-day attacks lies in their unpredictability, as they introduce previously unseen patterns that traditional rule-based detection systems fail to recognize. Addressing these challenges through deep learning-based anomaly detection presents a promising approach, leveraging neural networks to autonomously identify deviations indicative of malicious activity.
Conventional methods for preventing web-based attacks, such as WAFs [] and blacklisting, exhibit several limitations. For instance, maintaining a blacklist of prohibited keywords within web requests is both time-consuming and insufficient for addressing evolving attack patterns []. Moreover, none of these existing approaches is capable of detecting zero-day attacks, as the strategies and obfuscation techniques employed in these attacks remain unknown. WAFs remain limited in addressing zero-day and obfuscated web attacks. Studies have shown that even modern WAFs such as ModSecurity can fail to detect novel attack patterns, depending on their rule sets and update frequency []. These shortcomings highlight the urgent need for more adaptive and intelligent detection mechanisms capable of identifying unknown threats without relying on predefined signatures.
A critical advantage of anomaly detection models is that they do not require prior exposure to zero-day attacks to effectively detect them. In this study, we aim to address the limitations of existing detection systems by developing an ensemble-based anomaly detection model that can effectively detect zero-day web attacks without prior exposure to attack patterns. The model integrates multiple sub-models designed to detect zero-day attacks. Given that the patterns of zero-day attacks are inherently unknown, the model is trained exclusively on normal web request data. By learning the distribution of normal web traffic, the model becomes proficient in identifying deviations, thereby flagging both known and previously unseen attacks as anomalous. This approach ensures that malicious requests, whether originating from known attack types or zero-day exploits, are effectively classified as security threats.
To evaluate the proposed model, various web attacks such as SQL injection (SQLi), Cross-Site Scripting (XSS), and Buffer Overflow [] are treated as zero-day attacks within the dataset. The model classifies any request with an anomaly score exceeding a predefined threshold as a potential zero-day attack. While the proposed approach does not explicitly categorize different types of attacks, it demonstrates the capability to reliably detect anomalous activities, ensuring a high level of security against emerging threats. The primary objective of this model is to simultaneously address both known and zero-day attacks while maintaining a high detection rate and minimizing false positives.
The rest of this paper is structured as follows: Section 2 presents the foundational concepts and research background. Section 3 provides a review of existing literature on web attack detection. The methodology and architectural design of the proposed model are discussed in Section 4, followed by a performance evaluation in Section 5. Section 6 elaborates on the broader implications of the findings, and Section 7 concludes the paper with final remarks.
Our ensemble method differs from conventional ensemble approaches by combining and compressing latent features rather than relying on simple output aggregation (such as majority voting), significantly enhancing detection performance. The key contributions of this research are as follows:
  • Innovative Ensemble Model Architecture: This study introduces a novel ensemble approach by integrating LSTM, GRU, and stacked autoencoders for anomaly detection in web requests. Unlike conventional ensemble methods that use simple averaging or majority voting, our approach uniquely concatenates and compresses the latent representations from each autoencoder. This technique significantly improves anomaly detection performance and computational efficiency.
  • Advanced Tokenization and Feature Mapping: We propose a novel tokenization strategy that classifies tokens based on their character composition (numeric, lowercase, uppercase, and special characters). This structured approach effectively reduces input dimensionality, ensures greater consistency in data representation, and significantly enhances the detection capability of our anomaly detection system.
  • Zero-Day Attack Detection: Our model is trained exclusively on normal web requests, enabling it to effectively identify and detect previously unseen zero-day attacks by capturing deviations from established normal request patterns.
  • Comprehensive Evaluation Metrics with Emphasis on False Positive Rate (FPR): Unlike many existing studies, we explicitly evaluate and report the false positive rate, achieving a significantly lower FPR of 0.2%. This comprehensive evaluation underscores the practical applicability of our model, addressing an essential aspect often overlooked in anomaly detection research.
By addressing the limitations of traditional detection systems and leveraging anomaly detection through deep learning, this research contributes to advancing cybersecurity measures against evolving web-based threats.

2. Background

Due to the increasing reliance on web-based services, the detection of web attacks, particularly zero-day exploits, has become a critical cybersecurity challenge []. There are two approaches to detect attacks: non-ML (heuristic-based) and machine learning-based approaches. Traditional heuristic approaches detect threats based on manually defined rules [], but suffer from high false positive rates and are often ineffective against novel attack variants. Machine learning-based approaches for web attack detection typically consist of two key phases: training and detection. During the training phase, the model learns from patterns of normal web requests, while in the detection phase, it utilizes this learned knowledge to identify and mitigate potential web attacks [].
Web attack countermeasures can generally be categorized into three primary approaches: (1) supervised, (2) unsupervised, and (3) semi-supervised learning. The supervised approach is primarily designed to detect known web attacks such as Structured Query Language Injection (SQLi), Cross-Site Scripting (XSS), and Cross-Site Request Forgery (CSRF). It is commonly implemented in signature-based systems, including Web Application Firewalls (WAFs). Supervised models are trained on labeled datasets containing examples of attack patterns alongside normal requests. For instance, a simple SQL injection attack pattern might include inputs like OR ‘1’=‘1’ to manipulate authentication queries, while an XSS attack could involve scripts such as <script>alert(‘XSS’)</script> injected into input fields. By learning these known signatures, supervised models can accurately classify incoming web requests that match documented attack behaviors. However, their dependence on predefined signatures renders them ineffective against zero-day attacks, which introduce novel patterns that are not represented in the training datasets [].
The unsupervised approach, on the other hand, employs anomaly detection techniques to distinguish between normal and anomalous web activities. Unlike supervised methods, unsupervised models do not require prior knowledge of specific attack patterns, enabling them to identify previously unseen zero-day attacks [,]. By modeling the expected behavior of normal web traffic, these approaches can effectively flag deviations indicative of novel attack types, including obfuscated SQLi and XSS payloads, or unauthorized access attempts, making them particularly suitable for detecting evolving and sophisticated threats in dynamic web environments.
The semi-supervised approach lies between supervised and unsupervised methods and leverages exclusively normal web request data to train the model. By focusing on learning only legitimate traffic patterns, the model becomes capable of identifying anomalous behaviors that may signal new or obfuscated web attacks. Since semi-supervised methods eliminate the need for labeled malicious data, they offer a practical solution for early detection of emerging zero-day vulnerabilities while reducing dependency on frequent updates to labeled datasets [].
In this research, an unsupervised approach is employed to address the challenge of detecting zero-day attacks, which lack predefined signatures and attack patterns. By learning the distribution of normal web requests, the proposed model can effectively identify requests that deviate from the established normal behavior, thereby flagging them as potential zero-day attacks. This approach enhances the system’s ability to detect novel and previously unknown threats, making it a robust solution for mitigating modern web security risks.

4. Proposed Model

This section presents the proposed model for detecting zero-day web attacks. The detection process begins with the preprocessing of web requests through tokenization techniques, ensuring standardized input representation. These processed requests are then fed into an ensemble of relatively simple one-class classifiers designed to distinguish between normal and malicious web traffic. The effectiveness of the proposed model is assessed during the detection phase, focusing on its capability to identify and mitigate advanced security threats.

4.1. Architecture

The proposed model comprises multiple components, each serving a distinct role in the detection process. These components work together to predict whether an incoming web request is benign or malicious. As illustrated in Figure 1, the training pipeline begins with tokenizing normal web requests using a dictionary-based approach, which are then converted into numerical sequences and fed into an ensemble of autoencoders. Figure 2 emphasizes the change during the testing phase, where both normal and malicious requests are fed into the trained model, highlighting the anomaly detection capability.
Figure 1. The proposed model in the training phase.
Figure 2. The proposed model in the test phase.
During the training phase, the model is exclusively trained on normal web requests to establish a baseline pattern of legitimate traffic. In this phase, the model has full access to the training dataset, allowing it to learn the distribution of normal request patterns []. Conversely, in the testing phase, both normal and malicious web requests are input into the model for evaluation. This enables the model to assess its ability to generalize and detect deviations indicative of potential zero-day attacks.

4.1.1. Tokenization

A key innovation of the proposed model is the introduction of a novel tokenization technique for web requests, applied at the word level to both normal and malicious inputs []. Due to the inherent variability in the length and structure of web requests, this method addresses the challenges of training neural network-based models for web security, which arise from the inherent variability in the length and structure of web requests. Specifically, the model leverages anomaly detection principles to effectively distinguish between legitimate and anomalous web traffic [].
Unlike character-level tokenization [], which treats each character individually and can introduce unnecessary complexity, or n-gram-based methods [] that can lead to very high dimensionality and computational overhead, our approach focuses on categorizing tokens into predefined classes based on character composition (e.g., Alpha, AlphaNum, CapitalAlpha, and SpecialChar). This categorization simplifies input data and reduces dimensionality, directly enhancing computational efficiency.
In contrast, embedding-based methods like Word2Vec [] or transformer-based tokenization [] encode semantic relationships between tokens but might not efficiently capture the syntactic patterns critical for identifying subtle anomalies in structured web requests. Our dictionary-based tokenization, however, explicitly preserves critical structural and syntactic information, which is essential for accurately identifying anomalous web requests.
To ensure consistent data representation, the preprocessing pipeline standardizes normal web requests through a dictionary-based tokenization approach []. This process involves segmenting each request at the word level using tools such as Python’s WordPunctTokenizer []. The resulting structured pattern is subsequently utilized as input to train the ensemble model. We will explain the tokenization and data processing workflow by applying it to an example from our dataset. The example request is as follows:
  • POST/tienda1/publico/registro.jsp?modo=registro&login=m6&password=m6
  • &nombre=m&apellidos=m&email=m&dni=mm&direccion=Calle+Salvatierra+196+
  • %2C+&ciudad=m&provincia=31&cp=68970&ntc=6987987070987097&B1=Registrar.
A predefined dictionary is utilized to categorize each character into distinct classes, facilitating structured tokenization. The dictionary includes categories such as Alpha, AlphaNum, CapitalAlpha, and SpecialChar, among others. Figure 3 provides a visual example of how the predefined dictionary maps raw tokens to their corresponding semantic categories during the tokenization process. For instance, the token “login” is assigned the value “m6,” which represents a combination of letters and numbers, classifying it under AlphaNum according to the predefined dictionary. Similarly, the word “Register,” which begins with a capital letter, falls under the CapitalLowerAlpha category. These classifications enhance the accuracy of tokenization and facilitate the interpretation of web requests within the proposed model.
Figure 3. A tokenized request.
The significance of the tokenization approach in this model is twofold:
  • Data volume reduction: The tokenization process optimizes data representation, reducing the complexity and size of input data, thereby improving computational efficiency.
  • Pattern identification for anomaly detection: By establishing a structured pattern for normal web requests, the tokenization method enhances the model’s ability to differentiate between legitimate and anomalous activities, ensuring higher accuracy in detecting malicious web requests.
While the current tokenization approach effectively handles the structure of the CSIC dataset, its generalization to other datasets is limited due to its dependency on a static token mapping dictionary. Future work will explore adaptive tokenization using LLMs to support broader request formats.

4.1.2. Numerical Sequence

Following the tokenization of web requests on a word-by-word basis, each token must be mapped to a corresponding numerical value, as neural networks operate on numbers rather than raw text. This transformation is a critical step, as the varying range of input features necessitates data scaling before being processed by the model [,]. The tokenized text is converted into a structured numerical format suitable for input into the neural network. Figure 4 visualizes how the structured tokens from the previous step are translated into numerical values through a predefined mapping dictionary. Each semantic token class (e.g., AlphaNum, Num, SpecialChar) is assigned a unique integer identifier, enabling direct input into the neural network. This process ensures consistency across inputs, allowing the model to recognize semantic patterns in a numerically efficient format. The mapping also supports input standardization via padding, which mitigates the risk of misalignment or irregular dimensions during training and inference.
Figure 4. Mapping words to numbers.
To accommodate variations in the length of numerical sequences representing web requests, a padding mechanism is implemented. This ensures that all input sequences maintain a uniform length, preventing inconsistencies in model processing. Padding standardizes shorter sequences to match the fixed input length by appending neutral values to standardize input dimensions. This step enhances the model’s ability to analyze and recognize patterns within the data, ultimately improving detection accuracy and performance.

4.1.3. Ensemble Model

As illustrated in Figure 5, the proposed ensemble model consists of three sub-models: an LSTM autoencoder, a GRU autoencoder, and a stacked autoencoder. Each sub-model is trained to independently learn compressed representations of normal web requests. The LSTM and GRU autoencoders specialize in capturing sequential dependencies, while the stacked autoencoder focuses on global feature abstraction. After encoding and decoding, the latent representations from all three models are concatenated into a unified vector. This combined representation captures diverse patterns, helping improve detection accuracy. A dense compression layer is then applied to reduce the dimensionality of the concatenated vector, aligning it with the original input size while retaining the most relevant features. This step is critical for reducing computational overhead and minimizing redundancy, thereby improving both model scalability and interpretability. To mitigate the typical computational overhead associated with ensemble models, our approach strategically employs three lightweight sub-models—LSTM, GRU, and stacked autoencoder—each selected for its balance between performance and efficiency. Unlike complex or deeply layered architectures, these models are compact and require fewer parameters and training iterations. Additionally, during inference, all sub-models run in parallel, which reduces latency and makes the system well-suited for real-time or large-scale deployment. To further enhance efficiency, the outputs of all sub-models are concatenated and compressed using a dense layer, significantly reducing the dimensionality of the final feature representation before classification. This design not only preserves high detection accuracy but also ensures low memory usage and fast runtime, thereby directly addressing the scalability challenge often associated with ensemble-based approaches.
Figure 5. Architecture of the ensemble model showing LSTM, GRU, and stacked autoencoder branches, followed by latent space concatenation and feature compression.
To achieve this, various neural network architectures were evaluated. Experimental results demonstrated that employing Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and stacked autoencoders in an ensemble configuration enhances data processing efficiency and improves the accurate identification of both known and zero-day web attacks. The outputs of these sub-models are concatenated and further processed through a dense layer for feature reduction, optimizing the final classification process.
Autoencoders are widely utilized for dimensionality reduction and feature extraction. These models consist of an encoder and a decoder, both comprising multiple layers, which collectively transform an input sequence of symbols (words) into a continuous latent representation. The decoder then reconstructs the original input from this representation, preserving critical features while filtering out noise [,]. This reconstruction-based learning approach enables autoencoders to effectively capture underlying patterns in web requests, further improving the robustness of the detection framework.
The encoded sequences of the original input data are denoted as
x = [ x 1 , x 2 , x 3 , , x n ]
The encoded sequences are obtained through specific encoding functions corresponding to each autoencoder type. The encoding transformations for the LSTM, GRU, and stacked autoencoders are formally defined as follows:
y L = E L ( x )
y G = E G ( x )
y S = E S ( x )
These equations represent the encoding processes, wherein the input sequence x is mapped to its corresponding latent representation, effectively capturing essential features for the detection process.
For each type of autoencoder, the encoded representations are structured as follows:
y L = y 1 L , y 2 L , y 3 L , , y m L
for the LSTM autoencoder,
y G = y 1 G , y 2 G , y 3 G , , y m G
for the GRU autoencoder, and
y S = y 1 S , y 2 S , y 3 S , , y m S
for the stacked autoencoder.
The output of the encoder serves as the input to the decoder, which reconstructs the original input sequence from the encoded representation. The reconstruction process is defined as follows:
x ^ L = D L ( y L )
x ^ G = D G ( y G )
x ^ S = D S ( y S )
where D L , D G , and D S denote the decoding functions for the LSTM, GRU, and stacked autoencoders, respectively.
The primary objective of the decoding process is to validate the quality and representativeness of the extracted features. The outputs from all three autoencoders are subsequently concatenated to form a unified feature representation, denoted as x ^ :
x ^ = x 1 L , x 2 L , x 3 L , x 4 L , x 5 L , x 6 L , x 1 G , x 2 G , x 3 G , x 4 G , x 5 G , x 6 G , x 1 S , x 2 S , x 3 S , x 4 S , x 5 S , x 6 S
To maintain consistency with the original input dimensions and optimize computational efficiency, a compression operation is applied to x ^ . This step ensures that the number of features in the final output aligns with that of the original input data, enabling effective processing and interpretation of web requests within the proposed model:
x ^ = x ^ 1 , x ^ 2 , x ^ 3 , , x ^ n
This final transformation refines the extracted feature representations, ensuring that the model retains only the most relevant and informative aspects of the input data while discarding redundant or insignificant components. By optimizing the structure of the encoded sequences, the model enhances its capacity for accurate detection and classification of web requests.
The compression of concatenated latent features serves two main purposes: (1) it aligns the dimensionality of the output with that of the original input, enabling direct interpretability and compatibility for anomaly scoring, and (2) it eliminates redundant or low-information features across the sub-models, thus reducing computational overhead and enhancing inference efficiency. This step is particularly critical when deploying the model in real-time systems, where performance and resource constraints must be considered.
It is important to explicitly address how our proposed model handles the challenge of class imbalance, which is frequently problematic in anomaly detection scenarios. Unlike traditional supervised learning models, our approach uses an unsupervised anomaly detection paradigm that relies exclusively on normal web request data for training. Specifically, we train the model solely on 80% of the 6492 normal samples with 5193 from the CSIC 2012 dataset, ensuring that the model learns only legitimate web request patterns and behaviors. As a result, the imbalance issue—typically characterized by an insufficient number of malicious examples for learning—is effectively circumvented during training, as our model does not require exposure to malicious patterns to establish baseline behavior.
Furthermore, we carefully consider class imbalance in the evaluation phase. Our test set includes a notably imbalanced distribution: 1299 normal requests versus 50,176 malicious requests. Despite this imbalance, our evaluation explicitly incorporates a diverse range of metrics—accuracy, precision, recall, specificity, F1-score, and particularly the false positive rate (FPR)—to ensure that the model’s performance is assessed comprehensively and fairly. By explicitly including metrics like the FPR and recall, we precisely capture how well the model avoids false alarms and missed detections, even in the presence of extreme class imbalance. These metrics directly reflect the model’s practical viability for real-world deployment, where imbalance is inevitable but should not compromise detection reliability.
In short, our unsupervised training approach inherently avoids the class imbalance issue during model training, while our thorough evaluation methodology ensures the model’s robustness and practical effectiveness under realistic, highly imbalanced testing conditions.

5. Evaluation and Results

This section presents the evaluation of the proposed model and its sub-models based on multiple performance metrics, including accuracy, detection rate, sensitivity, precision, and false positive rate. To assess the effectiveness of the proposed approach, a threshold-based evaluation is conducted using the Mean Absolute Error (MAE), which quantifies the difference between the reconstructed request and the original input (prior to encoding).
In machine learning, MAE is a widely used metric for measuring the absolute difference between predicted values and their actual counterparts. It is computed by averaging the absolute errors across all predictions. MAE was selected as the primary evaluation metric due to its interpretability, robustness, and alignment with the model’s objectives. Specifically, MAE measures the average absolute deviation between the original and reconstructed web requests, providing a clear and intuitive indicator of reconstruction accuracy.
Unlike the Mean Squared Error (MSE), which disproportionately amplifies the effect of outliers due to its squared loss formulation, MAE is less sensitive to extreme deviations. This stability makes MAE particularly well-suited for anomaly detection, as it emphasizes individual discrepancies without being overly influenced by rare, extreme variations. By leveraging linear reconstruction errors, MAE effectively differentiates between benign and malicious web requests while ensuring an optimal balance between detection rates and false positives. This makes it an appropriate choice for evaluating the performance of web attack detection models.
The Mean Absolute Error (MAE) is formally defined as
MAE = 1 n i = 1 n x ^ i x i
In Formula (7), x ^ i represents the reconstructed value, while x i denotes the actual value. The classification criterion is based on the computed MAE for each web request. To determine the optimal threshold for distinguishing between benign and malicious requests, we analyzed the distribution of MAE scores across normal training requests, as visualized in Figure 6. Based on this distribution, we observed that MAE values greater than approximately 4 were uncommon among normal requests, suggesting that these higher values represent potential anomalies. Consequently, we selected a threshold of 4.09, just above this observed point, as our decision boundary. Additional experiments with alternative threshold values around 4 confirmed that this selection offered a balanced trade-off between sensitivity to malicious requests and avoiding excessive false positives. Although this approach is heuristic, future work could systematically evaluate the threshold using techniques such as percentile-based analysis or automated hyperparameter tuning methods to further refine the detection performance.
Figure 6. Density diagram according to the MAE of the proposed model.

5.1. Data Collection

Previous research on detecting malicious and zero-day web requests has primarily relied on two well-established datasets: CSIC [] and HTTPPARAMS []. These datasets provide a comprehensive representation of both normal and malicious web requests, making them widely used benchmarks in web security research. Additionally, a project hosted on GitHub [], which employs Convolutional Neural Networks (CNNs), utilized the CSIC 2012 dataset. Accordingly, the proposed model leverages this dataset for training and evaluation.
The dataset utilized in this study is significant due to the following characteristics:
  • It encompasses a diverse range of malicious requests, including SQL injection (SQLi), Cross-Site Scripting (XSS), and Buffer Overflow attacks.
  • It contains normal (benign) web requests, ensuring a balanced distribution of data for effective training and evaluation.
A key consideration in dataset selection is ensuring that malicious requests accurately reflect real-world attack scenarios. The dataset comprises approximately 16,000 instances labeled as anomalous. However, certain anomalies may arise from factors unrelated to direct cyberattacks, such as unusual user behavior, malformed requests, or suspicious data entry attempts. These cases, while indicative of potential security threats, do not strictly conform to defined attack patterns. To maintain data integrity and ensure the model is trained on well-defined attack and normal request samples, such ambiguous anomalies are removed from the dataset prior to training.

5.2. The Ensemble Model Structure

According to Figure 7, the LSTM autoencoder and GRU autoencoder each consist of four layers (two encoder layers of 50 and 25 units, respectively, and two symmetric decoder layers of 25 and 50 units), using the default tanh activation function. The stacked autoencoder comprises four dense layers (50, 25, 25, and 50 units) with linear activation. The ensemble model concatenates outputs from these autoencoders into a unified latent vector, which is further compressed via a dense layer (50 units). All models are trained using the Mean Absolute Error (MAE) loss function, the Nadam optimizer, and evaluated based on accuracy metrics.
Figure 7. The architectural structure of the proposed ensemble model. Each branch represents one of the three autoencoder sub-models—LSTM, GRU, and stacked autoencoder with their respective layer configurations. The outputs from these branches are concatenated and passed through a compression dense layer to form a unified representation used for anomaly detection.
The specific number of layers and units was primarily chosen to match the dimensionality of our input features (50 features). Additional experiments with alternative configurations (e.g., different layer sizes) confirmed that this selection provided the optimal balance between accuracy, computational efficiency, and effective representation of the structured input data.
Figure 8 shows the learning behavior of the proposed model across 90 epochs. Both the training and validation MAE decrease rapidly during early epochs and converge to low, stable values, indicating effective learning of the normal request distribution. The gap between the training and validation curves remains small throughout, suggesting minimal overfitting. This result confirms that the ensemble model generalizes well to unseen data while preserving low reconstruction error, which is critical for robust anomaly detection.
Figure 8. Training and validation process of the proposed model based on MAE and number of epochs.
Following the detection phase, the Mean Absolute Error (MAE) for each web request is computed and compared against the predefined threshold. Figure 9 provides a detailed view of how the Mean Absolute Error (MAE) for each individual request compares to the detection threshold. The X-axis represents the sequence of requests processed during evaluation, while the Y-axis shows the MAE score on a logarithmic scale. Blue dots above the red threshold line indicate detected anomalies. This visualization demonstrates the model’s ability to sharply distinguish between normal and malicious traffic, validating the appropriateness of the selected MAE threshold (4.09) for anomaly detection.
Figure 9. Comparison of the calculated MAE value for each web request with the threshold value.
The ensemble model, incorporating LSTM, GRU, and stacked autoencoder sub-models, demonstrates superior performance across all evaluation metrics compared to each sub-model individually. The reported results represent the average performance obtained over six independent runs of the model.
The system utilized for evaluating the proposed model consists of key components, as shown in Table 1, including LSTM, GRU, and stacked autoencoder for neural network-based processing, running on Windows 11 with Python v3.12 as the programming language. The implementation leverages Scikit-learn v1.6.0 for machine learning functionalities and WordPunctTokenizer from the Natural Language Toolkit (NLTK) for splitting a text into a sequence of words. Additionally, the Tokenizer class is employed for converting text data into numerical sequences, ensuring compatibility with neural networks. The model’s performance is evaluated using Mean Absolute Error (MAE), as previously defined, to quantify the difference between predicted and actual values, providing an effective measure for anomaly detection. The training phase required approximately 20 s, while the test phase was completed in 5 s. The model itself was implemented using Python version 3.12 with the Keras framework.
Table 1. System components used in the evaluation setup.

5.3. Results

Table 2 defines the key terms used to compute the evaluation metrics. The performance assessment of the proposed model involves the computation of six primary metrics: accuracy, precision, sensitivity, detection rate, false positive rate, and F1-score.
Table 2. Performance metrics used for the proposed model.
The proposed ensemble model consistently outperforms the individual sub-models across all evaluation metrics. While the LSTM and GRU autoencoders achieve high accuracy, sensitivity, and precision, they exhibit a higher false positive rate, incorrectly classifying several normal requests as malicious. Conversely, the stacked autoencoder reduces the false positive rate effectively but shows comparatively weaker precision and recall. Combining these sub-models into a unified ensemble framework leverages their complementary strengths, thereby significantly improving overall detection performance.
A detailed analysis reveals that both LSTM and GRU sub-models misclassified 14 out of 1299 normal requests as malicious—an undesirable outcome in real-world scenarios. Incorporating the stacked autoencoder into the ensemble mitigates this issue by reducing false positives, albeit at the expense of slightly lower accuracy and recall when used independently. Table 3 and Figure 10 provide a detailed comparative analysis of the performance metrics for each individual sub-model—LSTM, GRU, and stacked autoencoder—as well as the overall ensemble model. The ensemble model significantly outperforms all individual components in terms of accuracy (97.58%), recall (97.52%), and F1-score (98.74%), and exhibits a notably low false positive rate of just 0.2%. This improvement reflects the ensemble’s ability to capture diverse latent representations and mitigate the weaknesses of standalone models. The bar chart in Figure 10 visually reinforces these findings, showing that the proposed model maintains high precision and recall simultaneously—indicating a balanced and effective detection mechanism suitable for real-world deployment.
Table 3. Comparison of the performance of the proposed model with the sub-models.
Figure 10. Comparison of the performance of the proposed model with the sub-models in the form of a bar chart.
Table 4 presents a comparative analysis of the proposed model’s performance against various models from previous research that have utilized the CSIC2010 and CSIC2012 datasets. This comparison provides insights into the effectiveness of the proposed approach relative to existing solutions in the field. One notable limitation in prior studies is the omission of the false positive rate (FPR) in their evaluation results. This metric is crucial, as it quantifies the number of normal requests misclassified as malicious, directly impacting the practical applicability of detection models. The bar chart in Figure 11 visually reinforces these findings, illustrating that the proposed model maintains high precision, recall and low FPR simultaneously—showing a balanced among metrics and effective detection mechanism suitable for real-world deployment. In the figure, models that utilized the CSIC2012 dataset are highlighted in red.
Table 4. Comparison of the performance of the proposed model with previously designed models.
Figure 11. Comparison of the performance of the proposed model with previously designed models in the form of a bar chart. Prior works include those by Sivri et al. [], Jung et al. [], Vartouni et al. [], Liang et al. [], Kuang et al. [], Gong et al. [], Tekerek et al. [], Jemal et al. [], Alaoui et al. [], Mohamed et al. [], Shahid et al. [], and Moarref et al. [].
The primary comparison focuses on studies that have employed the CSIC2012 dataset [,,], as they provide the most directly comparable benchmark. However, to offer a broader perspective, we also include studies based on the CSIC2010 dataset [,,,,,,,,]. It is important to note that differences in dataset characteristics may influence the comparability of results.
The CSIC2010 and CSIC2012 datasets are widely recognized benchmarks for evaluating web application security models, particularly for detecting SQL injection (SQLi) and other web-based attacks. The CSIC2010 dataset, developed earlier, contains a diverse set of normal and anomalous HTTP requests. While it provides a solid foundation for studying web attack detection, it lacks the complexity and evolving attack patterns characteristic of modern cybersecurity threats.
To address these limitations, the CSIC2012 dataset was designed with more sophisticated and realistic attack scenarios, along with a broader range of normal traffic. This makes CSIC2012 a more representative dataset for contemporary web security challenges. Additionally, CSIC2012 includes refined labeling and a larger volume of data, enhancing its suitability for training and evaluating advanced machine learning models.
These distinctions underscore the importance of selecting CSIC2012 for research targeting modern web application threats, as it serves as a more rigorous and up-to-date evaluation benchmark compared to its predecessor.

6. Discussion

The proposed ensemble model demonstrates strong performance across all evaluation metrics, particularly in terms of the false positive rate (FPR). Achieving the lowest FPR compared to related works, the model highlights its capability to accurately distinguish between normal and malicious web requests. Maintaining a low FPR is critical in real-world web security applications to avoid blocking legitimate user activities and to preserve usability.

6.1. Effectiveness and Error Correlation in the Ensemble Model

Our ensemble approach integrates three complementary sub-models: LSTM autoencoder, GRU autoencoder, and stacked autoencoder. Each of these contributes uniquely to the ensemble:
  • LSTM Autoencoder: Excels at capturing complex, long-term sequential dependencies in web requests.
  • GRU Autoencoder: Offers computational efficiency alongside robust recognition of sequential patterns, making it well-suited for real-time scenarios.
  • Stacked autoencoder: Specializes in extracting compact and representative latent features, aiding in subtle anomaly detection and dimensionality reduction.
An essential strength of the ensemble is its capability to mitigate correlated errors. Individual autoencoders often produce detection errors due to specific limitations—LSTM and GRU autoencoders may generate false positives due to their sensitivity to sequence complexity, while the stacked autoencoder may produce false negatives when anomalies present subtle deviations. By concatenating and compressing latent representations from multiple sub-models, our ensemble significantly reduces the correlation of errors across models. Specifically, cases misclassified by one sub-model often receive correct classifications by others, thereby collectively enhancing detection robustness. Future analyses could systematically quantify the correlation among sub-model errors using metrics such as Cohen’s Kappa or correlation matrices, providing deeper insights into ensemble effectiveness.
Additionally, our advanced tokenization approach ensures a consistent and structured representation of web requests, which notably improves anomaly detection capabilities compared to conventional tokenization methods, as confirmed by our evaluation metrics. The explicit consideration and optimization of the false positive rate further differentiates our approach, addressing practical operational challenges often overlooked in the literature.

6.2. Generalizability, Adversarial Robustness, and Practical Challenges

Generalizability across different web domains remains a significant challenge. Our current tokenization and feature extraction pipeline was tailored specifically to standardize requests from the CSIC 2012 dataset, limiting its immediate applicability to more diverse real-world web requests. Adapting the tokenizer to handle varied formats found in datasets such as FWAF and HTTPParams [] or real-world environments is an essential step toward broader generalization. Future research could significantly benefit from adaptive tokenization techniques leveraging Large Language Models (LLMs) and prompt engineering, enhancing the model’s robustness across diverse and evolving web request structures.
The robustness of anomaly detection models against adversarial examples and obfuscation attacks is another crucial consideration that was not explicitly evaluated in our current work. Attackers often employ sophisticated evasion techniques, including payload obfuscation or data drift—changes in web request patterns over time—to bypass detection mechanisms. To address these challenges, future research should explicitly test our model against adversarial and obfuscated web attacks, applying adversarial training or robustness testing methods to ensure resilience in adversarial scenarios. Moreover, continuous learning strategies could be adopted to manage data drift, dynamically adapting model parameters as web request patterns evolve.
Another limitation is the reliance on annotated benchmark datasets such as CSIC 2012. Although widely used for reproducibility and benchmarking, they do not fully replicate the complexity and variability found in live web systems. Further research should thus validate the model’s performance within realistic simulated environments or operational settings, which could reveal insights about its practical robustness, scalability, and performance under real-time constraints.
While the current ensemble design achieves a balance between accuracy and efficiency, the scalability of such models in high-throughput or resource-constrained environments remains a practical challenge. Future research should explore optimization techniques such as model pruning, knowledge distillation, and quantization to further reduce inference time and memory consumption. Additionally, adaptive ensemble techniques where only a subset of sub-models are activated based on the input request’s complexity could offer a dynamic trade-off between speed and accuracy.
Lastly, the current model focuses on anomaly detection without explicit categorization of attack types (e.g., SQL injection, Cross-Site Scripting, Buffer Overflow). Integrating an explicit attack-type classification mechanism would further enhance the practical utility and forensic capabilities of the proposed model, providing more actionable insights for cybersecurity professionals.

7. Conclusions

In this study, each web request was initially segmented into individual words and then tokenized using a predefined vocabulary. This preprocessing step aimed to standardize and simplify web requests while establishing a structured pattern for normal web traffic. In the final stage of preprocessing, each tokenized word was mapped to a unique numerical representation, facilitating its input into the neural network. The proposed model employs an ensemble approach comprising three relatively simple sub-models: LSTM, GRU, and stacked autoencoders. The ensemble operates by independently processing the input data through each sub-model and then outputs are explicitly concatenated into a combined latent feature set, ensuring the ensemble benefits from the diverse representation capabilities of each sub-model. After concatenation, a dedicated dense layer compresses the resulting features into a unified, optimized representation, significantly reducing the dimensionality from a larger combined vector to a manageable size. A novel structured tokenization method significantly enhancing detection performance, and explicit evaluation of critical metrics including false positive rate.
During the training phase, only normal web requests were provided as input to the ensemble model, enabling it to learn the underlying patterns of legitimate requests. Upon completion of training, the model effectively captured and recognized these patterns. In the detection phase, both normal and malicious web requests were introduced for evaluation. The Mean Absolute Error (MAE) was employed as the primary metric to quantify the difference between the reconstructed and original values of each request. The threshold for classification was determined based on the MAE values computed during the training phase. In the detection phase, if the MAE of a web request was below the threshold, it was classified as normal; otherwise, it was identified as malicious.
During evaluation, the ensemble model’s performance was compared against each of its sub-models individually. The results demonstrated that the ensemble approach achieved superior performance, particularly in terms of an increased detection rate and a reduced false positive rate. Additionally, the proposed model was benchmarked against prior research, where it consistently outperformed existing approaches, further validating its effectiveness in detecting web-based threats.
Practical deployment within real-world security frameworks, such as Web Application Firewalls (WAFs) and real-time Intrusion Detection and Prevention Systems (IDS/IPS), is feasible given the computational efficiency of the proposed ensemble approach. Specifically, the pre-trained ensemble model can be integrated as a detection engine within WAF modules or IDS/IPS components, processing web requests in real time to promptly identify anomalous behavior based on reconstruction errors. To optimize performance in real-time scenarios, further efforts should explore model quantization, pruning, or efficient inference methods to ensure minimal latency without compromising detection accuracy.

8. Future Work

One potential direction for future research involves enhancing the tokenization and feature extraction process [] across diverse web attack datasets. This improvement can be achieved through the application of Generative AI, leveraging Large Language Models (LLMs) [,]. Specifically, prompt engineering [,] can be employed to construct a structured prompt that systematically guides the LLM in preprocessing each dataset sample. To achieve this concretely, the following structured roadmap will be adopted:
In Phase 1, an exploratory pilot study will be conducted using a representative dataset such as HTTPParams []. The goal is to develop and evaluate initial prompt engineering strategies that leverage few-shot learning to guide Large Language Models (LLMs) in generating dataset-specific tokenization rules. During this phase, the performance of the LLM-generated tokenization will be assessed based on consistency with human-crafted rules, semantic accuracy, and overall computational efficiency.
In Phase 2, the experiments will be expanded by incorporating additional datasets, including FWAF and real-world HTTP traffic logs. This phase will focus on systematically comparing LLM-generated tokenization rules to those derived manually. Key performance indicators to be analyzed include generalizability across formats, robustness to variations in input structure, and the computational overhead introduced during preprocessing.
In Phase 3, the aim will be to fully automate the preprocessing pipeline. LLMs will be used to dynamically generate customized preprocessing scripts [] based on structured prompts. This phase will focus on evaluating the reliability and consistency of the generated scripts, as well as measuring their runtime efficiency and the downstream impact on anomaly detection accuracy after LLM-driven preprocessing.
In Phase 4, beyond preprocessing improvements, future efforts will explore the implementation of advanced neural architectures and anomaly detection approaches, such as Bidirectional LSTM, GRU, and Convolutional Neural Networks (CNNs) [,], to develop a more robust ensemble model for web attack detection. Additionally, feature selection techniques will be applied to retain high-information-value features while eliminating less significant ones, effectively reducing input dimensionality and enhancing computational efficiency.
Regarding the adoption of LLMs for tokenization and preprocessing, potential challenges such as robustness against adversarial inputs, model hallucinations, and inconsistent outputs must be considered. Future research should thus include explicit adversarial robustness evaluations and validation protocols to assess and ensure reliability. Moreover, structured reasoning strategies inspired by recent frameworks such as VulnSage [], a framework leveraging structured reasoning strategies such as Chain-of-Thought and Think and Verify to improve zero-shot vulnerability detection in software systems.

Author Contributions

Methodology, V.B.; Software, V.B.; Validation, V.B.; Resources, V.B.; Writing—original draft, V.B.; Writing—review & editing, H.R.F.; Visualization, V.B.; Supervision, H.R.F.; Project administration, H.R.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ahmad, R.; Alsmadi, I.; Alhamdani, W.; Tawalbeh, L. Zero-day attack detection: A systematic literature review. Artif. Intell. Rev. 2023, 56, 10733–10811. [Google Scholar] [CrossRef]
  2. Dawadi, B.R.; Adhikari, B.; Srivastava, D.K. Deep learning technique-enabled web application firewall for the detection of web attacks. Sensors 2023, 23, 2073. [Google Scholar] [CrossRef] [PubMed]
  3. Hannousse, A.; Yahiouche, S.; Nait-Hamoud, M.C. Twenty-two years since revealing cross-site scripting attacks: A systematic mapping and a comprehensive survey. Comput. Sci. Rev. 2024, 52, 100634. [Google Scholar] [CrossRef]
  4. Babaey, V.; Ravindran, A. GenSQLi: A Generative Artificial Intelligence Framework for Automatically Securing Web Application Firewalls Against Structured Query Language Injection Attacks. Future Internet 2025, 17, 8. [Google Scholar] [CrossRef]
  5. Yang, J.; Chen, Y.L.; Por, L.Y.; Ku, C.S. A systematic literature review of information security in chatbots. Appl. Sci. 2023, 13, 6355. [Google Scholar] [CrossRef]
  6. Calzavara, S.; Conti, M.; Focardi, R.; Rabitti, A.; Tolomei, G. Machine learning for web vulnerability detection: The case of cross-site request forgery. IEEE Secur. Priv. 2020, 18, 8–16. [Google Scholar] [CrossRef]
  7. Kalla, D.; Mohammed, A.S.; Boddapati, V.N.; Jiwani, N.; Kiruthiga, T. Investigating the Impact of Heuristic Algorithms on Cyberthreat Detection. In Proceedings of the 2024 2nd International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT), Faridabad, India, 28–29 November 2024; Volume 1, pp. 450–455. [Google Scholar]
  8. Li, Z.; Zhu, Y.; Van Leeuwen, M. A survey on explainable anomaly detection. ACM Trans. Knowl. Discov. Data 2023, 18, 1–54. [Google Scholar] [CrossRef]
  9. Pu, G.; Wang, L.; Shen, J.; Dong, F. A hybrid unsupervised clustering-based anomaly detection method. Tsinghua Sci. Technol. 2020, 26, 146–153. [Google Scholar] [CrossRef]
  10. Long, H.V.; Tuan, T.A.; Taniar, D.; Can, N.V.; Hue, H.M.; Son, N.T.K. An efficient algorithm and tool for detecting dangerous website vulnerabilities. Int. J. Web Grid Serv. 2020, 16, 81–104. [Google Scholar] [CrossRef]
  11. Ingham, K.L.; Somayaji, A.; Burge, J.; Forrest, S. Learning DFA representations of HTTP for protecting web applications. Comput. Netw. 2007, 51, 1239–1255. [Google Scholar] [CrossRef]
  12. Sivri, T.T.; Akman, N.P.; Berkol, A.; Peker, C. Web intrusion detection using character level machine learning approaches with upsampled data. Ann. Comput. Sci. Inf. Syst. 2022, 32, 269–274. [Google Scholar]
  13. Jung, I.; Lim, J.; Kim, H.K. PF-TL: Payload feature-based transfer learning for dealing with the lack of training data. Electronics 2021, 10, 1148. [Google Scholar] [CrossRef]
  14. Vartouni, A.M.; Kashi, S.S.; Teshnehlab, M. An anomaly detection method to detect web attacks using stacked auto-encoder. In Proceedings of the 2018 6th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), Kerman, Iran, 28 February–2 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 131–134. [Google Scholar]
  15. Ariu, D.; Tronci, R.; Giacinto, G. HMMPayl: An intrusion detection system based on Hidden Markov Models. Comput. Secur. 2011, 30, 221–241. [Google Scholar] [CrossRef]
  16. Liang, J.; Zhao, W.; Ye, W. Anomaly-based web attack detection: A deep learning approach. In Proceedings of the 2017 VI International Conference on Network, Communication and Computing, Kunming, China, 8–10 December 2017; pp. 80–85. [Google Scholar]
  17. Kuang, X.; Zhang, M.; Li, H.; Zhao, G.; Cao, H.; Wu, Z.; Wang, X. DeepWAF: Detecting web attacks based on CNN and LSTM models. In Proceedings of the Cyberspace Safety and Security: 11th International Symposium, CSS 2019, Guangzhou, China, 1–3 December 2019; Proceedings, Part II 11. Springer: Berlin/Heidelberg, Germany, 2019; pp. 121–136. [Google Scholar]
  18. Tang, R.; Yang, Z.; Li, Z.; Meng, W.; Wang, H.; Li, Q.; Sun, Y.; Pei, D.; Wei, T.; Xu, Y.; et al. Zerowall: Detecting zero-day web attacks through encoder-decoder recurrent neural networks. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Virtually, 6–9 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 2479–2488. [Google Scholar]
  19. Indrasiri, P.L.; Halgamuge, M.N.; Mohammad, A. Robust ensemble machine learning model for filtering phishing URLs: Expandable random gradient stacked voting classifier (ERG-SVC). IEEE Access 2021, 9, 150142–150161. [Google Scholar] [CrossRef]
  20. Gong, X.; Lu, J.; Zhou, Y.; Qiu, H.; He, R. Model uncertainty based annotation error fixing for web attack detection. J. Signal Process. Syst. 2021, 93, 187–199. [Google Scholar] [CrossRef]
  21. Tekerek, A. A novel architecture for web-based attack detection using convolutional neural network. Comput. Secur. 2021, 100, 102096. [Google Scholar] [CrossRef]
  22. Jemal, I.; Haddar, M.A.; Cheikhrouhou, O.; Mahfoudhi, A. SWAF: A smart web application firewall based on convolutional neural network. In Proceedings of the 2022 15th International Conference on Security of Information and Networks (SIN), Sousse, Tunisia, 11–13 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
  23. Alaoui, R.L.; Nfaoui, E.H. Web attacks detection using stacked generalization ensemble for LSTMs and word embedding. Procedia Comput. Sci. 2022, 215, 687–696. [Google Scholar] [CrossRef]
  24. Moarref, N.; Sandıkkaya, M.T. MC-MLDCNN: Multichannel Multilayer Dilated Convolutional Neural Networks for Web Attack Detection. Secur. Commun. Netw. 2023, 2023, 2415288. [Google Scholar] [CrossRef]
  25. Yatagha, R.; Nebebe, B.; Waedt, K.; Ruland, C. Towards a Zero-Day Anomaly Detector in Cyber Physical Systems Using a Hybrid VAE-LSTM-OCSVM Model. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 5038–5045. [Google Scholar]
  26. Katbi, A.; Ksantini, R. One-class IoT anomaly detection system using an improved interpolated deep SVDD autoencoder with adversarial regularizer. Digit. Signal Process. 2025, 162, 105153. [Google Scholar] [CrossRef]
  27. Tokmak, M.; Nkongolo, M. Stacking an autoencoder for feature selection of zero-day threats. arXiv 2023, arXiv:2311.00304. [Google Scholar]
  28. Alghawazi, M.; Alghazzawi, D.; Alarifi, S. Deep learning architecture for detecting SQL injection attacks based on RNN autoencoder model. Mathematics 2023, 11, 3286. [Google Scholar] [CrossRef]
  29. Thalji, N.; Raza, A.; Islam, M.S.; Samee, N.A.; Jamjoom, M.M. Ae-net: Novel autoencoder-based deep features for sql injection attack detection. IEEE Access 2023, 11, 135507–135516. [Google Scholar] [CrossRef]
  30. Yao, W.; Hu, L.; Hou, Y.; Li, X. A lightweight intelligent network intrusion detection system using one-class autoencoder and ensemble learning for IoT. Sensors 2023, 23, 4141. [Google Scholar] [CrossRef] [PubMed]
  31. Mohamed, S.M.; Rohaim, M.A. Multi-Class Intrusion Detection System using Deep Learning. J. Al-Azhar Univ. Eng. Sect. 2023, 18, 869–883. [Google Scholar] [CrossRef]
  32. Shahid, W.B.; Aslam, B.; Abbas, H.; Khalid, S.B.; Afzal, H. An enhanced deep learning based framework for web attacks detection, mitigation and attacker profiling. J. Netw. Comput. Appl. 2022, 198, 103270. [Google Scholar] [CrossRef]
  33. Bedi, P.; Gupta, N.; Jindal, V. Siam-IDS: Handling class imbalance problem in intrusion detection systems using siamese neural network. Procedia Comput. Sci. 2020, 171, 780–789. [Google Scholar] [CrossRef]
  34. Milosevic, M.S.; Ciric, V.M. Extreme minority class detection in imbalanced data for network intrusion. Comput. Secur. 2022, 123, 102940. [Google Scholar] [CrossRef]
  35. Abdelkhalek, A.; Mashaly, M. Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning. J. Supercomput. 2023, 79, 10611–10644. [Google Scholar] [CrossRef]
  36. Yuan, Y.; Lu, Y.; Zhu, K.; Huang, H.; Yu, L.; Zhao, J. A Static Detection Method for SQL Injection Vulnerability Based on Program Transformation. Appl. Sci. 2023, 13, 11763. [Google Scholar] [CrossRef]
  37. Vorobyov, K.; Gauthier, F.; Krishnan, P. Synthesis of Allowlists for Runtime Protection against SQLi. In Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results, Lisbon, Portugal, 14–20 April 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 16–20. [Google Scholar]
  38. Su, H.; Li, F.; Xu, L.; Hu, W.; Sun, Y.; Sun, Q.; Chao, H.; Huo, W. Splendor: Static Detection of Stored XSS in Modern Web Applications. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, WA, USA, 17–21 July 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1043–1054. [Google Scholar]
  39. Silvestre, A.; Medeiros, I.; Mordido, A. Towards a SQL Injection Vulnerability Detector Based on Session Types. In Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering, Angers, France, 28–29 April 2024; Volume 1: ENASE. INSTICC. SciTePress: Setúbal, Portugal, 2024; pp. 711–718. [Google Scholar]
  40. Thomas, S.; Koleini, F.; Tabrizi, N. Dynamic defenses and the transferability of adversarial examples. In Proceedings of the 2022 IEEE 4th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA), Virtual, 14–16 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 276–284. [Google Scholar]
  41. Khalid, M.N.; Farooq, H.; Iqbal, M.; Alam, M.T.; Rasheed, K. Predicting web vulnerabilities in web applications based on machine learning. In Proceedings of the Intelligent Technologies and Applications: First International Conference, INTAP 2018, Bahawalpur, Pakistan, 23–25 October 2018; Revised Selected Papers 1. Springer: Berlin/Heidelberg, Germany, 2019; pp. 473–484. [Google Scholar]
  42. Levene, M.; Poulovassilis, A.; Davison, B.D. Learning web request patterns. In Web Dynamics: Adapting to Change in Content, Size, Topology and Use; Springer: Berlin/Heidelberg, Germany, 2004; pp. 435–459. [Google Scholar]
  43. Vijayarani, S.; Janani, R. Text mining: Open source tokenization tools-an analysis. Adv. Comput. Intell. Int. J. (ACII) 2016, 3, 37–47. [Google Scholar]
  44. Rashvand, N.; Hosseini, S.S.; Azarbayjani, M.; Tabkhi, H. Real-Time Bus Arrival Prediction: A Deep Learning Approach for Enhanced Urban Mobility. arXiv 2023, arXiv:2303.15495. [Google Scholar]
  45. Kefayat, E.; Thill, J.C. Urban Street Network Configuration and Property Crime: An Empirical Multivariate Case Study. ISPRS Int. J. Geo-Inf. 2025, 14, 200. [Google Scholar] [CrossRef]
  46. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 2017 Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  47. Rashvand, N.; Witham, K.; Maldonado, G.; Katariya, V.; Marer Prabhu, N.; Schirner, G.; Tabkhi, H. Enhancing automatic modulation recognition for iot applications using transformers. IoT 2024, 5, 212–226. [Google Scholar] [CrossRef]
  48. Shaheed, A.; Kurdy, M.B. Web application firewall using machine learning and features engineering. Secur. Commun. Netw. 2022, 2022, 5280158. [Google Scholar] [CrossRef]
  49. DuckDuckBug. CNN Web Application Firewall. 2023. Available online: https://github.com/DuckDuckBug/cnn_waf (accessed on 29 January 2025).
  50. Jagat, R.R.; Sisodia, D.S.; Singh, P. Detecting web attacks from HTTP weblogs using variational LSTM autoencoder deviation network. IEEE Trans. Serv. Comput. 2024, 17, 2210–2222. [Google Scholar] [CrossRef]
  51. Abshari, D.; Fu, C.; Sridhar, M. LLM-assisted Physical Invariant Extraction for Cyber-Physical Systems Anomaly Detection. arXiv 2024, arXiv:2411.10918. [Google Scholar]
  52. Zibaeirad, A.; Koleini, F.; Bi, S.; Hou, T.; Wang, T. A comprehensive survey on the security of smart grid: Challenges, mitigations, and future research opportunities. arXiv 2024, arXiv:2407.07966. [Google Scholar]
  53. Abshari, D.; Sridhar, M. A Survey of Anomaly Detection in Cyber-Physical Systems. arXiv 2025, arXiv:2502.13256. [Google Scholar]
  54. Babaey, V.; Ravindran, A. GenXSS: An AI-Driven Framework for Automated Detection of XSS Attacks in WAFs. arXiv 2025, arXiv:2504.08176. [Google Scholar]
  55. White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv 2023, arXiv:2302.11382. [Google Scholar]
  56. Graves, A.; Jaitly, N.; Mohamed, A.r. Hybrid speech recognition with deep bidirectional LSTM. In Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, 8–12 December 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 273–278. [Google Scholar]
  57. Talebi, S.; Zhou, K. Graph Neural Networks for Efficient AC Power Flow Prediction in Power Grids. arXiv 2025, arXiv:2502.05702. [Google Scholar]
  58. Zibaeirad, A.; Vieira, M. Reasoning with LLMs for Zero-Shot Vulnerability Detection. arXiv 2025, arXiv:2503.17885. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.