A Novel Ensemble of Deep Learning Approach for Cybersecurity Intrusion Detection with Explainable Artificial Intelligence

Alabdulatif, Abdullah

doi:10.3390/app15147984

Open AccessArticle

A Novel Ensemble of Deep Learning Approach for Cybersecurity Intrusion Detection with Explainable Artificial Intelligence

by

Abdullah Alabdulatif

Department of Cybersecurity, College of Computer, Qassim University, Buraydah 52571, Saudi Arabia

Appl. Sci. 2025, 15(14), 7984; https://doi.org/10.3390/app15147984

Submission received: 4 June 2025 / Revised: 7 July 2025 / Accepted: 14 July 2025 / Published: 17 July 2025

(This article belongs to the Special Issue Advanced Cybersecurity Applications: Solutions to Counteract Cyber Threats)

Download

Browse Figures

Versions Notes

Abstract

In today’s increasingly interconnected digital world, cyber threats have grown in frequency and sophistication, making intrusion detection systems a critical component of modern cybersecurity frameworks. Traditional IDS methods, often based on static signatures and rule-based systems, are no longer sufficient to detect and respond to complex and evolving attacks. To address these challenges, Artificial Intelligence and machine learning have emerged as powerful tools for enhancing the accuracy, adaptability, and automation of IDS solutions. This study presents a novel, hybrid ensemble learning-based intrusion detection framework that integrates deep learning and traditional ML algorithms with explainable artificial intelligence for real-time cybersecurity applications. The proposed model combines an Artificial Neural Network and Support Vector Machine as base classifiers and employs a Random Forest as a meta-classifier to fuse predictions, improving detection performance. Recursive Feature Elimination is utilized for optimal feature selection, while SHapley Additive exPlanations (SHAP) provide both global and local interpretability of the model’s decisions. The framework is deployed using a Flask-based web interface in the Amazon Elastic Compute Cloud environment, capturing live network traffic and offering sub-second inference with visual alerts. Experimental evaluations using the NSL-KDD dataset demonstrate that the ensemble model outperforms individual classifiers, achieving a high accuracy of 99.40%, along with excellent precision, recall, and F1-score metrics. This research not only enhances detection capabilities but also bridges the trust gap in AI-powered security systems through transparency. The solution shows strong potential for application in critical domains such as finance, healthcare, industrial IoT, and government networks, where real-time and interpretable threat detection is vital.

Keywords:

security; privacy; cybersecurity; deep learning; machine learning; explainable artificial intelligence; intrusion detection; network security

1. Introduction

With the rapid growth of digital connectivity and computational power, unauthorized network intrusions, commonly referred to as network attacks, have become increasingly frequent and sophisticated. These cyberattacks target vulnerabilities in network services to gain unauthorized access or disrupt operations, which ultimately endanger the operations of the entire Information Technology infrastructure [1]. Overall, the surge in network-based threats has led to a notable rise in both the scale and complexity of cyber incidents worldwide [1,2].

Recent statistics highlight the severity of this trend: in 2024, over 6.5 billion malware attacks were recorded globally, and the average cost of a data breach reached USD 4.88 million [2,3]. In the United States alone, the average breach cost in 2023 was approximately USD 9 million, while the global average consistently hovers around USD 4 million [2,3]. On the other hand, according to Figure 1, it is evident that the estimated annual cost of cybercrime worldwide is projected to reach over USD 13 trillion by 2028, making it one of the most economically damaging threats globally and surpassing the value of the global trade of all major illegal drugs combined [1,2]. Hence, these figures underscore the critical need for robust, intelligent cybersecurity solutions to protect sensitive digital infrastructure.

Overall, network attacks are broadly categorized into two main types: active attacks and passive attacks where active attacks involve direct system intrusions carried out using malware, viruses, trojans, or other exploitation tools [3,4,5]. Previously seen as minor threats posed by amateur hackers, these attacks have evolved in scale and impact, now affecting billions of users globally and resulting in estimated damages exceeding USD 2.7 billion between 2021 and 2023 [1,2,3]. Nonetheless, annual network security revenue is projected to grow significantly, increasing from approximately USD 20 billion in 2016 to over USD 60 billion by 2028 (as depicted in Figure 2), driven by rising cyber threats, increased adoption of cloud services, and stricter regulatory compliance requirements.

On the contrary, passive attacks involve the covert monitoring of data transmissions to intercept unencrypted credentials or sensitive information [5,6,7]. These attacks do not alter system resources but pose significant risks by compromising data confidentiality without detection. Figure 3 showcases commonly observed cyberattacks, including Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks, Man-in-the-Middle attacks, packet sniffing, port scanning, and ARP spoofing [2,3].

Traditional security defenses relying on static rule sets and manual log analysis are increasingly ineffective against evolving cyber threats [8,9,10,11,12]. Manual monitoring is labor-intensive and inherently reactive, often detecting breaches only after significant damage has occurred. To overcome these limitations, modern intrusion detection systems (IDSs) have begun leveraging machine learning (DL) to enable proactive, real-time threat detection [12,13,14,15,16]. While classical IDS approaches were effective in early network environments, they struggle in modern contexts due to several critical limitations:

Increased attack complexity: Modern threats such as Distributed Denial of Service (DDoS) and polymorphic malware evolve rapidly, often bypassing static, pattern-based detection methods [4].
Scalability challenges: The high volume and velocity of traffic in enterprise and cloud environments overwhelm traditional IDS infrastructures [3,4].
Limited adaptability: Fixed-pattern classifiers are incapable of identifying novel or zero-day attacks not represented in the training data [4,5,6,7].

As a result, many IDS models treat anomaly detection as a static binary or multiclass classification task, which significantly limits their ability to adapt to new or evolving cyber threats. This shortcoming underscores the urgent need for a dynamic, data-driven solution that supports continuous learning and real-time threat identification. In this context, our research introduces a hybrid ensemble framework that integrates an Artificial Neural Network (ANN) and Support Vector Machine (SVM) as base classifiers, with a Random Forest (RF) meta-classifier to intelligently combine their predictions for improved detection accuracy. Furthermore, the framework is deployed in a real cloud-based environment that utilizes real-time packet capture tools to stream live network traffic. This integration enables immediate anomaly detection, allowing the system to dynamically analyze incoming data and respond to threats in real time, thereby enhancing the adaptability and operational relevance of modern IDS deployments. Thus, in this regard, the key contributions of this study include the following:

Proposing a novel ensemble learning framework (ANN + SVM with RF meta-classifier) that achieves competitive performance compared to state-of-the-art intrusion detection models.
Employing Recursive Feature Elimination (RFE) to dynamically select the most relevant features and reduce model complexity.
Incorporating explainable artificial intelligence (XAI) to interpret individual detection decisions and improve transparency.
Developing a real-time deployment pipeline using Flask and live packet capture for immediate anomaly detection and dashboard reporting, in the real cloud environment.

On the other hand, a review of the recent literature indicates that most ML-based IDSs operate offline and lack real-time adaptability or interpretability [5,6,9,10,11,12]. Thus, this study addresses these limitations by combining real-time inference, feature selection, and explainability in a unified, scalable IDS framework.

The study is organized in the following manner. Following the introduction, the next section provides the background and related work relevant to the research. The methodology is detailed in Section 3, while Section 4 presents the evaluation results and analysis. Finally, the paper concludes with a summary of findings and future research directions in the Section 5.

2. Background and Literature Review

As networking technologies advance, cyberattacks have proliferated worldwide. What were once largely exploratory hacks causing minimal damage have evolved into lucrative operations for attackers, inflicting substantial losses on governments, corporations, and even individual smartphone users [1,2,3,4]. Adversaries continuously refine their methods, leveraging everything from novel exploits to botnets to target any device connected to a network. The exponential growth of interconnected systems, the surge in data traffic, and the relentless innovation in attack strategies have made real-time detection and mitigation ever more difficult [8,9,10].

As network attacks have grown more complex, integrating ML into IDSs has become significantly more effective than traditional methods [8,9,10]. Earlier implementations, such as the concept, first introduced by Jim Anderson in 1980, required system administrators to manually inspect logs for anomalies [10]. In contrast, modern IDS solutions increasingly rely on intelligent automation through ML techniques [12,13,14,15,16].

Over the past decade, a wide range of ML approaches, including supervised learning, unsupervised learning, pattern recognition, and deep learning (DL) approaches such as ANN and Recurrent Neural Networks (RNNs), have been explored to enhance IDS capabilities [5,10,11,13,14,15,16,17]. In this regard, each method offers unique advantages, and selecting the right combination of algorithms and datasets remains essential for effective implementation. This has led to a surge in research aimed at exploring how ML and DL can improve intrusion detection, with a growing emphasis on large-scale, high-quality datasets to train and validate these models. The ongoing pursuit of greater accuracy, performance, and autonomy continues to drive innovation in this domain [18,19,20,21,22].

The introduction of DL further expanded the potential of IDS by minimizing the need for manual feature engineering. Due to their layered structure, DL models are capable of automatically extracting complex patterns from raw data, enabling deeper insights and more accurate threat detection [12]. As a result, both ML- and DL-based intrusion detection solutions have evolved continuously, from early research in the 1980s to the sophisticated, self-adaptive systems we see today.

Real-time network data capture represents one of the key contributions of this study, serving as the primary input for the anomaly detection model. As networking technologies continue to evolve, the structure and behavior of data flows across networks have undergone substantial transformations. These changes, driven by advances in communication protocols and transmission mechanisms, have made it increasingly critical to capture network traffic during the precise window when intrusions occur, namely, during the exchange of data between systems [4,10,11]. Accurate and timely capture of network data during these transmission windows is essential for identifying malicious activity. To support this capability, a range of packet capture tools, such as pcap, ncap, and dashcap, have been developed [4,13,18,19,20]. These tools offer functionalities like selective packet filtering and time-specific data acquisition, enabling more precise detection and analysis of abnormal network behavior. The continuous refinement of such capture mechanisms has significantly advanced the field of network-based intrusion detection [4,5,6,7,8,9,10,11,12,13].

In order to provide a better understanding of the recent related research and to provide a brief comparison of our research with others, Table 1 provides a summary of the related research.

Overall, unlike many of the referenced studies that either focus on static offline analysis, lack explainable AI integration, or omit real-time deployment, our study introduces a unified, cloud-enabled intrusion detection system that excels in adaptability, accuracy, and transparency where it excels in the following:

Real-time cloud-based detection: Our framework integrates live network traffic capture within a cloud environment (e.g., AWS), enabling continuous monitoring and immediate anomaly detection, a capability often missing in previous works.
Hybrid ensemble learning: By combining an ANN and SVM as base classifiers with an RF meta-classifier, our approach leverages the strengths of both DL and traditional ML. This hybrid model achieves superior performance over standalone models.
Explainable AI integration (SHAP): We incorporate SHAP (SHapley Additive exPlanations) to provide both global and local interpretability, addressing the black-box nature of most DL-based IDS and increasing trust among security analysts.
End-to-end system architecture: Our work goes beyond theoretical accuracy by delivering a fully functional real-time deployment pipeline using Flask, offering a web-based interface for monitoring and actionable threat alerts, closing the gap between research and real-world usability.

Having provided a brief background on the related research, the next section outlines the methodology of the research.

3. Methodology

The proposed system was developed using a hybrid ensemble approach integrating ANN and SVM, with an RF classifier acting as a meta-learner. The methodology of the research study encompasses data preparation, model construction, ensemble integration, real-time deployment, and explainability integration, as outlined in Figure 4.

The following subsections detail each stage, highlighted in Figure 4:

3.1. Data Acquisition

For the experimental purpose, we have used the NSL-KDD dataset which is a refined version of the original KDD Cup dataset, which is widely used for benchmarking intrusion detection systems (IDSs) [8]. The employed subset of the dataset comprised the following records:

Total Records: 22,544.
Normal Records: 9711.
Attack Records: 12,833 (including both known and unknown attack types).
Each record in the dataset represents a single network connection and includes 41 features grouped into three categories:
Basic features (e.g., duration, protocol type, service).
Content features (e.g., number of failed logins, root shell).
Traffic features (e.g., same host connections, packet rates).

On the other hand, an additional label column indicates whether the connection is normal or an attack, and if an attack, which specific type. The attack records in the employed dataset can be apportioned into several categories, as in Table 2.

3.2. Data Preprocessing, Feature Selection, and Feature Scaling

Following the data acquisition, the dataset was rigorously preprocessed to optimize its structure for ML and DL tasks. Initially, categorical features such as ‘protocol_type’, ‘service’, and ‘flag’ were transformed into numerical values using label encoding. Then the redundant features with no variance, such as ‘num_outbound_cmds’, were removed, as they do not contribute to learning and may degrade model generalization.

3.2.1. Recursive Feature Elimination (RFE)

To reduce dimensionality and retain only the most relevant predictors, RFE was applied with an RF classifier as the base estimator. RFE works by recursively training the model and removing the least important features until the desired number of features is reached (10 features retained after RFE).

RFE mathematical process:

Let:

X ∊ ℝ ^{n x d} be the feature matrix (with n samples and d features)

y ∊ ℝ ⁿ be the corresponding labels.

RFE proceeds as follows:

Fit the model on X and y.

Compute feature importance FI_j for each feature j ∊ {1,…,d}.

For Random Forest: FI_j = ∑ Gini decrease _j across all trees.

Remove the feature with the smallest FI_j.

Repeat until only k features remain.

This iterative reduction selects the top k features that most significantly contribute to prediction accuracy.

3.2.2. Standardization with StandardScaler

After feature selection, StandardScaler was applied to normalize the selected features. StandardScaler transforms each feature to have zero mean and unit variance, ensuring that features are on the same scale, which is an essential step for models like ANN and SVM, which are sensitive to input magnitudes.

StandardScaler mathematical formula:

For each feature x_j, standardization is performed as follows:

z_{j} = \frac{x j - u j}{σ j}

where

u_j = mean of feature x_j;

σ_j = standard deviation of feature x_j;

z_j = standardized value.

Standardizing features improves model convergence and prevents any single feature from disproportionately influencing the model due to a larger numeric range.

3.3. Model Construction

A deep ANN model was developed using Keras, with ReLU-activated dense layers and a sigmoid output layer for binary classification. An SVM with a linear kernel was trained in parallel. Both models were trained and validated on a 70:30 split of the dataset. Hyperparameter tuning was applied to the ANN to improve learning efficiency. Figure 5 showcases the final ensemble learning model architecture.

Table 3 presents the layered ensemble architecture, detailing its hierarchical composition, including the base classifiers (ANN and SVM), their respective internal layers or operations, and how their outputs are fused using an RF meta-classifier to generate the final intrusion detection decision.

The choice of the number of layers and nodes in the ANN was based on empirical tuning using a grid search over multiple configurations. We evaluated models with 2 to 4 hidden layers and node sizes ranging from 16 to 128 per layer. The final architecture, with 3 hidden layers of 64, 32, and 16 nodes, was selected based on the best trade-off between accuracy and training time. A dropout rate of 0.3 and 0.2 was introduced after each layer, respectively, to minimize overfitting. The SVM used a linear kernel, and the learning process used early stopping based on validation loss to prevent overfitting and ensure convergence. Predictions from both ANN and SVM were stacked to form a 2-feature vector per instance. An RF meta-classifier was trained on this new feature space, allowing the model to leverage ANN’s nonlinear learning and SVM’s margin-based separation. This approach enhanced overall classification reliability.

3.4. Model Evaluation

The performance of the proposed intrusion detection system was evaluated using a combination of widely accepted classification metrics: accuracy, precision, recall, F1-score, and the confusion matrix [22,23,24,25,26]. These metrics provide a comprehensive understanding of the model’s ability to correctly identify anomalous and normal network traffic [27,28,29,30].

Let the confusion matrix be defined as follows:

TP (True Positives): Correctly predicted anomalies;
TN (True Negatives): Correctly predicted normal instances;
FP (False Positives): Normal instances incorrectly classified as anomalies;
FN (False Negatives): Anomalies incorrectly classified as normal.

The evaluation metrics are defined as follows:

Accuracy

This measures the proportion of total correct predictions:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

Precision

This represents the proportion of correctly predicted anomalies out of all instances predicted as anomalies:

Precision = \frac{T P}{T P + F P}

Recall (also known as Sensitivity or True Positive Rate)

This measures the model’s ability to correctly detect actual anomalies:

Recall = \frac{T P}{T P + F N}

F1-Score

The harmonic mean of precision and recall offers a balance between the two:

F 1 - Score = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

These metrics were computed from the confusion matrices generated during model evaluation on the test dataset. Together, they provide insights not only into overall correctness but also into the model’s reliability in identifying intrusions without over-flagging normal behavior.

3.5. Explainability Integration

XAI refers to a set of techniques designed to make the decisions of ML/DL models transparent, interpretable, and understandable to human users [31,32,33]. As ML models, especially deep learning architectures, become increasingly complex, the need for interpretability grows in critical domains like cybersecurity, healthcare, and finance. Overall, XAI ensures that end-users and analysts can trust, validate, and act upon AI-generated decisions [33,34,35,36].

In this study, SHAP, a leading XAI technique grounded in cooperative game theory, was integrated to enhance the interpretability of the proposed ensemble intrusion detection system. SHAP values were computed for both global and local interpretability. The global SHAP summary plot revealed that the model relied slightly more on the ANN outputs than on SVM predictions when forming its final decisions via the Random Forest meta-classifier. For local explanations, SHAP decision plots and waterfall charts were used to visualize the contribution of each feature to individual predictions, offering transparent reasoning behind classifications. These explainability components not only demystify the internal workings of the hybrid ANN–SVM–RF architecture but also foster greater user trust, accountability, and usability in real-time security-critical environments.

3.6. Real-Time Integration (Deployment)

In the real-time integration phase, a Flask web application was developed to interface with the trained model. The app captures live traffic from the cloud environment, preprocesses it, feeds it into the ensemble model, and presents results on a web dashboard. This setup allows real-time detection of intrusions and immediate response. The deployment steps are highlighted in Figure 6.

To deploy the SHAP-enabled ensemble IDS using PyShark v.0.6 for real-time network traffic analysis, the process began by setting up a cloud-based environment where we used an AWS EC2 instance running Ubuntu (for cloud deployment, an AWS EC2 T2 instance was utilized: t2.medium type, which provides 2 vCPUs, and 4 GiB of memory).

Afterward, essential dependencies including Python libraries (Flask, PyShark, TensorFlow/Keras, scikit-learn, and SHAP) were installed to support model inference, real-time data capture, and explainability. Pre-trained models, ANN, SVM, and RF meta-classifiers were then loaded into the environment. PyShark was used to capture live packet data from the network interface, which was subsequently parsed and converted into structured feature vectors. These features were scaled and fed into both the ANN and SVM models to generate prediction probabilities.

Later, the outputs from both models were combined and passed into the RF meta-classifier, which produces the final decision on whether the network activity represents a normal or intrusive behavior. To add interpretability, SHAP was applied to explain each prediction, offering both global feature importance and local explanations through summary and decision plots. Finally, the entire system was integrated into a Flask-based web application that provides real-time intrusion alerts and SHAP visualizations via a user-friendly interface, ensuring both operational relevance and transparency.

During the final testing phase, specific ports were opened on the EC2 virtual server where the IDS was deployed, and simulated DoS attacks were executed using tools like hping3 to assess the system’s detection capabilities under high-traffic conditions. To ensure that incoming external traffic would reach the IDS, the security group configuration of the EC2 instance was updated, and comprehensive testing was carried out. The inbound network rules were configured as follows: TCP port 64,295 was designated for secure SSH access (restricted to the researcher’s public IP), TCP port 64,297 was allocated for accessing the T-Pot web interface (also IP-restricted), and TCP and UDP ports 1–64,000 were opened to allow unrestricted external traffic specifically for simulation purposes. During this phase, the EC2 instance was deliberately flooded with malicious packets using hping3, enabling the validation of the IDS’s real-time detection accuracy, responsiveness, and explainability under simulated attack scenarios.

4. Results and Discussion

For the experiential evaluation, a laptop computer was used, with specifications as shown in Table 4.

The ANN model achieved an overall accuracy of 99.20%. The detailed classification report is depicted in Table 5. Overall, the ANN was trained using a batch size of 32, which provides a good balance between computational efficiency and model convergence. The binary cross-entropy loss function was chosen as it is well-suited for binary classification tasks, especially when the output layer uses a sigmoid activation function to predict class probabilities. The model was optimized using the Adam optimizer, which adapts learning rates during training and is known for its robust performance across various deep learning applications. Although the model was initially set to train for 10 epochs, training continued until the model reached a minimum loss threshold, ensuring convergence and preventing overfitting.

The SVM model achieved an overall accuracy of 96.80%. The detailed classification report is depicted in Table 6.

The final ensemble model achieved an overall accuracy of 99.40%. The detailed ensemble model classification report is depicted in Table 7.

The confusion matrix pertaining to the final ensemble model is depicted in Figure 7.

The ensemble model’s superior performance (99.40%) over standalone classifiers demonstrates the strength of combining complementary learning strategies. The ROC curve presented in Figure 8 illustrates the performance of the proposed intrusion detection model by plotting the True Positive Rate (Recall) against the False Positive Rate (FPR) at various threshold levels. The orange curve rises steeply toward the top-left corner, indicating that the model effectively distinguishes between the anomaly and non-anomaly classes. A near-vertical ascent followed by a flat plateau along the top signifies high sensitivity with minimal false positives. The area under the ROC curve (AUC) is close to 1.0, suggesting excellent classification performance and a strong ability to discriminate between classes.

On the other hand, the precision–recall curve shown in Figure 9 highlights the balance between precision and recall for the same classifier. The curve maintains a high precision level (close to 1.0) across nearly all recall values, except for a minor drop near the maximum recall threshold. This pattern demonstrates that the model can identify almost all actual anomalies without significantly sacrificing precision, even under imbalanced class distributions. The steep slope at the end indicates a trade-off between precision and recall at extreme thresholds, which is common in high-performing binary classifiers dealing with rare positive classes such as intrusions.

Overall, SHAP explainability showed that ANN predictions generally contributed more to anomaly classification, but in cases where ANN was uncertain, SVM votes tipped the decision. This interpretability is critical in cybersecurity contexts where understanding each alert’s origin helps prevent alert fatigue and improve incident response.

The SHAP force plot (Figure 10) provides a local explanation for a single prediction made by the ensemble model by breaking down the model output into additive feature contributions. The predicted value f(x) = 0.537 represents the model’s estimated probability that the instance is an anomaly. This prediction is derived by starting from a base value, typically the mean model output over the training set, and adding SHAP values for each feature. For example, a SHAP value of +0.15 for a feature means that this feature increased the prediction by 0.15, moving it closer to the anomaly class. Conversely, a −0.15 value indicates a negative influence, pulling the prediction back toward the normal class. The final prediction is thus the sum of the base value and all SHAP contributions, allowing clear visualization of how each feature impacted the decision.

The SHAP force plot shown in Figure 11 represents a single instance prediction made by the ensemble classifier using the outputs from both base learners: the ANN and the SVM. In this case, both ANN_pred and SVM_pred contributed values of 0.00, resulting in a final ensemble prediction of 0.01 (1%) probability for an anomaly, and 0.99 (99%) for a non-anomaly. The SHAP values indicate that the ANN output had a strong negative contribution (−0.868) to the anomaly prediction, and the SVM output contributed −0.117, both pushing the final decision toward the non-anomalous class. The force plot visually shows these contributions as blue bars pulling the prediction toward “non-anomaly,” reinforcing that the ensemble model confidently classified this instance as normal. This highlights how individual model outputs (even at 0.00) can strongly influence the final classification based on how the meta-classifier interprets them in context. While SHAP indeed provides insight into the contribution of input features (including ANN and SVM outputs) to the final decision, its purpose extends beyond feature ranking. In our framework, SHAP enables both global interpretability (e.g., understanding which features are generally most influential across predictions) and local interpretability (e.g., why a particular instance was classified as anomalous). This local transparency is valuable in cybersecurity, as it allows analysts to trace specific alerts back to input features, such as protocol type or failed login attempts, that contributed to a suspicious classification. Unlike traditional rule- or signature-based systems, which are static and often fail to detect zero-day threats, our SHAP-enabled system explains decisions made on previously unseen patterns, helping analysts understand model behavior in novel situations. Thus, it not only increases trust in AI decisions but also supports root cause analysis, alert prioritization, and compliance auditing in a real-time operational setting.

Table 8 presents the top ten most influential features ranked by their mean absolute SHAP values, highlighting their global contribution to the ensemble model’s predictions. Notably, features related to data volume (src_bytes, dst_bytes), connection frequency (count), and service diversity (same_srv_rate, diff_srv_rate) were found to have the most significant impact. These insights align with expected patterns in network behavior, where anomalies often manifest through abnormal data transfer sizes or irregular service access rates. Combined with local SHAP force plots, this feature importance table enhances interpretability and provides transparency into how the model detects anomalous traffic.

Finally, the obtained results are benchmarked against the studies highlighted in Table 9.

Nonetheless, to provide a more comprehensive evaluation, we extended our experimental analysis to include two widely recognized state-of-the-art (SOTA) ensemble learning techniques: XGBoost (Extreme Gradient Boosting) and AdaBoost (Adaptive Boosting). Both models were trained and evaluated using the same preprocessed NSL-KDD dataset and standardized feature set used in our proposed ensemble. XGBoost, known for its superior accuracy and handling of imbalanced data, achieved a classification accuracy of 98.70%, with strong precision and recall. AdaBoost, a boosting method that sequentially emphasizes misclassified instances, yielded 97.90% accuracy. Although these results are competitive, our proposed ANN + SVM + RF ensemble surpassed both, with 99.40% accuracy, while also offering real-time deployment capabilities and explainability through SHAP integration. Moreover, neither XGBoost nor AdaBoost was optimized for interpretability or real-time deployment, which limits their practical utility in operational cybersecurity environments. This highlights the robustness, transparency, and deployability of our proposed approach.

To further validate the results using the most recent data, an additional experiment was conducted on the UNSW-NB15 dataset [37,38], employing a similar methodology. This dataset includes nine distinct attack categories: Fuzzers, analysis, backdoors, DoS, exploits, generic, reconnaissance, shellcode, and worms. The proposed ensemble model demonstrated strong generalization capabilities, achieving a binary classification accuracy of 99.80%, outperforming individual base learners such as ANN (98.30%) and SVM (95.60%). These results are comparable to, and, in some cases, exceed, those reported in prior studies [37,38]. The corresponding confusion matrix (Figure 12) and ROC curve (Figure 13) further illustrate the model’s high precision and robust discriminative performance on the UNSW-NB15 dataset.

Overall, this research introduces a robust ensemble model combining ANN, SVM, and RF, optimized using Recursive Feature Elimination (RFE). The ensemble outperforms individual models, achieving 99.40% accuracy. Key contributions include enhanced feature selection, improved classification performance, and model robustness. Nonetheless compared to similar research, the study demonstrates real-world applicability in binary classification tasks, offering a reliable, scalable solution for domains such as healthcare, cybersecurity, and finance through a well-balanced and effective hybrid approach. The scalability of the proposed approach refers to both computational and architectural aspects of the system. From a computational standpoint, the ensemble model is lightweight and modular: the ANN and SVM models operate in parallel, and the RF meta-classifier only processes their outputs (two features), making inference efficient even on limited-resource instances (e.g., AWS EC2 t2.medium). From an architectural perspective, the system was designed using containerized components (Flask-based microservices), enabling horizontal scaling in cloud environments to support high-throughput scenarios. The use of SHAP for local explanations is also scalable, as it only operates on two aggregated model outputs instead of high-dimensional raw features. These design choices ensure that the system can handle larger volumes of network traffic or be extended to distributed deployments without major architectural changes.

Following model development, the trained models were serialized and integrated into a cloud-enabled platform built using Flask and then deployed to AWS EC2 as mentioned in Section 3.6. A simple and user-friendly web interface was developed to display real-time anomaly detection results. The system captures live traffic data from the cloud during the testing phase, processes it continuously, and presents the anomaly percentage to users via the web interface as depicted in Figure 14.

The webpage is configured to automatically refresh every 10 s, ensuring that users always see the most recent detection results. As new data is ingested, it is immediately preprocessed to match the input format required by the pre-trained anomaly detection model, an ensemble-based neural network. The model evaluates each data point to identify abnormal behavior. The proportion of anomalies detected is calculated and displayed as a percentage, offering users a clear and continuously updated insight into system health and potential security threats. This real-time monitoring framework enables timely detection and response to unusual activity patterns, enhancing the system’s effectiveness in identifying and mitigating potential risks.

Discussion

The results of this study demonstrate the effectiveness of the proposed ensemble intrusion detection system, which combines the strengths of ANN, SVM, and RF within a real-time, explainable framework. To assess the contribution of each component in the proposed ensemble, we evaluated the individual performance of ANN and SVM classifiers independently prior to integration. The ANN model achieved an accuracy of 99.20%, while the SVM achieved 96.80%. When combined in the ensemble architecture with RF as the meta-classifier, the accuracy improved to 99.40%. This improvement illustrates that each base learner contributes complementary strengths, and the ensemble effectively captures their collective predictive power, thereby justifying the architectural design.

With an overall accuracy of 99.40%, the ensemble model outperformed the individual classifiers and surpassed the accuracy levels reported in several related studies using the same NSL-KDD dataset. Notably, this performance was achieved while maintaining interpretability through SHAP and real-time responsiveness via a cloud-hosted Flask interface, features that are often lacking in existing academic and commercial IDS models. Although the current system operates as a binary classifier (normal vs. anomaly), it can be extended to a multiclass intrusion detection framework by utilizing the detailed attack-type labels present in both the NSL-KDD and UNSW-NB15 datasets. Such an extension would enable the identification of specific attack types (e.g., DoS and probe), further enhancing the model’s practical applicability. However, this would involve additional challenges such as handling class imbalance and retraining the ensemble model for multi-output classification, where we have identified this as a promising direction for future work.

In comparison to traditional signature-based or rule-based intrusion detection systems, the proposed model offers significant advantages in adaptability, scalability, and transparency. Most commercial IDS solutions either rely on static threat intelligence or lack explainability in their detection processes, which limits their effectiveness in detecting zero-day or polymorphic attacks and reduces the trust of human analysts. In contrast, this study’s approach integrates live packet analysis and machine learning with explainable AI to provide not only accurate predictions but also clear justifications for each decision, enhancing user confidence and operational efficiency.

Despite its promising results, the current implementation has a few limitations. First, the system is designed primarily for binary classification (normal vs. anomalous traffic) and does not yet distinguish between different types of attacks. Second, while the deployment on a t2.medium EC2 instance was sufficient for research purposes, larger-scale or enterprise-level implementations may require higher resource configurations to handle heavier network loads and parallel user access. While the current deployment on an AWS t2.medium instance validated the system’s real-time functionality and detection accuracy, the evaluation did not include systematic stress testing under high-volume traffic conditions. In future work, we plan to conduct performance benchmarking using simulated traffic bursts and continuous high-throughput streams to measure system response times, model inference latency, and resource utilization. This will provide a more comprehensive understanding of the framework’s scalability in production-grade environments.

Finally, although SHAP explanations offer interpretability, there remains a need for automated prioritization of alerts and integration with security incident and event management (SIEM) tools to streamline incident response in practical environments. As mentioned above, future work will focus on extending the system to multiclass attack classification, incorporating additional datasets to improve generalization across environments, and enabling online learning capabilities to adapt to evolving threat patterns. Furthermore, integrating the IDS with cloud-based security platforms or enterprise SIEM systems will make the solution more viable for deployment in real-world infrastructures, including critical sectors such as smart agriculture, military, healthcare, and government networks. These enhancements will strengthen the model’s utility as a robust, scalable, and trustworthy tool for proactive network defense.

5. Conclusions

This study introduced a novel ensemble intrusion detection framework that effectively combines ANN, SVM, and RF to enhance cybersecurity threat detection in real time. Unlike traditional intrusion detection systems that rely on static rules or isolated classifiers, the proposed hybrid model leverages the strengths of both deep learning and classical machine learning within a unified architecture. The system achieved a high detection accuracy of 99.40% on the NSL-KDD dataset, outperforming individual classifiers. The integration of SHAP-based explainable AI not only enhanced transparency but also provided actionable insights into model decisions, fostering trust among security analysts. To further validate the robustness of the ensemble model, an additional experiment was conducted using the more recent UNSW-NB15 dataset. This dataset includes nine distinct attack types, and the ensemble model maintained a high performance with a binary classification accuracy of 99.80%. The results demonstrated superior generalization compared to standalone classifiers (ANN: 98.30%, SVM: 95.60%) and were consistent with prior benchmark studies. The corresponding confusion matrix and ROC curve highlighted the model’s strong discriminative ability. Beyond its strong performance, the system was then successfully deployed using a Flask-based web interface in an AWS cloud environment, demonstrating its readiness for real-time monitoring and practical deployment. By continuously analyzing live network traffic and delivering interpretable alerts, the proposed framework proves its applicability across critical real-world domains such as healthcare, finance, industrial IoT, and government networks. Each phase of the system, from data preprocessing to explainability, was designed to ensure scalability, interpretability, and operational relevance. Moving forward, expanding the model to support multiclass attack detection, integrating it with enterprise-grade SIEM platforms, and exploring continuous online learning will further enhance its robustness and adaptability in complex security environments.

Funding

This research was funded by Qassim University (QU-APC-2025).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support (QU-APC-2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Perez, S.I.; Criado, R. Increasing the Effectiveness of Network Intrusion Detection Systems (NIDSs) by Using Multiplex Networks and Visibility Graphs. Mathematics 2022, 11, 107. [Google Scholar] [CrossRef]
100+ Network Security Statistics in 2025. AIMultiple. Available online: https://research.aimultiple.com/network-security-statistics/ (accessed on 23 May 2025).
Infographic: Cybercrime Expected to Skyrocket in Coming Years. Statista Daily Data. Available online: https://www.statista.com/chart/28878/expected-cost-of-cybercrime-until-2027 (accessed on 23 May 2025).
Mousavi, S.M.; St-Hilaire, M. Early detection of DDoS attacks against SDN controllers. In Proceedings of the 2015 International Conference on Computing, Networking and Communications (ICNC), Garden Grove, CA, USA, 16–19 February 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 77–81. [Google Scholar] [CrossRef]
Dora, V.R.S.; Lakshmi, V.N. Optimal feature selection with CNN-feature learning for DDoS attack detection using meta-heuristic-based LSTM. Int. J. Intell. Robot. Appl. 2022, 6, 323–349. [Google Scholar] [CrossRef]
Kuang, C. Research on Network Traffic Anomaly Detection Method Based on Deep Learning. J. Phys. Conf. Ser. 2021, 1861, 012007. [Google Scholar] [CrossRef]
“2025 DDoS Attack Statistics: Alarming Prediction—VPNRanks Predicts 1 in 20 Internet Users Will Likely To Be Hit!”. Available online: https://www.vpnranks.com/resources/ddos-attack-statistics/ (accessed on 23 May 2025).
“NSL-KDD|Datasets|Research|Canadian Institute for Cybersecurity|UNB.”. Available online: https://www.unb.ca/cic/datasets/nsl.html (accessed on 24 May 2025).
Alshammari, A.; Aldribi, A. Apply machine learning techniques to detect malicious network traffic in cloud computing. J Big Data 2021, 8, 90. [Google Scholar] [CrossRef]
Ahmad, Z.; Khan, A.S.; Shiang, C.W.; Abdullah, J.; Ahmad, F. Network intrusion detection system: A systematic study of machine learning and deep learning approaches. Trans. Emerg. Telecommun. Technol. 2021, 32, e4150. [Google Scholar] [CrossRef]
Yost, J.R. The March of IDES: Early History of Intrusion-Detection Expert Systems. IEEE Annals Hist. Comput. 2016, 38, 42–54. [Google Scholar] [CrossRef]
Dong, B.; Wang, X. Comparison deep learning method to traditional methods using for network intrusion detection. In Proceedings of the 2016 8th IEEE International Conference on Communication Software and Networks (ICCSN), Beijing, China, 4–6 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 581–585. [Google Scholar] [CrossRef]
Hosseini, S.; Zade, B.M.H. New hybrid method for attack detection using combination of evolutionary algorithms, SVM, and ANN. Comput. Netw. 2020, 173, 107168. [Google Scholar] [CrossRef]
Kurniabudi, K.; Purnama, B.; Sharipuddin, S.; Darmawijoyo, D.; Stiawan, D.; Samsuryadi, S.; Heryanto, A.; Budiarto, R. Network anomaly detection research: A survey. Indones. J. Electr. Eng. Inform. (IJEEI) 2019, 7, 37–50. [Google Scholar] [CrossRef]
Wattanapongsakorn, N.; Srakaew, S.; Wonghirunsombat, E.; Sribavonmongkol, C.; Junhom, T.; Jongsubsook, P.; Charnsripinyo, C. A Practical Network-Based Intrusion Detection and Prevention System. In Proceedings of the 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications, Liverpool, UK, 25–27 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 209–214. [Google Scholar] [CrossRef]
Kalinaki, K.; Thilakarathne, N.N.; Mubarak, H.R.; Malik, O.A.; Abdullatif, M. Cybersafe Capabilities and Utilities for Smart Cities. In Cybersecurity for Smart Cities. Advanced Sciences and Technologies for Security Applications; Ahmed, M., Haskell-Dowland, P., Eds.; Springer: Cham, Switzerland, 2023. [Google Scholar]
Bakar, R.A.; Huang, X.; Javed, M.S.; Hussain, S.; Majeed, M.F. An Intelligent Agent-Based Detection System for DDoS Attacks Using Automatic Feature Extraction and Selection. Sensors 2023, 23, 3333. [Google Scholar] [CrossRef]
Masum, M.; Shahriar, H.; Haddad, H.; Faruk, J.H.; Valero, M.; Khan, A.; Rahman, M.A.; Adnan, M.I.; Cuzzocrea, A.; Wu, F. Bayesian Hyperparameter Optimization for Deep Neural Network-Based Network Intrusion Detection. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 5413–5419. [Google Scholar] [CrossRef]
Thockchom, N.; Singh, M.M.; Nandi, U. A novel ensemble learning-based model for network intrusion detection. Complex. Intell. Syst. 2023, 9, 5693–5714. [Google Scholar] [CrossRef]
Seth, S.; Chahal, K.K.; Singh, G. A Novel Ensemble Framework for an Intelligent Intrusion Detection System. IEEE Access 2021, 9, 138451–138467. [Google Scholar] [CrossRef]
Verma, P.; Dumka, A.; Singh, R.; Ashok, A.; Gehlot, A.; Malik, P.K.; Gaba, G.S.; Hedabou, M. A Novel Intrusion Detection Approach Using Machine Learning Ensemble for IoT Environments. Appl. Sci. 2021, 11, 10268. [Google Scholar] [CrossRef]
Yousefnezhad, M.; Hamidzadeh, J.; Aliannejadi, M. Ensemble classification for intrusion detection via feature extraction based on deep Learning. Soft Comput. 2021, 25, 12667–12683. [Google Scholar] [CrossRef]
Shtayat, M.M.; Hasan, M.K.; Sulaiman, R.; Islam, S.; Khan, A.U.R. An Explainable Ensemble Deep Learning Approach for Intrusion Detection in Industrial Internet of Things. IEEE Access 2023, 11, 115047–115061. [Google Scholar] [CrossRef]
Divakar, S.; Priyadarshini, R.; Mishra, B.K. A Robust Intrusion Detection System using Ensemble Machine Learning. In Proceedings of the 2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE), Bhubaneswar, India, 26–27 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 344–347. [Google Scholar] [CrossRef]
Mohy-Eddine, M.; Guezzaz, A.; Benkirane, S.; Azrour, M.; Farhaoui, Y. An Ensemble Learning Based Intrusion Detection Model for Industrial IoT Security. Big Data Min. Anal. 2023, 6, 273–287. [Google Scholar] [CrossRef]
Thakkar, A.; Lohiya, R. Attack Classification of Imbalanced Intrusion Data for IoT Network Using Ensemble-Learning-Based Deep Neural Network. IEEE Internet Things J. 2023, 10, 11888–11895. [Google Scholar] [CrossRef]
Thilakarathne, N.N.; Bakar, M.S.A.; Abas, P.E.; Yassin, H. A novel cyber threat intelligence platform for evaluating the risk associated with smart agriculture. Sci. Rep. 2025, 15, 3904. [Google Scholar] [CrossRef]
Kalaivani, D. An Intrusion Detection System Based on Data Analytics and Convolutional Neural Network in NSS-KDD dataset. In Machine Learning Algorithms for Intelligent Data Analytics; Technoarete Research and Development Association: Chennai, India, 2022. [Google Scholar] [CrossRef]
Yedukondalu, G.; Bindu, G.H.; Pavan, J.; Venkatesh, G.; SaiTeja, A. Intrusion Detection System Framework Using Machine Learning. In Proceedings of the 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 2–4 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1224–1230. [Google Scholar] [CrossRef]
Alabdulatif, A.; Thilakarathne, N.N. A Novel Cloud-Enabled Cyber Threat Hunting Platform for Evaluating the Cyber Risks Associated with Smart Health Ecosystems. Appl. Sci. 2024, 14, 9567. [Google Scholar] [CrossRef]
Nwakanma, C.I.; Ahakonye, L.A.C.; Njoku, J.N.; Odirichukwu, J.C.; Okolie, S.A.; Uzondu, C.; Nweke, C.C.N.; Kim, D.-S. Explainable Artificial Intelligence (XAI) for Intrusion Detection and Mitigation in Intelligent Connected Vehicles: A Review. Appl. Sci. 2023, 13, 1252. [Google Scholar] [CrossRef]
Mahbooba, B.; Timilsina, M.; Sahal, R.; Serrano, M. Explainable Artificial Intelligence (XAI) to Enhance Trust Management in Intrusion Detection Systems Using Decision Tree Model. Complexity 2021, 2021, 6634811. [Google Scholar] [CrossRef]
Hariharan, S.; Robinson, R.R.R.; Prasad, R.R.; Thomas, C.; Balakrishnan, N. XAI for intrusion detection system: Comparing explanations based on global and local scope. J. Comput. Virol. Hack. Tech. 2022, 19, 217–239. [Google Scholar] [CrossRef]
Kim, A.; Park, M.; Lee, D.H. AI-IDS: Application of Deep Learning to Real-Time Web Intrusion Detection. IEEE Access 2020, 8, 70245–70261. [Google Scholar] [CrossRef]
Viharika, S.; Balaji, N. AI-Driven Intrusion Detection Systems in Cloud Infrastructures: A Comprehensive Review of Hybrid Security Models and Future Directions. In Proceedings of the 2024 4th International Conference on Ubiquitous Computing and Intelligent Information Systems (ICUIS), Gobichettipalayam, India, 12–13 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1201–1207. [Google Scholar] [CrossRef]
Andronikidis, G.; Eleftheriadis, C.; Batzos, Z.; Kyranou, K.; Maropoulos, N.; Sargsyan, G.; Grammatikis, P.R.; Sarigiannidis, P. AI-Driven Anomaly and Intrusion Detection in Energy Systems: Current Trends and Future Direction. In Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR), London, UK, 2–4 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 777–782. [Google Scholar] [CrossRef]
Ali, M.; Pervez, S.; Hosseini, S.E.; Siddu, M.K. Intelligent parameter-based in-network IDS for IoT using UNSW-NB15 and BoT-IoT datasets. J. Frankl. Inst. 2025, 362, 107440. [Google Scholar] [CrossRef]
Ali, M.; Pervez, S.; Hosseini, S.E.; Siddhu, M.K. Evaluation and Detection of Cyberattack in IoT-Based Smart City Networks Using Machine Learning on the UNSW-NB15 Dataset.|EBSCOhost. Available online: https://openurl.ebsco.com/contentitem/doi:10.3991%2Fijoe.v21i02.52671?sid=ebsco:plink:crawler&id=ebsco:doi:10.3991%2Fijoe.v21i02.52671 (accessed on 28 June 2025).

Figure 1. Estimated annual cost of cybercrime worldwide (in trillion US dollars).

Figure 2. Annual network security revenue (in billion US dollars) (2016–2028).

Figure 3. Commonly seen network-based cyberattacks.

Figure 4. Methodology of the research.

Figure 5. Ensemble learning model architecture.

Figure 6. Deployment steps.

Figure 7. Confusion matrix of the final ensemble.

Figure 8. The ROC curve.

Figure 9. The precision–recall curve.

Figure 10. SHAP force plot 01.

Figure 11. SHAP force plot 02.

Figure 12. Confusion matrix of the final ensemble (employing UNSW-NB15 dataset).

Figure 13. The ROC curve (employing UNSW-NB15 dataset).

Figure 14. Anomaly detection results from the AWS-hosted Flask app.

Table 1. Summary of related works.

Reference	Composed of Real-Time Cloud Integration	Composed of XAI Integration	Key Contributions	Limitations of the Study
[5]/2022	x	x	Proposes a deep learning-based DDoS detection model called CNN-O-LSTM, integrating CNNs and optimized LSTM. It uses Closest Position-based Grey Wolf Optimization (CP-GWO) for optimal feature selection, enhancing detection performance across five benchmark datasets by minimizing feature correlation and optimizing LSTM parameters for improved accuracy.	This approach lacks interpretability, live data integration, and human-readable outputs. It focuses solely on DDoS attacks, not covering broader threat types. No web interface or hybrid classifier architecture limits its transparency, generalizability, and operational usability in dynamic environments.
[6]/2021	x	x	Proposes a dynamic adaptive CNN-based framework for detecting abnormal network traffic. Unlike traditional fixed pooling, the model uses dynamic adaptive pooling to enhance feature extraction. Experimental results demonstrate improved detection accuracy and lower loss compared to standard DL methods, highlighting its effectiveness in identifying evolving network attacks.	This method lacks model explainability, real-time deployment, and ensemble integration. It also focuses solely on CNNs, without leveraging complementary classifiers or hybrid strategies for broader threat detection and robustness in varying attack scenarios.
[9]/2021	x	x	Proposes an ML-based intrusion detection framework trained on the ISOT-CID dataset to identify network traffic anomalies. It introduces a novel feature, based on the variability of packet payload length, to enhance model accuracy. The framework demonstrates improved detection performance by enriching the dataset with both flow-based and interval-based features.	Lacks model interpretability, dynamic inference capabilities, and support for diverse attack types. It does not integrate hybrid classifiers or explainable AI tools like SHAP, limiting its transparency, scalability, and adaptability to evolving threats in live environments.
[17]/2023	x	x	Presents an intelligent agent-based system for detecting DDoS attacks using automatic feature extraction and selection. Leveraging the CICDDoS2019 dataset, the system combines ML with sequential feature selection to dynamically detect attacks, achieving 99.7% accuracy and outperforming existing ML-based DDoS detection methods in both speed and precision.	This agent-based DDoS detection system lacks real-time deployment, explainability, and ensemble model robustness. Furthermore, it does not demonstrate web-based visualization or SHAP-based interpretability, limiting transparency and adaptability in dynamic environments.
[18]/2021	x	x	Proposes a Bayesian optimization framework for automatically tuning hyperparameters of deep neural networks (DNNs) used in network intrusion detection systems. Evaluated on the NSL-KDD dataset, the framework significantly outperforms random search methods in terms of accuracy, precision, recall, and F1-score, providing an efficient and scalable solution for optimizing DNN-based intrusion detection.	While this study improves DNN performance through hyperparameter tuning, it lacks explainability, real-time deployment, and multi-model ensemble integration.
[13]/2020	x	x	Introduces a hybrid intrusion detection method named MGA-SVM-HGS-PSO-ANN. It uses a wrapper-based feature selection technique (MGA-SVM) and a neural network trained with a hybrid of gravitational search and particle swarm optimization. Using the NSL-KDD dataset, the model achieves 99.3% accuracy, reduces features from 42 to 4, and trains in 3 s.	This model lacks real-time deployment, explainable AI, and multi-threat handling. It focuses solely on offline analysis without a web interface or interpretability tools.
[19]/2023	x	x	The study introduces an ensemble intrusion detection model using lightweight classifiers, Gaussian Naive Bayes, Logistic Regression, and Decision Tree, with Stochastic Gradient Descent as the meta-classifier. It employs Chi-square-based feature selection and demonstrates superior performance on KDD 1999, UNSW-NB15, and CIC-IDS2017 datasets for both binary and multiclass classifications.	The study lacks integration of deep learning or real-time detection capabilities and does not incorporate explainable AI. Unlike our study, it does not address live deployment or use hybrid ANN and SVM architecture with interpretability features like SHAP.
[20]/2021	x	x	This study presents an ensemble intrusion detection framework that ranks base classifiers by F1-score for each attack category, selecting the most effective one per class instead of relying on traditional voting. This targeted ensemble approach achieves 96.97% accuracy and improves detection rates across diverse multi-attack classification environments.	The model is limited to static datasets and lacks real-time detection or adaptability. It also omits deep learning techniques and explainable AI integration, unlike our approach, which supports live traffic analysis and interpretable predictions using ANN, SVM, and SHAP-based explanations.
[21]/2021	x	x	This study introduces a binary classification model for IoT intrusion detection using a Gradient Boosting Machine ensemble. The model is trained on preprocessed packet data to detect anomalies, particularly zero-day attacks. It achieves high performance with 98.27% accuracy, demonstrating its suitability for critical IoT applications.	The model is binary-only and tailored to static datasets, lacking support for multiclass attack detection, real-time operation, or interpretability.
[22]/2021	x	x	This study proposes an ensemble-based intrusion detection system that integrates kNN, SVM, and Dempster–Shafer theory to manage uncertainty in classification. Deep learning is used for feature extraction, and ensemble margin-based sample selection improves training efficiency. The method outperforms existing models on UNSW-NB15, CICIDS2017, and NSL-KDD benchmark datasets.	The model lacks real-time detection capability and does not incorporate explainability features. Unlike our approach, it does not demonstrate live deployment or interpret model outputs using XAI, limiting transparency and real-world usability in critical network environments.
[23]/2023	x	✓	This paper presents an explainable deep learning-based ensemble intrusion detection system for industrial IoT (IIoT) security. The model integrates SHAP and LIME to enhance decision transparency. Evaluated using the ToN_IoT dataset, the approach demonstrates improved detection performance and interpretability, aiding cybersecurity experts in developing resilient IIoT systems.	While explainability is addressed, the study lacks real-time deployment and practical implementation discussion. In contrast, our research offers live traffic integration and a hybrid ANN and SVM framework with SHAP, bridging performance and interpretability in operational environments.
[24]/2020	x	x	This study proposes a boosting-powered ensemble method to identify the most efficient classifier for real-time network traffic analysis using the UNSW-NB15 dataset. By evaluating 10 classifiers, it selects XGBoost based on both accuracy and training time and demonstrates performance improvements on CPU vs. GPU environments for faster detection.	The study focuses on classifier selection and runtime optimization but lacks model explainability, hybrid learning, or live deployment integration.
[25]/2023	x	x	This study introduces an intrusion detection system for industrial IoT (IIoT) by integrating Isolation Forest (IF) and Pearson Correlation Coefficient (PCC) for efficient feature selection and outlier removal. Using Random Forest as the classifier, the model achieves over 99% accuracy on Bot-IoT and NF-UNSW-NB15-v2 datasets with reduced prediction time.	Despite strong accuracy, the model lacks deep learning integration and explainable AI components. Unlike our study, it does not offer real-time detection or transparent reasoning for predictions, which limits its interpretability and applicability in high-stakes, adaptive cybersecurity environments.
[26]/2023	x	x	This study presents a bagging ensemble framework using deep neural networks to address class imbalance in IoT intrusion detection. By integrating class weighting during training, the model achieves better generalization and balanced classification across four datasets (NSL-KDD, UNSW-NB15, CIC-IDS2017, BoT-IoT), evaluated using accuracy, F1-score, and statistical validation.	Although class imbalance is addressed, the study lacks real-time deployment and explainable AI integration. Our research provides live anomaly detection with interpretable outputs (via SHAP) and a hybrid ANN and SVM ensemble, offering greater transparency and operational relevance in security-critical environments.

Table 2. Attack categories in NSL-KDD.

Category	Description	Example Attack Types in the Dataset
DoS/DDoS	Overwhelms resources to make a system or network unavailable to legitimate users.	neptune, smurf, back, teardrop, pod, land
Probe	Attempts to gather information about the network or system for reconnaissance.	portsweep, ipsweep, nmap, satan
User to Root (U2R)	Attempts to gain root (admin) access from a normal user account.	buffer_overflow, loadmodule, perl, rootkit
Remote to Local (R2L)	Tries to gain unauthorized access to a local system from a remote location.	guess_passwd, ftp_write, imap, phf, warezclient, warezmaster, multihop, spy

Table 3. Layered ensemble architecture.

Stage	Component	Layer	Output	Connected to
Input	Input Layer	Preprocessed and selected features (e.g., 20)	Feature vector	ANN Input, SVM Input
ANN block	Dense Layer 1	64 neurons, ReLU activation	Activation map	Dense Layer 2
	Dropout Layer	Dropout rate: 0.3	Regularized activation	Dense Layer 2
	Dense Layer 2	32 neurons, ReLU activation	Activation map	Dense Layer 3
	Dropout Layer	Dropout rate: 0.2	Regularized activation	Dense Layer 3
	Dense Layer 3	16 neurons, ReLU activation	Activation map	Output Layer
	Output Layer	1 neuron, Sigmoid activation	ANN prediction (probability)	RF Input
SVM block	SVM	RBF kernel (or linear), C = 1.0	SVM prediction (probability)	RF Input
Fusion	Meta Layer (Random Forest)	Input: ANN prob + SVM prob (2 features)	Final prediction (binary output)	Decision Interface
Explainability	SHAP/visualization layer	Feature importance for RF input	Explanation plots/rankings	User/Analyst Interface

Table 4. Device specifications.

Component	Specification
Processor (CPU)	Intent Core i5-72000 2.5 GHz
Graphics card (GPU)	NVIDIA GeForce 2 GB (NVIDIA, Santa Clara, CA, USA)
RAM	16 GB DDR4
Integrated development environment	Jupyter Notebook
Libraries used	TensorFlow, Scikit-learn, Pandas, Numpy etc.

Table 5. The detailed classification report on ANN.

Label	Precision (%)	Recall (%)	F1-Score (%)
0, Normal	98	97	97
1, Anomaly	98	99	97

Table 6. The detailed classification report on SVM.

Label	Precision (%)	Recall (%)	F1-Score (%)
0, Normal	95	94	93
1, Anomaly	93	96	94

Table 7. The detailed classification report on the final ensemble model.

Label	Precision (%)	Recall (%)	F1-Score (%)
0, Normal	99	98	98
1, Anomaly	99	99	98

Table 8. Top 10 most influential features based on SHAP values.

Rank	Feature	Mean SHAP Value	Description
1	src_bytes	0.1423	Bytes transferred from source to destination; indicates data volume sent by host.
2	dst_bytes	0.1376	Bytes transferred from destination to source; reflects server-side response.
3	count	0.1198	Number of connections to the same host within a 2 s window.
4	same_srv_rate	0.1082	Percentage of connections to the same service.
5	diff_srv_rate	0.0974	Rate of connections to different services; high values may indicate scanning.
6	protocol_type	0.0921	Protocol used (e.g., TCP, UDP); can hint at abnormal protocol usage.
7	flag	0.0847	TCP flag status; used to detect connection behavior.
8	dst_host_srv_count	0.0763	Connections to the same service on the destination host.
9	dst_host_same_srv_rate	0.0689	Rate of same service usage across connections to the destination host.
10	dst_host_diff_srv_rate	0.0615	Rate of different services across connections to destination host.

Table 9. Benchmark results of the ensemble model with similar research.

Reference	Dataset Employed	Algorithm/(s) Used	Achieved Accuracy (%)	Added Value
(Hosseini & Zade, 2020), [13]	NSL-KDD	Multi-parent genetic algorithm + SVM + ANN	99.30	Not applicable
(Masum et al., 2021), [18]	NSL-KDD	Deep neural network	82.95	Not applicable
(Dr. Kalaivani, 2022), [28]	NSL-KDD	Convolutional neural network	92.23	Not applicable
(Yedukondalu et al., 2021), [29]	NSL-KDD	ANN	97.00	Not applicable
Our study	NSL-KDD	XGBoost (Gradient Boosting)	98.70	Not applicable
Our study	NSL-KDD	AdaBoost (Adaptive Boosting)	97.60	Not applicable
Our study	NSL-KDD	Ensemble of ANN, SVM, and RF	99.40	Presents an ensemble of ANN, SVM, and RF with RFE-based feature selection, achieving 99.40% accuracy. Overall, it improves classification performance and robustness, offering a practical, scalable solution for real-world binary classification tasks across multiple domains.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alabdulatif, A. A Novel Ensemble of Deep Learning Approach for Cybersecurity Intrusion Detection with Explainable Artificial Intelligence. Appl. Sci. 2025, 15, 7984. https://doi.org/10.3390/app15147984

AMA Style

Alabdulatif A. A Novel Ensemble of Deep Learning Approach for Cybersecurity Intrusion Detection with Explainable Artificial Intelligence. Applied Sciences. 2025; 15(14):7984. https://doi.org/10.3390/app15147984

Chicago/Turabian Style

Alabdulatif, Abdullah. 2025. "A Novel Ensemble of Deep Learning Approach for Cybersecurity Intrusion Detection with Explainable Artificial Intelligence" Applied Sciences 15, no. 14: 7984. https://doi.org/10.3390/app15147984

APA Style

Alabdulatif, A. (2025). A Novel Ensemble of Deep Learning Approach for Cybersecurity Intrusion Detection with Explainable Artificial Intelligence. Applied Sciences, 15(14), 7984. https://doi.org/10.3390/app15147984

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Ensemble of Deep Learning Approach for Cybersecurity Intrusion Detection with Explainable Artificial Intelligence

Abstract

1. Introduction

2. Background and Literature Review

3. Methodology

3.1. Data Acquisition

3.2. Data Preprocessing, Feature Selection, and Feature Scaling

3.2.1. Recursive Feature Elimination (RFE)

3.2.2. Standardization with StandardScaler

3.3. Model Construction

3.4. Model Evaluation

3.5. Explainability Integration

3.6. Real-Time Integration (Deployment)

4. Results and Discussion

Discussion

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI