An Enhanced XGBoost-Based Framework for Efficient Multi-Class Cyber Threat Detection in Industrial IoT Networks

Ahmed, Adel A.; Abdullah, Talal A. A.

doi:10.3390/technologies14050274

Open AccessArticle

An Enhanced XGBoost-Based Framework for Efficient Multi-Class Cyber Threat Detection in Industrial IoT Networks

by

Adel A. Ahmed

^1,*

and

Talal A. A. Abdullah

²

¹

Information Technology Department, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, Jeddah 25729, Saudi Arabia

²

IRC for Finance and Digital Economy Department, KFUPM Business School, King Fahd University of Petroleum & Minerals, Dhahran 34772, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Technologies 2026, 14(5), 274; https://doi.org/10.3390/technologies14050274

Submission received: 7 April 2026 / Revised: 26 April 2026 / Accepted: 28 April 2026 / Published: 1 May 2026

(This article belongs to the Special Issue IoT-Enabling Technologies and Applications—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Securing Industrial IoT (IIoT) network environments remains a significant challenge due to the increasing complexity of interconnected sensors, actuators, gateways, and control systems, which are frequent targets of cyberattacks. These threats can lead to operational disruptions, financial losses, and safety risks. This paper proposes an efficient multi-stage intrusion detection framework based on an enhanced Extreme Gradient Boosting (XGBoost) model for IIoT environments. The proposed framework integrates data preprocessing, class imbalance handling, hyperparameter optimization, probability calibration, and class-specific decision thresholds within a unified pipeline. In addition, calibrated probability outputs are utilized as continuous indicators of prediction confidence, enabling more reliable and risk-aware decision-making. The hierarchical multi-stage design decomposes the detection task into progressively refined classification levels, improving discrimination among complex and overlapping attack categories. The framework is evaluated using the Edge-IIoTset benchmark dataset, which reflects realistic IIoT network traffic under both normal and malicious conditions. Experimental results demonstrate that the proposed approach achieved significant performance improvements, including up to 21% increase in recall and 15% improvement in macro F1 score compared to the baseline models. Furthermore, the model exhibits low inference latency and supports efficient deployment in time-sensitive IIoT monitoring scenarios. These results indicate that the proposed framework provides an effective and scalable solution for multi-class cyber threat detection in IIoT networks.

Keywords:

IIoT; cyber threats; deep ML; XGBoost

1. Introduction

The Industrial Internet of Things (IIoT) integrates Internet connectivity into industrial equipment such as sensors, actuators, digital machinery, and other manufactured devices, which facilitate intelligent automation and enhance operational efficiency and productivity. Each IIoT device must be configured with a unique IP address to perform various smart applications without human intervention [1,2]. Moreover, IIoT devices are extremely heterogeneous, differ in capabilities, and have very limited resources in terms of storage capacity and processing complexity, input/output hardware features, and source of energy. IIoT devices collect and transmit vast amounts of data, which enables intelligent decision-making and automation. However, the rapid deployment of IIoT devices has often prioritized functionality and cost-effectiveness over robust security. The integration of Internet connectivity into operational technology (OT) environments significantly increases the attack surface of IIoT systems, thereby requiring robust and carefully designed cybersecurity solutions [3,4,5,6]. The weaknesses of traditional security solutions originate from the fundamental misalignment between traditional industrial requirements for resilience and uptime, which demand Internet-exposed technology. Moreover, the three challenging categories are device management, network architecture, and systemic organizational, which suffer from vulnerabilities in sensor credentials that provide an immediate entry point for malicious actors. Furthermore, devices are frequently deployed with insecure default settings that are rarely reconfigured during installation, which leaves insecure default services like SNMP or FTP exposed to the open web. Due to the extended lifespans of industrial equipment and the significant operational downtime required for maintenance, patching cycles are often inadequate or non-existent, which leaves known security flaws unaddressed for years [4,5,6].

The problem extends to the protocol layer. Most traditional industrial protocols have been developed before the existence of cyber resilience mechanisms. For example, standard Message Queuing Telemetry Transport (MQTT) does not have built-in encryption or adequate authentication measures. This makes machine-to-machine (M2M) communication highly susceptible to Man-in-the-Middle (MITM) interceptions. These risks directly affect remote monitoring. While remote access is necessary for distributed systems, poorly secured VPNs or insufficient authentication often bypass traditional defense mechanisms. Eventually, these overlapping vulnerabilities extend from unpatched firmware to unencrypted traffic that necessitates a shift toward low-latency, AI-driven detection to ensure industrial sustainability and privacy [7,8,9,10].

1.1. Cyber Attacks in IIoT Environments

The framework of IIoT exhibits the features of resource-constrained devices, real-time communication traffic, and heterogeneous architectures, which make them more vulnerable to various attack scenarios such as distributed denial-of-service (DDoS) attacks, injection attacks, malware attacks, port scanning, and network attacks. Moreover, the growing connectivity of IIoT devices increases the scope for potential attacks, which enables attackers to conduct multi-layered attacks. As illustrated in Figure 1, cyber threats pose challenges for connecting and coordinating IIoT elements like sensors, digital equipment, and edge gateways [9,10,11,12].

The consequences of cyber threats on IIoT can be described as follows:

System Downtime: attacks on industrial control systems (ICSs) can lead to cascading failures that reduce overall productivity and availability.
Data Integrity and Confidentiality Breaches: Unauthorized access or manipulation of sensory and operational data can compromise decision-making processes.
Economic and Physical Risk: The financial burden of recovery is often compounded by physical safety risks, particularly in high-stakes fields like healthcare or autonomous manufacturing.
Lateral Movement and Propagation: Attackers can exploit compromised nodes to propagate across the network, which will affect additional devices and system components.

Due to increasing cyber threats, there is a critical need for efficient and intelligent detection mechanisms capable of identifying and mitigating attacks across multiple threat categories in IIoT environments [12,13,14,15].

1.2. Problem Statement and IIoT Challenges

Existing IIoT cybersecurity approaches face significant practical limitations because dynamic cyberattacks and new polymorphic/metamorphic cyber threats cannot be efficiently detected by signature-based detection. These dynamic malware forms are difficult to detect with conventional methods, which will leave IIoT deployments highly vulnerable. In addition, many advanced security solutions rely on computationally intensive models that require substantial processing power, memory, and energy resources. Such requirements are not always compatible with the constraints of IIoT systems, where edge devices often operate under limited computational capacity. This creates a gap between detection performance and practical deployments. Another key challenge lies in the accurate discrimination between multiple attack types with overlapping behavioral patterns, particularly in highly imbalanced datasets. This issue becomes more obvious in fine-grained intrusion detection tasks, where minority attacks are often misclassified. The main motivation of this research is to develop an efficient multi-class intrusion detection framework that balances detection performance with computational efficiency. The objective is to enable the accurate and low-latency identification of cyber threats in IIoT environments while maintaining scalability across diverse attack categories.

1.3. Research Contribution

The main contributions of this research are summarized as follows:

A hierarchical multi-stage intrusion detection framework is proposed for IIoT environments, which systematically decomposes the detection task into binary, group-level, and fine-grained classification stages. This design improves scalability and reduces inter-class confusion for closely related attack categories.
A confidence-aware decision mechanism is introduced by integrating probability calibration with class-specific decision thresholds. This mechanism enhances prediction reliability and improves detection performance under class imbalance and overlapping feature distributions.
An efficient optimization pipeline tailored to IIoT data is developed, which combines data preprocessing, class imbalance handling, hyperparameter tuning, and macro F1 score-driven optimization. While optimization is performed offline, the resulting model supports low-latency inference that makes it suitable for time-sensitive IIoT environments.
A comprehensive experimental evaluation is conducted on the Edge-IIoTset benchmark dataset including ablation analysis and comparison with machine learning and hybrid deep learning baselines (e.g., CNN, LSTM, and CNN–GRU). The proposed framework achieved significant improvements with up to 21% increase in recall and 15% improvement in macro F1 score, particularly for minority and complex attack classes.

The rest of this paper is organized as follows. Related works on detecting cyber threats are presented in Section 2. The system design of the proposed system is explained in Section 3. Section 4 describes the implementation and evaluation of the proposed system on IIoT. Limitations and future works are described in Section 5. Finally, Section 6 presents our conclusions.

2. Related Works on Detection of Cyber Threats on IIoT

Several research studies in the literature have focused on improving the baseline XGBoost model through hybrid architecture, feature engineering, or balancing techniques to handle the complex, imbalanced datasets typical of IIoT environments. Recent studies have explored tree-based ensemble learning models for IIoT intrusion detection due to their efficiency and strong performance on structured data. For instance, several works [16,17] have employed XGBoost and Light Gradient Boosting Machine (LightGBM) for detecting cyber threats in IIoT environments by proving high accuracy and scalability on large-scale datasets such as Edge-IIoTset. However, these approaches primarily focus on standard classification pipelines without incorporating probability calibration or risk-aware decision mechanisms. Furthermore, their performance often degrades in the presence of class imbalance, particularly for minority attack categories. The recent study is a network intrusion detection method based on XGBoost, which was evaluated on the ToN-IoT dataset [18]. The authors in [18] investigated feature reduction alongside SHAP-based interpretability analysis. Furthermore, the authors in [19] employed the detection of cyberattacks on the IIoT environment through advanced analytical techniques that explored the application of XGBoost ML algorithms. The authors used the KDD-99 and NSL KDD datasets that created by the U.S. Department of Defense and MIT to evaluate the performance of their models in terms of accuracy, precision, recall, and F1 score. Also, the research work in [20] studied the application of XGBoost and long-short-term memory (LSTM) AI models for cyberattack detection in a cyber–physical systems (CPSs), which were tested on a gas pipeline industrial control system dataset and other benchmark datasets, such as NetML-2020 and IoT-23, which contain various cyberattacks. Recent research [21] combined the feature extraction power of neural networks with the classification efficiency of XGBoost. The authors in [21] introduced a hybrid FFNN–XGBoost model that outperformed standalone versions by leveraging feed-forward neural networks (FFNNs) to handle complex IoT traffic patterns while maintaining XGBoost’s high-speed detection capabilities. The authors in [22] proposed an IIoT IDS approach based on the XGBoost model, which solved the imbalance of the multiclass data distribution in IIoT datasets such as TON_IoT and X-IIoTID. In a recent study [23], an optimized XGBoost model was proposed with feature selection techniques to improve the detection accuracy in IIoT networks. However, the achieved performance did not consider the multi-stage learning techniques and confidence prediction for the outcomes.

The development of intrusion detection systems based on deep learning has attracted significant interest in recent years. For instance, a convolutional neural network (CNN) is applied to extract spatial features from network traffic, while long-short-term memory (LSTM) captures temporal dependencies in sequential data. Also, a combination of the CNN–LSTM model proposed in [24,25] provided better results when detecting more complicated attacks from IIoT network traffic. However, such models typically require high computational resources and extensive training time, which limit their applicability in real-time industrial environments. The autoencoder (AE) approach is also combined with CNN–LSTM to create a more advanced hybrid approach, which is called AE–CNN–LSTM [26]. It can detect intrusion in IIoT environments with high accuracy. Nevertheless, the model suffers from certain difficulties in terms of complexity and interpretability.

Addressing multi-class classification and class imbalance remains a significant challenge in IIoT intrusion detection. Some recent papers have considered using oversampling methods like Synthetic Minority Over-sampling Technique (SMOTE) and weight classes to overcome the problem of detecting minorities. For example, Refs. [27,28,29,30] used the SMOTE approach with ensemble methods to enhance detection performance for fewer attack classes. While some improvement was gained, the creation of synthetic samples can cause noise and overfitting in highly unbalanced IIoT datasets. Other research works like [31,32,33,34] investigated the use of cost-sensitive learning and focal loss functions in deep learning architectures. While this helps improve the ability to detect minor classes, it makes the process difficult to optimize. However, there are many weaknesses in the current research work:

Most studies adopt single-stage classification frameworks, which limit their ability to capture hierarchical relationships between attack categories.
Lack of probability calibration and confidence-aware decision-making, which are essential for real-world deployment.
High computational complexity of deep learning and hybrid models, which is making them less suitable for resource-constrained IIoT environments.

The proposed enhanced XGBoost-based framework is designed to enhance a computationally efficient, multi-stage, and imbalance-aware framework that supports multi-class cyber threat detection and maintains scalability for real-world IIoT deployment.

3. System Design of Cyber Threat Detection Algorithm

In this section, the system design of an enhanced XGBoost model for detecting cyberattacks in IIoT networks is described. While deep learning architecture generally depends on the design of complicated spatial-temporal architectures, gradient boosting models tend to be more efficient because of their ability to analyze structured tabular data. The proposed system is designed to balance high-fidelity detection with the low-latency requirements of edge deployments. Figure 2 depicts the proposed system architecture. This system is composed of the following steps: dataset preprocessing, stratified splitting, imbalance correction and normalization, automated hyperparameter tuning, ensemble modeling, probability calibration, validation, and interpretability analysis.

3.1. Dataset Selection and Preprocessing

The Edge-IIoTset dataset is used in the proposed system to mimic an industrial testbed. It comprises approximately 2.2 million flow instances, where each instance is represented by 63 informative numerical features extracted from raw packets and aggregated flows, capturing temporal, transport, and statistical characteristics of IIoT traffic. These features include flow durations, packet counts, protocol-specific indicators, and derived statistics such as inter-arrival times and size distributions, providing rich input for deep learning models. The dataset and its details are available on IEEE DataPort [35,36]. The raw data traffic was pre-processed by cleaning the redundant identifiers and variables with missing values, which would cause instabilities. Label encoding was used on the categorical variables so that they could be compatible with the tree algorithms used later. The grouping of attack classes is based on their functional behavior and attack characteristics, forming a structured knowledge representation that guides the hierarchical classification process.

3.1.1. Partitioning Method

In order to train, validate, and test the enhanced XGBoost, we used a stratified 60/20/20 split. Preserving class distribution is vital in the Edge-IIoTset dataset, particularly for rare classes like MITM. The validation set was strictly reserved for hyperparameter tuning and threshold refinement, while the test set remained unseen until the final evaluation to ensure an unbiased generalization analysis.

3.1.2. Handling Imbalance and Feature Scaling

Class imbalance is a common challenge in IIoT intrusion detection datasets due to the high proportion of benign traffic and the relatively low number of samples for certain attack types. For instance, in the Edge-IIoTset dataset, minority classes such as MITM and fingerprinting contain significantly fewer instances compared to majority classes such as SQL injection, which includes over 50,000 samples. This imbalance can bias the model toward majority classes and reduce its ability to detect rare but critical attacks. To address this issue, random oversampling was applied exclusively to the training dataset, where minority class samples are duplicated to achieve a more balanced class distribution. This approach was selected due to its simplicity, effectiveness, and low computational overhead, making it suitable for large-scale IIoT data. Importantly, no oversampling was applied to the validation or testing sets, ensuring an unbiased evaluation of model performance. To mitigate potential overfitting associated with oversampling, several strategies were employed. First, the dataset was partitioned into independent training, validation, and testing subsets, where the validation set was used for hyperparameter tuning, threshold optimization, and model selection. Second, the inherent regularization mechanisms of XGBoost, including tree pruning and learning rate control, contribute to improved generalization. Third, probability calibration and class-specific decision thresholds further enhance the prediction reliability and reduce overconfident classifications. Additionally, the consistency of performance across the training, validation, and testing sets indicates that the model does not suffer from significant overfitting. For example, the macro F1 score remained stable across different data splits, confirming the robustness of the proposed framework. All numeric features were standardized to ensure uniform contribution during model training and to improve convergence behavior. Furthermore, recent studies have explored more advanced data augmentation techniques, such as generative adversarial networks (GANs), to generate realistic synthetic samples for minority classes. For example, the study in [37] demonstrated the effectiveness of GAN-based methods in improving detection performance on imbalanced IoT datasets. While such approaches offer promising results, they introduce additional computational complexity and training overhead. Therefore, in this work, a simpler and more efficient oversampling technique was adopted, while GAN-based augmentation is considered for future investigation.

3.2. Automated Hyperparameter Optimization

We chose not to tune the hyperparameters manually and used the automatic approach of Bayesian optimization from the Optuna package. Specifically, we used a grid search procedure where we maximized the macro-F1 score in terms of validation data by tuning the learning rate, tree depth, and regularization parameters. This ensures that the configuration is optimized for class-wide performance rather than just overall accuracy.

3.3. Training of the Enhanced XGBoost Ensemble

The final model was trained using the additive learning mechanism of XGBoost, which is specifically designed to find nonlinear relationships in flow statistics characteristic of complex attacks. By considering residual errors from each iteration, it creates clear decision boundaries even among overlapping traffic flows. This allows the model to produce calibrated probabilities that will form the basis for our final scoring system.

3.4. Probability Calibration and Threshold Optimization

The raw output probabilities obtained from ensembles such as XGBoost are prone to miscalibration, which leads to unreliable decision-making due to overly confident or cautious predictions. To address this issue, we integrated the probability calibration component that uses either sigmoid scaling or isotonic regression to match the model’s prediction results closer to the true frequency of classes. Moreover, the architecture incorporates class-specific decision threshold optimization on the validation dataset. Instead of applying a uniform decision threshold across all classes, the system finds an optimal threshold value that maximizes the F1 score for each class separately and considers the cost imbalance that is typical for intrusion detection tasks. As a result, this optimization helps increase sensitivity to minor intrusions, such as MITM, while controlling the number of false alarms.

3.5. Model Evaluation

The calibrated and threshold-optimized models are evaluated on the separated test set using several performance metrics that include precision, recall, macro average F1 score, and accuracy. Besides providing class labels, the model calculates calibrated class probabilities for each instance, which serves as a measure of prediction confidence. The highest predicted probability for each instance can be considered as a classification confidence metric, which allows for a deeper understanding of the model’s prediction reliability. Thus, probability information allows for a better interpretation of network traffic and enables users to determine confidently predicted instances from those with low prediction certainty. Overall, the use of probability calibration and class-specific decision thresholds enhances the reliability of decisions made by the proposed model compared to conventional classification models, as will be proven in Section 4.

3.6. Statistical Validation and Reliability Analysis

To ensure that any improvements in the performance of the proposed algorithm are statistically significant and not just artifacts of sampling errors, we incorporated statistical testing in the proposed model. The bootstrap sampling technique was used to calculate the 95% confidence interval for macro F1 scores, which will help to determine performance variance. Moreover, a comparative evaluation was performed between the proposed method and baseline algorithms through McNemar’s test and the Wilcoxon signed-rank test. The corresponding significance tests allowed verifying that the reported improvements in performance metrics were mathematically consistent and significant.

3.7. Framework Operation and Workflow

The operational workflow of the proposed enhanced XGBoost-scoring framework is designed as a modular pipeline that transforms raw IIoT traffic into calibrated threat scores. As can be illustrated in Algorithm 1, the system operates in a sequential and modular manner by transforming raw IIoT traffic records into calibrated threat scores through preprocessing, optimized ensemble learning, and statistically validated decision-making. The overall operation is designed to ensure robustness under class imbalance, probabilistic reliability, and low-latency detection of both known and emerging threats. At runtime, the framework accepts pre-extracted flow-level traffic features from IIoT monitoring infrastructure. These features are first standardized and validated before being processed by the optimized XGBoost classifier. The trained model outputs calibrated class probabilities, which are subsequently refined through class-specific decision thresholds to generate a final threat score and attack label. The operation of the proposed framework consists of the following phases:

Phase 1: Data Acquisition and Preprocessing.
The raw IIoT traffic records are collected and converted into structured feature vectors. Also, the non-numeric attributes are removed, and the categorical labels are encoded. The missing values are handled while feature scaling is applied to ensure numerical stability.
Phase 2: Dataset Partitioning.
The dataset is partitioned into training, validation, and test sets to preserve class distributions and support unbiased performance evaluation.
Phase 3: Training Dataset.
To mitigate class imbalance, oversampling is applied exclusively to the training set, which enables equitable learning across minority attack categories.
Phase 4: Model Optimization.
Bayesian hyperparameter optimization is performed using the validation set to identify the optimal XGBoost configuration. The optimized ensemble is then trained on the imbalanced dataset.
Phase 5: Probability Calibration and Threshold Selection.
The trained model’s probability outputs are calibrated to improve confidence and reliability. Class-specific decision thresholds are optimized to maximize detection effectiveness under asymmetric misclassification costs.
Phase 6: Threat Scoring and Inference.
For each unseen traffic instance, the system computes calibrated class probabilities and assigns a final threat score, which supports alert prioritization and intrusion response. Class-specific decision thresholds are first applied to identify high-confidence candidate classes. If one or more classes satisfy their respective thresholds, the final prediction is assigned to the class with the highest probability among these candidates. Otherwise, a fallback mechanism selects the class with the maximum predicted probability (argmax).
Phase 7: Statistical Validation and Interpretability (Offline Analysis).
Statistical tests and explainability analyses are conducted to validate performance stability and identify influential traffic features.

Algorithm 1. Pseudo Code of Enhanced XGBoost-Based Threat Scoring Framework
Enhanced XGBoost Framework at IIoT Security Analyzer (Training Phase)
	$Input : Edge dataset D = {X, Y}$ $and Feature matrix X \in R^{N \times F};$ $Hyperparameter search space Θ;$ Oversampling method; calibration using Isotonic Regression;
	$Output : Optimized XGBoost model M;$ Calibrated probability function; Class-specific decision thresholds;
	Start Algorithm (XGBoost-Train)
1	\| If model training is initiated then:
2	\| $Load dataset D = {X, Y};$
3	\| $Remove non-numeric, redundant, and invalid features from X;$
4	\| Encode categorical attributes into numeric form;
5	\| Handle missing values using statistical imputation;
6	\| $Standardize numeric features in X;$
7	\| $Split D$ $into stratified subsets : D_{t r a i n}, D_{v a l}, D_{t e s t};$
8	\| $Apply oversampling to D_{t r a i n}$ to balance class distribution;
9	\| $Initialize XGBoost hyperparameter space Θ;$
10	\| $For each trial θ \in Θ$ do //Hyperparameter optimization
11	\| $Train XGBoost model M_{θ}$ $on D_{t r a i n};$
12	\| $Evaluate M_{θ}$ $on D_{v a l}$ using macro-F1 score;
13	\| End; // Hyperparameter optimization loop
14	\| $Select optimal parameters θ^{*} = a r g {m a x}_{θ} ({F 1}_{m a c r o});$
15	\| $Train final XGBoost model M$ $using θ^{*};$
16	\| $For each class c$ do: //Threshold optimization
17	\| $Determine optimal decision threshold τ_{c}$ maximizing F1 score;
18	\| End; // Threshold optimization loop
19	\| End; // Training Phase
20	End; // Algorithm
Enhanced XGBoost Framework at IIoT Security Analyzer (Inference Phase)
	$Input : Optimized XGBoost model M;$ $Calibrated probabilities using Isotonic Regression;$ $Optimal decision Thresholds \{τ_{c}\} on D_{v a l}$ $maximizing class-wise F 1 score;$ Unseen traffic instance x_i;
	$Output : Predicted attack class {\hat{y}}_{i};$ $Threat score S_{i} \in [0,1];$
21	Start Algorithm (XGBoost-Inference)
22	\| While (new traffic instance received) do:
23	\| $Receive IIoT traffic feature vector x_{i};$
24	\| $Apply preprocessing and standardization to x_{i};$
25	\| $Compute class probability P = M . Predict_Proba (x_{i}$ );
26	\| $Apply probability calibration to P;$
27	\| $For each class c$ do: //Class evaluation
28	\| $If P_{c} \geq τ_{c}$ $then / / Thresholds τ_{c}$ are used for high-confidence alerting, while argmax determines final class
29	\| Add class c to candidate set C;
30	\| End If;
31	\| End For; // Class evaluation loop
32	\| If C is not empty then
33	\| $\hat{y}$ $= argmax_{c \in C} (P_{c}$ ); //Select highest probability among candidates
34	\| Else
35	\| $\hat{y}$ $= argmax (P_{c}$ ); // Fallback to global maximum probability
36	\| End If;
37	\| $Compute threat score S_{i}$ $= \max (P_{c}$ );
38	\| $Output ({\hat{y}}_{i}, S_{i});$
39	\| End; // While loop
40	End; // Algorithm

4. Implementation and Performance Evaluation

In this study, the experimental framework was implemented on a local workstation using an Hewlett-Packard (HP) ProBook 640 G8 notebook equipped with 16 GB of RAM. The proposed system was developed in Python (version 3.10), leveraging the TensorFlow deep learning framework and the Keras high-level API for model implementation and training. All experiments were conducted in a controlled offline environment to ensure reproducibility and consistent performance evaluation. The use of a standard computing platform is intended to provide a fair and stable environment for model training, hyperparameter optimization, and comparative analysis across baseline methods. It is important to note that this setup does not aim to replicate deployment on resource-constrained IIoT edge devices. All preprocessing, training, hyperparameter optimization, and evaluation procedures were executed using the same hardware and software configuration to ensure fair and consistent performance comparisons.

4.1. Dataset and Preprocessing: Edge-IIoTset

The proposed framework was evaluated using the Edge-IIoTset dataset, which is considered a comprehensive benchmark that is publicly designed for intrusion detection in IIoT environments [35,36]. The dataset contains realistic network traffic generated by various testbeds consisting of edge devices, sensors, and cloud-based services, which enables a highly accurate simulation of industrial communication processes. Additionally, the dataset contains legitimate network flows and an extensive list of cyberattacks like DDoS, injections, scanning, and malware behavior. The 63 features have been described in detail in Table VII of Edge-IIoTset [35,36], which lists all network flows including protocol and statistical information that are captured directly from the packets. Before model training, the raw Edge-IIoTset data were preprocessed to remove redundant and non-informative attributes including constant-value features and non-relevant identifiers. All features were converted to numeric format, and categorical attributes were encoded using label encoding. Missing values were handled by replacing them with appropriate statistical measures (e.g., mean values for numerical features). In addition, feature scaling was applied using standardization to normalize the data distribution. To address the inherent class imbalance present in IIoT security data, the RandomOverSampler technique was applied exclusively to the training subset, ensuring that the validation and test sets remained unbiased. The Edge-IIoTset dataset was partitioned into 60% training, 20% validation, and 20% testing splits. A larger validation set was chosen to ensure statistically reliable hyperparameter tuning, threshold optimization, and calibration evaluation across 15+ classes. The testing partition was held strictly out of the tuning process to provide an unbiased performance estimation. This partition strategy balances the need for adequate training data with robust model selection and evaluation, which is consistent with best practices in multiclass intrusion detection research. However, it is acknowledged that the validation set is reused across multiple optimization stages, which may introduce a risk of overfitting to the validation distribution. To mitigate this, the test set remains completely unseen during all optimization steps, and the observed consistency between the validation and test performance indicates strong generalization capability. Nevertheless, future work will consider more rigorous validation strategies, such as nested cross-validation or separate hold-out subsets for calibration and threshold optimization, to further reduce potential bias. Table 1 summarizes the proposed distribution of the Edge-IIoT dataset and also shows the imbalanced distribution of each attack. The final configuration of our enhanced XGBoost model is summarized in Table 2. We utilized the Optuna framework on the validation set to tune hyperparameters, specifically aiming to maximize the macro-averaged F1 score. Finally, per-class decision thresholds and probability calibration were integrated to refine the model’s predictive reliability.

4.2. Evaluation Metrics

In order to measure the effectiveness of the enhanced XGBoost method, several different evaluation metrics, such as precision, recall, F1 score, and the macro-average of the F1 score were considered to be primary evaluation criteria. Specifically, precision is the ratio of true positives to all cases that have been marked as an attack by the machine learning classifier. Recall can be seen as a measure of the ability of the classifier to detect real malicious traffic. The F1 score is the average of precision and recall. In order to guarantee that both majority and minority attack instances receive the same level of importance when optimizing the detection algorithm, the macro-average F1 score was the main metric used for tuning and comparisons. Class-wise decision thresholds were optimized to increase detection reliability for each class of attacks. Also, the probabilities obtained from XGBoost were used to determine the continuous threat score of each traffic instance. This value shows the confidence level of the model and can help in making an efficient decision in terms of risk awareness for IIoT security monitoring systems. Table 3 summarizes the various evaluation metrics with the following notations: TP—true positives, TN—true negatives, FP—false positives, and FN—false negatives.

4.3. Baseline Models and Comparison Protocol

In order to assess the effectiveness of the proposed XGBoost-based cyber threats model, a comprehensive comparison study was performed using various benchmarking classifier techniques that belong to different classes of machine learning approaches. The design of such a comparison was designed to ensure fairness and reflect recent trends in IIoT intrusion detection research. To ensure a fair comparison, all baseline models were systematically tuned using a consistent hyperparameter optimization strategy. In particular, key parameters for each model were optimized using a validation-based search to maximize the macro-averaged F1 score. This ensured that all models operated under near-optimal configurations and that the performance differences reflected model capability rather than suboptimal parameter settings.

The selected baseline models included:

Support Vector Machine (SVM): A classical supervised learning algorithm that is widely used in network intrusion detection due to its effectiveness in high-dimensional spaces and its strong generalization capability.
Baseline XGBoost: An optimized gradient boosting framework that utilizes second-order optimization and regularization techniques.
LightGBM: A tree-based ensemble learning method that employs gradient boosting with histogram-based optimization.
Multi-Layer Perceptron (MLP): A feedforward neural network consisting of fully connected layers. It is capable of learning nonlinear feature representations but lacks the ability to capture temporal dependencies.
Deep Neural Network (DNN): An extension of MLP with increased depth and complexity, enabling more expressive feature learning. However, it may suffer from overfitting and requires careful tuning.
Convolutional Neural Network (CNN): A convolutional model adapted for structured input representations, serving as a lightweight deep model.
Long Short-Term Memory (LSTM): A recurrent neural network designed to capture temporal dependencies in sequential data. It is particularly effective for modeling time-dependent attack patterns in network traffic.
Hybrid AE + CNN + LSTM: An autoencoder combined with convolutional and recurrent LSTM layers, which represents a recent trend in deep learning for security analytics and temporal dependencies.
Hybrid CNN–GRU: A hybrid deep learning architecture that integrates convolutional layers for feature extraction with gated recurrent units (GRUs) to capture sequential dependencies. This model is designed to efficiently learn both the spatial and temporal characteristics of network traffic while maintaining lower computational complexity compared to an LSTM-based architecture.

The inclusion of the hybrid AE + CNN + LSTM model aims to capture both feature representation learning (via the autoencoder), spatial pattern extraction (via CNN), and temporal dynamics (via LSTM), reflecting state-of-the-art deep learning approaches in intrusion detection. All baseline models were trained and evaluated on the same preprocessed Edge-IIoTset dataset with identical stratified training, validation, and testing splits to ensure consistency.

4.4. Performance Evaluation and Comparison

The performance evaluation was divided into three stages of assessment: binary classification, group threat classification, and multi-class traffic classification. In each stage, the proposed system is evaluated using a classification report, confusion matrix, and receiver operating characteristic (ROC), and it is also compared with baseline algorithms.

4.4.1. Binary Attack Detection

The first stage aims to distinguish between normal and attack traffic. To ensure the robustness of the proposed model, performance was evaluated on training, validation, and test sets. The validation results closely matched the training performance, which indicates that the model does not suffer from overfitting and generalizes well to unseen data. Table 4 presents a comparative evaluation of the proposed Stage 1 binary classifier across the training, validation, and test datasets. The results demonstrated a highly consistent performance, with accuracy exceeding 99.4% in all cases. The macro F1 score remained stable at 0.99, which indicates balanced classification between normal and attack classes. Furthermore, the model achieved perfect recall for the normal class 1.00, which ensures that the legitimate traffic is almost never misclassified. For the attack class, precision reached 1.00 and recall remained consistently high at 0.98 across all datasets, which confirms the absence of false positives. These findings are visualized in the confusion matrix and ROC curve in Figure 3. As illustrated in Figure 3b, the ROC curve is concentrated toward the top-left corner of the plot. This shows that there was a very high value of the true positive rate (TPR) even at the lowest possible value of the false positive rate (FPR). Also, the area under the curve (AUC) was very close to the perfect value of 1.0, with an estimated value of 0.9998.

Results and Discussion

The performance comparison between the enhanced XGBoost classifier and the baselines for binary detection during Stage 1 is shown in Figure 4. The overall accuracy obtained from our proposed method was 99.5% with a 99% F1 score and a high 100% precision score. Compared with the baseline XGBoost algorithm, there was a marked improvement of 10% recall from 89% to 99% in the enhanced XGBoost. This is mainly due to the integration of advanced optimization strategies within the proposed framework including automated hyperparameter tuning, probability calibration, and class-specific decision thresholds, which help the proposed model detect attacks better than before. In the baseline algorithms, it was observed that LightGBM stood out as one of the most competitive peers with our approach in terms of accuracy and precision. However, when it came to increasing the complexity of classifications, as will be explained in the next two stages, there was less stable performance by the LightGBM algorithm. Moreover, deep learning models including MLP, DNN, CNN, and LSTM achieved strong performance, with accuracy values around 98%. However, their F1 scores and recall values were generally lower than those of our model. For example, LSTM achieved a recall of 91%, which indicates missed attack samples, while CNN and DNN showed slightly lower F1 scores 95%. Additionally, these models typically require higher computational resources and longer training times, which may limit their applicability in real-time IIoT environments. Finally, the hybrid AE+CNN+LSTM model improved performance to 99% accuracy and 98% F1 score, which demonstrates the benefit of combining multiple deep learning techniques. However, this comes at the cost of increased model complexity compared to the proposed approach. In contrast, the SVM showed poor performance with a recall rate of 67%, which proved the ineffectiveness of the model in handling high-dimensional IIoT traffic. Eventually, the Stage 1 results demonstrated that our enhanced XGBoost provides the optimal balance of detection sensitivity and computational efficiency required for a high-performance intrusion detection system.

4.4.2. Group Classification

The original fifteen traffic classes were consolidated into six higher-level groups based on their functional threat characteristics. Under this taxonomy, the DDoS group comprises ddos_http, ddos_icmp, ddos_tcp, and ddos_udp, which represent protocol-based denial-of-service attacks. The Injection group includes sql_injection and XSS, which target application-layer vulnerabilities. The Malware group consists of ransomware, backdoor, and uploading, which capture various forms of malicious software behavior. The Network group contains fingerprinting, MITM, and password attacks, which are primarily related to network-level exploitation and unauthorized access. The Scanning group includes port_scanning and vulnerability_scanner, which represent reconnaissance activities. Finally, normal traffic is treated as a separate class representing benign network behavior. Table 5 presents the performance of the Stage 2 group classification model across the experimental splits. The enhanced model achieved an accuracy of 98.66% on the training set, which slightly decreased to 98.00% and 97.99% on the validation and test sets respectively. Similarly, the macro F1 score dropped from 0.9463 for the training to 0.9150 for the validation and 0.9148 for the test sets. The small and consistent performance gap indicates that the model generalizes well to unseen data with minimal overfitting. From a detection perspective, the attack recall remained relatively high at 0.93 for training and 0.90 for both the validation and test sets. This suggests that the majority of attack instances were correctly classified into their respective groups, although a small portion was still misclassified. The attack precision remained high at 0.97 for training and 0.94 for the validation and test sets, which indicates that the attack group was predicted correctly with limited false positives.

The normalized confusion matrix in Figure 5a shows that the normal class achieved perfect classification with 1.00, which means that the legitimate traffic has been consistently identified without misclassification. This is a critical requirement in IIoT systems to avoid unnecessary alerts. Similarly, the DDoS class showed excellent performance with a correct classification rate of 0.99. This demonstrates that high-volume and distinct attack patterns are easily recognized by the proposed model. The Malware and Scanning classes achieved high performance with correct classification rates of 0.90 and 0.94, respectively. In contrast, the Injection and Network attack groups exhibited relatively lower classification accuracy at 0.80 and 0.75, respectively. These results are further supported by the Stage 2 ROC analysis in Figure 5b. The micro-average AUC of 1.000 indicates near-perfect overall classification performance when aggregating all classes. At the class level, most categories achieved extremely high AUC values: DDoS and Normal classes achieved perfect AUC scores (1.000), which indicates linearly separable with virtually no overlap with other categories. Malware and Scanning also exhibited excellent performance with AUC values of 0.999, which demonstrates minimal classification ambiguity. Injection and Network classes showed slightly lower AUC values of 0.997 and 0.996, respectively. The ROC analysis confirmed that the proposed Stage 2 classifier achieves near-optimal class separability across all attack groups. Although this multi-class approach is relatively complicated compared to the binary approach used in the previous stage, it has been proven that its performance levels remain stable across all trials.

Results and Discussion

As shown in Figure 6, the enhanced XGBoost model outperformed all baseline methods for the Stage 2 group classification task. The proposed framework reached a maximum accuracy equal to 98% and an F1 score of 92.1%, with a balanced recall equal to 90% and a precision of 94%. Comparing the enhanced XGBoost with the baseline XGBoost model, the latter obtained an accuracy of 93% and F1 score of 78.5%. In turn, it showed an evident advantage in comparison with the standard model in terms of all evaluated metrics including increased recall value by 21% (from 69% to 90%). This is primarily due to the integrated optimization strategies within the proposed framework including automated hyperparameter tuning, probability calibration, and class-specific decision thresholds. Among the evaluated baselines, the calibrated LightGBM model achieved the best performance, with an accuracy of 93% and an F1 score of 86%, outperforming the standard LightGBM model and confirming the positive impact of probability calibration. The deep learning methods (MLP, DNN, CNN, and LSTM) provided similar levels of accuracy in terms of percentage values between 93% to 95%, F1 scores of 75% to 77%, and recalls of 66% to 76%. This suggests that although these methods may produce accurate classification in overall terms, their balanced performance across all categories is relatively poor. As can be seen, the LSTM model demonstrated the lowest level of recall, equal to 66%, which implies that attacks may be missed. In this regard, the combined AE + CNN + LSTM approach showed an increased accuracy of 95% and F1 of 79%, providing some additional improvements for this category of attacks but at the cost of increased computational complexity. Finally, SVM obtained poor results in comparison with the other methods, obtaining only 41% for F1 score.

4.4.3. Multiclass Classification

The performance of the Stage 3 classifier across the training, validation, and testing phases is summarized in Table 6. We observed a high degree of consistency in accuracy, which began at 98.69% during training and remained stable at 98.15% on the test set. Similarly, the macro F1 score showed negligible fluctuation at 91% across both validation and testing. This narrow gap between the training and evaluation phases suggests that the model generalizes well and is not suffering from significant overfitting. From a detection perspective, the attack recall decreased slightly from 0.90 for the training to 0.88 for the testing set. However, the attack precision remained relatively high with 0.93 for the testing, which indicates that most predicted attack classes were correct. Overall, these metrics confirm that the framework maintains an effective balance between sensitivity and precision, even when tasked with the increased complexity of a 15-class classification problem.

The normalized confusion matrix in Figure 7a provides details about the classification behavior across all 15 attack classes. Several classes including ddos_icmp, ddos_tcp, and ddos_udp achieved perfect classification (1.00). The normal traffic also achieved 1.00 which confirms that benign traffic is never misclassified. Moreover, both vulnerability_scanner and backdoor experienced 0.95 classification, which shows very strong performance. However, some classes demonstrated good but not perfect performance, such as ddos_http with 0.92, port_scanning with 0.91, and uploading with 0.88, which maintained high classification rates. The ransomware and sql_injection showed moderate confusion with 0.82 and 0.83, respectively. However, certain classes exhibited notable misclassification, particularly XSS, MITM, and fingerprinting, which achieved lower classification rates of 0.73, 0.68, and 0.51, respectively. This behavior can be attributed to several factors. First, feature overlaps between some attack types, especially those operating at similar network or application layers, which lead to less separable patterns and increase classification ambiguity. Second, class imbalance affects the learning process, as minority classes such as fingerprinting and MITM have fewer training samples, which limits the model’s ability to generalize effectively. Third, the fine-grained nature of Stage 3 classification introduces additional complexity, which requires the model to distinguish between closely related attack behaviors with subtle differences. These challenges are reflected in the confusion patterns, where misclassifications tend to occur among structurally similar classes. Overall, the proposed system in Stage 3 demonstrated strong performance in fine-grained multi-class classification, maintaining high accuracy and stable generalization across datasets. While performance was slightly lower than in Stage 2, this is expected due to the increased classification complexity (15 classes versus 6 groups). Despite this, the ROC curves in Figure 7b underscore the model’s robust separability; a micro-average AUC of 1.000 indicates that even when individual recall varies, the model retains an excellent overall ability to distinguish between attack categories.

Results and Discussion

Figure 8 illustrates the performance comparison between the proposed enhanced XGBoost model and several baseline ML approaches. Our model significantly outperformed all baselines, and it achieved a 98% accuracy, 91% F1 score, 88% recall, and 93% precision compared to the baseline XGBoost, which achieved 90% accuracy, a 76% F1 score, and 74% recall. The enhanced model showed significant improvement in terms of recall of 14% and F1 score of 15%. This is primarily due to the combination of probability calibration and automated tuning, which optimizes class discrimination. The proposed approach effectively addresses the challenge of multi-class intrusion detection by capturing intricate patterns and minimizing misclassification between similar attack types. Deep learning models such as MLP, DNN, CNN, and LSTM achieved moderate performance (91–93%) accuracy, but their F1 scores and recall were lower (66–73%). The hybrid AE + CNN + LSTM model improved performance slightly in terms of F1 score at 76%, but still fell short of the proposed model. This suggests that increasing the architectural complexity does not necessarily translate to a better performance in highly structured IIoT tabular data. SVM and LightGBM performed poorly with a 42% and 8% F1 score, respectively, which reflects their limitations in handling high-dimensional, and multi-class IIoT traffic data. Overall, the results demonstrate that the enhanced XGBoost model had the most reliable balance across all metrics, which makes it highly suitable for detailed intrusion detection in IIoT environments. To further strengthen the comparison with recent hybrid deep learning approaches, a CNN–GRU model was implemented and evaluated on Stage 3. The model achieved an overall accuracy of 94% with a macro F1 score of 0.70. While the CNN–GRU model demonstrated strong performance on the majority classes such as normal traffic and high-volume DDoS attacks, it exhibited significant performance degradation on minority and complex attack classes. For instance, classes such as fingerprinting, XSS, and password attacks achieved F1 scores below 0.45, which indicates poor generalization in imbalanced scenarios. In contrast, the proposed enhanced XGBoost model achieved substantially higher performance, with 98% accuracy and a macro F1 score of 0.91. This represents an improvement of approximately 21% in macro F1 score compared to CNN–GRU. The results highlight the effectiveness of the proposed framework in handling class imbalance and improving the detection of minority attack classes. Furthermore, despite the architectural complexity of CNN–GRU, it did not outperform the proposed model, while requiring higher computational resources and longer training time. These findings confirm that the enhanced XGBoost model provides a better balance between performance, robustness, and computational efficiency for IIoT intrusion detection.

4.4.4. Ablation Study

To evaluate the contribution of each component in the proposed framework, an ablation study on Stage 3 was conducted by incrementally incorporating key modules including class balancing, threshold optimization, and probability calibration. The results in Table 7 demonstrate a clear and consistent improvement across all evaluation metrics as each component is introduced. Starting from the baseline XGBoost model, the incorporation of class weighting significantly improved the macro F1 score from 0.67 to 0.76 and recall from 0.63 to 0.74, which highlights its effectiveness in addressing class imbalance and enhances the detection of minority attack classes. The addition of threshold optimization further refined decision boundaries, which provided incremental gains in both F1 score and recall.

Moreover, probability calibration provided substantial improvements, with isotonic calibration outperforming sigmoid calibration, which increased the F1 score to 0.89 and recall to 0.86. This indicates that well-calibrated probabilities lead to more reliable and discriminative predictions. Finally, the fully enhanced model achieved the best overall performance, with an accuracy of 98.15%, macro F1 score of 0.91, and recall of 0.88. These results confirm that the performance gains are not attributed to a single component, but rather to the collaborative integration of multiple optimization strategies within the proposed framework.

4.4.5. Computational Complexity and Efficiency Analysis

The computational efficiency of the proposed framework is an important factor for its practical applicability in IIoT intrusion detection scenarios. Tree-based models generally require lower computational resources compared to deep neural networks, particularly when handling structured tabular data. In this work, the XGBoost algorithm was adopted due to its scalability and efficient training mechanism based on gradient boosting over decision trees. From a complex perspective, the training process of XGBoost scales approximately linearly with respect to the number of training instances and features, making it suitable for large-scale IIoT datasets. Although the proposed framework incorporates additional components such as hyperparameter optimization, class balancing, and probability calibration, these steps are performed offline during the training phase and therefore do not affect inference efficiency. To provide empirical evidence, runtime measurements were conducted across different stages of the pipeline. The preprocessing stage, including feature scaling, required 0.1966 s, indicating minimal overhead. The training process of the enhanced XGBoost model required 122.889 s, which is reasonable given the dataset size and the inclusion of multiple optimization steps. During inference, the model demonstrated high efficiency with a total prediction time of 0.9246 s for the full test set, which corresponded to an average latency of approximately 0.0293 ms/sample. This low per-sample latency indicates that the model is capable of performing fast predictions suitable for time-sensitive intrusion detection tasks. Furthermore, the hierarchical multi-stage design contributes to computational efficiency by filtering a large portion of normal traffic in Stage 1, thereby reducing the number of samples that require more detailed analysis in subsequent stages. This selective processing mechanism reduces unnecessary computations and improves overall scalability. It should be noted that system-level metrics such as memory consumption and CPU utilization were not explicitly measured in this study. These factors are important for validating deployment in resource-constrained environments. Therefore, future work will include detailed profiling and evaluation of representative edge devices to further assess deployment feasibility. Overall, the proposed framework achieves a favorable balance between detection performance and computational efficiency, which makes it well-suited for real-time IIoT cybersecurity applications.

5. Limitations and Future Work

Despite the strong performance achieved by the proposed hierarchical intrusion detection framework, several limitations should be acknowledged.

5.1. Limitations

First, the framework was evaluated in an offline experimental setting using a workstation environment. Although the reported inference latency demonstrated high computational efficiency, the model was not validated on resource-constrained edge devices or within streaming IIoT environments. Second, the framework relies on random oversampling to address class imbalance. While this approach is computationally efficient, it may introduce duplicated samples and does not fully capture the diversity of minority attack patterns, which can affect performance for rare classes. Third, the fine-grained multi-class classification (Stage 3) remains challenging. Lower recall observed in certain attack categories (e.g., fingerprinting, MITM, and XSS) was primarily due to feature overlaps, limited sample diversity, and the inherent complexity of distinguishing closely related attack behaviors. Fourth, the current framework operates under a static learning assumption and does not explicitly address temporal distribution shifts (concept drift), which may affect long-term performance in dynamic IIoT environments. Finally, the experimental evaluation was conducted on a single benchmark dataset, and additional validation across diverse real-world datasets is required to further confirm the generalizability of the proposed approach.

5.2. Future Work

Future research will focus on extending the proposed framework along several key directions. First, more advanced data augmentation techniques, such as GAN-based approaches, will be investigated to improve minority class representation and enhance robustness under severe class imbalance. Second, the framework will be evaluated on resource-constrained edge platforms (e.g., Raspberry Pi and NVIDIA Jetson) to measure inference latency, memory usage, and energy efficiency under realistic IIoT deployment conditions. Third, online learning and drift-aware mechanisms will be incorporated to enable the continuous adaptation of changing IIoT traffic patterns including statistical drift detection and incremental model updates. Fourth, additional comparisons with state-of-the-art deep learning architectures, including attention-based and hybrid models, will be conducted to further validate performance. Finally, future work will explore enhanced feature representation and validation strategies, including improved feature engineering and more rigorous evaluation protocols (e.g., nested validation), to further strengthen model robustness and reliability.

6. Conclusions

In this study, an efficient multi-stage intrusion detection framework for IIoT environments was developed based on an enhanced XGBoost model for multi-class cyber threat detection. The proposed framework integrates data preprocessing, class imbalance handling, hyperparameter optimization, probability calibration, and class-specific decision thresholds within a unified pipeline to address the challenges of imbalanced and heterogeneous IIoT network traffic. Experimental evaluation on the Edge-IIoTset benchmark dataset demonstrated that the proposed approach achieved consistent performance improvements across all classification stages. Specifically, the model improved recall by 10%, 21%, and 14%, and macro F1 score by 6%, 13.6%, and 15% in Stage 1, Stage 2, and Stage 3, respectively. In addition, the model exhibited stable generalization, with minimal performance variation between the validation and test sets. Furthermore, the computational analysis indicates that the proposed framework supports low-latency inference, making it suitable for time-sensitive IIoT monitoring scenarios. Hierarchical multi-stage design also contributes to efficiency by progressively filtering and refining classification decisions across stages. Overall, the results demonstrate that the proposed framework provides an effective and scalable solution for multi-class cyber threat detection in IIoT networks.

Author Contributions

Conceptualization, A.A.A. and T.A.A.A.; Methodology, A.A.A.; Software, A.A.A. and T.A.A.A.; Validation, A.A.A. and T.A.A.A.; Formal analysis, T.A.A.A.; Investigation, A.A.A.; Resources, A.A.A.; Data curation, A.A.A. and T.A.A.A.; Writing—original draft preparation, A.A.A. and T.A.A.A.; Writing—review and editing, A.A.A. and T.A.A.A.; Visualization, A.A.A.; Supervision, A.A.A.; Project administration, A.A.A.; Funding acquisition, A.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This Project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia under grant no. (IPP:621-830-2025). The authors, therefore, acknowledge with thanks DSR for technical and financial support.

Data Availability Statement

The dataset used in this study (Edge-IIoTset) is publicly available from the sources cited in the manuscript. The implementation code, including preprocessing, training, and evaluation scripts are available on request from the corresponding author due to institutional policies regarding code sharing.

Acknowledgments

This Project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia under grant no. (IPP:621-830-2025). The authors, therefore, acknowledge with thanks DSR for technical and financial support. During the preparation of this study, the author(s) used Grammarly (v1.2.254.1880) for the purposes of paraphrasing. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guo, G.; Qamar, F.; Kazmi, S.H.A.; ur Rehman, M.H. Threat detection in the 6G enabled Industrial IoT Networks using Deep Learning: A review on the state-of-the-art solutions, challenges and future research directions. Internet Things 2025, 33, 101686. [Google Scholar] [CrossRef]
Ferrag, M.A.; Maglaras, L.; Moschoyiannis, S.; Janicke, H. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. J. Inf. Secur. Appl. 2020, 50, 102419. [Google Scholar] [CrossRef]
Alshamrani, A.; Myneni, S.; Chowdhary, A.; Huang, D. A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutor. 2019, 21, 1851–1877. [Google Scholar] [CrossRef]
Yan, P.; Khoei, T.T. Securing the internet of things: A comprehensive review of ransomware attacks, detection, countermeasures, and future prospects. Frankl. Open 2025, 11, 100256. [Google Scholar] [CrossRef]
Sasi, T.; Lashkari, A.H.; Lu, R.; Xiong, P.; Iqbal, S. A comprehensive survey on IoT attacks: Taxonomy, detection mechanisms and challenges. J. Inf. Intell. 2024, 2, 455–513. [Google Scholar] [CrossRef]
Dasari, A.K.; Bisawas, S.K.; Purkayastha, B. Enhanced Network Intrusion Detection Systems with Explainable Artificial Intelligence for Network Security. Int. J. Commun. Syst. 2025, 38, e70209. [Google Scholar] [CrossRef]
Xu, X.; Wang, X. An adaptive Network Intrusion Detection Method Based on PCA and Support Vector Machines. In Proceedings of the International Conference on Advanced Data Mining and Applications, Berlin/Heidelberg, Germany, 22–24 July 2025; Springer: Berlin/Heidelberg, Germany, 2005; pp. 696–703. [Google Scholar]
Li, J.; Othman, M.S.; Chen, H.; Yusuf, L.M. Optimizing IoT intrusion detection system: Feature selection versus feature extraction in machine learning. J. Big Data 2024, 11, 36. [Google Scholar] [CrossRef]
Houkan, A.; Sahoo, A.K.; Gochhayat, S.P.; Sahoo, P.K.; Liu, H.; Khalid, S.G.; Jain, P. Enhancing security in industrial IoT networks: Machine learning solutions for feature selection and reduction. IEEE Access 2024, 12, 160864–160883. [Google Scholar] [CrossRef]
Arreche, O.; Guntur, T.R.; Roberts, J.W.; Abdallah, M. E-xai: Evaluating black-box explainable ai frameworks for network intrusion detection. IEEE Access 2024, 12, 23954–23988. [Google Scholar] [CrossRef]
Orman, A. Cyberattack detection systems in industrial internet of things (IIoT) networks in big data environments. Appl. Sci. 2025, 15, 3121. [Google Scholar] [CrossRef]
Susilo, B.; Muis, A.; Sari, R.F. Intelligent Intrusion Detection System Against Various Attacks Based on a Hybrid Deep Learning Algorithm. Sensors 2025, 25, 580. [Google Scholar] [CrossRef]
Aldhaheri, A.; Alwahedi, F.; Ferrag, M.A.; Battah, A. Deep learning for cyber threat detection in IoT networks: A review. Internet Things Cyber-Phys. Syst. 2024, 4, 110–128. [Google Scholar]
Gueriani, A.; Kheddar, H.; Mazari, A.C. Adaptive cyber-attack detection in iiot using attention-based lstm-cnn models. In 2024 International Conference on Telecommunications and Intelligent Systems (ICTIS); IEEE: Djelfa, Algeria, 2024; pp. 1–6. [Google Scholar]
Rehman, Z.; Gondal, I.; Ge, M.; Dong, H.; Gregory, M.; Tari, Z. Proactive defense mechanism: Enhancing IoT security through diversity-based moving target defense and cyber deception. Comput. Secur. 2024, 139, 103685. [Google Scholar] [CrossRef]
Doghramachi, D.F.; Ameen, S.Y. Internet of Things (IoT) Security Enhancement Using XGboost Machine Learning Techniques. Comput. Mater. Contin. 2023, 77, 717–732. [Google Scholar] [CrossRef]
Alazab, M.; Khurma, R.A.; García-Arenas, M.; Jatana, V.; Baydoun, A.; Damaševičius, R. Enhanced threat intelligence framework for advanced cybersecurity resilience. Egypt. Inform. J. 2024, 27, 100521. [Google Scholar] [CrossRef]
Hu, Y.; Xiao, K.; Luo, L.; Chen, L. An XGBoost-Based Intrusion Detection Framework with Interpretability Analysis for IoT Networks. Appl. Sci. 2026, 16, 980. [Google Scholar] [CrossRef]
Alenazi, M.; Mishra, S. Cyberatttack detection and classification in IIoT systems using XGBoost and Gaussian Naïve Bayes: A comparative study. Eng. Technol. Appl. Sci. Res. 2024, 14, 15074–15082. [Google Scholar] [CrossRef]
Abdullahi, M.; Alhussian, H.; Aziz, N.; Abdulkadir, S.J.; Alwadain, A.; Muazu, A.A.; Bala, A. Comparison and investigation of AI-based approaches for cyberattack detection in cyber-physical systems. IEEE Access 2024, 12, 31988–32004. [Google Scholar] [CrossRef]
Alashjaee, A.M.; Alqahtani, F. Enhanced intrusion detection system IoT network security model by feed forward neural network and machine learning. Sci. Rep. 2025, 15, 36085. [Google Scholar] [CrossRef]
Le, T.-T.-H.; Oktian, Y.E.; Kim, H. XGBoost for Imbalanced Multiclass Classification-Based Industrial Internet of Things Intrusion Detection Systems. Sustainability 2022, 14, 8707. [Google Scholar] [CrossRef]
Binsaeed, K.A.; Hafez, A.M. Enhancing Intrusion Detection Systems with XGBoost Feature Selection and Deep Learning Approaches. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 2023, 14., 1084–1098. [Google Scholar] [CrossRef]
Altunay, H.C.; Albayrak, Z. A hybrid CNN+ LSTM-based intrusion detection system for industrial IoT networks. Eng. Sci. Technol. Int. J. 2023, 38, 101322. [Google Scholar] [CrossRef]
Khan, M.A.; Karim, M.R.; Kim, Y. A Scalable and Hybrid Intrusion Detection System Based on the Convolutional-LSTM Network. Symmetry 2019, 11, 583. [Google Scholar] [CrossRef]
Anuja, R.; Annrose, J. End-to-end deep learning for smart maritime threat detection: An AE–CNN–LSTM-based approach. Sci. Rep. 2025, 15, 36316. [Google Scholar] [CrossRef]
Talukder, M.A.; Sharmin, S.; Uddin, M.A.; Islam, M.M.; Aryal, S. MLSTL-WSN: Machine learning-based intrusion detection using SMOTETomek in WSNs. Int. J. Inf. Secur. 2024, 23, 2139–2158. [Google Scholar] [CrossRef]
Sayegh, H.R.; Dong, W.; Al-madani, A.M. Enhanced intrusion detection with LSTM-based model, feature selection, and SMOTE for imbalanced data. Appl. Sci. 2024, 14, 479. [Google Scholar] [CrossRef]
Alotaibi, Y.; Ilyas, M. Ensemble-learning framework for intrusion detection to enhance internet of things devices security. Sensors 2023, 23, 5568. [Google Scholar] [CrossRef]
Aldaej, A.; Ullah, I.; Ahanger, T.A.; Atiquzzaman, M. Ensemble technique of intrusion detection for IoT-edge platform. Sci. Rep. 2024, 14, 11703. [Google Scholar] [CrossRef]
Dina, A.S.; Siddique, A.B.; Manivannan, D. A deep learning approach for intrusion detection in Internet of Things using focal loss function. Internet Things 2023, 22, 100699. [Google Scholar] [CrossRef]
Gupta, N.; Jindal, V.; Bedi, P. CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems. Comput. Secur. 2022, 112, 102499. [Google Scholar] [CrossRef]
Khanam, S.; Ahmedy, I.; Idris, M.Y.I.; Jaward, M.H. Towards an effective intrusion detection model using focal loss variational autoencoder for internet of things (IoT). Sensors 2022, 22, 5822. [Google Scholar] [CrossRef]
Hasan, T.; Tasnim, S. Multidimensional feature learning enhancement in iot intrusion detection: An adaptive cost-sensitive autoencoder and weighted ensemble approach. In 2024 IEEE 10th World Forum on Internet of Things (WF-IoT); IEEE: New York, NY, USA, 2024; pp. 536–541. [Google Scholar]
Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning, IEEE DataPort, 2022. Available online: https://ieee-dataport.org/documents/edge-iiotset-new-comprehensive-realistic-cyber-security-dataset-iot-and-iiot-applications (accessed on 27 April 2026).
Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning. IEEE Access 2022, 10, 40281–40306. [Google Scholar] [CrossRef]
Kumar, D.; Pawar, P.P.; Addula, S.R.; Meesala, M.K.; Oni, O.; Cheema, Q.N.; Haq, A.U.; Sajja, G.S. AI-Powered security for IoT ecosystems: A hybrid deep learning approach to anomaly detection. J. Cybersecur. Priv. 2025, 5, 90. [Google Scholar] [CrossRef]

Figure 1. Cyber threats in the IIoT environment.

Figure 2. System design for enhanced XGBoost.

Figure 3. Stage 1: Binary classification performance. (a) Confusion matrix; (b) ROC curve.

Figure 4. Stage 1 comparison between enhanced XGBoost and baseline ML mechanisms.

Figure 5. Stage 2: Group classification performance. (a) Confusion matrix; (b) ROC curve.

Figure 6. Stage 2 comparison between enhanced XGBoost and baseline ML mechanisms.

Figure 7. Stage 3: Multiclass classification performance. (a) Confusion matrix; (b) ROC curve.

Figure 8. Stage 3 comparison between enhanced XGBoost and baseline ML mechanisms.

Table 1. Distribution of attacks in the Edge-IIoTset dataset.

Class Name	Total	Training (60%)	Validation (20%)	Testing (20%)
Normal	1,615,643	969,385	323,129	323,129
DDoS_UDP	121,568	72,941	24,313	24,314
DDoS_ICMP	116,436	69,862	23,287	23,287
Ransomware	10,925	6555	2185	2185
DDoS_HTTP	49,911	29,947	9982	9982
SQL_injection	51,203	30,722	10,240	10,241
Uploading	37,634	22,580	7527	7527
DDoS_TCP	50,062	30,037	10,013	10,012
Backdoor	24,862	14,917	4973	4972
Vulnerability_scanner	50,110	30,066	10,022	10,022
Port_Scanning	22,564	13,538	4513	4513
XSS	15,915	9549	3183	3183
Password	50,153	30,092	10,030	10,031
MITM	1214	728	243	243
Fingerprinting	1001	601	200	200
Total	2,219,201	1,331,520	443,840	443,841

Table 2. Configuration of enhanced XGBoost threat scoring model.

Parameters	Configuration
Number of Classes	15
Training/Validation/Test Split	60% training, 20% validation, 20% testing
Feature Scaling	StandardScaler applied to numeric features
Class Imbalance Handling	RandomOverSampler (training set only)
Hyperparameter Optimization	Optuna Bayesian optimization (80 trials)
Tuned Hyperparameters	learning_rate = 0.025, max_depth = 9, gamma = 0.52, subsample = 0.89, colsample_bytree = 0.8, reg_alpha = 0.194, reg_lambda = 2.25, n_estimators = 387)
Evaluation Metric (Tuning)	Macro-averaged F1 score on validation
Threshold Optimization	Per-class thresholds selected over grid [0.05, 0.95] to maximize macro F1
Probability Calibration	CalibratedClassifierCV with Isotonic and Sigmoid methods
Final Test Evaluation Metrics	Precision, Recall, F1 score, Accuracy
Bootstrap Confidence Intervals	95% CI estimated with 500 resamples

Table 3. Performance metrics.

Metric	Configuration
Accuracy (AC)	(TP + TN)/(TP + TN + FP + FN)
Precision (PR)	TP/(TP + FP)
Recall (RE)	TN/(TN + FN)
F1 Score (F1)	2 × TP/(2 × TP + FP + FN)

Table 4. Training, validation and testing results of Stage 1 binary traffic attack.

Dataset	Accuracy	Macro F1	Attack Recall	Attack Precision
Train	0.9955	0.9918	0.98	1.00
Validation	0.9948	0.9904	0.98	1.00
Test	0.9950	0.9907	0.98	1.00

Table 5. Training, validation and testing results of Stage 2 group classification.

Dataset	Accuracy	Macro F1	Attack Recall	Attack Precision
Train	0.9866	0.9463	0.93	0.97
Validation	0.9800	0.9150	0.90	0.94
Test	0.9799	0.9148	0.90	0.94

Table 6. Training, validation and testing results of Stage 3 multiclass classification.

Dataset	Accuracy	Macro F1	Attack Recall	Attack Precision
Train	0.9869	0.92	0.90	0.96
Validation	0.9821	0.91	0.89	0.93
Test	0.9815	0.91	0.88	0.93

Table 7. Ablation performance evaluation.

Model	Accuracy	Macro F1	Attack Recall	Attack Precision
Baseline XGBoost	0.86	0.67	0.63	0.72
Class Weighted	0.92	0.75	0.74	0.80
Threshold Optimization	0.90	0.76	0.74	0.80
Sigmoid Calibration	0.95	0.88	0.85	0.89
Isotonic Calibration	0.96	0.89	0.86	0.90
Final Enhanced Model	0.9815	0.91	0.88	0.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ahmed, A.A.; Abdullah, T.A.A. An Enhanced XGBoost-Based Framework for Efficient Multi-Class Cyber Threat Detection in Industrial IoT Networks. Technologies 2026, 14, 274. https://doi.org/10.3390/technologies14050274

AMA Style

Ahmed AA, Abdullah TAA. An Enhanced XGBoost-Based Framework for Efficient Multi-Class Cyber Threat Detection in Industrial IoT Networks. Technologies. 2026; 14(5):274. https://doi.org/10.3390/technologies14050274

Chicago/Turabian Style

Ahmed, Adel A., and Talal A. A. Abdullah. 2026. "An Enhanced XGBoost-Based Framework for Efficient Multi-Class Cyber Threat Detection in Industrial IoT Networks" Technologies 14, no. 5: 274. https://doi.org/10.3390/technologies14050274

APA Style

Ahmed, A. A., & Abdullah, T. A. A. (2026). An Enhanced XGBoost-Based Framework for Efficient Multi-Class Cyber Threat Detection in Industrial IoT Networks. Technologies, 14(5), 274. https://doi.org/10.3390/technologies14050274

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Enhanced XGBoost-Based Framework for Efficient Multi-Class Cyber Threat Detection in Industrial IoT Networks

Abstract

1. Introduction

1.1. Cyber Attacks in IIoT Environments

1.2. Problem Statement and IIoT Challenges

1.3. Research Contribution

2. Related Works on Detection of Cyber Threats on IIoT

3. System Design of Cyber Threat Detection Algorithm

3.1. Dataset Selection and Preprocessing

3.1.1. Partitioning Method

3.1.2. Handling Imbalance and Feature Scaling

3.2. Automated Hyperparameter Optimization

3.3. Training of the Enhanced XGBoost Ensemble

3.4. Probability Calibration and Threshold Optimization

3.5. Model Evaluation

3.6. Statistical Validation and Reliability Analysis

3.7. Framework Operation and Workflow

4. Implementation and Performance Evaluation

4.1. Dataset and Preprocessing: Edge-IIoTset

4.2. Evaluation Metrics

4.3. Baseline Models and Comparison Protocol

4.4. Performance Evaluation and Comparison

4.4.1. Binary Attack Detection

4.4.2. Group Classification

4.4.3. Multiclass Classification

4.4.4. Ablation Study

4.4.5. Computational Complexity and Efficiency Analysis

5. Limitations and Future Work

5.1. Limitations

5.2. Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI