ROSE-BOX: A Lightweight and Efficient Intrusion Detection Framework for Resource-Constrained IIoT Environments †
Abstract
1. Introduction
- We propose a hybrid feature selection framework that synergistically combines Random Forest and XGBoost algorithms to identify and rank the most salient features from high-dimensional IIoT datasets. By eliminating redundant and non-informative features, this approach significantly reduces computational overhead and enhances detection accuracy, making it highly suitable for deployment in resource-constrained IIoT environments.
- We incorporate the Synthetic Minority Oversampling Technique (SMOTE) [13] to address the class imbalance inherent in intrusion detection datasets and Bayesian optimization (BO) [9] to fine-tune the hyperparameters of XGBoost, further optimizing detection performance while minimizing computational demands.
- The experimental results indicate that the proposed ROSE-BOX framework [14] consistently achieves detection accuracies exceeding 99.85%, markedly surpassing traditional intrusion detection systems. Moreover, ROSE-BOX substantially reduces latency and computational overhead, rendering it highly suitable for real-time deployment in resource-constrained Industrial Internet of Things (IIoT) environments.
2. Related Work
- Addressing resource-constrained scenarios: Given the stringent resource limitations of IIoT devices, we proposed a framework for minimizing computational complexity in IIoT systems. By focusing on efficient feature selection and model pruning techniques, we ensure that the processing overhead is drastically reduced.
- Ultra-low latency processing: In an IIoT environment, where communications between devices and the cloud are measured in milliseconds, the ability to perform instant threat detection is vital. Without real-time deployment, a delayed response could allow malicious activities to spread unchecked, potentially leading to critical system failures. Each data sample is processed within microseconds, which is crucial for maintaining real-time responsiveness.
- Scalable adaptability and lightweight deployment: Using XGBoost-assisted Random Forest for feature selection and Bayesian optimization for hyperparameter tuning, the low computational demand of ROSE-BOX makes it ideally suited for deployment across a wide array of devices, from industrial controllers to edge computing nodes and embedded systems. This scalability, coupled with a reduction in required computing resources, allows for broad deployment without overburdening system resources or incurring significant operational costs.
3. Methodology
3.1. Overall Framework
3.2. The XGBoost Model
3.3. The ROSE-BOX Algorithm
Algorithm 1: The detection algorithm ROSE-BOX |
4. Experimental Results
4.1. Datasets
- CIC-IDS2017: The CIC-IDS2017 dataset simulates benign and attack traffic during normal working hours, capturing common threats such as DoS, PortScan, and Brute Force attacks. The total number of flow records after preprocessing is 692,703.
- CSE-CIC-IDS2018: The CSE-CIC-IDS2018 dataset contains traffic from various realistic attack scenarios, including infiltration, botnets, and web attacks. It comprises 1,048,574 flow records and was processed using CICFlowMeter to extract standardized network features.
- CIC-DDoS2019: The CIC-DDoS2019 dataset is designed to capture DDoS attack patterns targeting multiple protocols (e.g., HTTP, TCP, UDP); this dataset reflects high-volume traffic characteristics. After preprocessing, it includes 431,371 records.
4.2. Evaluation Metrics
4.3. The Results of Feature Selection and Bayesian Optimization
4.4. Baseline Methods for Intrusion Detection
- KNN [24]: K-Nearest Neighbors (KNNs) classifies new traffic instances by measuring similarities (e.g., Euclidean distance) to labeled examples, making it straightforward to detect anomalies in IIoT flows.
- RF [25]: Random Forest (RF) leverages ensemble learning with multiple decision trees through majority voting, mitigates overfitting via feature randomness and bootstrap aggregation (bagging), and provides robust anomaly detection by evaluating feature importance.
- AdaBoost [10]: Adaptive Boosting (AdaBoost) builds a strong classifier by sequentially weighting misclassified samples and combining weak learners (typically decision stumps), which improves the detection of rare attack types in imbalanced IIoT datasets.
- DNN [26]: Deep neural networks (DNNs) leverage multiple fully connected layers to connect a large number of neurons to learn hierarchical representations of network features, enabling the detection of complex attack signatures.
- LSTM [11]: Long short-term memory (LSTM) is a recurrent architecture designed to capture long-range temporal dependencies in network traffic.
- Transformer [34]: Transformer embeds each categorical flow attribute into dense vectors and processes them through Transformer encoder layers to produce contextualized feature representations, which are then concatenated with numeric features and passed to an MLP that discriminates among multiple attack and normal classes.
- CNN-LSTM [7]: CNN-LSTM combines convolutional neural networks (CNNs) for spatial feature extraction (e.g., packet-level patterns) with LSTM layers for temporal analysis, yielding enhanced detection accuracy in IIoT attack classification.
- XGBoost [12]: eXtreme Gradient Boosting (XGBoost) is an optimized gradient-boosted tree ensemble that incorporates regularization, sparsity awareness, and parallel tree construction, consistently achieving state-of-the-art accuracy in multi-class IIoT intrusion detection benchmarks.
- XGB-RF [35]: XGBoost with RF (XGB-RF) is the proposed ROSE-BOX model without SMOTE and Bayesian optimization, that is, XGBoost-assisted Random Forest, as mentioned above. When RF is used to rank features for the IIoT datasets, XGBoost applies regularized gradient-boosted trees on the selected subset for high-precision anomaly detection simultaneously.
4.5. Analysis of Detection Results
4.6. Complexity Analysis
5. Conclusions and Future Direction
- Feature selection optimization: Leveraging XGBoost-assisted Random Forest to select the most relevant features, reducing data dimensionality while maintaining high detection performance;
- Class Imbalance handling: Applying SMOTE to balance intrusion datasets, improving model robustness against rare but critical cyberattacks;
- Computational efficiency: Utilizing Bayesian optimization to fine-tune XGBoost, ensuring superior anomaly detection with minimal resource consumption;
- Scalability for IIoT applications: Demonstrating feasibility for deployment in industrial control systems, edge computing platforms, and embedded security solutions.
- Feature Selection:
- Limitation: The Random Forest and XGBoost-based feature selection might not generalize well to datasets with significantly different feature distributions or when new types of attacks are introduced;
- Future Work: Further research is needed to develop more adaptive feature selection mechanisms that can dynamically adjust to evolving attack patterns and data distributions.
- Class Imbalance Handling:
- Limitation: SMOTE may introduce synthetic samples that do not fully capture the complexity of real-world attack instances. This could potentially lead to overfitting on synthetic data and reduced generalization to unseen data.
- Future work: Exploring advanced resampling techniques or hybrid methods that combine oversampling with other strategies (e.g., undersampling majority classes) could improve model robustness and generalization.
- Computational Efficiency and Real-Time Deployment:
- Limitation: The computational overhead of Bayesian optimization and XGBoost parameter tuning might still be prohibitive for extremely resource-constrained devices, such as low-power IoT sensors;
- Future work: Investigating more lightweight optimization algorithms and model architectures that can further reduce computational requirements without compromising detection accuracy is essential.
- Scalability and Deployment:
- Limitation: The framework’s scalability in large-scale IIoT environments with thousands of devices and high-frequency data streams has not been extensively tested. Certain types of attacks, such as zero-day, stealthy, multi-stage, and false data injection attacks, remain challenging due to their sophisticated nature and ability to evade detection.
- Future work: Developing distributed implementations of ROSE-BOX; exploring federated learning integration, deployment on resource-constrained devices, and online learning mechanisms for evolving IIoT threats in industrial control systems; and enhancing ROSE-BOX with advanced detection techniques, continuous learning mechanisms, and context-aware feature selection processes to improve its robustness and adaptability to evolving threats.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mohammadian, H.; Ghorbani, A.A.; Lashkari, A.H. A gradient-based approach for adversarial attack on deep learning-based network intrusion detection systems. Appl. Soft Comput. 2023, 137, 110173. [Google Scholar] [CrossRef]
- Ahmim, A.; Maazouzi, F.; Ahmim, M.; Namane, S.; Dhaou, I.B. Distributed denial of service attack detection for the internet of things using hybrid deep learning model. IEEE Access 2023, 11, 119862–119875. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Maglaras, L.; Moschoyiannis, S.; Janicke, H. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. J. Inf. Secur. Appl. 2020, 50, 102419. [Google Scholar] [CrossRef]
- Alani, M.M.; Awad, A.I. An intelligent two-layer intrusion detection system for the internet of things. IEEE Trans. Ind. Inform. 2023, 19, 683–692. [Google Scholar] [CrossRef]
- Hindy, H.; Atkinson, R.; Tachtatzis, C.; Colin, J.N.; Bayne, E.; Bellekens, X. Utilising deep learning techniques for effective zero-day attack detection. Electronics 2020, 9, 1684. [Google Scholar] [CrossRef]
- Ring, M.; Wunderlich, S.; Scheuring, D.; Landes, D.; Hotho, A. A survey of network-based intrusion detection data sets. Comput. Secur. 2019, 86, 147–167. [Google Scholar] [CrossRef]
- Halbouni, A.; Gunawan, T.S.; Habaebi, M.H.; Halbouni, M.; Kartiwi, M.; Ahmad, R. Cnn-lstm: Hybrid deep neural network for network intrusion detection system. IEEE Access 2022, 10, 99837–99849. [Google Scholar] [CrossRef]
- Dai, W.; Li, X.; Ji, W.; He, S. Network intrusion detection method based on cnn, bilstm, and attention mechanism. IEEE Access 2024, 12, 53099–53111. [Google Scholar] [CrossRef]
- Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; De Freitas, N. Taking the human out of the loop: A review of bayesian optimization. Proc. IEEE 2015, 104, 148–175. [Google Scholar] [CrossRef]
- Yulianto, A.; Sukarno, P.; Suwastika, N.A. Improving adaboost-based intrusion detection system (ids) performance on cicids 2017 dataset. J. Phys. Conf. Ser. 2019, 1192, 012018. [Google Scholar] [CrossRef]
- Jeong, H.W.; Kim, H.G.; Choi, Y.H. Lstm-based network intrusion detection system and solving data imbalance problem through gan. In Proceedings of the 2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 18–21 February 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1156–1159. [Google Scholar]
- Kang, Y.; Tan, M.; Lin, D.; Zhao, Z. Intrusion detection model based on autoencoder and xgboost. J. Phys. Conf. Ser. 2022, 2171, 012053. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Peng, S.; Han, Y.; Liang, X.; Yang, C.; Gui, W.; Zhou, N. Rose-box: An approach for intrusion detection in industrial internet of things. In Proceedings of the 2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Kaifeng, China, 30 October–2 November 2024; pp. 2276–2277. [Google Scholar] [CrossRef]
- Li, R.; Qin, Y.; Wang, C.; Li, M.; Chu, X. A Blockchain-Enabled Framework for Enhancing Scalability and Security in IIoT. IEEE Trans. Ind. Inform. 2023, 19, 7389–7400. [Google Scholar] [CrossRef]
- Li, R.; Qin, Y.; Liu, J.; Chu, X.; Li, J. Multipath Based Congestion Propagation via Information Network Interaction in IIoT. IEEE Trans. Ind. Inform. 2024, 20, 8512–8523. [Google Scholar] [CrossRef]
- Li, J.; Li, R.; Xu, L. Multi-stage deep residual collaboration learning framework for complex spatial–temporal traffic data imputation. Appl. Soft Comput. 2023, 147, 110814. [Google Scholar] [CrossRef]
- Li, J.; Xu, L.; Li, R.; Wu, P.; Huang, Z. Deep Spatial-temporal Bi-directional Residual Optimisation based on Tensor Decomposition for Traffic data Imputation on Urban Road Network. Appl. Intell. 2022, 52, 11363–11381. [Google Scholar] [CrossRef]
- Depren, O.; Topallar, M.; Anarim, E.; Ciliz, M.K. An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks. Expert Syst. Appl. 2005, 29, 713–722. [Google Scholar] [CrossRef]
- Perdisci, G.; Roberto, G.G.; Lee, W. Using an Ensemble of One-Class SVM Classifers to Harden Payload-BasedAnomaly Detection Systems. In Proceedings of the 6th International Conference on Data Mining (ICDM’06), New York, NY, USA, 18–22 December 2006; pp. 488–498. [Google Scholar]
- Amin, S.O.; Siddiqui, M.S.; Hong, C.S.; Choe, J. A novel coding scheme to implement signature based IDS in IP based sensor networks. In Proceedings of the IFIP/IEEE International Symposium on Integrated Network Management–Workshops, New York, NY, USA, 1–5 June 2009; pp. 269–274. [Google Scholar]
- Li, Z.; Li, X. Intrusion detection method based on genetic algorithm of optimizing lightgbm. In Proceedings of the 2021 5th International Conference on Electronic Information Technology and Com-puter Engineering, Kunming, China, 29–31 October 2021; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1366–1371. [Google Scholar] [CrossRef]
- Jemili, F.; Meddeb, R.; Korbaa, O. Intrusion detection based on ensemble learning for big data classification. Clust. Comput. 2024, 27, 3771–3798. [Google Scholar] [CrossRef]
- Siddartha, V.S.; Nagalakshmi, T. Performance analysis of an intrusion detection system for wireless adhoc network in the detection of dos attack using k-means cluster and k-nn algorithm. In AIP Conference Proceedings, Kanyakumari, India, 9–10 December 2021; AIP Publishing: Melville, NY, USA, 21 November 2023. [Google Scholar]
- Chua, T.H.; Salam, I. Evaluation of machine learning algorithms in network-based intrusion detection using progressive dataset. Symmetry 2023, 15, 1251. [Google Scholar] [CrossRef]
- Vinayakumar, R.; Alazab, M.; Soman, K.P.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep learning approach for intelligent intrusion detection system. IEEE Access 2019, 7, 41525–41550. [Google Scholar] [CrossRef]
- Farhan, R.I.; Maolood, A.T.; Hassan, N. Performance analysis of flow-based attacks detection on cse-cic-ids2018 dataset using deep learning. Indones. J. Electr. Eng. Comput. Sci. 2020, 20, 1413–1418. [Google Scholar] [CrossRef]
- Kanimozhi, V.; Jacob, T.P. Artificial intelligence based net-work intrusion detection with hyper-parameter optimization tuning on the realistic cyber dataset cse-cic-ids2018 using cloud computing. In Proceedings of the 2019 International Conference on Communication and Signal Processing (ICCSP), Melmaruvathur, India, 4–6 April 2019; pp. 0033–0036. [Google Scholar] [CrossRef]
- Shieh, C.S.; Lin, W.W.; Nguyen, T.T.; Chen, C.H.; Horng, M.F.; Miu, D. Detection of unknown ddos attacks with deep learning and gaussian mixture model. Appl. Sci. 2021, 11, 5213. [Google Scholar] [CrossRef]
- Chartuni, A.; Márquez, J. Multi-classifier of ddos attacks in computer networks built on neural networks. Appl. Sci. 2021, 11, 10609. [Google Scholar] [CrossRef]
- Zolanvari, M.; Teixeira, M.A.; Jain, R. Effect of Imbalanced Datasets on Security of Industrial IoT Using Machine Learning. In Proceedings of the 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), Miami, FL, USA, 8–10 November 2018; pp. 112–117. [Google Scholar] [CrossRef]
- Imani, M.; Beikmohammadi, A.; Arabnia, H.R. Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels. Technologies 2025, 13, 88. [Google Scholar] [CrossRef]
- Brochu, E.; Cora, V.; Freitas, N. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv 2010, arXiv:1012.2599. [Google Scholar]
- Ke, D. Network intrusion detection based on feature selection and transformer. In Proceedings of the 2023 International Conference on Intelligent Communication and Computer Engineering (ICICCE), Changsha, China, 24–26 November 2023; pp. 23–28. [Google Scholar] [CrossRef]
- Wang, Z.; Yuan, F.; Li, R.; Zhang, M.; Luo, X. Hidden as link prediction based on random forest feature selection and gwo-xgboost model. Comput. Netw. 2025, 262, 111164. [Google Scholar] [CrossRef]
Method Category | Core Techniques | Latency | Cost | Scalability |
---|---|---|---|---|
Traditional methods | Signature-based IDS [21]; A threshold-based IDS [19,20] | Medium–High | Low | Poor |
ML-based methods | PCA/EFS + SMOTE + AdaBoost [10]; RFE + GAO-LightGBM [22]; KNN [24]; RF [25]; AE + XGBoost [12]; Ensembles [23] | Medium | Medium | Moderate |
DL-based methods | Autoencoders [5]; CNN [7,30]; Hybrid DNNs [26,27,28]; BiLSTM-GMM [29]; CNN-LSTM-Attention [8] | Medium–Low | High | Strong |
Ours | ROSE-BOX | Low | Medium–Low | Strong |
Dataset | Label | Size |
---|---|---|
CIC-IDS2017 | BENIGN | 440,031 |
DoS Hulk | 231,073 | |
DoS GoldenEye | 10,293 | |
DoS slowloris | 5796 | |
DoS Slowhttptest | 54,969 | |
Heartbleed | 11 | |
CSE-CIC-IDS2018 | DoS attacks-Hulk | 461,912 |
BENIGN | 446,772 | |
DoS attacks-SlowHTTPTest | 139,890 | |
CIC-DDoS2019 | DrDoS_NTP | 121,368 |
TFTP | 98,917 | |
Benign | 97,831 | |
Syn | 49,373 | |
UDP | 18,090 | |
DrDoS_UDP | 10,420 | |
UDP-lag | 8872 | |
MSSQL | 8523 | |
DrDoS_MSSQL | 6212 | |
DrDoS_DNS | 3669 | |
DrDoS_SNMP | 2717 | |
LDAP | 1906 | |
DrDoS_LDAP | 1440 | |
Portmap | 685 | |
NetBIOS | 644 | |
DrDoS_NetBIOS | 598 | |
UDPLag | 55 | |
WebDDoS | 51 |
Parameter | Range | CIC-IDS2017 | CIC-DDoS2019 |
---|---|---|---|
learning_rate | (0.01, 1.0) | 0.46206 | 0.2854 |
min_child_weight | (0, 10) | 10 | 3 |
max_depth | (3, 10) | 10 | 10 |
subsample | (0.5, 1.0) | 0.93196 | 0.86975 |
colsample_bytree | (0.5, 1.0) | 0.72514 | 0.70505 |
n_estimators | (50, 300) | 300 | 218 |
reg_alpha | (1 × 10−9, 1.0) | 4.23191 | 5.33418 |
reg_lambda | (1 × 10−9, 100) | 0.00068 | 7.24844 |
Algorithm | Accuray | Precision | Recall | F1 Score | AUROC |
---|---|---|---|---|---|
KNN [24] | 99.39% | 98.733% | 99.003% | 98.867% | 0.99409 |
RF [25] | 99.915% | 99.643% | 99.787% | 99.715% | 0.9988 |
AdaBoost [10] | 77.72% | 30.355% | 38.4% | 29.796% | 0.67736 |
DNN [26] | 98.958% | 93.342% | 99.243% | 95.732% | 0.99953 |
LSTM [11] | 83.126% | 34.383% | 30.966% | 31.859% | 0.73295 |
Transformer [34] | 98.767% | 98.582% | 98.739% | 98.651% | 0.99947 |
CNN-LSTM [7] | 98.909% | 96.51% | 99.223% | 97.876% | 0.99895 |
XGBoost [12] | 99.963% | 99.733% | 99.784% | 99.759% | 0.99787 |
XGB-RF [35] | 99.982% | 99.891% | 99.804% | 99.772% | 0.99989 |
ROSE-BOX (Ours) | 99.991% | 99.993% | 99.987% | 99.989% | 1.0 |
Algorithm | Accuray | Precision | Recall | F1 Score | AUROC |
---|---|---|---|---|---|
KNN [24] | 99.785% | 99.837% | 99.833% | 99.835% | 0.99999 |
RF [25] | 99.999% | 99.999% | 99.999% | 99.999% | 0.99999 |
AdaBoost [10] | 99.823% | 99.866% | 99.862% | 99.864% | 0.99999 |
DNN [26] | 99.837% | 99.877% | 99.872% | 99.875% | 0.99918 |
LSTM [11] | 13.602% | 40.89% | 33.534% | 12.853% | 0.74749 |
Transformer [34] | 99.84% | 99.88% | 99.875% | 99.877% | 0.99967 |
CNN-LSTM [7] | 99.102% | 98.353% | 99.624% | 98.91% | 0.99979 |
XGBoost [12] | 99.999% | 99.999% | 99.999% | 99.999% | 0.99999 |
ROSE-BOX (Ours) | 100% | 100% | 100% | 100% | 1.0 |
Algorithm | Accuray | Precision | Recall | F1 Score | AUROC |
---|---|---|---|---|---|
KNN [24] | 90.82% | 47.829% | 45.243% | 46.095% | 0.73662 |
RF [25] | 92.636% | 48.948% | 48.549% | 48.682% | 0.74064 |
AdaBoost [10] | 65.706% | 29.818% | 29.429% | 25.381% | 0.78124 |
DNN [26] | 93.192% | 50.796% | 50.335% | 47.265% | 0.99223 |
LSTM [11] | 59.624% | 15.618% | 15.87% | 14.194% | 0.77945 |
Transformer [34] | 93.422% | 53.859% | 50.672% | 46.547% | 0.95346 |
CNN-LSTM [7] | 99.814% | 58.8% | 57.566% | 57.278% | 0.99916 |
XGBoost [12] | 94.166% | 53.989% | 54.988% | 56.987% | 0.78616 |
XGB-RF [35] | 95.042% | 61.145% | 60.212% | 62.019% | 0.81572 |
ROSE-BOX (Ours) | 99.851% | 63.563% | 65.687% | 64.989% | 0.99395 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Peng, S.; Han, Y.; Li, R.; Liu, L.; Liu, J.; Gu, Z. ROSE-BOX: A Lightweight and Efficient Intrusion Detection Framework for Resource-Constrained IIoT Environments. Appl. Sci. 2025, 15, 6448. https://doi.org/10.3390/app15126448
Peng S, Han Y, Li R, Liu L, Liu J, Gu Z. ROSE-BOX: A Lightweight and Efficient Intrusion Detection Framework for Resource-Constrained IIoT Environments. Applied Sciences. 2025; 15(12):6448. https://doi.org/10.3390/app15126448
Chicago/Turabian StylePeng, Silin, Yu Han, Ruonan Li, Lichen Liu, Jie Liu, and Zhaoquan Gu. 2025. "ROSE-BOX: A Lightweight and Efficient Intrusion Detection Framework for Resource-Constrained IIoT Environments" Applied Sciences 15, no. 12: 6448. https://doi.org/10.3390/app15126448
APA StylePeng, S., Han, Y., Li, R., Liu, L., Liu, J., & Gu, Z. (2025). ROSE-BOX: A Lightweight and Efficient Intrusion Detection Framework for Resource-Constrained IIoT Environments. Applied Sciences, 15(12), 6448. https://doi.org/10.3390/app15126448