Diverse Machine Learning-Based Malicious Detection for Industrial Control System
Abstract
:1. Introduction
2. Preliminary
2.1. Autoencoder
2.2. Random Forest
2.3. K-Nearest Neighbor (KNN)
3. Proposed Scheme
3.1. Screening Analysis Module (SAM)
3.2. Classification Analysis Module (CAM)
- Classification process using random forest.
- Step 1. Upon receiving a recg packet, each trained decision tree within the random forest ensemble processes the packet independently and generates its classification output.
- Step 2. Following the collection of individual tree predictions, the algorithm applies Equation (4) to compute the frequency distribution of predicted classes, where the most frequently occurring class is designated as the final random forest classification result.
- Classification process using KNN.
- Step 1. The behavioral features xFE of the recg packet are input into the KNN algorithm, which references these features against the trained data distribution map to identify k nearest neighboring packets.
- Step 2. After identifying the k nearest packets, the algorithm performs statistical analysis on their respective classes as formulated in Equation (5), which implements majority voting to determine the most frequent class occurrence as the final classification prediction.
3.3. Agile Training Module (ATM)
- Step 1. When the number of unid packets in the buffer reaches a th threshold, the ATM is activated. These packets are carefully selected, specifically targeting packets not included in the initial training of the SAM and CAM. From this filtered collection, th packets are randomly sampled to constitute the agile training dataset.
- Step 2. The Autoencoder within the SAM undergoes self-evolution learning using these th genuine unidentified packets. Upon completion of the self-evolution learning phase, the enhanced SAM model is deployed into the current system architecture.
- Step 3. For CAM enhancement, both random forest and KNN models are retrained using an augmented dataset. This dataset combines the original training data with the newly selected th unidentified packets. Following the self-evolution process, the updated random forest and KNN models are integrated into the system architecture.
4. Experimental Results
4.1. Environment Information
4.1.1. Dataset Description and Preprocessing
4.1.2. Model Architectures
4.1.3. Objective Evaluation Metrics
4.2. Performance of DICS Framework Without ATM
4.2.1. Evaluation of SAM Without ATM
4.2.2. Evaluation of CAM Without ATM
4.3. Performance DICS Framework After Implementing ATM
4.3.1. Evaluation of SAM Under ATM Scenario
4.3.2. Evaluation of CAM Under ATM Scenario
5. Conclusions and Future Works
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mourtzis, D. 2-Industry 4.0 and Smart Manufacturing. In Manufacturing from Industry 4.0 to Industry 5.0, Advances and Applications; Elsevier: Amsterdam, The Netherlands, 2024; pp. 13–61. [Google Scholar]
- A Comprehensive Guide to Digital Transformation in Manufacturing. Available online: https://www.themanufacturer.com/articles/a-comprehensive-guide-to-digital-transformation-in-manufacturing/ (accessed on 10 March 2025).
- The 7 Top Smart Manufacturing Trends to Watch in 2025. Available online: https://archerpoint.com/smart-manufacturing-trends-for-2025/ (accessed on 10 March 2025).
- Lee, J.S.; Chen, Y.C.; Chew, C.J.; Hong, W.Z.; Fan, Y.Y.; Li, B. Constructing Gene Features for Robust 3D Mesh Zero-watermarking. JISA 2023, 73, 103414. [Google Scholar] [CrossRef]
- Chew, C.J.; Lee, W.B.; Sung, L.Z.; Chen, Y.C.; Wang, S.J.; Lee, J.S. Lawful Remote Forensics Mechanism with Admissibility of Evidence in Stochastic and Unpredictable Transnational Crime. IEEE TIFS 2024, 19, 5956–5970. [Google Scholar] [CrossRef]
- Chin, Y.C.; Hsu, C.L.; Lin, T.W.; Tsai, K.Y. Dynamic Trust Management Framework using Blockchain for Zero-Trust-based Authentication in BYOD Environments. Enterp. Inf. Syst. 2025, 19. Available online: https://www.tandfonline.com/doi/full/10.1080/17517575.2025.2457952 (accessed on 10 March 2025).
- Lee, J.S.; Chen, Y.C.; Chew, C.J.; Chen, C.L.; Huynh, T.N.; Kuo, C.W. CoNN-IDS: Intrusion Detection System based on Collaborative Neural Networks and Agile Training. Comput. Secur. 2022, 122, 102908. [Google Scholar] [CrossRef]
- ISA/IEC 62443 Series of Standards. Available online: https://www.isa.org/standards-and-publications/isa-standards/isa-iec-62443-series-of-standards (accessed on 10 March 2025).
- Dolezilek, D.; Gammel, D.; Fernandes, W. Cybersecurity based on IEC 62351 and IEC 62443 for IEC 61850 systems. In Proceedings of the 15th International Conference on Developments in Power System Protection (DPSP 2020), Liverpool, UK, 9–12 March 2020. [Google Scholar]
- Franceschett, A.L.; de Souza, P.R.A.; de Barros, F.L.P.; de Carvalho, V.R. A Holistic Approach—How to Achieve the State-of-art in Cybersecurity for a Secondary Distribution Automation Energy System Applying the IEC 62443 Standard. In Proceedings of the 2019 IEEE PES Innovative Smart Grid Technologies Conference—Latin America (ISGT Latin America), Gramado, Brazil, 15–18 September 2019. [Google Scholar]
- Siemens Gains IEC 62443 Certification for Secure System Integration Services. Available online: https://press.siemens.com/global/en/pressrelease/siemens-gains-iec-62443-certification-secure-system-integration-services (accessed on 10 March 2025).
- The NIST Cybersecurity Framework (CSF) 2.0. Available online: https://nvlpubs.nist.gov/nistpubs/CSWP/NIST.CSWP.29.pdf (accessed on 10 March 2025).
- OneHotEncoder. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html (accessed on 10 March 2025).
- Ho, T.K. Random Decision Forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 2, pp. 278–282. [Google Scholar]
- Altman, N. An Introduction to Kernel and Nearest-neighbor Nonparametric Regression. Am. Stat. 1990, 46, 175–185. [Google Scholar] [CrossRef]
- Kramer, O. Scikit-learn. In Machine Learning for Evolution Strategies; Springer: Cham, Switzerland, 2016; Volume 20, pp. 45–53. [Google Scholar]
- Mckinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; Volume 445, pp. 51–56. [Google Scholar]
- Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array Programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
- TensorFlow Distributions. Available online: https://arxiv.org/abs/1711.10604 (accessed on 10 March 2025).
- Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning. IEEE Access 2022, 10, 40281–40306. [Google Scholar] [CrossRef]
- Alqahtani, A.; Whyte, A. Estimation of Life-cycle Costs of Buildings: Regression vs Artificial Neural Network. Built Environ. Proj. Asset Manag. 2016, 6, 30–43. [Google Scholar] [CrossRef]
- Salahuddin, M.; Pourahmadi, V.; Alameddine, H.; Bari, M.; Boutaba, R. Chronos: DDoS Attack Detection using Time-based Autoencoder. IEEE TNSM 2022, 19, 627–641. [Google Scholar] [CrossRef]
- Stiawan, D.; Susanto; Bimantara, A.; Idris, M.Y.; Budiarto, R. IoT Botnet Attack Detection using Deep Autoencoder and Artificial Neural Networks. KSII Trans. Internet Inf. Syst. 2023, 17, 1310–1338. [Google Scholar]
- Cui, J.; Sun, H.; Zhong, H.; Zhang, J.; Wei, L.; Bolodurina, I.; He, D. Collaborative Intrusion Detection System for SDVN: A Fairness Federated Deep Learning Approach. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 2512–2528. [Google Scholar] [CrossRef]
- Mehedi, S.T.; Anwar, A.; Rahman, Z.; Ahmed, K.; Islam, R. Dependable Intrusion Detection System for IoT: A Deep Transfer Learning based Approach. IEEE Trans. Ind. Inform. 2023, 19, 1006–1017. [Google Scholar] [CrossRef]
Traffic Category | Subcategory | Total Samples | Training Set (80%) | Testing Set (20%) |
---|---|---|---|---|
MITM | ARP spoofing DNS spoofing | 358 | 286 | 72 |
Injection | XSS SQL injection Uploading attack | 102,230 | 81,784 | 20,446 |
Malware | Backdoor Ransomware Password cracking | 83,637 | 66,909 | 16,728 |
DDoS | TCP/SYN HTTP UDP ICMP | 28,811 | 230,488 | 57,623 |
Information gathering | Port scanning OS fingerprint Vulnerability scanning | 70,853 | 56,682 | 14,171 |
Benign | Normal | 1,363,998 | 1,091,198 | 272,800 |
Experiment Group | Categories Included | Data Distribution | Total Sample | |
---|---|---|---|---|
EG-A | MITM Injection Malware DDoS Benign | 286 81,784 66,909 230,488 1,091,198 | (0.02%) (5.56%) (4.55%) (15.67%) (74.20%) | 1,470,665 |
EG-B | MITM Injection Malware Information gathering Benign | 286 81,784 66,909 56,682 1,091,198 | (0.02%) (6.31%) (5.16%) (4.37%) (84.14%) | 1,296,859 |
EG-C | MITM Injection DDoS Information gathering Benign | 286 81,784 230,488 56,682 1,091,198 | (0.02%) (5.60%) (15.78%) (3.88%) (74.72%) | 1,460,438 |
EG-D | MITM Malware DDoS Information gathering Benign | 286 66,909 230,488 56,682 1,091,198 | (0.02%) (4.63%) (15.94%) (3.92%) (75.49%) | 1,445,563 |
TG | MITM Injection Malware DDoS Information gathering Benign | 72 20,446 16,728 57,623 14,171 272,800 | (0.03%) (5.35%) (4.38%) (15.09%) (3.71%) (71.44%) | 381,840 |
Subject | Number of Neurons | Parameter | Activation Function | |
---|---|---|---|---|
Layer | ||||
Input layer | 95 | 0 | - | |
Hidden layer 1 | 76 | 7296 | Sigmoid | |
Hidden layer 2 | 47 | 3619 | Sigmoid | |
Hidden layer 3 | 28 | 1344 | Sigmoid | |
Hidden layer 4 | 47 | 1363 | Sigmoid | |
Hidden layer 5 | 76 | 3648 | Sigmoid | |
Output layer | 95 | 7315 | Sigmoid |
EG-A | EG-B | EG-C | EG-D | |
---|---|---|---|---|
(a) Random forest | ||||
Accuracy | 0.9901 | 0.9347 | 0.9986 | 0.9937 |
Precision | 0.7816 | 0.7061 | 0.9955 | 0.7755 |
Recall | 0.7968 | 0.8333 | 0.9955 | 0.8000 |
F1-score | 0.7890 | 0.7488 | 0.9955 | 0.7873 |
(b) KNN | ||||
Accuracy | 0.9851 | 0.9342 | 0.9927 | 0.9930 |
Precision | 0.7744 | 0.6939 | 0.9802 | 0.7886 |
Recall | 0.7824 | 0.8313 | 0.9712 | 0.7966 |
F1-score | 0.7782 | 0.7422 | 0.9756 | 0.7926 |
Experiment | EG-A | EG-B | EG-C | EG-D | |
---|---|---|---|---|---|
Category | |||||
(a) Unidentified packet group | |||||
MITM | 72 | 60 | 72 | 72 | |
Injection | 774 | 5875 | 4057 | 18,158 | |
Malware | 736 | 1719 | 16,728 | 1116 | |
DDoS | 134 | 35,545 | 245 | 498 | |
Information gathering | 11,068 | 305 | 292 | 501 | |
Benign | 7294 | 443 | 33,796 | 380 | |
Total | 20,078 | 43,947 | 55,190 | 20,725 | |
(b) Recognized packet group | |||||
MITM | 0 | 12 | 0 | 0 | |
Injection | 19,672 | 14,571 | 16,389 | 2288 | |
Malware | 15,992 | 15,009 | 0 | 15,612 | |
DDoS | 57,489 | 22,078 | 57,378 | 57,125 | |
Information gathering | 3103 | 13,866 | 13,879 | 13,670 | |
Benign | 265,506 | 272,358 | 239,004 | 272,420 | |
Total | 361,762 | 337,895 | 326,650 | 361,115 |
EG-A | EG-B | EG-C | EG-D | |
---|---|---|---|---|
AUC value | 0.977 | 0.945 | 1.000 | 0.971 |
Measurement | Random Forest | KNN | ||
---|---|---|---|---|
Class | Precision | Recall | Precision | Recall |
(a) EG-A | ||||
MITM | NaN | NaN | NaN | NaN |
Injection | 0.9507 | 0.9881 | 0.9423 | 0.9264 |
Malware | 1.0000 | 1.0000 | 0.9423 | 1.0000 |
DDoS | 0.9572 | 0.9959 | 0.9299 | 0.9855 |
Information gathering | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
Benign | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
Average | 0.7816 | 0.7968 | 0.7629 | 0.7824 |
(b) EG-B | ||||
MITM | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
Injection | 0.8100 | 1.0000 | 1.0000 | 0.9993 |
Malware | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
DDoS | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
Information gathering | 0.4263 | 1.0000 | 0.4592 | 0.9884 |
Benign | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
Average | 0.7061 | 0.8333 | 0.7432 | 0.8313 |
(c) EG-C | ||||
MITM | NaN | NaN | NaN | NaN |
Injection | 0.9862 | 0.9858 | 0.9496 | 0.9123 |
Malware | NaN | NaN | NaN | NaN |
DDoS | 0.9959 | 0.9961 | 0.9735 | 0.9845 |
Information gathering | 1.0000 | 1.0000 | 0.9808 | 0.9808 |
Benign | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
Average | 0.9955 | 0.9955 | 0.9760 | 0.9694 |
(d) EG-D | ||||
MITM | NaN | NaN | NaN | NaN |
Injection | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
Malware | 0.9710 | 1.0000 | 1.0000 | 1.0000 |
DDoS | 0.9901 | 1.0000 | 0.9614 | 0.9993 |
Information gathering | 0.9163 | 1.0000 | 0.9817 | 0.9839 |
Benign | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
Average | 0.7755 | 0.8000 | 0.7886 | 0.7966 |
EG-A | EG-B | EG-C | EG-D | |
---|---|---|---|---|
(a) Random forest | ||||
Accuracy | 0.9976 | 0.9637 | 0.9987 | 0.9982 |
Precision | 0.9774 | 0.8770 | 0.9929 | 0.9896 |
Recall | 0.9743 | 0.9627 | 0.9930 | 0.9931 |
F1-score | 0.9758 | 0.9032 | 0.9930 | 0.9912 |
(b) KNN | ||||
Accuracy | 0.9846 | 0.9449 | 0.9927 | 0.9914 |
Precision | 0.8840 | 0.8300 | 0.9665 | 0.9875 |
Recall | 0.8369 | 0.9409 | 0.9542 | 0.9624 |
F1-score | 0.8565 | 0.8516 | 0.9601 | 0.9736 |
Experiment Group | EG-A | EG-B | EG-C | EG-D | |
---|---|---|---|---|---|
Category | |||||
(a) Unidentified packet group | |||||
MITM | 72 | 62 | 62 | 72 | |
Injection | 18,058 | 14,745 | 14,265 | 2049 | |
Malware | 965 | 140 | 1 | 1139 | |
DDoS | 482 | 2 | 82 | 5229 | |
Information gathering | 261 | 85 | 106 | 8263 | |
Benign | 9418 | 12,630 | 39 | 22,715 | |
Total | 29,256 | 27,664 | 14,555 | 39,467 | |
(b) Recognized packet group | |||||
MITM | 0 | 10 | 10 | 0 | |
Injection | 2388 | 5701 | 6181 | 18,397 | |
Malware | 15,763 | 16,588 | 16,727 | 15,589 | |
DDoS | 57,141 | 57,621 | 57,541 | 52,394 | |
Information gathering | 13,910 | 14,086 | 14,065 | 5908 | |
Benign | 263,382 | 260,170 | 272,761 | 250,085 | |
Total | 352,584 | 354,176 | 367,285 | 342,373 |
Measurement | Random Forest | KNN | |||
---|---|---|---|---|---|
Class | Precision | Recall | Precision | Recall | |
(a) EG-A | |||||
MITM | NaN | NaN | NaN | NaN | |
Injection | 0.90054 | 0.90243 | 0.49167 | 0.44472 | |
Malware | 1.00000 | 1.00000 | 0.99924 | 1.00000 | |
DDoS | 0.98960 | 0.99583 | 0.92906 | 0.98899 | |
Information gathering | 1.00000 | 0.97374 | 1.00000 | 0.75054 | |
Benign | 1.00000 | 1.00000 | 1.00000 | 1.00000 | |
Average | 0.97742 | 0.97429 | 0.88399 | 0.83685 | |
(b) EG-B | |||||
MITM | 1.00000 | 1.00000 | 1.00000 | 1.00000 | |
Injection | 0.68223 | 0.99982 | 0.46320 | 0.99684 | |
Malware | 1.00000 | 1.00000 | 1.00000 | 1.00000 | |
DDoS | 0.99998 | 0.77656 | 0.99966 | 0.66571 | |
Information gathering | 0.57953 | 1.00000 | 0.51717 | 0.98275 | |
Benign | 1.00000 | 1.00000 | 1.00000 | 1.00000 | |
Average | 0.87695 | 0.96273 | 0.83000 | 0.94088 | |
(c) EG-C | |||||
MITM | 1.00000 | 1.00000 | 1.00000 | 1.00000 | |
Injection | 0.96153 | 0.96230 | 0.82973 | 0.76395 | |
Malware | 1.00000 | 1.00000 | 1.00000 | 1.00000 | |
DDoS | 0.99595 | 0.99586 | 0.97140 | 0.98467 | |
Information gathering | 1.00000 | 1.00000 | 0.99775 | 0.97675 | |
Benign | 1.00000 | 1.00000 | 1.00000 | 1.00000 | |
Average | 0.99291 | 0.99302 | 0.96648 | 0.95422 | |
(d) EG-D | |||||
MITM | NaN | NaN | NaN | NaN | |
Injection | 1.00000 | 0.96570 | 0.99981 | 0.85563 | |
Malware | 1.00000 | 1.00000 | 1.00000 | 1.00000 | |
DDoS | 0.99338 | 1.00000 | 0.94776 | 0.99926 | |
Information gathering | 0.95444 | 1.00000 | 0.98967 | 0.95718 | |
Benign | 1.00000 | 1.00000 | 1.00000 | 1.00000 | |
Average | 0.98956 | 0.99314 | 0.98744 | 0.96241 |
Model | Random Forest | KNN | |||
---|---|---|---|---|---|
Measurement | Before | After | Before | After | |
(a) EG-A | |||||
Accuracy | 0.990129 | 0.997606 | 0.985112 | 0.984614 | |
Precision | 0.781564 | 0.977429 | 0.774435 | 0.883993 | |
Recall | 0.796807 | 0.974290 | 0.782374 | 0.836851 | |
F1-score | 0.789036 | 0.975819 | 0.778230 | 0.856445 | |
(b) EG-B | |||||
Accuracy | 0.934660 | 0.963645 | 0.934154 | 0.944878 | |
Precision | 0.706055 | 0.876955 | 0.693946 | 0.830005 | |
Recall | 0.833333 | 0.962730 | 0.831284 | 0.940884 | |
F1-score | 0.748802 | 0.903176 | 0.742242 | 0.851567 | |
(c) EG-C | |||||
Accuracy | 0.998595 | 0.998718 | 0.992748 | 0.992736 | |
Precision | 0.995536 | 0.992913 | 0.980209 | 0.966480 | |
Recall | 0.995461 | 0.993028 | 0.971212 | 0.954229 | |
F1-score | 0.995499 | 0.992970 | 0.975610 | 0.960103 | |
(d) EG-D | |||||
Accuracy | 0.993664 | 0.998157 | 0.992950 | 0.991390 | |
Precision | 0.775482 | 0.989565 | 0.788606 | 0.987448 | |
Recall | 0.800000 | 0.993140 | 0.796648 | 0.962412 | |
F1-score | 0.787328 | 0.991184 | 0.792553 | 0.973619 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Y.-C.; Cheng, C.-H.; Lin, T.-W.; Lee, J.-S. Diverse Machine Learning-Based Malicious Detection for Industrial Control System. Electronics 2025, 14, 1947. https://doi.org/10.3390/electronics14101947
Chen Y-C, Cheng C-H, Lin T-W, Lee J-S. Diverse Machine Learning-Based Malicious Detection for Industrial Control System. Electronics. 2025; 14(10):1947. https://doi.org/10.3390/electronics14101947
Chicago/Turabian StyleChen, Ying-Chin, Chia-Hao Cheng, Tzu-Wei Lin, and Jung-San Lee. 2025. "Diverse Machine Learning-Based Malicious Detection for Industrial Control System" Electronics 14, no. 10: 1947. https://doi.org/10.3390/electronics14101947
APA StyleChen, Y.-C., Cheng, C.-H., Lin, T.-W., & Lee, J.-S. (2025). Diverse Machine Learning-Based Malicious Detection for Industrial Control System. Electronics, 14(10), 1947. https://doi.org/10.3390/electronics14101947