Intrusion Detection in Industrial Control Systems Using Transfer Learning Guided by Reinforcement Learning
Abstract
1. Introduction
- Hybrid CNN-LSTM Architecture: We develop a deep learning backbone that combines one-dimensional CNNs and LSTMs for intrusion detection. This hybrid model efficiently extracts spatial correlations and temporal patterns, which enables the detection of complex multi-stage attacks in ICS networks.
- RL-Guided Transfer Learning: We introduce a new RL-based fine-tuning mechanism for domain adaptation. An RL agent monitors the model’s performance and dynamically adjusts training parameters (like layer freezing or learning rate) to optimize adaptation. This approach allows the IDS to autonomously adapt to new network conditions with little human help.
- Improved Detection Under Data Constraints: Through experiments on an OT network dataset with very few training samples, we demonstrate that our RL-guided transfer learning IDS shows promising performance compared to conventional fine-tuning and baseline models. The proposed system achieves improved detection accuracy and F1-scores, suggesting potential effectiveness for enhancing cybersecurity in smart grid and SCADA environments where labeled data is scarce.
2. Literature Review
2.1. Deep Learning in Intrusion Detection
2.2. Spatio-Temporal Feature Learning
2.3. Intrusion Detection in ICS/OT Environments
2.4. Transfer Learning for Intrusion Detection
2.5. Reinforcement Learning and Adaptive Training
3. Proposed System Architecture
- Data preprocessing and feature selection using correlation analysis.
- A combined CNN-LSTM model for learning both spatial and time-based patterns.
- An RL agent that fine-tunes the model during transfer learning.
3.1. Data Preprocessing and Correlation Analysis
3.1.1. Data Cleaning and Normalization
3.1.2. Simulating Data Shortage in the Target Domain
3.1.3. Removing Obvious Features
3.2. Hybrid CNN-LSTM Backbone for Spatio-Temporal Feature Extraction
3.2.1. CNN Component for Spatial Analysis
- Block 1 (Detailed Feature Analysis): This layer uses a 1D convolution with 16 filters and a small kernel size of 1 × 1. It focuses on learning detailed relationships between features without combining values from nearby positions.
- Block 2 (Wider Feature Analysis): This layer uses a 1D convolution with 64 filters and a larger kernel size of 5 × 5. It helps the model learn broader patterns by combining information across several nearby features.
3.2.2. LSTM Component for Temporal Analysis
- First LSTM Layer: 64 units that process the concatenated CNN features and capture initial temporal dependencies.
- Second LSTM Layer: 32 units that refine the temporal representations and provide more compact feature encoding.
- Forget Gate: —controls what parts of the old memory should be thrown away.
- Input Gate and Candidate Values are:The input gate decides what new information to add, and gives the candidate values for updating the memory.
- Memory Cell Update: —combines the old memory (scaled by the forget gate) and new information (scaled by the input gate).
- Output Gate and Hidden States are:The output gate decides what to pass to the next time step and to the next layer.
- Dropout Layer: This helps reduce overfitting by randomly turning off some neurons during training.
- Dense Output Layer: A fully connected layer with a sigmoid activation as shown in Equation (9):
3.3. LSTM Design Rationale for Temporal Pattern Learning
3.4. Reinforcement Learning Agent for Adaptive Fine-Tuning
- S: The set of possible states.
- A: The set of actions the agent can take.
- : The chance of moving from state s to when action a is taken.
- : The reward received when taking action a in state s and ending up in state .
- : The discount factor, which determines how much future rewards matter.
- : Expected total reward starting from state s and following policy .
- : Expected total reward starting from state s, taking action a, and then following .
3.4.1. State Space (S)
- : Current F1-score on validation data,
- : Validation loss,
- : Current epoch as a fraction of total epochs,
- : Percentage of parameters currently trainable,
- : Gradient norm of the first n layers.
3.4.2. Action Space (A)
- freeze_layer_n: Freezes layer n to stop its weights from changing.
- unfreeze_all: Makes all layers trainable for full model adjustment.
- increase_lr: Raises the learning rate to speed up training or escape local minima.
- decrease_lr: Lowers the learning rate for more careful fine-tuning.
- adjust_dropout: Changes the dropout rate to improve regularization.
3.4.3. Reward Function (R)
3.4.4. Learning Mechanism
3.4.5. RL-Guided Fine-Tuning
Algorithm 1 RL-Guided Fine-Tuning Loop |
|
4. Evaluation Methodology and Metrics
4.1. Performance Evaluation Metrics for Intrusion Detection
- True Positive (TP): Correctly detects an attack.
- True Negative (TN): Correctly detects normal traffic.
- False Positive (FP): Mistakenly marks normal traffic as an attack.
- False Negative (FN): Misses an actual attack.
- Accuracy (ACC): Tells how many predictions were correct overall:
- Precision (Pr): Out of everything marked as an attack, how many were actually attacks:
- Recall (Detection Rate, DR): Out of all real attacks, how many we correctly found:
- F1-Score: A balanced score that combines precision and recall:
4.2. Learning Process and Convergence Analysis
- Training and Validation Loss Curves: Show how the model’s error decreases during training. If training loss goes down but validation loss does not, it may be overfitting.
- Accuracy and F1-Score Over Time: We plot accuracy and F1-score for each epoch to see how quickly and stably the model improves.
4.3. Evaluation Metrics for the Reinforcement Learning Agent
- F1-Score Trajectory and Action Annotation: We plot the F1-score during training and mark the actions taken by the RL agent (like Freeze L0, Increase LR) to see if those actions lead to improvements.
- Action Frequency Distribution: We look at how often each action is chosen. A good policy should use a variety of actions depending on the situation. If one action is used too much, it might mean the agent has not learned a balanced strategy.
- Cumulative Reward: We track the total reward the agent earns over time:
- –
- A steady increase means the agent is learning well.
- –
- A flat or dropping line means something is going wrong.
5. Results and Discussion
5.1. Comparative Analysis of Learning and Performance
5.1.1. Learning Convergence
5.1.2. Detection Performance
5.2. Analysis of the Reinforcement Learning Agent’s Behavior
5.2.1. Action Selection Strategy
5.2.2. Impact of Agent Actions on Performance
5.2.3. Comparison of RL Algorithms
5.2.4. Discussion of Results
6. Limitations
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ACC | Accuracy |
CNN | Convolutional Neural Network |
DNP3 | Distributed Network Protocol 3 |
DNN | Deep Neural Network |
DoS | Denial-of-Service |
DR | Detection Rate |
FN | False Negative |
FP | False Positive |
ICS | Industrial Control Systems |
IDS | Intrusion Detection System |
IIoT | Industrial Internet of Things |
LSTM | Long Short-Term Memory |
ML | Machine Learning |
MDP | Markov Decision Process |
MitM | Man-in-the-Middle |
NIDS | Network Intrusion Detection System |
OT | Operational Technology |
Pr | Precision |
RL | Reinforcement Learning |
RNN | Recurrent Neural Network |
SCADA | Supervisory Control and Data Acquisition |
TL | Transfer Learning |
TN | True Negative |
TP | True Positive |
References
- Humayed, A.; Lin, J.; Li, F.; Luo, B. Cyber-Physical Systems SecurityA Survey. IEEE Internet Things J. 2017, 4, 1802–1831. [Google Scholar] [CrossRef]
- Yang, Y.; Li, W.; Sun, L.; Wu, K.; Xiang, Y. A Survey on the Security of SCADA Systems. Future Gener. Comput. Syst. 2022, 115, 946–973. [Google Scholar]
- Langner, R. Stuxnet: Dissecting a Cyberwarfare Weapon. IEEE Secur. Priv. 2011, 9, 49–51. [Google Scholar] [CrossRef]
- Lee, R.M.; Assante, M.J.; Conway, T. Analysis of the Cyber Attack on the Ukrainian Power Grid; E-ISAC: Washington, DC, USA, 2016. [Google Scholar]
- Nicholson, A.; Webber, S.; Dyer, S.; Patel, T.; Janicke, H. SCADA Security in the Light of Cyber-Warfare. Comput. Secur. 2012, 31, 418–436. [Google Scholar] [CrossRef]
- Kravchik, M.; Shabtai, A. Detecting Cyber Attacks in Industrial Control Systems Using Convolutional Neural Networks. In Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCy (CPS-SPI 2018), New York, NY, USA, 22 October 2018; pp. 72–83. [Google Scholar]
- Goh, J.; Adepu, S.; Junejo, K.N.; Mathur, A.P. A Dataset to Support Research in the Design of Secure Water Treatment Systems. In Proceedings of the 11th International Conference on Critical Information Infrastructures Security (CRITIS 2016), Paris, France, 10–12 October 2016; pp. 88–99. [Google Scholar]
- Aslam, N.; Khan, A.; Nazir, B.; Hassan, S.; Lee, B.; Ahmad, A. Deep Learning Techniques for Industrial Control System Security: A Comprehensive Survey. IEEE Access 2025, 13, 5678–5702. [Google Scholar]
- Berman, D.S.; Buczak, A.L.; Chavis, J.S.; Corbett, C.L. A Survey of Deep Learning Methods for Cybersecurity. IEEE Commun. Surv. Tutor. 2019, 21, 1154–1176. [Google Scholar]
- Layeghy, S.; Gamage, A.T.; Sivaraman, V. DI-NIDS: A Deep Intrusion Detection System for In-Vehicle Networks with Adversarial Domain Adaptation. Comput. Secur. 2023, 120, 102786. [Google Scholar]
- Han, H.; Kim, H.; Kim, Y. An Efficient Hyperparameter Control Method for a Network Intrusion Detection System Based on Proximal Policy Optimization. Symmetry 2022, 14, 161. [Google Scholar] [CrossRef]
- Shaikh, A.; Smys, S.; Safari, M.; Jalil, P.; Chauhdary, S.A.; Abd-Elkader, O. HCLR-IDS: Hierarchical CNN-LSTM with Reinforcement Learning for Internet of Medical Things. Comput. Electr. Eng. 2025, 110, 108996. [Google Scholar]
- MITRE ATT&CK for ICS: Adversarial Tactics, Techniques & Common Knowledge for Industrial Control Systems. Available online: https://attack.mitre.org/matrices/ics/ (accessed on 15 January 2025).
- Wang, W.; Wang, Z.; Zhou, Z.; Deng, H.; Zhao, W.; Wang, C.; Guo, Y. Anomaly detection of industrial control systems based on transfer learning. Tsinghua Sci. Technol. 2021, 26, 821–832. [Google Scholar] [CrossRef]
- Fernández, V.; López, N.; Rodríguez, I. Complexity and resolution of spatio-temporal reasonings for criminology with greedy and evolutionary algorithms. Expert Syst. Appl. 2025, 275, 126932. [Google Scholar] [CrossRef]
- Yin, C.; Zhu, Y.; Fei, J.; He, X. A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks. IEEE Access 2017, 5, 21954–21961. [Google Scholar] [CrossRef]
- Javaid, A.; Niyaz, Q.; Sun, W.; Alam, M. A Deep Learning Approach for Network Intrusion Detection System. In Proceedings of the 9th EAI International Conference on Bio-Inspired Information and Communications Technologies (BIONETICS), New York, NY, USA, 3–5 December 2015; ICST: Brussels, Belgium, 2016; pp. 21–26. [Google Scholar]
- Shone, N.; Ngoc, T.N.; Phai, V.D.; Shi, Q. A Deep Learning Approach to Network Intrusion Detection. IEEE Trans. Emerg. Top. Comput. Intell. 2018, 2, 41–50. [Google Scholar] [CrossRef]
- Hsu, C.-M.; Azhari, M.Z.; Hsieh, H.-Y.; Prakosa, S.W.; Leu, J.-S. Robust Network Intrusion Detection Scheme Using Long-Short Term Memory Based Convolutional Neural Networks. Mobile Netw. Appl. 2021, 26, 1137–1144. [Google Scholar] [CrossRef]
- Lokman, S.F.; Othman, A.T.; Abu-Bakar, M.H. Intrusion Detection System for Modbus Protocol Using Long Short-Term Memory. In Proceedings of the 2018 International Conference on Computing, Electronics & Communications Engineering (iCCECE), Southend, UK, 16–17 August 2018; pp. 201–205. [Google Scholar]
- Bakhsh, S.; Khan, M.; Saidani, O.; Alasbali, N.; Abbas, S.; Khan, M.; Ahmad, J. Enhancing Security in DNP3 Communication for Smart Grids: A Segmented Neural Network Approach. IEEE Access 2025, 13, 1–11. [Google Scholar] [CrossRef]
- Aslam, M.M.; Tufail, A.; Irshad, M.N. Survey of Deep Learning Approaches for Securing Industrial Control Systems: A Comparative Analysis. Cyber Secur. Appl. 2025, 3, 100096. [Google Scholar] [CrossRef]
- Pinto, A.; Herrera, L.-C.; Donoso, Y.; Gutierrez, J.A. Survey on Intrusion Detection Systems Based on Machine Learning Techniques for the Protection of Critical Infrastructure. Sensors 2023, 23, 2415. [Google Scholar] [CrossRef]
- Kumar, A.; Gutierrez, J.A. Impact of Machine Learning on Intrusion Detection Systems for the Protection of Critical Infrastructure. Information 2025, 16, 515. [Google Scholar] [CrossRef]
- de Silva, L.A.; Weragoda, S.K.; Thilakarathna, K.; Seneviratne, A. An Improved Autoencoder Method for ICS Intrusion Detection. IEEE Access 2024, 12, 45678–45689. [Google Scholar]
- Cai, Z.; Du, H.; Wang, H.; Zhang, J.; Si, Y.; Li, P. One-Dimensional Convolutional Wasserstein GAN-Based Intrusion Detection Method for Industrial Control Systems. Electronics 2023, 12, 4653. [Google Scholar] [CrossRef]
- Almalawi, A.; Hassan, S.; Fahad, A.; Iqbal, A.; Khan, A.I. Hybrid Cybersecurity for Asymmetric Threats: Intrusion Detection and SCADA System Protection Innovations. Symmetry 2025, 17, 616. [Google Scholar] [CrossRef]
- AlHaddad, U.; Basuhail, A.; Khemakhem, M.; Eassa, F.; Jambi, K. Ensemble Model Based on Hybrid Deep Learning for Intrusion Detection in Smart Grid Networks. Sensors 2023, 23, 7464. [Google Scholar] [CrossRef] [PubMed]
- Dangwal, G.; Mittal, S.; Wazid, M.; Singh, J.; Das, A.K.; Giri, D.; Alenazi, M.J.F. An Effective Intrusion Detection Scheme for Distributed Network Protocol 3 (DNP3) Applied in SCADA-Enabled IoT Applications. Comput. Electr. Eng. 2024, 120, 109828. [Google Scholar] [CrossRef]
- Wang, W.; Harrou, F.; Bouyeddou, B.; Senouci, S.; Sun, Y. A Stacked Deep Learning Approach to Cyber-Attacks Detection in Industrial Systems: Application to Power System and Gas Pipeline Systems. Clust. Comput. 2022, 25, 561–578. [Google Scholar] [CrossRef] [PubMed]
- Qu, Y.; Ma, H.; Jiang, Y.; Bu, Y. A Network Intrusion Detection Method Based on Domain Confusion. Electronics 2023, 12, 1255. [Google Scholar] [CrossRef]
- Li, K.; Ma, W.; Duan, H.; Xie, H. Multi-Source Refined Adversarial Domain Adaptation with Transfer Complementarity Infusion for IoT Intrusion Detection under Limited Samples. Expert Syst. Appl. 2024, 254, 124352. [Google Scholar] [CrossRef]
- Mehedi, S.T.; Anwar, A.; Rahman, Z.; Ahmed, K. Deep Transfer Learning-Based Intrusion Detection System for Electric Vehicular Networks. Electronics 2021, 21, 4736. [Google Scholar] [CrossRef]
- Rodríguez, E.; Valls, P.; Otero, B.; Costa, J.J.; Verdú, J.; Pajuelo, M.A.; Canal, R. Transfer-Learning-Based Intrusion Detection Framework in IoT Networks. Sensors 2022, 22, 5621. [Google Scholar] [CrossRef]
- Ullah, F.; Ullah, S.; Srivastava, G.; Lin, J.C.-W. IDS-INT: Intrusion Detection System Using Transformer-Based Transfer Learning for Imbalanced Network Traffic. Digit. Commun. Netw. 2024, 10, 190–204. [Google Scholar] [CrossRef]
- Abdelhamid, S.; Hegazy, I.; Aref, M.; Roushdy, M. Attention-Driven Transfer Learning Model for Improved IoT Intrusion Detection. Big Data Cogn. Comput. 2024, 8, 116. [Google Scholar] [CrossRef]
- Amamra, A.; Terrelonge, V. Multiple Kernel Transfer Learning for Enhancing Network Intrusion Detection in Encrypted and Heterogeneous Network Environments. Electronics 2025, 14, 80. [Google Scholar] [CrossRef]
- Wu, W.; Joloudari, J.H.; Jagatheesaperumal, S.K.; Kandala, R.N.V.P.S.; Gaftandzhieva, S.; Rezaei, M. Deep Transfer Learning Techniques in Intrusion Detection System–Internet of Vehicles: A State-of-the-Art Review. Comput. Mater. Contin. 2024, 80, 1–29. [Google Scholar] [CrossRef]
- Sangoleye, F.; Johnson, J.; Tsiropoulou, E.E. Intrusion Detection in Industrial Control Systems Based on Deep Reinforcement Learning. IEEE Access 2024, 12, 1–15. [Google Scholar] [CrossRef]
- Shaikh, J.; Wang, C.; Sima, M.; Arshad, M.; Owais, M.; Hassan, D.; Alkanhel, R.; Muthanna, M. A Deep Reinforcement Learning-Based Robust Intrusion Detection System for Securing IoMT Healthcare Networks. Front. Med. 2025, 12, 995872. [Google Scholar] [CrossRef] [PubMed]
- Talaat, F.; Gamel, S. RL-Based Hyper-Parameters Optimization Algorithm (ROA) for Convolutional Neural Network. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 13349–13359. [Google Scholar] [CrossRef]
- Han, Z.; Wu, Q.; Li, Z.; Ding, M.; Yuen, C.; Ibraheem, O.; Poor, H. Reinforcement Learning for Adaptive Intrusion Detection under Concept Drift. IEEE Trans. Netw. Sci. Eng. 2022, 9, 1261–1275. [Google Scholar]
- IEEE Std 1815-2012 (Revision of IEEE Std 1815-2010); IEEE Standard for Electric Power Systems Communications–Distributed Network Protocol (DNP3). IEEE Standards Association: Piscataway, NJ, USA, 2012.
Model | Accuracy | Precision | Recall (TPR) | F1-Score |
---|---|---|---|---|
RL-Guided (Proposed) | 0.9825 | 0.9801 | 0.9850 | 0.9825 |
Neural Fine-tuning | 0.8625 | 0.8718 | 0.8500 | 0.861 |
5-Layer Benchmark | 0.8475 | 0.8564 | 0.8350 | 0.846 |
Baseline | 0.7625 | 0.7692 | 0.7500 | 0.759 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ali, J.; Ali, S.; Al Balushi, T.; Nadir, Z. Intrusion Detection in Industrial Control Systems Using Transfer Learning Guided by Reinforcement Learning. Information 2025, 16, 910. https://doi.org/10.3390/info16100910
Ali J, Ali S, Al Balushi T, Nadir Z. Intrusion Detection in Industrial Control Systems Using Transfer Learning Guided by Reinforcement Learning. Information. 2025; 16(10):910. https://doi.org/10.3390/info16100910
Chicago/Turabian StyleAli, Jokha, Saqib Ali, Taiseera Al Balushi, and Zia Nadir. 2025. "Intrusion Detection in Industrial Control Systems Using Transfer Learning Guided by Reinforcement Learning" Information 16, no. 10: 910. https://doi.org/10.3390/info16100910
APA StyleAli, J., Ali, S., Al Balushi, T., & Nadir, Z. (2025). Intrusion Detection in Industrial Control Systems Using Transfer Learning Guided by Reinforcement Learning. Information, 16(10), 910. https://doi.org/10.3390/info16100910