Reinforcement Learning for the Optimization of Adaptive Intrusion Detection Systems †
Abstract
1. Introduction
2. Fundamentals of Reinforcement Learning
Deep Q-Networks
3. Related Work
4. Experimental Methodology
4.1. Experimental Protocol
- States : Instances from the dataset. The RL model will receive the stacked predictions from each model in the ensemble as input. Therefore, for the RL model, each instance will be characterized by the probabilities of belonging to each class given by each model, instead of the original features. This is illustrated in Figure 1. The label of each instance will remain unchanged.
- Actions : The possible labels to assign to the instance. The RL model will thus function as a classifier.
- Reward : We follow the strategy of previous works [12] and use a binary reward. Its value will be 1 if the selected label for the instance is correct and 0 otherwise. Although some works like [26] use rewards that take data imbalance into account, in [12] the binary reward achieves the best performance on the same problem we address in this work.
- The base ensemble is trained, and the predictions of each component from the training and test subsets are extracted.
- The predictions from each model in the ensemble are stacked and will be used as input to the RL model, for both training and testing.
- The hyperparameters of the RL model are optimized, and the best configuration is selected.
- The RL model is trained, and its predictions on the test set are extracted.
- We calculate different metrics to evaluate the quality of the predictions from both the base model and the RL model.
4.2. Models and Hyperparameters
4.3. Datasets
4.4. Analysis
- 1.
- First, we apply the Kolmogorov–Smirnov test for normality with the Lilliefors correction. We obtain non-significant results in all cases, so we will apply parametric inferential analyses and use the mean as a measure of central tendency.
- 2.
- We apply the Student’s t-test for paired samples between the values obtained by the base ensemble and the predictions adjusted by RL. If there are significant differences, we will carry out a descriptive analysis.
5. Discussion of Results
5.1. Results in NSL-KDD
5.2. Results in UNSW-NB15
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| RL | Reinforcement Learning |
| IDS | Intrusion Detection System |
| MDP | Markov Decision Process |
| DQN | Deep Q-Networks |
| DRL | Deep Reinforcement Learning |
| LR | Logistic Regression |
| DT | Decision Tree |
| KNN | K-Nearest Neighbors |
| MLP | Multilayer Perceptron |
| DoS | Denial of Service |
| AUC | Area Under the Curve |
References
- Sethi, K.; Sai Rupesh, E.; Kumar, R.; Bera, P.; Venu Madhav, Y. A context-aware robust intrusion detection system: A reinforcement learning-based approach. Int. J. Inf. Secur. 2019, 19, 657–678. [Google Scholar] [CrossRef]
- Thakkar, A.; Lohiya, R. A survey on intrusion detection system: Feature selection, model, performance measures, application perspective, challenges, and future research directions. Artif. Intell. Rev. 2021, 55, 453–563. [Google Scholar] [CrossRef]
- Shahraki, A.; Abbasi, M.; Taherkordi, A.; Jurcut, A.D. A comparative study on online machine learning techniques for network traffic streams analysis. Comput. Netw. 2022, 207, 108836. [Google Scholar] [CrossRef]
- Gibert, D.; Mateu, C.; Planes, J. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. J. Netw. Comput. Appl. 2020, 153, 102526. [Google Scholar] [CrossRef]
- Kegelmeyer, W.P.; Chiang, K.; Ingram, J. Streaming Malware Classification in the Presence of Concept Drift and Class Imbalance. In Proceedings of the 2013 12th International Conference on Machine Learning and Applications, Miami, FL, USA, 4–7 December 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 48–53. [Google Scholar] [CrossRef]
- Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell. 2016, 5, 221–232. [Google Scholar] [CrossRef]
- Wu, Z.; Gao, P.; Cui, L.; Chen, J. An Incremental Learning Method Based on Dynamic Ensemble RVM for Intrusion Detection. IEEE Trans. Netw. Serv. Manag. 2022, 19, 671–685. [Google Scholar] [CrossRef]
- Alavizadeh, H.; Alavizadeh, H.; Jang-Jaccard, J. Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection. Computers 2022, 11, 41. [Google Scholar] [CrossRef]
- Louati, F.; Ktata, F.B.; Amous, I. Enhancing Intrusion Detection Systems with Reinforcement Learning: A Comprehensive Survey of RL-based Approaches and Techniques. SN Comput. Sci. 2024, 5. [Google Scholar] [CrossRef]
- Yungaicela-Naula, N.M.; Vargas-Rosales, C.; Pérez-Díaz, J.A. SDN/NFV-based framework for autonomous defense against slow-rate DDoS attacks by using reinforcement learning. Future Gener. Comput. Syst. 2023, 149, 637–649. [Google Scholar] [CrossRef]
- Wang, X.; Wang, S.; Liang, X.; Zhao, D.; Huang, J.; Xu, X.; Dai, B.; Miao, Q. Deep Reinforcement Learning: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 5064–5078. [Google Scholar] [CrossRef]
- Lopez-Martin, M.; Carro, B.; Sanchez-Esguevillas, A. Application of deep reinforcement learning to intrusion detection for supervised problems. Expert Syst. Appl. 2020, 141, 112963. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Usman, M.; Chen, H. EMRIL: Ensemble Method based on ReInforcement Learning for binary classification in imbalanced drifting data streams. Neurocomputing 2024, 605, 128259. [Google Scholar] [CrossRef]
- Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
- Gupta, N.; Jindal, V.; Bedi, P. CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems. Comput. Secur. 2022, 112, 102499. [Google Scholar] [CrossRef]
- Shyaa, M.A.; Zainol, Z.; Abdullah, R.; Anbar, M.; Alzubaidi, L.; Santamaría, J. Enhanced Intrusion Detection with Data Stream Classification and Concept Drift Guided by the Incremental Learning Genetic Programming Combiner. Sensors 2023, 23, 3736. [Google Scholar] [CrossRef] [PubMed]
- Mohamed, S.; Ejbali, R. Deep SARSA-based reinforcement learning approach for anomaly network intrusion detection system. Int. J. Inf. Secur. 2022, 22, 235–247. [Google Scholar] [CrossRef]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; IEEE: Ottawa, ON, Canada, 2009; pp. 1–6. [Google Scholar] [CrossRef]
- Kolias, C.; Kambourakis, G.; Stavrou, A.; Gritzalis, S. Intrusion Detection in 802.11 Networks: Empirical Evaluation of Threats and a Public Dataset. IEEE Commun. Surv. Tutor. 2016, 18, 184–208. [Google Scholar] [CrossRef]
- van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double Q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI’16), Phoenix, AZ, USA, 12–17 February 2016; AAAI Press: Palo Alto, CA, USA, 2016; pp. 2094–2100. [Google Scholar]
- Nie, L.; Sun, W.; Wang, S.; Ning, Z.; Rodrigues, J.J.P.C.; Wu, Y.; Li, S. Intrusion Detection in Green Internet of Things: A Deep Deterministic Policy Gradient-Based Algorithm. IEEE Trans. Green Commun. Netw. 2021, 5, 778–788. [Google Scholar] [CrossRef]
- Pang, G.; van den Hengel, A.; Shen, C.; Cao, L. Toward Deep Supervised Anomaly Detection: Reinforcement Learning from Partially Labeled Anomaly Data. In KDD ’21 Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining; ACM: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
- Apruzzese, G.; Andreolini, M.; Marchetti, M.; Venturi, A.; Colajanni, M. Deep Reinforcement Adversarial Learning Against Botnet Evasion Attacks. IEEE Trans. Netw. Serv. Manag. 2020, 17, 1975–1987. [Google Scholar] [CrossRef]
- Agarwal, R.; Schuurmans, D.; Norouzi, M. An Optimistic Perspective on Offline Reinforcement Learning. Proc. Mach. Learn. Res. 2020, 119, 104–114. [Google Scholar]
- Al-Fawa’reh, M.; Abu-Khalaf, J.; Szewczyk, P.; Kang, J.J. MalBoT-DRL: Malware Botnet Detection Using Deep Reinforcement Learning in IoT Networks. IEEE Internet Things J. 2024, 11, 9610–9629. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the The 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
- Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. In NIPS’11 Proceedings of the 25th International Conference on Neural Information Processing Systems, Granada, Spain, 12–14 December 2011; pp. 2546–2554. [Google Scholar]
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–6. [Google Scholar] [CrossRef]

| Parameters | Values |
|---|---|
| learning_rate | |
| batch_size | |
| buffer_size | |
| exploration_final_eps | |
| exploration_fraction | |
| target_update_interval | |
| learning_starts | |
| train_freq | |
| subsample_steps | |
| net_arch |
| Dataset | Class | Frequency | |
|---|---|---|---|
| Training | Test | ||
| NSL-KDD | Normal | 67,343 (53.45%) | 9711 (43.07%) |
| DoS | 45,927 (36.45%) | 7460 (33.08%) | |
| Probe | 11,656 (9.25%) | 2421 (10.73%) | |
| R2L | 995 (0.79%) | 2885 (12.21%) | |
| U2R | 52 (0.041%) | 67 (0.88%) | |
| Total | 125,973 | 22,544 | |
| UNSW-NB15 | Analysis | 2000 (1.14%) | 677 (0.82%) |
| Backdoor | 1746 (0.99%) | 583 (0.70%) | |
| DoS | 12,264 (6.99%) | 4089 (4.96%) | |
| Exploit | 33,393 (19.04%) | 11,132 (13.52%) | |
| Normal | 56,000 (31.93%) | 37,000 (44.93%) | |
| Fuzzers | 18,184 (10.37%) | 6062 (7.36%) | |
| Generic | 40,000 (22.81%) | 18,871 (22.92%) | |
| Recon | 10,491 (5.98%) | 3496 (4.24%) | |
| Shellcode | 1133 (0.64%) | 378 (0.45%) | |
| Worm | 130 (0.07%) | 44 (0.05%) | |
| Total | 175,341 | 82,332 | |
| Metric | Statistic | -Value |
|---|---|---|
| -macro | 5.0971 | |
| -weighted | 6.0187 |
| Experiment | Macro | Weighted | ||
|---|---|---|---|---|
| (Seed) | Base | RL | Base | RL |
| 1062237619 | 0.6026 | 0.6426 | 0.7788 | 0.8050 |
| 2112 | 0.5797 | 0.6105 | 0.7804 | 0.8027 |
| 249651232 | 0.5651 | 0.6347 | 0.7680 | 0.8209 |
| 308118868 | 0.5754 | 0.5976 | 0.7709 | 0.8004 |
| 798844875 | 0.5865 | 0.6273 | 0.7857 | 0.8303 |
| Mean | 0.5818 | 0.6225 | 0.7768 | 0.8119 |
| Predicted Class | ||||||
|---|---|---|---|---|---|---|
| Normal | DoS | Probe | R2L | U2R | ||
| True Class | Normal | 0.7786 | 0.0286 | 0.0000 | 0.0000 | 0.1928 |
| DoS | 0.0454 | 0.6848 | 0.0000 | 0.0000 | 0.2697 | |
| Probe | 0.0004 | 0.0025 | 0.3500 | 0.0004 | 0.6467 | |
| R2L | 0.0050 | 0.0050 | 0.0000 | 0.0450 | 0.9450 | |
| U2R | 0.0044 | 0.0213 | 0.0038 | 0.0000 | 0.9704 | |
| Predicted Class | ||||||
|---|---|---|---|---|---|---|
| Normal | DoS | Probe | R2L | U2R | ||
| True Class | Normal | 0.7884 | 0.0320 | 0.0000 | 0.0000 | 0.1795 |
| DoS | 0.0582 | 0.8414 | 0.0000 | 0.0000 | 0.1004 | |
| Probe | 0.0000 | 0.0795 | 0.3936 | 0.0007 | 0.5261 | |
| R2L | 0.0000 | 0.3150 | 0.0050 | 0.0850 | 0.5950 | |
| U2R | 0.0084 | 0.0232 | 0.0004 | 0.0001 | 0.9679 | |
| Metric | Statistic | -Value |
|---|---|---|
| -macro | −6.7754 | |
| -weighted | 5.6754 |
| Experiment | Macro | Weighted | ||
|---|---|---|---|---|
| (Seed) | Base | RL | Base | RL |
| 1062237619 | 0.4913 | 0.4631 | 0.7914 | 0.7986 |
| 2112 | 0.5072 | 0.4696 | 0.7842 | 0.8058 |
| 249651232 | 0.5052 | 0.4426 | 0.7886 | 0.8042 |
| 308118868 | 0.4820 | 0.4479 | 0.7863 | 0.8038 |
| 798844875 | 0.4917 | 0.4388 | 0.7912 | 0.8017 |
| Mean | 0.4955 | 0.4524 | 0.7883 | 0.8028 |
| Predicted Class | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| An | Ba | DoS | Ex | No | Fu | Ge | Re | Sh | Wo | ||
| True Class | Analysis | 0.0340 | 0.1403 | 0.1152 | 0.5628 | 0.1433 | 0.0000 | 0.0044 | 0.0000 | 0.0000 | 0.0000 |
| Backdoors | 0.0395 | 0.1595 | 0.1269 | 0.4717 | 0.1818 | 0.0103 | 0.0051 | 0.0000 | 0.0051 | 0.0000 | |
| DoS | 0.0408 | 0.1551 | 0.1215 | 0.6026 | 0.0399 | 0.0081 | 0.0127 | 0.0068 | 0.0122 | 0.0002 | |
| Exploits | 0.0162 | 0.0567 | 0.0313 | 0.8232 | 0.0299 | 0.0034 | 0.0136 | 0.0159 | 0.0093 | 0.0005 | |
| Normal | 0.0106 | 0.0325 | 0.0277 | 0.1727 | 0.5762 | 0.0008 | 0.1391 | 0.0028 | 0.0376 | 0.0000 | |
| Fuzzers | 0.0000 | 0.0002 | 0.0028 | 0.0208 | 0.0024 | 0.9713 | 0.0009 | 0.0001 | 0.0014 | 0.0001 | |
| Generic | 0.0126 | 0.0000 | 0.0012 | 0.0227 | 0.1848 | 0.0001 | 0.7746 | 0.0004 | 0.0034 | 0.0000 | |
| Reconnaissance | 0.0051 | 0.0203 | 0.0074 | 0.1430 | 0.0066 | 0.0000 | 0.0034 | 0.8012 | 0.0129 | 0.0000 | |
| Shellcode | 0.0000 | 0.0000 | 0.0132 | 0.1190 | 0.1005 | 0.0026 | 0.0238 | 0.0053 | 0.7355 | 0.0000 | |
| Worms | 0.0000 | 0.0000 | 0.0227 | 0.4318 | 0.0682 | 0.0000 | 0.0000 | 0.0000 | 0.0455 | 0.4318 | |
| Predicted Class | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| An | Ba | DoS | Ex | No | Fu | Ge | Re | Sh | Wo | ||
| True Class | Analysis | 0.0000 | 0.0000 | 0.6573 | 0.2659 | 0.0532 | 0.0000 | 0.0222 | 0.0015 | 0.0000 | 0.0000 |
| Backdoors | 0.0000 | 0.0000 | 0.6398 | 0.2281 | 0.0652 | 0.0103 | 0.0154 | 0.0360 | 0.0051 | 0.0000 | |
| DoS | 0.0000 | 0.0000 | 0.5769 | 0.3656 | 0.0171 | 0.0078 | 0.0127 | 0.0081 | 0.0117 | 0.0000 | |
| Exploits | 0.0000 | 0.0000 | 0.2182 | 0.7251 | 0.0149 | 0.0038 | 0.0213 | 0.0128 | 0.0040 | 0.0000 | |
| Normal | 0.0000 | 0.0000 | 0.1501 | 0.1290 | 0.3895 | 0.0018 | 0.3098 | 0.0028 | 0.0170 | 0.0000 | |
| Fuzzers | 0.0000 | 0.0000 | 0.0051 | 0.0192 | 0.0015 | 0.9717 | 0.0018 | 0.0002 | 0.0005 | 0.0000 | |
| Generic | 0.0000 | 0.0000 | 0.0029 | 0.0208 | 0.1168 | 0.0001 | 0.8570 | 0.0005 | 0.0019 | 0.0000 | |
| Reconnaissance | 0.0000 | 0.0000 | 0.0698 | 0.1287 | 0.0031 | 0.0003 | 0.0057 | 0.7898 | 0.0026 | 0.0000 | |
| Shellcode | 0.0000 | 0.0000 | 0.0608 | 0.2037 | 0.0847 | 0.0000 | 0.0635 | 0.0053 | 0.5820 | 0.0000 | |
| Worms | 0.0000 | 0.0000 | 0.0909 | 0.6818 | 0.0682 | 0.1136 | 0.0455 | 0.0000 | 0.0000 | 0.0000 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Mogollón-Gutiérrez, Ó.; Escudero García, D.; Sancho Núñez, J.C.; DeCastro-García, N. Reinforcement Learning for the Optimization of Adaptive Intrusion Detection Systems. Eng. Proc. 2026, 123, 2. https://doi.org/10.3390/engproc2026123002
Mogollón-Gutiérrez Ó, Escudero García D, Sancho Núñez JC, DeCastro-García N. Reinforcement Learning for the Optimization of Adaptive Intrusion Detection Systems. Engineering Proceedings. 2026; 123(1):2. https://doi.org/10.3390/engproc2026123002
Chicago/Turabian StyleMogollón-Gutiérrez, Óscar, David Escudero García, José Carlos Sancho Núñez, and Noemí DeCastro-García. 2026. "Reinforcement Learning for the Optimization of Adaptive Intrusion Detection Systems" Engineering Proceedings 123, no. 1: 2. https://doi.org/10.3390/engproc2026123002
APA StyleMogollón-Gutiérrez, Ó., Escudero García, D., Sancho Núñez, J. C., & DeCastro-García, N. (2026). Reinforcement Learning for the Optimization of Adaptive Intrusion Detection Systems. Engineering Proceedings, 123(1), 2. https://doi.org/10.3390/engproc2026123002

