Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection
Abstract
:1. Introduction
- We introduce a new generation of network intrusion detection methods that combine a Q-learning based reinforcement learning with a deep feed forward neural network method for network intrusion detection. Our proposed model is equipped with the ongoing auto-learning capability for a network environment it interacts with and can detect different types of network intrusions. Its self-learning capabilities allow our model to continuously enhance its detection capabilities.
- We provide intrinsic details of the best approaches involved in fine-tuning different hyperparameters of deep learning-based reinforcement learning methods (e.g., learning rates, discount factor) for more effective self-learning and interacting with the underlying network environment for more optimized network intrusion detection tasks.
- Our experimental results, based on the NSL-KDD dataset, demonstrate that our proposed DQL is highly effective in detecting different intrusion classes and outperforms other similar machine learning approaches, achieving more than 90% accuracy in the classification tasks involved in different network intrusion classes.
2. Related Work
3. Background
3.1. Reinforcement Learning
3.2. Feed Forward Neural Network
4. Dataset
5. Anomaly Detection Using Deep Q Learning
5.1. Deep Q-Networks
5.2. Deep R-Learning Concepts
5.2.1. Environment
5.2.2. Agent
5.2.3. States
5.2.4. Actions
5.2.5. Rewards
5.3. Deep Q-Learning Process
Algorithm 1. Deep Q-learning agent training based on NSL-KDD environment |
6. Evaluation of the DQL Model
6.1. Experiment Setup and Parameters
6.2. Performance Metrics
6.2.1. Accuracy
6.2.2. Precision
6.2.3. Recall
6.2.4. F1 Score
6.3. Performance Evaluation
6.4. Comparison with Other Approaches
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Stoecklin, M.P. Deeplocker: How AI Can Power a Stealthy New Breed of Malware. Secur. Intell. 2018, 8. Available online: https://securityintelligence.com/deeplocker-how-ai-can-power-a-stealthy-new-breed-of-malware/ (accessed on 9 February 2022).
- Hou, J.; Li, Q.; Cui, S.; Meng, S.; Zhang, S.; Ni, Z.; Tian, Y. Low-cohesion differential privacy protection for industrial Internet. J. Supercomput. 2020, 76, 8450–8472. [Google Scholar] [CrossRef]
- Brundage, M.; Avin, S.; Clark, J.; Toner, H.; Eckersley, P.; Garfinkel, B.; Dafoe, A.; Scharre, P.; Zeitzoff, T.; Filar, B.; et al. The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. arXiv 2018, arXiv:1802.07228. [Google Scholar]
- Bodeau, D.; Graubart, R. Cyber Resiliency Design Principles: Selective Use throughout the Lifecycle and in Conjunction with Related Disciplines; The MITRE Corporation: McClean, VA, USA, 2017; pp. 1–90. [Google Scholar]
- Toyoshima, K.; Oda, T.; Hirota, M.; Katayama, K.; Barolli, L. A DQN based mobile actor node control in WSAN: Simulation results of different distributions of events considering three-dimensional environment. In Proceedings of the International Conference on Emerging Internetworking, Data & Web Technologies, Kitakyushu, Japan, 24–26 February 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 197–209. [Google Scholar]
- Saito, N.; Oda, T.; Hirata, A.; Hirota, Y.; Hirota, M.; Katayama, K. Design and Implementation of a DQN Based AAV. In Proceedings of the International Conference on Broadband and Wireless Computing, Communication and Applications, Yonago, Japan, 28–30 October 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 321–329. [Google Scholar]
- Alavizadeh, H.; Hong, J.B.; Kim, D.S.; Jang-Jaccard, J. Evaluating the effectiveness of shuffle and redundancy mtd techniques in the cloud. Comput. Secur. 2021, 102, 102091. [Google Scholar] [CrossRef]
- Sethi, K.; Kumar, R.; Mohanty, D.; Bera, P. Robust Adaptive Cloud Intrusion Detection System Using Advanced Deep Reinforcement Learning. In Proceedings of the International Conference on Security, Privacy, and Applied Cryptography Engineering, Kolkata, India, 17–21 December 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 66–85. [Google Scholar]
- Sethi, K.; Kumar, R.; Prajapati, N.; Bera, P. Deep reinforcement learning based intrusion detection system for cloud infrastructure. In Proceedings of the 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS), Bangalore, India, 7–11 January 2020; pp. 1–6. [Google Scholar]
- Sethi, K.; Rupesh, E.S.; Kumar, R.; Bera, P.; Madhav, Y.V. A context-aware robust intrusion detection system: A reinforcement learning-based approach. Int. J. Inf. Secur. 2020, 19, 657–678. [Google Scholar] [CrossRef]
- Dang, Q.V.; Vo, T.H. Reinforcement learning for the problem of detecting intrusion in a computer system. In Proceedings of the Sixth International Congress on Information and Communication Technology, Online, 25–26 February 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 755–762. [Google Scholar]
- Cappart, Q.; Moisan, T.; Rousseau, L.M.; Prémont-Schwarz, I.; Cire, A. Combining reinforcement learning and constraint programming for combinatorial optimization. arXiv 2020, arXiv:2006.01610. [Google Scholar]
- Ma, X.; Shi, W. Aesmote: Adversarial reinforcement learning with smote for anomaly detection. IEEE Trans. Netw. Sci. Eng. 2020, 8, 943–956. [Google Scholar] [CrossRef]
- Lopez-Martin, M.; Carro, B.; Sanchez-Esguevillas, A. Application of deep reinforcement learning to intrusion detection for supervised problems. Expert Syst. Appl. 2020, 141, 112963. [Google Scholar] [CrossRef]
- Stefanova, Z.S.; Ramachandran, K.M. Off-Policy Q-learning Technique for Intrusion Response in Network Security. World Acad. Sci. Eng. Technol. Int. Sci. Index 2018, 136, 262–268. [Google Scholar]
- François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. arXiv 2018, arXiv:1811.12560. [Google Scholar]
- Hu, B.; Li, J. Shifting Deep Reinforcement Learning Algorithm towards Training Directly in Transient Real-World Environment: A Case Study in Powertrain Control. IEEE Trans. Ind. Inform. 2021, 17, 8198–8206. [Google Scholar] [CrossRef]
- Sethi, K.; Madhav, Y.V.; Kumar, R.; Bera, P. Attention based multi-agent intrusion detection systems using reinforcement learning. J. Inf. Secur. Appl. 2021, 61, 102923. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Reddi, V.J. Deep Reinforcement Learning for Cyber Security. arXiv 2019, arXiv:1906.05799. [Google Scholar] [CrossRef]
- Caminero, G.; Lopez-Martin, M.; Carro, B. Adversarial environment reinforcement learning algorithm for intrusion detection. Comput. Netw. 2019, 159, 96–109. [Google Scholar] [CrossRef]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
- Kolias, C.; Kambourakis, G.; Stavrou, A.; Gritzalis, S. Intrusion Detection in 802.11 Networks: Empirical Evaluation of Threats and a Public Dataset. IEEE Commun. Surv. Tutorials 2016, 18, 184–208. [Google Scholar] [CrossRef]
- Iannucci, S.; Barba, O.D.; Cardellini, V.; Banicescu, I. A performance evaluation of deep reinforcement learning for model-based intrusion response. In Proceedings of the 2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS* W), Umea, Sweden, 16–20 June 2019; pp. 158–163. [Google Scholar]
- Iannucci, S.; Cardellini, V.; Barba, O.D.; Banicescu, I. A hybrid model-free approach for the near-optimal intrusion response control of non-stationary systems. Future Gener. Comput. Syst. 2020, 109, 111–124. [Google Scholar] [CrossRef]
- Malialis, K.; Kudenko, D. Distributed response to network intrusions using multiagent reinforcement learning. Eng. Appl. Artif. Intell. 2015, 41, 270–284. [Google Scholar] [CrossRef]
- Holgado, P.; Villagrá, V.A.; Vazquez, L. Real-time multistep attack prediction based on hidden markov models. IEEE Trans. Dependable Secur. Comput. 2017, 17, 134–147. [Google Scholar] [CrossRef]
- Zhang, Z.; Naït-Abdesselam, F.; Ho, P.H.; Kadobayashi, Y. Toward cost-sensitive self-optimizing anomaly detection and response in autonomic networks. Comput. Secur. 2011, 30, 525–537. [Google Scholar] [CrossRef]
- Fessi, B.; Benabdallah, S.; Boudriga, N.; Hamdi, M. A multi-attribute decision model for intrusion response system. Inf. Sci. 2014, 270, 237–254. [Google Scholar] [CrossRef]
- Shi, W.C.; Sun, H.M. DeepBot: A time-based botnet detection with deep learning. Soft Comput. 2020, 24, 16605–16616. [Google Scholar] [CrossRef]
- Ganju, K.; Wang, Q.; Yang, W.; Gunter, C.A.; Borisov, N. Property inference attacks on fully connected neural networks using permutation invariant representations. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 619–633. [Google Scholar]
- da Costa, K.A.; Papa, J.P.; Lisboa, C.O.; Munoz, R.; de Albuquerque, V.H.C. Internet of Things: A survey on machine learning-based intrusion detection approaches. Comput. Netw. 2019, 151, 147–157. [Google Scholar] [CrossRef]
- Hassan, M.M.; Gumaei, A.; Alsanad, A.; Alrubaian, M.; Fortino, G. A hybrid deep learning model for efficient intrusion detection in big data environment. Inf. Sci. 2020, 513, 386–396. [Google Scholar] [CrossRef]
- Ibrahim, L.M.; Basheer, D.T.; Mahmod, M.S. A comparison study for intrusion database (Kdd99, Nsl-Kdd) based on self organization map (SOM) artificial neural network. J. Eng. Sci. Technol. 2013, 8, 107–119. [Google Scholar]
- Jiang, K.; Wang, W.; Wang, A.; Wu, H. Network intrusion detection combined hybrid sampling with deep hierarchical network. IEEE Access 2020, 8, 32464–32476. [Google Scholar] [CrossRef]
- Yang, K.; Liu, J.; Zhang, C.; Fang, Y. Adversarial examples against the deep learning based network intrusion detection systems. In Proceedings of the MILCOM 2018—2018 IEEE Military Communications Conference (MILCOM), Los Angeles, CA, USA, 29–31 October 2018; pp. 559–564. [Google Scholar]
- Alavizadeh, H.; Alavizadeh, H.; Kim, D.S.; Jang-Jaccard, J.; Torshiz, M.N. An automated security analysis framework and implementation for MTD techniques on cloud. In Proceedings of the International Conference on Information Security and Cryptology, Seoul, Korea, 4–6 December 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 150–164. [Google Scholar]
- Alavizadeh, H.; Alavizadeh, H.; Jang-Jaccard, J. Cyber situation awareness monitoring and proactive response for enterprises on the cloud. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December–1 January 2020; pp. 1276–1284. [Google Scholar]
- Zhu, J.; Jang-Jaccard, J.; Watters, P.A. Multi-Loss Siamese Neural Network with Batch Normalization Layer for Malware Detection. IEEE Access 2020, 8, 171542–171550. [Google Scholar] [CrossRef]
- McIntosh, T.; Jang-Jaccard, J.; Watters, P.; Susnjak, T. The inadequacy of entropy-based ransomware detection. In Proceedings of the International Conference on Neural Information Processing, Sydney, Australia, 12–15 December 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 181–189. [Google Scholar]
- Wei, Y.; Jang-Jaccard, J.; Sabrina, F.; Alavizadeh, H. Large-Scale Outlier Detection for Low-Cost PM10 Sensors. IEEE Access 2020, 8, 229033–229042. [Google Scholar] [CrossRef]
Categories | Notation | Definitions | Samples # |
---|---|---|---|
Normal | N | Normal activities based on the features | 148,517 |
DoS | D | Attacker tries to avoid users of a service Denial of Service attack | 53,385 |
Probe | P | Attacker tries to scan the target network to collect information such as vulnerabilities | 14,077 |
U2R | U | Attackers with local access to victim’s machine tries to get user privileges | 119 |
R2L | P | Attacker without a local account tries to send packets to the target host to get access | 3882 |
F# | Feature Name | F# | Feature Name | F# | Feature Name |
---|---|---|---|---|---|
F1 | Duration | F15 | Su attempted | F29 | Same srv rate |
F2 | Protocol_type | F16 | Num root | F30 | Diff srv rate |
F3 | Service | F17 | Num file creation | F31 | Srv diff host rate |
F4 | Flag | F18 | Num shells | F32 | Dst host count |
F5 | Src bytes | F19 | Num access files | F33 | Dst host srv count |
F6 | Dst bytes | F20 | Num outbound cmds | F34 | Dst host same srv rate |
F7 | Land | F21 | Is host login | F35 | Dst host diff srv rate |
F8 | Wrong fragment | F22 | Is guest login | F36 | Dst host same srv port rate |
F9 | Urgent | F23 | Count | F37 | Dst host srv fiff host rate |
F10 | Hot | F24 | Srv count | F38 | Dst host serror rate |
F11 | Num_failed_logins | F25 | Serror rate | F39 | Dst host srv serror rate |
F12 | Logged_in | F26 | Srv serror rate | F40 | Dst host rerror rate |
F13 | Num compromised | F27 | Rerror rate | F41 | Dst host srv rerror rate |
F14 | Root shell | F28 | Srv rerror rate | F42 | Class label |
Parameters | Description | Values |
---|---|---|
num-episode | Number of episodes to train DQN | 200 |
num-iteration | Number of iteration to improve Q-values in DQN | 100 |
hidden_layers | Number of hidden layers: Setting weights, producing outputs, based on activation function | 2 |
num_units | number of hidden unit to improve the quality of prediction and training | |
Initial weight value | Normal Initialization | Normal |
Activation function | Non-linear activation function | ReLU |
Epsilon | Degree of randomness for performing actions | 0.9 |
Decoy rate | Reducing the randomness probability for each iteration | 0.99 |
Gamma | Discount factor for target prediction | 0.001 |
Batch-size (bs) | A batch of records NSL-KDD dataset fetched for processing | 500 |
Metric | Discount Factors | ||
---|---|---|---|
Precision | 0.7784 | 0.6812 | 0.6731 |
Recall | 0.7676 | 0.7466 | 0.758 |
F1 score | 0.8141 | 0.7063 | 0.6911 |
Accuracy | 0.7807 | 0.7473 | 0.7578 |
Metric | Attack Categories | |||
---|---|---|---|---|
Normal | DoS | Probe | R2L | |
Accuracy | 0.8094 | 0.9247 | 0.9463 | 0.8848 |
F1 score | 0.8084 | 0.9237 | 0.9449 | 0.8370 |
Precision | 0.8552 | 0.9249 | 0.9441 | 0.8974 |
Recall | 0.8093 | 0.83 | 0.9247 | 0.8848 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alavizadeh, H.; Alavizadeh, H.; Jang-Jaccard, J. Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection. Computers 2022, 11, 41. https://doi.org/10.3390/computers11030041
Alavizadeh H, Alavizadeh H, Jang-Jaccard J. Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection. Computers. 2022; 11(3):41. https://doi.org/10.3390/computers11030041
Chicago/Turabian StyleAlavizadeh, Hooman, Hootan Alavizadeh, and Julian Jang-Jaccard. 2022. "Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection" Computers 11, no. 3: 41. https://doi.org/10.3390/computers11030041
APA StyleAlavizadeh, H., Alavizadeh, H., & Jang-Jaccard, J. (2022). Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection. Computers, 11(3), 41. https://doi.org/10.3390/computers11030041