Explaining Intrusion Detection-Based Convolutional Neural Networks Using Shapley Additive Explanations (SHAP)
Abstract
:1. Introduction
1.1. Explainable Artificial Intelligence
1.2. The Main Contributions of the Study
- Studying the XAI results for multiple ML and AI models in intrusion detection applications using the “KDD 99” and “Distilled Kitsune-2018” datasets.
- Interpreting and digging into the XAI results to understand which dataset features are more useful to an ML model based on the feature’s KDE characteristics.
- Present a methodology that can be used before building an ML model, which helps select the proper ML model for a certain dataset based on its KDE plot.
- Present a methodology to select the most important features of an AI model before applying XAI analysis to the ML model’s results.
2. Related Work
3. Proposed XAI Mechanism
4. Results and Discussion
4.1. Dataset Classification
4.2. Performance of ML Classifiers Using Different Datasets
4.3. Explaining XAI Results for the Models
5. Conclusions and Future Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Abu Al-Haija, Q.; Al-Badawi, A. Attack-Aware IoT Network Traffic Routing Leveraging Ensemble Learning. Sensors 2022, 22, 241. [Google Scholar] [CrossRef] [PubMed]
- Ring, M.; Wunderlich, S.; Scheuring, D.; Landes, D.; Hotho, A. A Survey of Network-Based Intrusion Detection Data Sets, Computers & Security; Elsevier: Amsterdam, The Netherlands, 2019; Volume 86, pp. 147–167. [Google Scholar]
- Le, T.-T.-H.; Kim, H.; Kang, H.; Kim, H. Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method. Sensors 2022, 22, 1154. [Google Scholar] [CrossRef]
- Mahbooba, B.; Timilsina, M.; Sahal, R.; Serrano, M. Explainable artificial intelligence (XAI) to enhance trust management in intrusion detection systems using decision tree model. Complexity 2021, 2021, 6634811. [Google Scholar] [CrossRef]
- Srinivasu, P.N.; Sandhya, N.; Jhaveri, R.H.; Raut, R. From Blackbox to Explainable AI in Healthcare: Existing Tools and Case Studies. Mob. Inform. Syst. 2022, 2022, 8167821. [Google Scholar] [CrossRef]
- Abir, W.H.; Uddin, M.; Khanam, F.R.; Tazin, T.; Khan, M.M.; Masud, M.; Aljahdali, S. Explainable AI in Diagnosing and Anticipating Leukemia Using Transfer Learning Method. Comput. Intell. Neurosci. 2022, 2022, 5140148. [Google Scholar] [CrossRef] [PubMed]
- Dieber, J.; Sabrina, K. Why model why? Assessing the strengths and limitations of LIME. arXiv 2020, arXiv:2012.00093. [Google Scholar]
- Neupane, S.; Ables, J.; Anderson, W.; Mittal, S.; Rahimi, S.; Banicescu, I.; Seale, M. Explainable Intrusion Detection Systems (X-IDS): A Survey of Current Methods, Challenges, and Opportunities. arXiv 2022, arXiv:2207.06236. [Google Scholar]
- Islam, S.R.; Eberle, W.; Ghafoor, S.K.; Ahmed, M. Explainable artificial intelligence approaches: A survey. arXiv 2021, arXiv:2101.09429. [Google Scholar]
- Alahmed, S.; Alasad, Q.; Hammood, M.M.; Yuan, J.-S.; Alawad, M. Mitigation of Black-Box Attacks on Intrusion Detection Systems-Based ML. Computers 2022, 11, 115. [Google Scholar] [CrossRef]
- Gramegna, A.; Giudici, P. SHAP and LIME: An evaluation of discriminative power in credit risk. Front. Artif. Intell. 2021, 4, 752558. [Google Scholar] [CrossRef]
- Jesus, S.; Belém, C.; Balayan, V.; Bento, J.; Saleiro, P.; Bizarro, P.; Gama, J. How can I choose an explainer? An application-grounded evaluation of post-hoc explanations. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event Canada, 3–10 March 2021; pp. 805–815. [Google Scholar]
- Zhang, C.A.; Cho, S.; Vasarhelyi, M. Explainable Artificial Intelligence (XAI) in auditing. Int. J. Account. Inf. Syst. 2022, 46, 100572. [Google Scholar] [CrossRef]
- Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stumpf, S.; Yang, G.Z. XAI—Explainable artificial intelligence. Sci. Robot. 2019, 4, eaay7120. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lundberg, S. An Introduction to Explainable AI with Shapley Values. Revision 45b85c18. 2018. Available online: https://shap.readthedocs.io/en/latest/overviews.html (accessed on 1 June 2022).
- Ribeiro, M.T. Local Interpretable Model-Agnostic Explanations (Lime). Revision 533368b7. 2016. Available online: https://lime-ml.readthedocs.io/en/latest/ (accessed on 22 May 2022).
- Ahmed, I.; Kumara, I.; Reshadat, V.; Kayes, A.S.M.; van den Heuvel, W.-J.; Tamburri, D.A. Travel Time Prediction and Explanation with Spatio-Temporal Features: A Comparative Study. Electronics 2022, 11, 106. [Google Scholar] [CrossRef]
- Velmurugan, M.; Ouyang, C.; Moreira, C.; Sindhgatta, R. Evaluating Fidelity of Explainable Methods for Predictive Process Analytics. In Intelligent Information Systems; Nurcan, S., Korthaus, A., Eds.; Springer: Cham, Switzerland, 2021; Volume 424. [Google Scholar] [CrossRef]
- Kumara, I.; Ariz, M.H.; Chhetri, M.B.; Mohammadi, M.; van Den Heuvel, W.-J.; Tamburri, D.A. FOCloud: Feature Model Guided Performance Prediction and Explanation for Deployment Configurable Cloud Applications. In Proceedings of the 2022 IEEE World Congress on Services (SERVICES), Barcelona, Spain, 10–16 July 2022. [Google Scholar] [CrossRef]
- Roberts, C.V.; Ehtsham, E.; Ashok, C. On the Bias-Variance Characteristics of LIME and SHAP in High Sparsity Movie Recommendation Explanation Tasks. arXiv 2022, arXiv:2206.04784. [Google Scholar]
- Panati, C.; Wagner, S.; Brüggenwirth, S. Feature Relevance Evaluation using Grad-CAM, LIME and SHAP for Deep Learning SAR Data Classification. In Proceedings of the 2022 23rd International Radar Symposium (IRS), Gdansk, Poland, 12–14 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 457–462. [Google Scholar]
- Brent, M.; Chris, R.; Sandra, W. Explaining Explanations in AI. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* ’19), Atlanta, GA, USA, 29–31 January 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 279–288. [Google Scholar] [CrossRef]
- Páez, A. The Pragmatic Turn in Explainable Artificial Intelligence (XAI). Minds Mach. 2019, 29, 441–459. [Google Scholar] [CrossRef] [Green Version]
- de Bruijn, H.; Warnier, M.; Janssen, M. The perils and pitfalls of explainable AI: Strategies for explaining algorithmic decision-making. Gov. Inf. Q. 2022, 39, 101666. [Google Scholar] [CrossRef]
- Houda, Z.A.E.; Brik, B.; Khoukhi, L. Why Should I Trust Your IDS?: An Explainable Deep Learning Framework for Intrusion Detection Systems in the Internet of Things Networks. IEEE Open J. Commun. Soc. 2022, 3, 1164–1176. [Google Scholar] [CrossRef]
- O’Kane, P.; Sezer, S.; McLaughlin, K.; Im, E.G. SVM Training Phase Reduction Using Dataset Feature Filtering for Malware Detection. IEEE Trans. Inf. Forensics Secur. 2013, 8, 500–509. [Google Scholar] [CrossRef] [Green Version]
- Itani, S.; Lecron, F.; Fortemps, P. A one-class classification decision tree based on kernel density estimation. Appl. Soft Comput. 2020, 91, 106250. [Google Scholar] [CrossRef] [Green Version]
- Zebin, T.; Rezvy, S.; Luo, Y. An Explainable AI-Based Intrusion Detection System for DNS over HTTPS (DoH) Attacks. IEEE Trans. Inf. Forensics Secur. 2022, 17, 2339–2349. [Google Scholar] [CrossRef]
- Syed, W.; Irfan, K. Explainable signature-based machine learning approach for identification of faults in grid-connected photovoltaic systems. arXiv 2021, arXiv:2112.14842. [Google Scholar]
- Michalopoulos, P. Comparing Explanations for Black-Box Intrusion Detection Systems. Master’s Thesis, Mathematics and Computer Science Department, Eindhoven University of Technology, Eindhoven, The Netherlands, 24 January 2020. [Google Scholar]
- Schlegel, U.; Arnout, H.; El-Assady, M.; Oelke, D.; Keim, D.A. Towards a rigorous evaluation of Xai methods on time series. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019; IEEE: Piscataway, NJ, USA; pp. 4197–4201. [Google Scholar]
- Durán, J.M.; Jongsma, K.R. Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical AI. J. Med. Ethics 2021, 47, 329–335. [Google Scholar] [CrossRef]
- Khedkar, S.P.; Ramalingam, A.C. Classification and Analysis of Malicious Traffic with Multi-layer Perceptron Model. Ingénierie Syst. d’Inf. 2021, 26, 303–310. [Google Scholar] [CrossRef]
- Abuomar, O.; Sogbe, P. Classification and Detection of Chronic Kidney Disease (CKD) Using Machine Learning Algorithms. In Proceedings of the 2021 International Conference on Electrical, Computer and Energy Technologies (ICECET), Cape Town, South Africa, 9–10 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Hasan, M.J.; Sohaib, M.; Kim, J.M. An Explainable AI-Based Fault Diagnosis Model for Bearings. Sensors 2021, 21, 4070. [Google Scholar] [CrossRef]
- Shraddha, M.; Dattaraj, R. Explaining Network Intrusion Detection System Using Explainable AI Framework. arXiv 2021, arXiv:2103.07110. [Google Scholar]
- Dang, Q.-V. Improving the performance of the intrusion detection systems by the machine learning explainability. Int. J. Web Inf. Syst. 2021, 17, 537–555. [Google Scholar] [CrossRef]
- Devarakonda, A.; Sharma, N.; Saha, P.; Ramya, S. Network intrusion detection: A comparative study of four classifiers using the NSL-KDD and KDD’99 datasets. J. Physics: Conf. Ser. 2022, 2161, 012043. [Google Scholar] [CrossRef]
- Zhang, C.; Jia, D.; Wang, L.; Wang, W.; Liu, F.; Yang, A. Comparative Research on Network Intrusion Detection Methods Based on Machine Learning. Comput. Secur. 2022, 121, 102861. [Google Scholar] [CrossRef]
- Abu Al-Haija, Q.; Zein-Sabatto, S. An Efficient Deep-Learning-Based Detection and Classification System for Cyber-Attacks in IoT Communication Networks. Electronics 2020, 9, 2152. [Google Scholar] [CrossRef]
- Sathianarayanan, B.; Singh Samant, Y.C.; Conjeepuram Guruprasad, P.S.; Hariharan, V.B.; Manickam, N.D. Feature-based augmentation and classification for tabular data. CAAI Trans. Intell. Technol. 2022, 7, 481–491. [Google Scholar] [CrossRef]
- Ahsan, H. A Study on How Data Quality Influences Machine Learning Predictability and Interpretability for Tabular Data. Ph.D. Dissertation, Youngstown State University, Youngstown, OH, USA, 2022. [Google Scholar]
- Montavon, G.; Kauffmann, J.; Samek, W.; Müller, K.R. Explaining the Predictions of Unsupervised Learning Models. In xxAI—Beyond Explainable AI; Holzinger, A., Goebel, R., Fong, R., Moon, T., Müller, K.R., Samek, W., Eds.; Springer: Cham, Switzerland, 2022; Volume 13200. [Google Scholar] [CrossRef]
- Patil, S.; Varadarajan, V.; Mazhar, S.M.; Sahibzada, A.; Ahmed, N.; Sinha, O.; Kumar, S.; Shaw, K.; Kotecha, K. Explainable Artificial Intelligence for Intrusion Detection System. Electronics 2022, 11, 3079. [Google Scholar] [CrossRef]
- Hussein, M.A. Performance Analysis of different Machine Learning Models for Intrusion Detection Systems. J. Eng. 2022, 28, 61–91. [Google Scholar]
- Rawat, S.; Srinivasan, A.; Ravi, V.; Ghosh, U. Intrusion detection systems using classical machine learning techniques vs. integrated unsupervised feature learning and deep neural network. Int. Technol. Lett. 2022, 5, e232. [Google Scholar] [CrossRef]
- Bertoli, G.D.C.; Junior, L.A.P.; Saotome, O.; Dos Santos, A.L.; Verri, F.A.N.; Marcondes, C.A.C.; Barbieri, S.; Rodrigues, M.S.; De Oliveira, J.M.P. An End-to-End Framework for Machine Learning-Based Network Intrusion Detection System. IEEE Access 2021, 9, 106790–106805. [Google Scholar] [CrossRef]
- Mahbooba, B.; Sahal, R.; Alosaimi, W.; Serrano, M. Trust in intrusion detection systems: An investigation of performance analysis for machine learning and deep learning models. Complexity 2021, 2021, 5538896. [Google Scholar] [CrossRef]
- Yahalom, R.; Steren, A.; Nameri, Y.; Roytman, M. Small Versions of the Extracted Features Datasets for 9 Attacks on IP Camera and IoT Networks Generated by Mirskey et al., Mendeley Data. 2018. Available online: https://data.mendeley.com/datasets/zvsk3k9cf2/1 (accessed on 1 December 2021).
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities, and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef] [Green Version]
- Das, A.; Paul, R. Opportunities and challenges in explainable artificial intelligence (XAI): A survey. arXiv 2020, arXiv:2006.11371. [Google Scholar]
- Adadi, A.; Mohammed, B. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
- Hoffman, R.R.; Klein, G.; Mueller, S.T. Explaining explanation for “explainable AI”. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2018, 62, 197–201. [Google Scholar] [CrossRef]
Study | XAI Tool | DataSet | Main Findings of the Study |
---|---|---|---|
[27] | None | Six different medical datasets | They presented an approach that uses kernel density estimation to split a data subset based on one or several intervals of interest to build the one-class classification model. |
[30] | LIME, SHAP, Anchors, and LORE | SimpleWeb dataset and ISCX IDS 2012 | They explored the feasibility of applying different XAI techniques in the intrusion detection domain. They used the results from the XAI systems to create a white-box ML model. |
[36] | LIME, SHAP, and Others | NSL-KDD | They used a deep neural network for network intrusion detection and also proposed an XAI framework to make the stages of the ML pipeline more interpretable. |
[26] | - | The dataset is created by representing opcode density histograms. | They proved that a subset of opcodes could be used to detect malware, and applying a filter to the features can reduce the SVM training phase. |
[31] | SHAP, DeepLIFT, LRP, and Saliency Maps | FordA, FordB, ElectricDevices, and seven other relevant datasets | They showed that XAI methods with images and text work on time series data by specifying relevance to time points. |
[4] | Decision Tree importance algorithm | KDD | They used the XAI concept to improve the decision tree model in the area of IDS. They improved simple decision tree algorithms that mimic a human decision-making approach. |
[3] | SHAP | AWID-CLS-R | A two-stage classification model was proposed to detect intrusions in a Wi-Fi network. The first stage uses all the dataset features to make predictions, while the second stage uses the most important features in stage one. |
Feature Name | Type | Dataset |
---|---|---|
duration | continuous | DS1 |
protocol_type | symbolic | DS1 |
Service | symbolic | DS1 |
src_bytes | continuous | DS1 |
dst_bytes | continuous | DS1 |
Flag | symbolic | DS2 |
Land | symbolic | DS1 |
wrong_fragment | continuous | DS1 |
Urgent | continuous | DS1 |
Hot | continuous | DS1 |
num_failed_logins | continuous | DS2 |
logged_in | symbolic | DS2 |
num_compromised | continuous | DS1 |
root_shell | continuous | DS1 |
su_attempted | continuous | DS1 |
num_root | continuous | DS1 |
num_file_creations | continuous | DS1 |
num_shells | continuous | DS1 |
num_access_files | continuous | DS1 |
num_outbound_cmds | continuous | DS1 |
is_hot_login | symbolic | DS1 |
is_guest_login | symbolic | DS1 |
Count | continuous | DS2 |
srv_count | continuous | DS2 |
serror_rate | continuous | DS2 |
srv_serror_rate | continuous | DS2 |
rerror_rate | continuous | DS2 |
srv_rerror_rate | continuous | DS2 |
same_srv_rate | continuous | DS2 |
diff_srv_rate | continuous | DS2 |
srv_diff_host_rate | continuous | DS2 |
dst_host_count | continuous | DS2 |
dst_host_srv_count | continuous | DS2 |
dst_host_same_srv_rate | continuous | DS2 |
dst_host_diff_srv_rate | continuous | DS1 |
dst_host_same_src_port_rate | continuous | DS2 |
dst_host_srv_diff_host_rate | continuous | DS2 |
dst_host_serror_rate | continuous | DS2 |
dst_host_srv_serror_rate | continuous | DS2 |
dst_host_rerror_rate | continuous | DS2 |
dst_host_srv_rerror_rate | continuous | DS2 |
Model | Can Be Used with Tabular Data | Can Be Explained with SHAP | Execution Time |
---|---|---|---|
Random Forest | YES | YES | Relatively slow |
Decision tree | YES | YES | Relatively slow |
Naïve Bayes | YES | YES | Fast |
Logistic Regression | YES | YES | Fast |
CNN | YES | YES | Depends on the architecture used |
Mirai | Syn DoS | Video Injection | |
---|---|---|---|
Features | 116 | 115 | 115 |
Dense (DS1) | 76 | 23 | 17 |
Relaxed (DS2) | 40 | 23 | 17 |
Reference | The Contribution to Assessing and Evaluating XAI |
---|---|
[51] | They explored why XAI is important and categorized XAI methods in their scope, methodology, usage, and nature. The study focused on explaining deep neural network algorithms. |
[52] | Evaluated trending XAI methods and showed how these methods show the internal layer’s content of ML models. |
[2] | Answered the question of what we want from XAI models and answered the question that these models can be more trusted with the aid of XAI. |
[22] | Analyzed the information provided by different XAI methods and discussed some inabilities in the current XAI methods. |
[53] | Presented a scale to evaluate XIA model explanations in human–machine work systems. |
[23] | Discussed some inabilities in the current XAI methods. |
[24] | Presented seven strategies that might aid in trusting XAI. |
Present Study | Presents a method to explain SHAP results for different ML models based on the KDE plots of the features’ data. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Younisse, R.; Ahmad, A.; Abu Al-Haija, Q. Explaining Intrusion Detection-Based Convolutional Neural Networks Using Shapley Additive Explanations (SHAP). Big Data Cogn. Comput. 2022, 6, 126. https://doi.org/10.3390/bdcc6040126
Younisse R, Ahmad A, Abu Al-Haija Q. Explaining Intrusion Detection-Based Convolutional Neural Networks Using Shapley Additive Explanations (SHAP). Big Data and Cognitive Computing. 2022; 6(4):126. https://doi.org/10.3390/bdcc6040126
Chicago/Turabian StyleYounisse, Remah, Ashraf Ahmad, and Qasem Abu Al-Haija. 2022. "Explaining Intrusion Detection-Based Convolutional Neural Networks Using Shapley Additive Explanations (SHAP)" Big Data and Cognitive Computing 6, no. 4: 126. https://doi.org/10.3390/bdcc6040126
APA StyleYounisse, R., Ahmad, A., & Abu Al-Haija, Q. (2022). Explaining Intrusion Detection-Based Convolutional Neural Networks Using Shapley Additive Explanations (SHAP). Big Data and Cognitive Computing, 6(4), 126. https://doi.org/10.3390/bdcc6040126