BAE: Anomaly Detection Algorithm Based on Clustering and Autoencoder
Abstract
:1. Introduction
2. Application of Algorithm
3. Related Works
4. Methods
4.1. Overall Framework of Proposed Method
4.2. BIRCH for Data Pre-Classification
4.3. Autoencoder for Anomaly Detection
5. Experimental Analysis
5.1. Datasets
- UNSW-NB15: The original network packet of the UNSW-NB15 dataset is a public security dataset, including normal network traffic data and network attack data created in the network-wide laboratory of the Australian Cyber Security Centre (ACCS) using the IXIA PerfectStorm tool.
- CICIDS 2017: the CICIDS 2017 dataset is a network security dataset released by the Canadian Institute for Cybersecurity (CIC) in 2018. It collects data of various network attacks through the mounted network terminal.
- NSL-KDD: NSL-KDD is a suggested dataset to solve some inherent problems of the KDDCUP99 dataset.
5.2. Experiments
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Engly, A.H.; Larsen, A.R.; Meng, W. Evaluation of Anomaly-Based Intrusion Detection with Combined Imbalance Correction and Feature Selection. In Proceedings of the International Conference on Network and System Security, Melbourne, Australia, 25–27 November 2020; pp. 277–291. [Google Scholar]
- Hussain, A.; Heidemann, J.; Papadopoulos, C. A framework for classifying denial of service attacks. In Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Karlsruhe, Germany, 25–29 August 2003; pp. 99–110. [Google Scholar]
- Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.-A.; Bottou, L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
- Al-Qatf, M.; Lasheng, Y.; Al-Habib, M.; Al-Sabahi, K. Deep learning approach combining sparse autoencoder with SVM for network intrusion detection. IEEE Access 2018, 6, 52843–52856. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114 2013. [Google Scholar]
- Masci, J.; Meier, U.; Cireşan, D.; Schmidhuber, J. Stacked convolutional auto-encoders for hierarchical feature extraction. In Lecture Notes in Computer Science, Proceedings of International Conference on Artificial Neural Networks, Espoo, Finland, 14–17 June 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 52–59. [Google Scholar]
- Park, S.; Seo, S.; Kim, J. Network intrusion detection using stacked denoising autoencoder. Adv. Sci. Lett. 2017, 23, 9907–9911. [Google Scholar] [CrossRef]
- Chen, Z.; Yeo, C.K.; Lee, B.S.; Lau, C.T. Autoencoder-based network anomaly detection. In Proceedings of the 2018 Wireless Telecommunications Symposium (WTS), Phoenix, AZ, USA, 17–20 April 2018; pp. 1–5. [Google Scholar]
- Shone, N.; Ngoc, T.N.; Phai, V.D.; Shi, Q. A deep learning approach to network intrusion detection. IEEE Trans. Emerg. Top. Comput. Intell. 2018, 2, 41–50. [Google Scholar] [CrossRef] [Green Version]
- Lakhina, A.; Crovella, M.; Diot, C. Diagnosing network-wide traffic anomalies. ACM SIGCOMM Comput. Commun. Rev. 2004, 34, 219–230. [Google Scholar] [CrossRef] [Green Version]
- Yang, S.; Zhang, R.; Nie, F.; Li, X. Unsupervised feature selection based on reconstruction error minimization. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 2107–2111. [Google Scholar]
- Zhu, Q.-H.; Yang, Y.-B. Subspace clustering via seeking neighbors with minimum reconstruction error. Pattern Recognit. Lett. 2018, 115, 66–73. [Google Scholar] [CrossRef]
- Auskalnis, J.; Paulauskas, N.; Baskys, A. Application of local outlier factor algorithm to detect anomalies in computer network. Elektron. Elektrotechnika 2018, 24, 96–99. [Google Scholar] [CrossRef] [Green Version]
- Shen, X.; Zhang, J. Research of intrusion detection based on the BP networks and the improved PSO algorithm. Comput. Eng. Sci. 2010, 32, 34–36. [Google Scholar]
- Li, Y.; Qiu, R.; Jing, S. Intrusion detection system using Online Sequence Extreme Learning Machine (OS-ELM) in advanced metering infrastructure of smart grid. PLoS ONE 2018, 13, e0192216. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shaikh, R.A.; Shashikala, S. An Autoencoder and LSTM based Intrusion Detection approach against Denial of service attacks. In Proceedings of the 2019 1st International Conference on Advances in Information Technology (ICAIT), Chikmagalur, India, 24–27 July 2019; pp. 406–410. [Google Scholar]
- Fan, H.; Zhang, F.; Li, Z. AnomalyDAE: Dual autoencoder for anomaly detection on attributed networks. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 5685–5689. [Google Scholar]
- Li, W.; Meng, W.; Kwok, L.-F.; Horace, H. Enhancing collaborative intrusion detection networks against insider attacks using supervised intrusion sensitivity-based trust management model. J. Netw. Comput. Appl. 2017, 77, 135–145. [Google Scholar] [CrossRef]
- Azzalini, D.; Bonali, L.; Amigoni, F. A Minimally Supervised Approach Based on Variational Autoencoders for Anomaly Detection in Autonomous Robots. IEEE Robot. Autom. Lett. 2021, 6, 2985–2992. [Google Scholar] [CrossRef]
- Kolberg, J.; Grimmer, M.; Gomez-Barrero, M.; Busch, C. Anomaly detection with convolutional autoencoders for fingerprint presentation attack detection. IEEE Trans. Biom. Behav. Identity Sci. 2021, 3, 190–202. [Google Scholar] [CrossRef]
- Tan, Z.; Jamdagni, A.; He, X.; Nanda, P.; Liu, R.P. A System for Denial-of-Service Attack Detection Based on Multivariate Correlation Analysis. IEEE Trans. Parallel Distrib. Syst. 2014, 25, 447–456. [Google Scholar] [CrossRef]
- Maxion, R.A.; Tan, K.M. Benchmarking anomaly-based detection systems. In Proceedings of the International Conference on Dependable Systems and Networks, DSN, New York, NY, USA, 25–28 June 2000; pp. 623–630. [Google Scholar]
- Zhang, T.; Ramakrishnan, R.; Livny, M. BIRCH: An efficient data clustering method for very large databases. ACM Sigmod Rec. 1996, 25, 103–114. [Google Scholar] [CrossRef]
- Siddique, K.; Akhtar, Z.; Khan, F.A.; Kim, Y. KDD Cup 99 data sets: A perspective on the role of data sets in network intrusion detection research. Computer 2019, 52, 41–51. [Google Scholar] [CrossRef]
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar]
- Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the ICISSP, Funchal, Madeira, Portugal, 22–24 January 2018; pp. 108–116. [Google Scholar]
- Revathi, S.; Malathi, A. A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection. Int. J. Eng. Res. Technol. 2013, 2, 1848–1853. [Google Scholar]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the IEEE International Conference on Computational Intelligence for Security & Defense Applications 2009, Ottawa, ON, Canada, 8–10 July 2009. [Google Scholar]
- Datahub. Kddcup99. Available online: https://datahub.io/machine-learning/kddcup99 (accessed on 18 July 2023).
- Ihsan, Z.; Idris, M.Y.; Abdullah, A.H. Attribute normalization techniques and performance of intrusion classifiers: A comparative analysis. Life Sci. J. 2013, 10, 2568–2576. [Google Scholar]
- Davies, D.L.; Bouldin, D.W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 2, 224–227. [Google Scholar] [CrossRef]
Methods | Accuracy (%) | Recall (%) | Precision (%) | F-Score (%) |
---|---|---|---|---|
Logistic Regression | 93.892 | 88.204 | 96.884 | 92.340 |
SVM | 92.472 | 99.897 | 84.781 | 91.72 |
Decision Tree | 95.302 | 99.929 | 89.933 | 94.668 |
Autoencoder | 81.140 | 76.490 | 100.00 | 86.679 |
BAE (with label:0) | 99.329 | 99.245 | 100.00 | 99.621 |
BAE (with label:1) | 93.207 | 93.205 | 100.00 | 96.483 |
BAE (with label:2) | 94.590 | 94.572 | 100.00 | 97.210 |
BAE (with label:3) | 96.705 | 86.277 | 97.000 | 91.324 |
BAE (average) | 95.958 | 93.325 | 99.250 | 96.160 |
Methods | Accuracy (%) | Recall (%) | Precision (%) | F-Score (%) |
---|---|---|---|---|
Logistic Regression | 89.516 | 80.438 | 62.031 | 70.046 |
SVM | 90.755 | 70.117 | 69.492 | 69.803 |
Decision Tree | 89.708 | 85.419 | 61.73 | 71.668 |
Auto-encoder | 82.204 | 100.00 | 45.665 | 62.698 |
BAE (with label:0) | 88.603 | 100.00 | 40.636 | 57.789 |
BAE (with label:1) | 90.002 | 100.00 | 81.985 | 90.101 |
BAE (with label:2) | 84.252 | 91.401 | 100.00 | 95.507 |
BAE (average) | 87.619 | 97.137 | 74.207 | 81.132 |
Methods | Accuracy (%) | Recall (%) | Precision (%) | F-Score (%) |
---|---|---|---|---|
Logistic Regression | 89.888 | 95.499 | 88.204 | 91.707 |
SVM | 88.312 | 89.537 | 90.406 | 89.969 |
Decision Tree | 90.254 | 85.276 | 97.194 | 91.107 |
Auto-encoder | 89.848 | 83.236 | 91.541 | 87.191 |
BAE (with label:0) | 89.716 | 88.260 | 99.996 | 93.762 |
BAE (with label:1) | 95.171 | 86.702 | 93.269 | 89.866 |
BAE (with label:2) | 92.854 | 83.911 | 98.551 | 90.643 |
BAE (average) | 92.580 | 86.291 | 97.272 | 91.424 |
Methods | Accuracy (%) | Recall (%) | Precision (%) | F-Score (%) |
---|---|---|---|---|
Logistic Regression | 84.428 | 100.0 | 78.473 | 87.938 |
SVM | 81.915 | 69.860 | 97.599 | 81.432 |
Decision Tree | 92.932 | 89.494 | 97.872 | 93.496 |
Auto-encoder | 74.421 | 83.864 | 64.486 | 72.910 |
BAE (with label:0) | 95.160 | 96.561 | 98.457 | 97.499 |
BAE (with label:1) | 86.838 | 82.958 | 100.0 | 90.685 |
BAE (with label:2) | 81.644 | 84.639 | 70.977 | 77.208 |
BAE (average) | 87.880 | 88.053 | 89.811 | 88.464 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, D.; Nie, M.; Chen, D. BAE: Anomaly Detection Algorithm Based on Clustering and Autoencoder. Mathematics 2023, 11, 3398. https://doi.org/10.3390/math11153398
Wang D, Nie M, Chen D. BAE: Anomaly Detection Algorithm Based on Clustering and Autoencoder. Mathematics. 2023; 11(15):3398. https://doi.org/10.3390/math11153398
Chicago/Turabian StyleWang, Dongqi, Mingshuo Nie, and Dongming Chen. 2023. "BAE: Anomaly Detection Algorithm Based on Clustering and Autoencoder" Mathematics 11, no. 15: 3398. https://doi.org/10.3390/math11153398
APA StyleWang, D., Nie, M., & Chen, D. (2023). BAE: Anomaly Detection Algorithm Based on Clustering and Autoencoder. Mathematics, 11(15), 3398. https://doi.org/10.3390/math11153398