Improvement of Distributed Denial of Service Attack Detection through Machine Learning and Data Processing
Abstract
:1. Introduction
- In machine learning, data processing is considered essential to achieve good results using any machine learning model. However, in some studies, the data preprocessing was not detailed, or outliers were not taken into account [16,18,19]. In other studies [1,13,17,20,22], it was not specified how many records were affected after processing the outliers.
- A crucial aspect when working with machine learning algorithms is the appropriate selection of the hyperparameters that a model should use. In many studies, this selection was performed arbitrarily [20,21,22]. In other cases, the hyperparameters used were not justified, making implementation difficult, or only the author only mention the algorithm used without specifying its hyperparameters [17].
- The distribution of data between training and validation sets is a critical factor in evaluating algorithm performance. However, in [18], the proportion of data used in each group was not specified. Additionally, in other studies, it was observed that only a subset of data from the original dataset was used [14,21], which can lead to data leakage.
- Another fundamental aspect in DDoS attack detection is the response time to these events. In most of the reviewed studies, this factor was not addressed, except for [1], which only specified the inference time.
2. Materials and Methods
2.1. Materials
2.2. Method
2.2.1. Data Preprocessing
2.2.2. Feature Selection
2.2.3. Data Normalization
2.2.4. Hyperparameter Tuning
- D: data.
- : hyperparameters.
- : likelihood of the data given the hyperparameters.
- : likelihood of the observation given a point in the search space and the hyperparameters .
- : posterior probability of the hyperparameters given the data.
- : prior of the hyperparameters.
- K: kernel function.
- h: bandwidth.
2.2.5. Machine Learning Algorithms and Performance Evaluation
3. Results
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Najar, A.A.; Manohar, S. Cyber-Secure SDN: A CNN-Based Approach for Efficient Detection and Mitigation of DDoS attacks. Comput. Secur. 2024, 139, 103716. [Google Scholar] [CrossRef]
- Bravo, S.; Mauricio, D. Systematic review of aspects of DDoS attacks detection. Indones. J. Electr. Eng. Comput. Sci. 2019, 14, 155–168. [Google Scholar] [CrossRef]
- Li, Q.; Huang, H.; Li, R.; Lv, J.; Yuan, Z.; Ma, L.; Han, Y.; Jiang, Y. A comprehensive survey on DDoS defense systems: New trends and challenges. Comput. Netw. 2023, 233, 109895. [Google Scholar] [CrossRef]
- Behal, S.; Kumar, K.; Sachdeva, M. Characterizing DDoS attacks and flash events: Review, research gaps and future directions. Comput. Sci. Rev. 2017, 25, 101–114. [Google Scholar] [CrossRef]
- The Cloudflare Blog. Available online: http://blog.cloudflare.com/cloudflare-mitigates-record-breaking-71-million-request-per-second-ddos-attack/ (accessed on 20 January 2024).
- OpenAI Status. Available online: https://status.openai.com/history (accessed on 4 February 2024).
- Bahashwan, A.A.; Anbar, M.; Manickam, S.; Al-Amiedy, T.A.; Aladaileh, M.A.; Hasbullah, I.H. A Systematic Literature Review on Machine Learning and Deep Learning Approaches for Detecting DDoS Attacks in Software-Defined Networking. Sensors 2023, 23, 4441. [Google Scholar] [CrossRef] [PubMed]
- Digital Attack Map. Available online: https://www.digitalattackmap.com/ (accessed on 20 August 2023).
- Fortinet Threat Map. Available online: https://threatmap.fortiguard.com/ (accessed on 20 August 2023).
- Darktrace. Available online: https://es.darktrace.com/ (accessed on 8 February 2024).
- Mustapha, A.; Khatoun, R.; Zeadally, S.; Chbib, F.; Fadlallah, A.; Fahs, W.; Attar, A.E. Detecting DDoS attacks using adversarial neural network. Comput. Secur. 2023, 127, 103117. [Google Scholar] [CrossRef]
- Dayal, N.; Srivastava, S. Analyzing effective mitigation of DDoS attack with software defined networking. Comput. Secur. 2023, 130, 103269. [Google Scholar] [CrossRef]
- Hnamte, V.; Najar, A.A.; Nhung-Nguyen, H.; Hussain, J.; Sugali, M.N. DDoS attack detection and mitigation using deep neural network in SDN environment. Comput. Secur. 2024, 138, 103661. [Google Scholar] [CrossRef]
- Sadhwani, S.; Manibalan, B.; Muthalagu, R.; Pawar, P. A Lightweight Model for DDoS Attack Detection Using Machine Learning Techniques. Appl. Sci. 2023, 13, 9937. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, Y.; Feng, F.; Liu, Y.; Li, Z.; Shan, Y. A DDoS Detection Method Based on Feature Engineering and Machine Learning in Software-Defined Networks. Sensors 2023, 23, 6176. [Google Scholar] [CrossRef] [PubMed]
- Ma, R.; Wang, Q.; Bu, X.; Chen, X. Real-Time Detection of DDoS Attacks Based on Random Forest in SDN. Appl. Sci. 2023, 13, 7872. [Google Scholar] [CrossRef]
- Lv, H.; Du, Y.; Zhou, X.; Ni, W.; Ma, X. A Data Enhancement Algorithm for DDoS Attacks Using IoT. Sensors 2023, 23, 7496. [Google Scholar] [CrossRef] [PubMed]
- Ahmad, I.; Imran, M.; Qayyum, Q.; Ramzan, M.S.; Alassafi, M.O. An Optimized Hybrid Deep Intrusion Detection Model (HD-IDM) for Enhancing Network Security. Mathematics 2023, 11, 4501. [Google Scholar] [CrossRef]
- Ragab, M.; Alshammari, S.M.; Maghrabi, L.A.; Alsalman, D.; Althaqafi, T.; AL-Ghamdi, A.A.-M. Robust DDoS Attack Detection Using Piecewise Harris Hawks Optimizer with Deep Learning for a Secure Internet of Things Environment. Mathematics 2023, 11, 4448. [Google Scholar] [CrossRef]
- Setitra, M.A.; Fan, M.; Agbley, B.L.Y.; Bensalem, Z.E.A. Optimized MLP-CNN Model to Enhance Detecting DDoS Attacks in SDN Environment. Network 2023, 3, 538–562. [Google Scholar] [CrossRef]
- Adeniyi, O.; Sadiq, A.S.; Pillai, P.; Aljaidi, M.; Kaiwartya, O. Securing Mobile Edge Computing Using Hybrid Deep Learning Method. Computers 2024, 13, 25. [Google Scholar] [CrossRef]
- Ramzan, M.; Shoaib, M.; Altaf, A.; Arshad, S.; Iqbal, F.; Castilla, A.K.; Ashraf, I. Distributed Denial of Service Attack Detection in Network Traffic Using Deep Learning Algorithm. Sensors 2023, 23, 8642. [Google Scholar] [CrossRef]
- Sharafaldin, I.; Lashkari, A.H.; Hakak, S.; Ghorbani, A.A. Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy. In Proceedings of the International Carnahan Conference on Security Technology (ICCST), Chennai, India, 1–8 October 2019. [Google Scholar]
- Talukder, M.A.; Uddin, M.A. CIC-DDoS2019 Dataset. 2023, Version 1. Available online: https://data.mendeley.com/datasets/ssnc74xm6r/1 (accessed on 5 January 2023).
- Frye, M.; Mohren, J.; Schmitt, R.H. Benchmarking of Data Preprocessing Methods for Machine Learning-Applications in Production. Procedia CIRP 2021, 104, 50–55. [Google Scholar] [CrossRef]
- Zhang, J.; Wang, Q.; Shen, W. Hyper-parameter optimization of multiple machine learning algorithms for molecular property prediction using hyperopt library. Chin. J. Chem. Eng. 2022, 52, 115–125. [Google Scholar] [CrossRef]
- Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégel, B. Algorithms for Hyper-Parameter Optimization. In Advances in Neural Information Processing Systems, Curran Associates. 2011. Available online: https://papers.nips.cc/paper_files/paper/2011/hash/86e8f7ab32cfd12577bc2619bc635690-Abstract.html (accessed on 11 January 2023).
# | Characteristics | Min and Max Values |
---|---|---|
1 | Protocol | [0; 17] |
2 | Fwd Packet Length Max | [0; 32,120] |
3 | Fwd Packet Length Std | [0; 2221.5562] |
4 | Bwd Packet Length Min | [0; 1460] |
5 | Flow Bytes/s | [0; 2,944,000,000] |
6 | Bwd IAT Total | [0; 119,943,720] |
7 | Bwd IAT Min | [0; 249] |
8 | Bwd Header Length | [−2,125,437,950; 1,478,492,170] |
9 | Bwd Packets/s | [0; 2,000,000] |
10 | Packet Length Max | [0; 37,960] |
11 | Packet Length Variance | [0; 43,778,892] |
12 | SYN Flag Count | [0; 1] |
13 | ACK Flag Count | [0; 1] |
14 | URG Flag Count | [0; 1] |
15 | CWE Flag Count | [0; 1] |
16 | Down/Up Ratio | [0; 23] |
17 | Init Fwd Win Bytes | [−1; 65,535] |
18 | Init Bwd Win Bytes | [−1; 65,535] |
19 | Fwd Act Data Packets | [0; 18,766] |
20 | Active Std | [0; 21,352,442] |
21 | Active Max | [0; 45,536,680] |
22 | Idle Std | [0; 45,536,680] |
ML Classifier | GridSearch | TPE |
---|---|---|
RF | 768 m 13.7 s | 11 m 51 s |
ADA | 653 m 17.5 s | 9 m 36 s |
XGB | 616 m 56.2 s | 2 m 4 s |
DT | 207 m 47.6 s | 25 s |
ML Classifier | Space Values |
---|---|
RF | max_depth = range(10, 16) n_estimators = range(35, 46) criterion = [“gini”, “entropy”] max_features = range(0.01, 1) |
DT | criterion = [“gini”, “entropy”] splitter =[“best”, “random”] max_depth = range(1, 10) min_samples_split = range(2, 30) min_samples_leaf = range(1, 15) |
ADA | learning_rate= range(0, 1) n_estimators= range(20, 75) algorithm = [“SAMME, “SAMME.R”] |
XGB | n_estimators = range(50, 100) max_depth = range(1, 10) learning_rate = range(0, 1) gamma = range(0.0, 1.0) min_child_weight = range(1, 10)) |
MLP | hidden_layer_sizes = [(32,), (64,), (128,)] activation = [“relu”, “tanh”, “logistic”] alpha = [0.0001, 0.01] solver = [“adam”] |
DNN | layers = [[64, 32], [128, 64], [256, 128]] activation= [“relu”, “tanh”] dropout_rate = range(0.0, 0.5) optimizer = [“adam”, “rmsprop”] batch_size=[32, 64, 128] epochs = [10, 20, 30, 40, 50] |
ML Classifier | Best Hyperparameters | Accuracy | Training Time |
---|---|---|---|
RF | {‘criterion’: ‘entropy’, ‘max_depth’: 12, ‘max_features’: 0.9145, ‘n_estimators’: 43} | 99.95% | 11 m 51 s% |
DT | {‘criterion’: ‘gini’, ‘min_samples_split’: 24, ‘max_depth’: 7, ‘min_samples_leaf’: 10, ‘splitter’: ‘best’} | 99.88% | 25 s% |
ADA | {‘algorithm’: ‘SAMME.R’, ‘learning_rate’: 0.55, ‘n_estimators’: 68} | 99.59% | 9 m 36 s% |
XGB | {‘gamma’: 0.21, ‘learning_rate’: 0.65, ‘max_depth’: 8, ‘min_child_weight’: 1, ‘n_estimators’: 80} | 99.95% | 2 m 4 s% |
MLP | {‘activation’: ‘relu’, ‘alpha’: 0.0001, ‘hidden_layer_sizes’: (32,), ‘solver’: ‘adam’} | 99.66% | 2 h 14 m 40 s% |
DNN | {‘activation’: ‘tanh’, ‘batch_size’: 128, ‘dropout_rate’: 0.070, ‘epochs’: 50, ‘layers’: (256, 128), ‘optimizer’: ‘adam’} | 99.70% | 50 m 14 s% |
ML Classifier | Benign | Attack | |
---|---|---|---|
RF | 18,876 | 13 | Benign |
12 | 64,851 | Attack | |
DT | 18,771 | 66 | Benign |
28 | 64,787 | Attack | |
ADA | 18,699 | 190 | Benign |
147 | 64,716 | Attack | |
XGB | 18,874 | 15 | Benign |
22 | 64,841 | Attack | |
MLP | 18,759 | 130 | Benign |
163 | 64,700 | Attack | |
DNN | 18,833 | 56 | Benign |
194 | 64,669 | Attack |
ML Classifier | Accuracy (%) | Precision (%) | Recall (%) | F1 (%) | AUC (%) |
---|---|---|---|---|---|
RF | 99.97 | 99.98 | 99.80 | 99.98 | 99.96 |
DT | 99.89 | 99.90 | 99.96 | 99.23 | 99.80 |
ADA | 99.60 | 99.70 | 99.77 | 99.74 | 99.38 |
XGB | 99.96 | 99.98 | 99.97 | 99.97 | 99.94 |
MLP | 99.65 | 99.80 | 99.75 | 99.77 | 99.53 |
DNN | 99.70 | 99.91 | 99.70 | 99.80 | 99.70 |
Ref. | Approach | Features | Accuracy (%) | Precision (%) | Recall (%) | F1 (%) |
---|---|---|---|---|---|---|
[1] | CNN | 66 | 98.64 | 99.0 | 99.0 | 99.0 |
[11] | LSTM | 67 | - | - | - | 99.0 |
[16] | RF | 24 | 99.99 | 99.99 | 99.99 | 99.99 |
[18] | Hybrid GRU and LSTM | - | 99.91 | 99.62 | 99.43 | 99.52 |
[20] | OptMLP-CNN | 20 | 99.95 | 99.90 | 99.98 | 99.93 |
[22] | RNN | 20 | 99.99 | 99.99 | 99.99 | 99.99 |
LSTM | 99.99 | 99.0 | 99.0 | 99.0 | ||
GRU | 99.99 | 99.0 | 100 | 100 | ||
Our study | RF (Our approach) | 22 | 99.97 | 99.98 | 99.80 | 99.98 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Becerra-Suarez, F.L.; Fernández-Roman, I.; Forero, M.G. Improvement of Distributed Denial of Service Attack Detection through Machine Learning and Data Processing. Mathematics 2024, 12, 1294. https://doi.org/10.3390/math12091294
Becerra-Suarez FL, Fernández-Roman I, Forero MG. Improvement of Distributed Denial of Service Attack Detection through Machine Learning and Data Processing. Mathematics. 2024; 12(9):1294. https://doi.org/10.3390/math12091294
Chicago/Turabian StyleBecerra-Suarez, Fray L., Ismael Fernández-Roman, and Manuel G. Forero. 2024. "Improvement of Distributed Denial of Service Attack Detection through Machine Learning and Data Processing" Mathematics 12, no. 9: 1294. https://doi.org/10.3390/math12091294
APA StyleBecerra-Suarez, F. L., Fernández-Roman, I., & Forero, M. G. (2024). Improvement of Distributed Denial of Service Attack Detection through Machine Learning and Data Processing. Mathematics, 12(9), 1294. https://doi.org/10.3390/math12091294