Real-Time Detection of DDoS Attacks Based on Random Forest in SDN
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Real-Time Computing
3.1.1. Edge Computing
3.1.2. Distributed Computing
Distributed Computing Concepts
Load Balancing for Distributed Computing
Algorithm 1: Remaining weight random polling algorithm. |
Input: SDN Controller cluster nodes |
Output: Scheduling results |
procedure: parameter Optimization |
defined Node Initial weight queue weightQue, |
for Ni in controller nodes: //Calculate the initial weights of each node |
Ni.W = CalculateW(Ni) |
add Ni in WQue |
while (time = T): |
defined: |
Node remaining weight queue W′Que, |
Node scheduling probability queue PQue, |
Total remaining weight of nodes W″=0, |
Random scheduling function random() |
for Ni in controller nodes: //Iterate through each node of the controller |
Ni.W’ = CalculateW′(Ni) |
if Ni.W′ < Ni.W then //If node i has a residual weight |
Ni.W″ = Ni.W - Ni.W′ |
add Ni in weightQue |
W″ += Ni.W″ //Total remaining weight of statistical nodes |
for Ni in W′Que: |
Ni.P = Ni.W″ / W″ |
add Ni in PQue |
resultNode = random (PQue) //Random scheduling of nodes |
while (tmpNode ==false) then: |
remove tmpNode from PQue |
resultNode = random (PQue) |
return resultNode |
end procedure |
3.1.3. SDN
3.2. DDoS Attack Detection
3.2.1. Datasets
3.2.2. Heterogeneous Integration Feature Selection
- Filtered feature selection algorithms
- 2.
- Wraparound feature selection algorithm
- 3.
- Embedded feature selection algorithms
Algorithm 2: Feature Selection Algorithm. |
Input: datasets = DDoS2019 datasets, |
featureAlgorithms = [variance(),mutualInformation(),backwardElimination(),lasso(),randomForest()], |
model = Random Forest |
Output: Feature Results |
procedure: Feature Selection |
Step 1: defined: |
featureResults = [], |
featureSet= [1-82], |
featureSet1= [], |
featureSet2= [], |
featureSet3= [], |
featureSet4= [], |
featureSet5= [], |
Step 3: for each featureAlgorithm in featureAlgorithms: |
featureSet1=variance(datasets), |
eatureSet2=mutualInformation(datasets), |
featureSet3=backwardElimination(datasets), |
featureSet4=lasso(datasets), |
featureSet5=randomForest(datasets) |
Step 4: for each feature in [featureSet1, featureSet2, featureSet3, featureSet4, featureSet5]: |
++featureSet[feature] |
Step 5: for each featureNums in featureSet: |
if featureNums ≥ 3 then: |
add featureNums.index in featureResults |
Step 6: return featureResults |
end procedure |
3.2.3. Random Forest Optimization
- Random sampling: many samples are taken from the original training set using the Bootstrap sampling method (with put-back sampling) as a new training set.
- Random feature selection: a portion of the original feature set is randomly selected as the new feature set, and only this portion of the features is considered when building the decision tree.
- Construction of the decision tree: the decision tree is built based on the new training set and feature set, with each node division being based on a random subset of the feature set.
- Integration of RF: multiple decision trees are generated and eventually each sample is classified or regressed by voting or taking the average.
- RF algorithms are more capable of handling high-dimensional data and large-scale datasets.
- The RF algorithm can effectively avoid the problem of overfitting.
- The RF algorithm can handle both discrete and continuous data.
- The RF algorithm can give the importance ranking of variables, which is a convenient for feature selection.
Algorithm 3: RF Optimization Algorithm. |
Input: DDoS2019 datasets, sample, depth, estimator |
Output: Optimized RF |
procedure: parameters Optimization |
Step 1: defined Indicators = [Acc, Pre, Rec, F1, Ave, Tim] |
Step 2: Initialize RF (sample = 0.9, depth = 20, estimator = 100) |
Step 2: Xtrain, Xtest, Ytrain, Ytest = split(datasets) |
Step 3: Optimise the sample |
for i in [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]: |
model = RF (sample = i, depth = 20, estimator = 100) |
model.fit |
Ypred = model.predict(Xtest) |
Indicator = test (Ytest,Ypred ) |
Indicators.append(Indicator) |
Step 4: Select the bestSample with the highest Indicators |
Step 5: Optimise depth |
for j in range (10,30,2): |
model = RF (sample = 0.9, depth=j, estimator = 100) |
model.fit |
Ypred = model.predict(Xtest) |
Indicator = test (Ytest,Ypred ) |
Indicators.append(Indicator) |
Step 6: Select the bestDepth with the highest Indicators |
Step 7: Optimise estimator |
for k in range (10,210,20): |
model = RF (sample = 0.9, depth = 20, estimator = k) |
model.fit |
Ypred = model.predict(Xtest) |
Indicator = test (Ytest,Ypred ) |
Indicators.append(Indicator) |
Step 8: Select the bestEstimator with the highest Indicators |
Step 9: return RF (sample = bestSample, depth = bestDepth, estimator = bestEstimator) |
end procedure |
3.2.4. Evaluation Indicators
4. Results and Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, S.; Da Xu, L.; Zhao, S. 5G Internet of Things: A survey. J. Ind. Inf. Integr. 2018, 10, 1–9. [Google Scholar] [CrossRef]
- Chica, J.C.C.; Imbachi, J.C.; Vega, J.F.B. Security in SDN: A comprehensive survey. J. Netw. Comput. Appl. 2020, 159, 102595. [Google Scholar] [CrossRef]
- Bawany, N.Z.; Shamsi, J.A.; Salah, K. DDoS Attack Detection and Mitigation Using SDN: Methods, Practices, and Solutions. Arab. J. Sci. Eng. 2017, 42, 425–441. [Google Scholar] [CrossRef]
- Dong, S.; Abbas, K.; Jain, R. A survey on distributed denial of service (DDoS) attacks in SDN and cloud computing environments. IEEE Access 2019, 7, 80813–80828. [Google Scholar] [CrossRef]
- Chen, J.; Ran, X. Deep Learning with Edge Computing: A Review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
- Cao, K.; Liu, Y.; Meng, G.; Sun, Q. An Overview on Edge Computing Research. IEEE Access 2020, 8, 85714–85728. [Google Scholar] [CrossRef]
- Sun, X.; Ansari, N. EdgeIoT: Mobile Edge Computing for the Internet of Things. IEEE Commun. Mag. 2016, 54, 22–29. [Google Scholar] [CrossRef]
- Ren, Y.; Leng, Y.; Cheng, Y.; Wang, J. Secure data storage based on blockchain and coding in edge computing. Math. Biosci. Eng. 2019, 16, 1874–1892. [Google Scholar] [CrossRef]
- Xiao, Y.; Jia, Y.; Liu, C.; Cheng, X.; Yu, J.; Lv, W. Edge Computing Security: State of the Art and Challenges. Proc. IEEE 2019, 107, 1608–1631. [Google Scholar] [CrossRef]
- Sharma, P.K.; Chen, M.-Y.; Park, J.H. A Software Defined Fog Node Based Distributed Blockchain Cloud Architecture for IoT. IEEE Access 2017, 6, 115–124. [Google Scholar] [CrossRef]
- Birman, K.P. The process group approach to reliable distributed computing. Commun. ACM 1993, 36, 37–53. [Google Scholar] [CrossRef] [Green Version]
- Vavilapalli, V.K.; Murthy, A.C.; Douglas, C.; Agarwal, S.; Konar, M.; Evans, R.; Graves, T.; Lowe, J.; Shah, H.; Seth, S.; et al. Apache Hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing, Santa Clara, CA, USA, 1–3 October 2013. [Google Scholar]
- Hindman, B.; Konwinski, A.; Zaharia, M.; Ghodsi, A.; Joseph, A.D.; Katz, R.; Shenker, S.; Stoica, I. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2011, Boston, MA, USA, 30 March–1 April 2011; Volume 11. No. 2011. [Google Scholar]
- Alsadie, D. TSMGWO: Optimizing Task Schedule Using Multi-Objectives Grey Wolf Optimizer for Cloud Data Centers. IEEE Access 2021, 9, 37707–37725. [Google Scholar] [CrossRef]
- Arshed, J.U.; Ahmed, M. RACE: Resource Aware Cost-Efficient Scheduler for Cloud Fog Environment. IEEE Access 2021, 9, 65688–65701. [Google Scholar] [CrossRef]
- Banitalebi Dehkordi, A.; Soltanaghaei, M.; Boroujeni, F.Z. DDoS attacks detection through machine learning and statistical methods in SDN. J. Supercomput. 2021, 77, 2383–2415. [Google Scholar] [CrossRef]
- Niyaz, Q.; Sun, W.; Javaid, A.Y. A deep learning based DDoS detection system in software-defined net-working (SDN). arXiv 2016, arXiv:1611.07400. [Google Scholar]
- Erhan, D.; Anarım, E. Istatistiksel Yöntemler Ile DDoS Saldırı Tespiti DDoS Detection Using Statistical Methods. In Proceedings of the 2020 28th Signal Processing and Communications Applications Conference (SIU), Gaziantep, Turkey, 5–7 October 2020. [Google Scholar]
- Tayfour, O.E.; Marsono, M.N. Collaborative detection and mitigation of DDoS in software-defined networks. J. Supercomput. 2021, 77, 13166–13190. [Google Scholar] [CrossRef]
- Yu, Y.; Guo, L.; Liu, Y.; Zheng, J.; Zong, Y. An Efficient SDN-Based DDoS Attack Detection and Rapid Response Platform in Vehicular Networks. IEEE Access 2018, 6, 44570–44579. [Google Scholar] [CrossRef]
- Santos, R.; Souza, D.; Santo, W.; Ribeiro, A.; Moreno, E. Machine learning algorithms to detect DDoS attacks in SDN. Concurr. Comput. Pract. Exp. 2019, 32, e5402. [Google Scholar] [CrossRef]
- Cui, J.; Wang, M.; Luo, Y.; Zhong, H. DDoS detection and defense mechanism based on cognitive-inspired computing in SDN. Future Gener. Comput. Syst. 2019, 97, 275–283. [Google Scholar] [CrossRef]
- Cruz, R.L. A calculus for network delay. II. Network analysis. IEEE Trans. Inf. Theory 1991, 37, 132–141. [Google Scholar] [CrossRef]
- Ben-Or, M.; Goldwasser, S.; Wigderson, A. Completeness theorems for non-cryptographic fault-tolerant distributed computation. In Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali; Association for Computing Machinery: New York, NY, USA, 2019; pp. 351–371. [Google Scholar]
- Mansouri, N.; Zade, B.M.H.; Javidi, M.M. Hybrid task scheduling strategy for cloud computing by modified particle swarm optimization and fuzzy theory. Comput. Ind. Eng. 2019, 130, 597–633. [Google Scholar] [CrossRef]
- Tan, L.; Pan, Y.; Wu, J.; Zhou, J.; Jiang, H.; Deng, Y. A New Framework for DDoS Attack Detection and Defense in SDN Environment. IEEE Access 2020, 8, 161908–161919. [Google Scholar] [CrossRef]
- Ahmad, S.; Mir, A.H. Scalability, Consistency, Reliability and Security in SDN Controllers: A Survey of Diverse SDN Controllers. J. Netw. Syst. Manag. 2020, 29, 1–59. [Google Scholar] [CrossRef]
- Ghosh, M.; Guha, R.; Sarkar, R.; Abraham, A. A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput. Appl. 2019, 32, 7839–7857. [Google Scholar] [CrossRef]
- Nouri-Moghaddam, B.; Ghazanfari, M.; Fathian, M. A novel multi-objective forest optimization algorithm for wrapper feature selection. Expert Syst. Appl. 2021, 175, 114737. [Google Scholar] [CrossRef]
- Liu, H.; Zhou, M.; Liu, Q. An embedded feature selection method for imbalanced data classification. IEEE/CAA J. Autom. Sin. 2019, 6, 703–715. [Google Scholar] [CrossRef]
- Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef]
- Luque, A.; Carrasco, A.; Martín, A.; de las Heras, A. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit. 2019, 91, 216–231. [Google Scholar] [CrossRef]
Name | Definition | Formula |
---|---|---|
Send Delay | The time taken by a host or router to send a data frame. | |
Propagation delay | The time it takes for an electromagnetic wave to travel through the channel. | |
Processing delay | The time taken by a host or router to process a received packet. | |
Queuing delay | The time spent waiting for a packet to be queued in the router’s input queue or output queue. |
Classification | Algorithm | Definition |
---|---|---|
Filter | Variance | Remove features that show the same value for most of them. |
Mutual Information | Reciprocal information measures how much information the presence/absence of a feature has to make a correct prediction for Y. | |
Wrapper | Backward Elimination | A base model is used to perform multiple rounds of training, sequentially removing non-conforming ones until a certain criterion is met. |
Embedded | Lasso(L1) | Lasso regression restricts the model coefficients by adding L1 regularization. |
Random Forest | The generalization of the model is improved by constructing multiple decision trees to reduce the risk of overfitting a single decision tree. |
Feature | Code | Feature | Code | Feature | Code |
---|---|---|---|---|---|
Source Port | 1 | Fwd IAT Min | 28 | Down/Up Ratio | 55 |
Destination Port | 2 | Bwd IAT Total | 29 | Average Packet Size | 56 |
Protocol | 3 | Bwd IAT Mean | 30 | Avg Fwd Segment Size | 57 |
Timestamp | 4 | Bwd IAT Std | 31 | Avg Bwd Segment Size | 58 |
Flow Duration | 5 | Bwd IAT Max | 32 | Fwd Header Length.1 | 59 |
Total Fwd Packets | 6 | Bwd IAT Min | 33 | Fwd Avg Bytes/Bulk | 60 |
Total Backward Packets | 7 | Fwd PSH Flags | 34 | Fwd Avg Packets/Bulk | 61 |
Total Length of Fwd Packets | 8 | Bwd PSH Flags | 35 | Fwd Avg Bulk Rate | 62 |
Total Length of Bwd Packets | 9 | Fwd URG Flags | 36 | Bwd Avg Bytes/Bulk | 63 |
Fwd Packet Length Max | 10 | Bwd URG Flags | 37 | Bwd Avg Packets/Bulk | 64 |
Fwd Packet Length Min | 11 | Fwd Header Length | 38 | Bwd Avg Bulk Rate | 65 |
Fwd Packet Length Mean | 12 | Bwd Header Length | 39 | Subflow Fwd Packets | 66 |
Fwd Packet Length Std | 13 | Fwd Packets/s | 40 | Subflow Fwd Bytes | 67 |
Bwd Packet Length Max | 14 | Bwd Packets/s | 41 | Subflow Bwd Packets | 68 |
Bwd Packet Length Min | 15 | Min Packet Length | 42 | Subflow Bwd Bytes | 69 |
Bwd Packet Length Mean | 16 | Max Packet Length | 43 | Init_Win_bytes_forward | 70 |
Bwd Packet Length Std | 17 | Packet Length Mean | 44 | Init_Win_bytes_backward | 71 |
Flow Bytes/s | 18 | Packet Length Std | 45 | act_data_pkt_fwd | 72 |
Flow Packets/s | 19 | Packet Length Variance | 46 | min_seg_size_forward | 73 |
Flow IAT Mean | 20 | FIN Flag Count | 47 | Active Mean | 74 |
Flow IAT Std | 21 | SYN Flag Count | 48 | Active Std | 75 |
Flow IAT Max | 22 | RST Flag Count | 49 | Active Max | 76 |
Flow IAT Min | 23 | PSH Flag Count | 50 | Active Min | 77 |
Fwd IAT Total | 24 | ACK Flag Count | 51 | Idle Mean | 78 |
Fwd IAT Mean | 25 | URG Flag Count | 52 | Idle Std | 79 |
Fwd IAT Std | 26 | CWE Flag Count | 53 | Idle Max | 80 |
Fwd IAT Max | 27 | ECE Flag Count | 54 | Idle Min | 81 |
Inbound | 82 |
Type | Algorithm | Feature Code |
---|---|---|
Filter | Variance | 1,2,4,5,6,7,8,10,11,12,18,19,20,21,22,23,28,29, 30,32,38,39,40,41,42,43,44,45,46,56,57,59,66,67,68,70,73,82 |
Mutual Information | 1,2,4,5,7,8,10,11,12,18,19,20,21,22,39,40,41,42,43,44,56,57,67,68,82 | |
Wrapper | Backward Elimination | 1,2,6,31,52,39,42,45,67,70,71,73,82 |
Embedded | Lasso | 23,28,30,31,32,38,39,41,46,59,70,71,77 |
Random Forest | 1,2,7,8,10,11,12,29,30,32,38,39,41,42,44,45,52,56,57,59,66,68,70,71,73,82 |
Feature | Code | Feature | Code | Feature | Code |
---|---|---|---|---|---|
Bwd Header Length | 39 | Fwd Packet Length Max | 10 | Average Packet Size | 56 |
Destination Port | 2 | Fwd Packet Length Min | 11 | Avg Fwd Segment Size | 57 |
Bwd Packets/s | 41 | Fwd Packet Length Mean | 12 | Fwd Header Length.1 | 59 |
Min Packet Length | 42 | Bwd IAT Mean | 30 | Subflow Fwd Bytes | 67 |
Init_Win_bytes_forward | 70 | Bwd IAT Max | 32 | Subflow Bwd Packets | 68 |
Inbound | 82 | Fwd Header Length | 38 | min_seg_size_forward | 73 |
Total Backward Packets | 7 | Packet Length Mean | 44 | Source Port | 1 |
Total Length of Fwd Packets | 8 | Packet Length Std | 45 | Init_Win_bytes_backward | 71 |
Name | Confusion Matrix | Acc | Pre | Rec | F1 | Ave | Tim |
---|---|---|---|---|---|---|---|
NFS-NRF | 0.92018 | 0.999634 | 0.903962 | 0.949394 | 0.943292 | 0.688239 | |
YFS-NRF | 0.915158 | 0.992977 | 0.903960 | 0.946381 | 0.939619 | 0.434086 | |
YFS-YRF | 0.999954 | 0.999989 | 0.999956 | 0.999972 | 0.999968 | 0.417087 |
Number of Controllers | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
Number of processors | 231,000 | 198,000 | 165,000 | 132,000 | 99,000 | 66,000 | 33,000 |
Prediction time | 0.949209 | 0.827186 | 0.695162 | 0.551112 | 0.390083 | 0.268054 | 0.127017 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, R.; Wang, Q.; Bu, X.; Chen, X. Real-Time Detection of DDoS Attacks Based on Random Forest in SDN. Appl. Sci. 2023, 13, 7872. https://doi.org/10.3390/app13137872
Ma R, Wang Q, Bu X, Chen X. Real-Time Detection of DDoS Attacks Based on Random Forest in SDN. Applied Sciences. 2023; 13(13):7872. https://doi.org/10.3390/app13137872
Chicago/Turabian StyleMa, Ruikui, Qiuqian Wang, Xiangxi Bu, and Xuebin Chen. 2023. "Real-Time Detection of DDoS Attacks Based on Random Forest in SDN" Applied Sciences 13, no. 13: 7872. https://doi.org/10.3390/app13137872
APA StyleMa, R., Wang, Q., Bu, X., & Chen, X. (2023). Real-Time Detection of DDoS Attacks Based on Random Forest in SDN. Applied Sciences, 13(13), 7872. https://doi.org/10.3390/app13137872