Machine Learning Approach Equipped with Neighbourhood Component Analysis for DDoS Attack Detection in Software-Deﬁned Networking

: The Software-Deﬁned Network (SDN) is a new network paradigm that promises more dynamic and efﬁciently manageable network architecture for new-generation networks. With its programmable central controller approach, network operators can easily manage and control the whole network. However, at the same time, due to its centralized structure, it is the target of many attack vectors. Distributed Denial of Service (DDoS) attacks are the most effective attack vector to the SDN. The purpose of this study is to classify the SDN trafﬁc as normal or attack trafﬁc using machine learning algorithms equipped with Neighbourhood Component Analysis (NCA). We handle a public “DDoS attack SDN Dataset” including a total of 23 features. The dataset consists of Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and Internet Control Message Protocol (ICMP) normal and attack trafﬁcs. The dataset, including more than 100 thousand recordings, has statistical features such as byte_count, duration_sec, packet rate, and packet per ﬂow, except for features that deﬁne source and target machines. We use the NCA algorithm to reveal the most relevant features by feature selection and perform an effective classiﬁcation. After preprocessing and feature selection stages, the obtained dataset was classiﬁed by k-Nearest Neighbor (kNN), Decision Tree (DT), Artiﬁcial Neural Network (ANN), and Support Vector Machine (SVM) algorithms. The experimental results show that DT has a better accuracy rate than the other algorithms with 100% classiﬁcation achievement.


Introduction
SDN is a new paradigm that facilitates network management with its dynamic and programmable structure. In SDN, control and data planes are divided from each other, and network management is carried out by a central controller [1]. Thus, the controller, which can manage the whole network from a single point, can quickly apply different network policies to the whole network. Figure 1 shows the layered structure of the SDN environment. However, this emerging new approach brings along security problems in addition to the advantages it provides. In addition to attacks encountered in traditional network structures, SDN is also exposed to attacks specific to itself [2]. Perhaps the most dangerous of these attacks are attacks on the controller, because the attacker who seizes the controller can have the ability to manage or disrupt all network traffic. DDoS attacks in which users are denied access to network services are at the top of the attacks on the controller. The attackers aim to create heavy traffic with more than one machine, to consume the resources on the target machine, and to prevent it from serving after a while by DDoS attacks. Attackers use "botnets" created from devices called zombies hijacked by internet hackers. DDoS attacks are carried out with a large number of machines, so it is very difficult to detect and block. The frequency and severity of DDoS attacks are constantly increasing and can have fatal effects on many network services [3,4]. For this reason, quick detection and prevention of DDoS attacks are some of the most important problems for network service providers and administrators. Different SDN layers can be disabled by filling communication channels between the controller and the switch or between the controller and the application layer with unnecessary flow information by DDoS attacks. There is no built-in security mechanism on the controller that can distinguish between attack traffic and normal traffic. Therefore, it is very difficult to detect an attack.
DDoS attacks are grouped into three categories; application-layer attacks, resourceconsuming attacks, and volumetric attacks [5]. Application-layer attacks consist of complex attacks. They target specific services using less bandwidth and slowly consume network resources. Therefore, it is difficult to detect. Hypertext Transfer Protocol (HTTP) and Domain Name System (DNS) attacks can be evaluated in this category [6]. In resourceconsuming attacks, servers are rendered unavailable by taking advantage of vulnerabilities in protocols implemented on the network layer. TCP-SYN Flood consumes the resources of the target machine (memory, CPU, and storage) [7]. It aims to consume the bandwidth of the network with volumetric attacks. Common attacks such as ICMP, UDP, and TCP-SYN flood are performed by using vulnerabilities in Layer 3 and Layer 4 protocols [8].
In this study, we focus on the SDN to ensure a lightweight hybrid model equipped with NCA and machine learning approaches to contribute to ensuring a new-generation manageable network architecture. In detecting DDoS attacks with machine learning, some flow characteristics (packet size, arrival time, response time, packet rate, packet per flow, etc.) are used to identify whether the network traffic is normal. DDoS attacks often use the same average packet size. Since the attack traffic has a high bitrate, the time to arrive at the target machine is very short. Attackers focus on any of these features to consume the target machine's resources and prevent it from serving. For this purpose, we handle a public dataset including a total of 23 features for detecting DDoS attacks with machine learning. Instead of considering all the features in the dataset, we reveal the most efficient features with the NCA approach with the help of the newly proposed model. To ensure more generalized results, the proposed approach is tried and tested in four different machine learning algorithms. As a result, the obtained promising results point out that the proposed approach can achieve more efficient results compared to traditional machine learning Electronics 2021, 10, 1227 3 of 16 algorithms, even while using fewer features. The proposed model has great potential in contributing to the management of new-generation SDN architecture.
The rest of the paper is organized as follows: The next section elaborates on some previous related works. In Section 3, information about the used publicly available dataset is briefly given. in addition, the existing models, feature selection method, data augmentation method, machine learning method, optimization method, and the proposed method are presented briefly in this section. The results and analysis are given in Section 4. The discussion is presented in Section 5. Finally, Section 6 includes the concluding remarks and future work.

Related Works
In recent years, many studies have been done to secure SDN using machine learning techniques. In this section, we discuss several studies of DDoS security mechanisms based on machine learning and deep learning techniques.
Security solutions such as the Intrusion Prevention System (IPS) and the Intrusion Detection System (IDS) are used to ensure network security. The increasing variety of attacks has made it necessary to make statistical calculations on these systems. With machine learning algorithms, IDS systems have gained the ability to make meaningful comments and predictions. Pérez-Díaz et al. [9] proposed a new architectural solution to detect Low-Rate DDoS (LR-DDoS) attacks and mitigate their effectiveness in SDN. The architectural solution consists of IPS and IDS modules placed on the controller. Attack detection is made using different trained machine learning and deep learning methods through the Identification Application Programming Interface (API) positioned in the IDS module. They used the Canadian Institute of Cybersecurity (CIC) DoS dataset in their studies. The experimental results showed that the algorithm that gives the best result with 95% accuracy among six different machine learning algorithms is Multi-Layer Perceptron (MLP). Shoo et al. [10] introduced a new evolutionary model to classify DDoS attack traffic in an SDN environment. The model uses a combined SVM algorithm for malicious traffic classification. Genetic algorithms (GA) were used for SVM optimization when determining Kernel Principal Component Analysis (KPCA) as a property-selection method to improve the model's classification performance. Two different datasets which consist of UDP flood, HTTP flood, Smurf, SiDDoS and normal traffics were used to test and compare model accuracy. The experimental results show that the proposed combined method accuracy is 98.9%.
Kyaw, Aye Thandar, May Zin Oo, and Chit Su Khin [11] used two machine learning algorithms to detect UDP flooding attacks in the SDN environment. They used the Scapy tool for traffic packet generation. Their system collects the flow statics via the OpenFlow switch. After the feature extraction phase, they compared the classification performance of Linear and Polynomial SVM models. Experimental results show that the Polynomial SVM algorithm has a 34% lower false alarm rate with 3% better accuracy.
Janarthanam, S., N. Prakash, and M. Shanthakumar [12] proposed the security framework that detects DDoS attacks on the SDN environment. The framework is based on an adaptive learning model that uses the historical dataset for traffic classification. They used a cross-validation approach for efficient classification results. Although the results obtained are promising, the adaptive security model should be tested on different datasets obtained from the real environment to be more realistic. Tan, Liang et al. [13] proposed a novel security model for DDoS attacks in the SDN environment. The model involves two modules based on ML algorithms. The data-processing module uses the K-Means algorithm for best feature selection and the detection module uses the k-nearest neighbour (kNN) algorithm to detect attack flows. Compared to the distributed-Self-Organizing Map (SOM) and entropy-based method, their method has a 98.85% accuracy with a 98.47% recall rate.
Wang, Lu, and Ying Liu [14] proposed a DDoS attack detection method that used a two-level detection system to identify the attack based on information entropy and deep learning. They used entropy detection to detect suspicious traffic at the first level, and at the second level, they used the convolutional neural network (CNN) model to detect attack traffic. Finally, they tested the method using deep neural networks, decision trees (DT), and SVM models. The CNN model's accuracy was 4.25-8.20% higher than the other algorithms.
Deepa, V., K. Muthamil Sudar, and P. Deepalakshmi [15] proposed an ensemble technique to detect denial of service (DDoS) attacks. They used four different machine learning models to detect suspicious traffic in the SDN environment. SVM-SOM algorithm showed better results compared to the other ML algorithms with 98.12% accuracy. The authors in [16] introduced a DDoS attack-detection system for SDN. The system used two security stages. Firstly, they used Snort to detect signature-based attacks. After that, they used the SVM classifier and the deep neural network (DNN) machine learning model for attack classification. The experimental results proved that DNN has a better classification accuracy rate than SVM at 92.30%.
The authors of [17] demonstrated the success of the deep learning model in detecting and classifying DDoS attacks in their studies. They applied the DNN model on two different samples taken over the CICDDoS2019 dataset. The attack detection scenario was applied on the first dataset, while the attack traffic classification scenario was applied on the second dataset. Their results showed that the DNN model is quite successful in both intrusion detection and classification. The authors generalize on the results they obtained on the CICDDoS2019 dataset in their studies. However, different datasets can give different results. Therefore, they could support their work by working on different datasets such as NSL-KDD, ISCX IDS 2012, UNSW-NB15, and CICIDS 2017.
Some of the researchers have made intrusion detection using hybrid machine learning models in their studies. Nam, Tran Manh et al. [18] proposed a DDoS security system using the SDN architecture to detect attack flows. Their hybrid solution uses combined kNN and SOM algorithms. They classified the traffic into normal and malicious using flow statistics collected from SDN switches and vehicle sensors. Adhikary et al. [19] focused on a hybrid technique which was combined the technique of Neural Network and DT for different types of DDoS attacks in Vehicular Ad hoc Network (VANET). The proposed hybrid algorithm has better results than the single models of Neural Networks and DT. Hosseini and Azizi [20] proposed a hybrid model to detect and mitigate the DDoS attack. Their framework separated the sides as proxy and client. This way, the limited resources on both sides can be used effectively. They combined six different ML techniques to identify the attack flows. Random Forest classifier provides better results than the compared ML techniques.
Several machine learning-based solutions to detect DDoS attacks in cloud computing and IoT networks have been proposed. The big challenge in machine learning-based solutions is the detection of these attacks with high accuracy. Ujjan, Raja Majid Ali et al. [21] focused on Internet of Things (IoT) DDoS attack detection. Their proposed methods used time-based and packet-based sampling approaches to collect network traffic coming to the SDN data plane. With these sampling approaches, they aim to reduce the IDS and Deep Neural Network (DNN) model's processing load and increase the classification performances. The results show that their proposed model has higher detection rates. Ravi, Nagarathna, and S. Mercy Shalinie [22] proposed a security mechanism to detect DDoS attacks mitigation in the IoT networks. Their mechanism, which is named Learning-driven Detection Mitigation (LEDEM), used a semi-supervised ML model for malicious traffic detection. LEDEM has multiple customized controllers connected to a central controller. They have implemented different security approaches for IoT environments that they separate as mobile IoT and Fixed IoT. They used their dataset for testing their security mechanism. Yong et al. [23] focused on the web-shell intrusion in IoT environment using the ensemble methods. The authors used the principal component analysis to select the best features with three types of ensemble techniques: random forest (RF), voting, and extremely randomized trees (ET). While RF and ET work well for the light IoT environment, the voting method gives better results for heavy IoT scenarios.
The authors of [24] proposed a machine learning-based DDoS intrusion detection system to ensure the security of cloud services. They developed the Self-Adaptive Evolutionary Extreme Learning Machine (SaE-ELM) model as an automatic adaptive system and applied it to intrusion detection systems. They tested their method on four different datasets and compared the classification accuracy with commonly used machine learning models such as ANN, DT, and SVM. Although the test and training time of the model they developed is slightly higher compared to the SaE-ELM model, the results obtained are quite good.
Although the central management and programmable structure provided by SDN brings new capabilities to IDSs, the performance of these detection systems depends on the quality of training datasets.
In the recent studies we have summarized above, different datasets such as KDD Cup'99, NSL-KDD, CICIDS2017, CAIDA 2016, UNB-ISCX, and CIC DoS were used. The biggest problem with these datasets is that they are out of date. Attack characteristics are changing, so the need for up-to-date datasets is increasing. LITNET-2020 dataset [25] and Bogaziçi University datasets [26] are the current datasets used to detect DDoS attacks. However, these datasets are also created using traditional network platforms like the other datasets.
Our motivation in this study has been to work on up-to-date datasets obtained from SDN network platforms. There are a few publicly available datasets that can be used directly for anomaly detection systems applied in SDN networks [27,28]. We used the "DDOS attack SDN Dataset" in our study, which is also a new dataset and accessible to researchers for use in machine learning and deep learning research.

Materials and Methods
The steps of the method followed to achieve the results are given in Figure 2. Furthermore, the features and classes of the public dataset used in this section are explained. The feature selection algorithm used to determine the features that will increase the classification accuracy in the dataset is given, and the features of the classifiers used after the feature selection phase are also explained in detail in this section.
Electronics 2021, 10, x FOR PEER REVIEW 5 of 16 mechanism. Yong et al. [23] focused on the web-shell intrusion in IoT environment using the ensemble methods. The authors used the principal component analysis to select the best features with three types of ensemble techniques: random forest (RF), voting, and extremely randomized trees (ET). While RF and ET work well for the light IoT environment, the voting method gives better results for heavy IoT scenarios. The authors of [24] proposed a machine learning-based DDoS intrusion detection system to ensure the security of cloud services. They developed the Self-Adaptive Evolutionary Extreme Learning Machine (SaE-ELM) model as an automatic adaptive system and applied it to intrusion detection systems. They tested their method on four different datasets and compared the classification accuracy with commonly used machine learning models such as ANN, DT, and SVM. Although the test and training time of the model they developed is slightly higher compared to the SaE-ELM model, the results obtained are quite good.
Although the central management and programmable structure provided by SDN brings new capabilities to IDSs, the performance of these detection systems depends on the quality of training datasets.
In the recent studies we have summarized above, different datasets such as KDD Cup'99, NSL-KDD, CICIDS2017, CAIDA 2016, UNB-ISCX, and CIC DoS were used. The biggest problem with these datasets is that they are out of date. Attack characteristics are changing, so the need for up-to-date datasets is increasing. LITNET-2020 dataset [25] and Boğaziçi University datasets [26] are the current datasets used to detect DDoS attacks. However, these datasets are also created using traditional network platforms like the other datasets.
Our motivation in this study has been to work on up-to-date datasets obtained from SDN network platforms. There are a few publicly available datasets that can be used directly for anomaly detection systems applied in SDN networks [27,28]. We used the "DDOS attack SDN Dataset" in our study, which is also a new dataset and accessible to researchers for use in machine learning and deep learning research.

Materials and Methods
The steps of the method followed to achieve the results are given in Figure 2. Furthermore, the features and classes of the public dataset used in this section are explained. The feature selection algorithm used to determine the features that will increase the classification accuracy in the dataset is given, and the features of the classifiers used after the feature selection phase are also explained in detail in this section.

Dataset
In this study; the public "DDOS attack SDN Dataset", which was created in the SDN environment and made publicly accessible to researchers for use in machine learning and deep learning research, was used [29]. There are 23 features in the dataset, which contains 104345 traffic flows. The dataset consisting of TCP, UDP, and ICMP traffic is shown using the normal and attack traffic class label. The dataset has statistical features such as byte_count, duration_sec, packet rate, and packet per flow, except for features that define source and target machines. Before starting machine learning model training, the data must be preprocessed. At this stage, packet rate, byte per-flow, and packet per flow properties are excluded from the dataset because they contain duplicate values. Categorical variables such as source-destination IP and protocol that do not have numeric values were encoded by using one-hot encoding [30]. In the next step, we tried to find the correlation of input features with output features by using various machine learning methods, heatmap graph, and correlation techniques. As a result of this process, the column shown with the "dt" feature and containing the time information was found to be unnecessary and removed from the dataset. The data preprocessing phase was terminated by applying normalization to numeric data.

Machine Learning Approach
Machine learning methods are used to analyze system performance and detect unusual events that are not consistent with normal network behaviour. Especially in network systems where high-density data is circulating, abnormal movements are detected by mathematical models created using machine learning algorithms, and preventive policies are quickly applied to network systems. In this section, the features of the machine learning approaches used in this study are briefly explained.

k-Nearest Neighbor (k-NN) Classifier
k-NN is one of the popular machine learning algorithms. It is a non-parametric, distance-based, and supervised approach that was introduced in 1951 [31]. This algorithm measures the similarities in the dataset considering a distance function. The test data are classified based on the majority votes of its k-nearest neighbours.
A training set is defined as X and Y pairs. Let X = {x 1 , x 2 , . . . x n } where x i ∈ R n corresponds to the training data in the n-dimension feature set, and let Y = {y 1 , y 2 , . . . y n } match the target labels. A prediction for a test datax applied as input to the kNN model is realized as follows [32]: • A distance function such as a Euclidean one is used to measure similarity in the training data. For two points named a and b have Cartesian coordinates (a 1 , a 2 ) and (b 1 , b 2 ), the distance between a and b is calculated as given in Equation (1): • The label of test datax is determined considering the majority votes of its k-nearest neighbours.

Decision Tree
The decision tree machine learning algorithm is used for regression as well as the classification of real-world problems. This model is inspired by a tree structure. However, the root of the tree is located at the top. The branches are created considering objective rules relied on the features of the dataset, and the decision tree is also progressively developed [33]. To create a decision tree, the procedures described below can be followed [34]: The whole dataset is divided into two parts as training and test sets.

2.
The training set is applied as the input to the root of the tree.

3.
The root is determined using information theory as described in Equation (2).

4.
The prone procedure is carried out. 5.
The procedures between 1 and 4 are followed again and again until all nodes become leaf nodes.
where the probability distribution of the dataset is denoted with p. To achieve an efficient decision tree, there are also other hyper-parameters, such as the minimum leaf size, minimum parent size, and the maximum number of splits to be set.

Artificial Neural Network
ANN is a useful computational model for making predictions on nonlinear and complicated systems. In the basic approach, an ANN model consists of an input layer, one or more hidden layer(s), and an output layer [35]. This computational model can be defined as in Equation (3) Herein, the weights of the model are denoted with ω i , and these weights are calculated using training algorithms. The data describing the problem with features is shown by x i , and bias values are symbolized by b i [36].
In the configuration stage of the ANN model, we used three hidden layers with 25, 15, 10 computational nodes, respectively. Levenberg-Marquardt training algorithm was preferred. Other hyper-parameters were used with their default values.

Support Vector Machine
The SVM algorithm is one of the most efficient machine learning algorithms for classification and regression problems [37]. SVM determines a hyperplane that can separate the space into two or more classes. The margin is kept large as possible, and the data points in this border are called support vectors [38]. The kernel is used to divide the data non-linearly in SVM. To this aim, the SVM searches support vectors, weights, and bias. For input data, z ∈ R n , the SVM is determined as follows: Herein, Ψ(.) corresponds to the mapping function, and v and c are weights and bias, respectively. The mapping function can be linear SVM, polynomial SVM, radial-basis function (RBF)-SVM. In this study, we preferred RBF-SVM as a mapping function for the classification task.

Neighbourhood Component Analysis
The main purpose of feature selection can be expressed as the selection of a subset feature by reducing the cost of computing and reducing unrelated features from the feature set that will affect model performance [39]. In this study, the NCA algorithm was used to select the most appropriate features to perform an effective classification of more than 100 thousand network records, which consists of 22 features of SDN technology. The advantage of the NCA model developed based on the kNN algorithm is that it lists the features in order of importance and also provides information about the weight value of the features [40,41].
There is a possibility that a feature given as x i input in the NCA algorithm corresponds to y i the class, corresponding to all classes. The distance between two observations is calculated according to Equation (5) [41]. Here, w r is the weight value of the feature. The reference points (P) in the feature set are calculated according to Equation (6).
The probability of choosing x i as the reference point for xj is calculated according to Equation (7).
Herein, k corresponds to the kernel function (k(z) = (exp −z/σ)) and σ denotes the width of kernel function whereas the correct classification possibility of the real class is calculated as defined in Equation (8) [42].
where y ij = 1 if and only if y j = y j and y ij = 0.

Performance Metrics and Model Evaluation
A confusion matrix was used to test the performance results of the experimental studies conducted to determine the normal and abnormal network records obtained with SDN. This matrix contains estimated values and real values. The confusion matrix is given in Table 1. Here, true positive (TP) and true negative (TN) represent the correctly predicted values of network movements, while false positive (FP) and false negative (FN) represent incorrect predicted values [43,44]. Moreover, the receiver operating curve (ROC) and the areas under the curves (AUC) were used to evaluate the model performance. ROC curve has a false positive rate on the horizontal axis and a true positive rate on the vertical axis. The proposed model was evaluated based on the accuracy (Acc), sensitivity (Se), specificity (Sp), Precision (Pr), and F-score performance metrics derived from confusion matrices. The formulations of these metrics are given in Table 2. In addition, the k-fold cross-validation method was used in this experimental study to determine the test error of the predictive model. In the cross-validation method, the dataset is divided into k groups and it is a less biased model [38]. The k value was determined as 10 in this experimental study.
To carry out experimental studies, more than 100 thousand network records consisting of 22 different features of network movements were kept. Before the feature selection and classification phase of the network records, the preprocessing phase was carried out. The standardization process, which is widely used in the field of machine learning, was applied as preprocessing. Standardization can be expressed as the recalculation of variance according to the defined value [45]. Mathematically, it is calculated as in Equation (9).
In Equation (9), S i refer to raw data while µ i and σ µ i indicate mean and standard deviation, respectively.

Accuracy TP+TN TP+TN+FP+FN
The overall accuracy of the model.

Sensitivity TP TP+FN
The performance of the model on detecting abnormal network traffic.

Specificity TN TN+FP
The performance of the model on detecting normal network traffic.

Precision TP TP+FP
The ratio of correctly predicted abnormal network traffic to the total abnormal network traffic.
F-Score 2 * TP 2 * TP+FP+FN The accuracy of the model on the whole dataset.

Experimental Results
In the first stage of the experimental study, SDN records were classified directly with machine learning methods after the preprocessing step without any feature selection. Hyper-parameters of machine learning algorithms were determined automatically using the method of optimization of hyper-parameters to perform an effective classification. While the dataset was divided as training at the rate of 0.7, it was separated as a test at the rate of 0.3. To perform the classification process with the kNN algorithm, the value of k, which is the number of neighbours to be looked at, was determined as 1, and Euclidean was chosen as the distance function. The Gini algorithm is determined as the division criterion in the DT method. The hidden neuron number was 10 for classification with the ANN method, and the Levenberg-Marquardt algorithm was used as the training algorithm. To classify the network records with the SVM method, the kernel Radial Basis Function was selected, the box constraint value was determined as 1, and the kernel scale value was determined as 0.9. When the classification results were examined after the SDN records were given to the input of the machine learning algorithms, the best accuracy rate was obtained with the ANN method at 97.35%, while the accuracy rates of 95.41%, 94.14%, and 80.56% were obtained with kNN, DT, and SVM methods, respectively. Performance results obtained as results of classification are given in Table 3. The raw network data in the dataset were carried out in the preprocessing step, and feature selection was applied with the NCA algorithm. To train the NCA algorithm, the regularization parameter value lambda (λ), which prevents overfitting, was automatically determined. The stochastic gradient descent (SGD) method was used to optimize feature weights. In SGD optimization, the mini-batch size value was determined as 10 and the epoch value as 5. While the weight values of the unrelated features in the NCA algorithm are close to zero, the weight values of the features with high discrimination features are higher. The index values of features and the corresponding weight values are given in Figure 3.
ization parameter value lambda (λ), which prevents overfitting, was automatically determined. The stochastic gradient descent (SGD) method was used to optimize feature weights. In SGD optimization, the mini-batch size value was determined as 10 and the epoch value as 5. While the weight values of the unrelated features in the NCA algorithm are close to zero, the weight values of the features with high discrimination features are higher. The index values of features and the corresponding weight values are given in Figure 3. When we look at the weight values of the features with the NCA algorithm, it is observed that the weight values of eight features are between 0 and 1, while the weight values of 14 features vary between 1.11 and 17.87. Machine learning algorithms are known to affect computational costs when classifying high-specification problems [46]. For this reason, after analyzing 22 network features NCA algorithms, the first classification process was made with eight features with an index value of more than 9. In the second experimental study, 14 effective features were selected and given as input data to machine learning algorithms. The 14 most effective properties and weight values selected by NCA are given in Table 4.  When we look at the weight values of the features with the NCA algorithm, it is observed that the weight values of eight features are between 0 and 1, while the weight values of 14 features vary between 1.11 and 17.87. Machine learning algorithms are known to affect computational costs when classifying high-specification problems [46]. For this reason, after analyzing 22 network features NCA algorithms, the first classification process was made with eight features with an index value of more than 9. In the second experimental study, 14 effective features were selected and given as input data to machine learning algorithms. The 14 most effective properties and weight values selected by NCA are given in Table 4. More than 100 thousand network records were classified by kNN, DT, ANN, and SVM algorithms after preprocessing and feature selection. In the first experimental study, the new dataset, created by selecting the features with an index value of more than 9 with the NCA algorithm, is given as an input to ML algorithms. As a result of experimental studies, promising results were obtained with all classification algorithms. While the best accuracy rate was obtained with the DT method as 99.1760%, it was determined as 97.7542%, 96.2015%, and 81.4810% with the kNN, YSA, and SVM methods, respectively. The performance results obtained by machine learning methods as a result of the experimental study with the most efficient eight features are given in Table 5. ROC curves of the whole machine learning method with eight features are given in Figure 4. the NCA algorithm, is given as an input to ML algorithms. As a result of experimental studies, promising results were obtained with all classification algorithms. While the best accuracy rate was obtained with the DT method as 99.1760%, it was determined as 97.7542%, 96.2015%, and 81.4810% with the NN, YSA, and SVM methods, respectively. The performance results obtained by machine learning methods as a result of the experimental study with the most efficient eight features are given in Table 5. ROC curves of the whole machine learning method with eight features are given in Figure 4.   In the second experimental study, the feature set consisted of 22 features. After training with the NCA algorithm, the features with an index value of more than 1.11 were selected. They were classified by ML methods using the same hyperparameters as in the first experimental study. As a result of experimental studies, very good results were obtained with all classification algorithms. While the best accuracy rate was obtained as 100% with the DT method, it was determined as 99.15%, 99.78%, and 98.59% with kNN, ANN, and SVM methods, respectively. The performance results obtained by machine learning methods as a result of the experimental study are given in Table 6. In Figure 5, the ROC curves of all machine learning methods are given. In the second experimental study, the feature set consisted of 22 features. After training with the NCA algorithm, the features with an index value of more than 1.11 were selected. They were classified by ML methods using the same hyperparameters as in the first experimental study. As a result of experimental studies, very good results were obtained with all classification algorithms. While the best accuracy rate was obtained as 100% with the DT method, it was determined as 99.15%, 99.78%, and 98.59% with NN, ANN, and SVM methods, respectively. The performance results obtained by machine learning methods as a result of the experimental study are given in Table 6. In Figure 5, the ROC curves of all machine learning methods are given.  Finally, the feature set consisting of 22 features showed the best results and an index value above 1.11. It was also subjected to a cross-validation test. As a result of the experimental study, the highest accuracy rates were obtained by the DT method. While the accuracy rate was 99.82% with the DT method, it was determined as 99.23%, 97.63%, and Finally, the feature set consisting of 22 features showed the best results and an index value above 1.11. It was also subjected to a cross-validation test. As a result of the experimental study, the highest accuracy rates were obtained by the DT method. While the accuracy rate was 99.82% with the DT method, it was determined as 99.23%, 97.63%, and 97.20% with the kNN, ANN, and SVM methods, respectively. Performance values obtained by cross-validation test are given in Table 7.

Discussion
In Table 8, the studies on DDoS attack traffic detection using machine learning algorithms and the classification model we propose are shown comparatively. When Table 8 is examined, it is seen that different datasets were used to detect attack traffic. Some of the researchers used public datasets containing network traffic data from conventional network topologies such as KDD Cup'99, NSL-KDD, UNB-ISCX, CICIDS2017, and CAIDA 2016 [2][3][4][5][6][7][8]. The use of these datasets is positive for comparing the performance of machine learning algorithms used in the detection of attack traffic. However, the fact that the SDN architecture is different from the conventional network architecture causes SDN to have unique attack vectors other than its current attacks. Furthermore, the increasing number of attack traffic and variety requires the use of up-to-date datasets. For this reason, researchers use their datasets obtained by using the SDN architecture for their work [11,12,14,[19][20][21]27,28]. The SDN-specific dataset used in this study was created by the Bennett University study group for machine learning and deep learning studies. The most important criterion for selecting the dataset is that it is created using the SDN architecture and includes up-to-date SDN DDoS traffic data. The results show that machine learning models are quite successful in detecting attack traffic. Our work aims to contribute to the research conducted in this field. Our experimental results showed that using the NCA feature selection method on SDN traffic data increases the accuracy of machine learning methods in detecting attack traffics. While selecting features with the NCA algorithm, all features are scored according to their distinctiveness index values. However, although this feature selection method does not give the optimum number of features to be selected, it is a deficiency of the method, but experimental studies have been carried out by selecting a different number of features in this study. For attacks such as DDoS attacks that need to be intervened without wasting time, it is important to detect the attack traffic by using system resources as efficiently as possible. Therefore, the most effective features should be selected when creating machine learning models.
It can be seen from Table 8 that the performance of machine learning models in studies using feature selection algorithms is better than in other studies [10,19,21]. It can be said that model classification performance contributes positively to the classification of attack traffic when used in conforming to feature selection algorithms. However, given that studies in the literature are run by applying different models on different datasets, it is difficult to make general evaluations on comparative results.

Conclusions
In this study, normal and attack traffic in the dataset obtained from the SDN environment was classified using machine learning algorithms. The customized SDN-based dataset consists of TCP, UDP, and ICMP normal and attack traffics. The dataset has statistical features such as byte_count, duration_sec, packet rate, and packet per flow except for features that define source and target machines. The NCA algorithm has been used to perform an effective classification and to select the most suitable features. After analyzing 22 network features NCA algorithms, 14 effective features were selected and given as input to machine learning algorithms. More than 100 thousand network records were classified by kNN, DT, ANN, and SVM algorithms after preprocessing and feature selection. The experimental results show that DT has a better accuracy rate than the other algorithms with 100%.
In future studies, it is planned to increase the diversity of attacks and compare the classification performances of machine learning models with feature selection algorithms.