D ETECTING DD O S A TTACKS IN S OFTWARE - D EFINED N ETWORKS T HROUGH F EATURE S ELECTION M ETHODS AND M ACHINE L EARNING M ODELS

: For the easy and flexible management of large scale networks, Software-Defined Networking (SDN) is a strong candidate technology that offers centralisation and programmable interfaces for making complex decisions in a dynamic and seamless manner. On the one hand, there are opportunities for individuals and businesses to build and improve services and applications based on their requirements in the SDN. On the other hand, SDN poses a new array of privacy and security threats, such as Distributed Denial of Service (DDoS) attacks. For detecting and mitigating potential threats, Machine Learning (ML) is an effective approach that has a quick response to anomalies. In this article, we analyse and compare the performance, using different ML techniques, to detect DDoS attacks in SDN, where both experimental datasets and self-generated traffic data are evaluated. Moreover, we propose a simple supervised learning (SL) model to detect flooding DDoS attacks against the SDN controller via the fluctuation of flows. We verify the outcome through simulations and measurements over a real testbed. Our main goal is to find a lightweight SL model to detect DDoS attacks with data and features that can be easily obtained. In this study, DDoS attacks in SDN were detected using machine learning-based models. First, specific features were obtained from SDN for the dataset in normal conditions and under DDoS attack traffic. Then, a new dataset was created using feature selection methods on the existing dataset. Feature selection methods were preferred to simplify the models, facilitate their interpretation, and provide a shorter training time. Both datasets, created with and without feature


I. INTRODUCTION
Software-Defined Networking(SDN) is the networking architecture defined by the software program.In SDN, network traffic is controlled by the software which directs the traffic between the hosts.This is unlike the current network architecture where the traffic control is hardware-based which is defined by the switches, routers and, other network infrastructure.The centralized SDN architecture is designed in such a way that SDN switch is comprised of only the data plane and the control plane is moved to another entity known as the controller.The controller acts as the brain of the network and it is a network wide centralized entity where all the switches in the network abide by the decision made by the controller.SDN Controller -Translates the requirements from the SDN application layer to the SDN data paths.It also provides the SDN applications with a central repository of network policies, a view of the networks and network traffic.SDN Data path -Implements switches that move data packets on a network.
Control functions are taken from the switch and given to the controller, which is the brain of the network in SDN architecture.Parent level rules are easily applied to the network with the help of the controller.The controller can add new rules to the transmission devices and change the existing rules.It can carry out these changes by communicating with transmission devices via a secure channel through the OpenFlow protocol.Continuity and the unity of data traffic are ensured through this channel.If this secure channel breaks, the connection between the controller and transmission devices breaks.
SDN architecture is the target for DDoS attacks.While the attacker is attacking the SDN network, it has three main targets, as shown in Figure 1: to consume the sources of the controller, to occupy the bandwidth of the channel between the controller and the switch, and to fill the flow tables in the switch with unnecessary flows.In DDoS attacks against the controller, the attacker sends a large number of packets to the OpenFlow switch via zombie users.It is difficult for the controller to differentiate between traffic sent by the attacker and legal traffic.

IJCRT2212604
International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.orgf310 The OpenFlow switch seeks a match in the flow input by checking the packet header (source port, target port, source IP address, target IP address, etc.).If there is no match, the packet is forwarded to the controller by encapsulating the packet header in the flow request with the OpenFlow protocol (OFPT) PACKET_IN message.Then, the controller responds with the OFPT FLOW_MOD message.This message involves the process to be carried out on the packet and the timeout of the flow in the flow table assigned for the packet [8].As the number of packets forwarded to the controller increases, the sources of the controller are consumed (bandwidth, memory, and CPU), new flow input for the new legal packets arriving at the network cannot be processed, and the SDN architecture collapses.

II. LITERATURE SURVEY
In [1], Software-Defined Networking (SDN) is a networking paradigm that has redefined the term network by making the network devices programmable.SDN helps network engineers to monitor the network expeditely, control the network from a central point, identify malicious traffic and link failure in easy and efficient manner.Besides such flexibility provided by SDN, it is also vulnerable to attacks such as DDoS which can halt the complete network.To mitigate this attack, the paper proposes to classify the benign traffic from DDoS attack traffic by using machine learning technique.The major contribution of this paper is identification of novel features for DDoS attack detections.Novel features are logged into CSV file to create the dataset and machine learning algorithms are trained on the created SDN dataset.Various work which has already been done for DDoS attack detection either used a non-SDN dataset or the research data is not made public.A novel hybrid machine learning model is utilized to perform the classification.Results show that the hybrid model of Support Vector classifier with Random Forest (SVC-RF) classifies the traffic with the highest testing accuracy of 98.8% with a very low false alarm rate.
In [2], In this paper, we propose a new framework of cooperative detection methods of control plane and data plane, which effectively improve the detection accuracy and efficiency, and prevent DDoS attacks on SDN.
In [3], The Distributed Denial of Service (DDoS) attack has seriously impaired network availability for decades and still there is no effective defense mechanism against it.However, the emerging Software Defined Networking (SDN) provides a new way to reconsider the defense against DDoS attacks.In this paper, we propose two methods to detect the DDoS attack in SDN.One method adopts the degree of DDoS attack to identify the DDoS attack.The other method uses the improved K-Nearest Neighbors (KNN) algorithm based on Machine Learning (ML) to discover the DDoS attack.The results of the theoretical analysis and the experimental results on datasets show that our proposed methods can better detect the DDoS attack compared with other methods.
In [4], Software Defined Networking (SDN) offers several advantages such as manageability, scaling, and improved performance.However, SDN involves specific security problems, especially if its controller is defenseless against Distributed Denial of Service (DDoS) attacks.The process and communication capacity of the controller is overloaded when DDoS attacks occur against the SDN controller.Consequently, as a result of the unnecessary flow produced by the controller for the attack packets, the capacity of the switch flow table becomes full, leading the network performance to decline to a critical thresho ld.In this study, DDoS attacks in SDN were detected using machine learning-based models.First, specific features were obtained from SDN for the dataset in normal conditions and under DDoS attack traffic.Then, a new dataset was created using feature selection methods on the existing dataset.Feature selection methods were preferred to simplify the models, facilitate their interpretation, and provide a shorter training time.Both datasets, created with and without feature selection methods, were trained and tested with Support Vector Machine (SVM), Naive Bayes (NB), Artificial Neural Network (ANN), and K-Nearest Neighbors (KNN) classification models.The test results showed that the use of the wrapper feature selection with a KNN classifier achieved the highest accuracy rate (98.3%) in DDoS attack detection.The results suggest that machine learning and feature selection algorithms can achieve better results in the detection of DDoS attacks in SDN with promising reductions in processing loads and times.
In [5], Keeping Internet users protected from cyberattacks and other threats is one of the most prominent security challenges for network operators nowadays.Among other critical threats, distributed denial-of-service (DDoS) becomes one of the most widespread attacks in the Internet, which is very challenging to mitigate appropriately as DDoS attacks cause the system to stop working by resource exhaustion.Software-defined networking (SDN) has recently emerged as a new networking technology offering unprecedented programmability that allows network operators to configure and manage their infrastructures dynamically.The flexible processing and centralized management of the SDN controller allow flexibly deploying complex security algorithms and mitigation methods.In this paper, we propose a novel DDoS attack mitigation in SDN-based Internet Service Provider (ISP) networks for TCP-SYN and ICMP flood attacks utilizing machine learning approach, i.e., K-Nearest-Neighbor (KNN) and XGBoost.By deploying a testbed, we implement the proposed algorithms,

IJCRT2212604
International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.orgf311 evaluate their accuracy, and address the trade-off between the accuracy and mitigation efficiency.Through extensive experiments, the results show that the algorithms can efficiently mitigate the attack by over 98.0% while benign traffic is not affected.
In [6], The Software-Defined Network (SDN) is a new network paradigm that promises more dynamic and efficiently manageable network architecture for new-generation networks.With its programmable central controller approach, network operators can easily manage and control the whole network.However, at the same time, due to its centralized structure, it is the target of many attack vectors.Distributed Denial of Service (DDoS) attacks are the most effective attack vector to the SDN.The purpose of this study is to classify the SDN traffic as normal or attack traffic using machine learning algorithms equipped with Neighbourhood Component Analysis (NCA).We handle a public "DDoS attack SDN Dataset" including a total of 23 features.The dataset consists of Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and Internet Control Message Protocol (ICMP) normal and attack traffics.The dataset, including more than 100 thousand recordings, has statistical features such as byte_count, duration_sec, packet rate, and packet per flow, except for features that define source and target machines.We use the NCA algorithm to reveal the most relevant features by feature selection and perform an effective classification.After preprocessing and feature selection stages, the obtained dataset was classified by k-Nearest Neighbor (kNN), Decision Tree (DT), Artificial Neural Network (ANN), and Support Vector Machine (SVM) algorithms.The experimental results show that DT has a better accuracy rate than the other algorithms with 100% classification achievement.
In [7], Software Defined Network (SDN) is a new network architecture which separates the data plane from the control plane.The SDN controller implements the control plane and switches implement the data plane.Many papers discuss about DDoS attacks on primary servers present in SDN and how they can be mitigated with the help of controller.In our paper we show how DDoS attack can be instigated on the SDN controller by manipulating the flow table entries of switches, such that they send continuous requests to the controller and exhaust its resources.This is a new, but one of the possible way in which a DDoS attack can be performed on controller.We show the vulnerability of SDN for this kind of attack.We further propose a solution for mitigating it, by running a DDoS Detection module which uses variation of flow entry request traffic from all switches in the network to identify compromised switches and blocks them completely.
In [8], Software Defined Networking (SDN) is one of the most commonly used network architectures in recent years.With the substantial increase in the number of Internet users, network security threats appear more frequently, which brings more concerns to SDN.Distributed denial of Service (DDoS) attacks are one of the most dangerous and frequent attacks in software defined networks.The traditional attack detection method using entropy has some defects such as slow attack detection and poor detection effect.In order to solve this problem, this paper proposed a method of fusion entropy, which detects attacks by measuring the randomness of network events.This method has the advantages of fast attack detection speed and obvious decrease in entropy value.The complementarity of information entropy and log energy entropy is effectively utilized.The experimental results show that the entropy value of the attack scenarios 91.25% lower than normal scenarios, which has greater advantages and significance compared with other attack detection methods.
In [9], Software Defined Networks (SDN) provides separation of data plane and control plane, which can be used for implementing various network solutions like traffic engineering, intrusion detection load balancing, etc.However, there are few issues relating to SDN that needs to be addressed, one of them being Distributed Denial of Service (DDoS) attack on the centralized controller.Many researchers have contributed various solutions for identifying and mitigating such attacks.However, the intruders often find new ways of performing such DDoS attacks and hence the detection of such attacks takes more time and resources.In this paper, the aim is to demonstrate how a DDoS attack can be initiated on an SDN controller by the compromised switches whose idle and hard timeout values are manipulated to send repeated flow table entry requests to the controller.Furthermore, a solution is also proposed to detect such an attack within the second repeated request and mitigate it immediately.This solution is highly efficient as the attack is detected instantly instead of calculating a threshold based on the number of flow entry requests to identify whether the traffic is attack traffic or a genuine one.
In [10], The rapid usage of the Internet for the last few decades has lead to the deployment of high-speed networks in commercial and educational institutions.As network traffic is increasing, security challenges are also increasing in the highspeed network.Although the Intrusion Detection System (IDS) has a significant role in spotting potential attacks, the heavy traffic flow causes severe technical challenges relating to monitoring and detecting the network activities.Moreover, the devastating nature of the Distributed Denial-of-Service (DDoS) attack draws out as a significant cyber-attack regardless of the emergence of Software Defined Network (SDN) architecture.This paper proposes a novel framework to address the performance issues of IDS and the design issues of SDN about DDoS attacks by incorporating intelligence in the data layer using Data Plane Development Kit (DPDK) in the SDN architecture.This novel framework is named as DPDK based DDoS Detection (D3) framework, since DPDK provides fast packet processing and monitoring in the data plane.Moreover, the statistical anomaly detection algorithm implemented in the data plane as Virtual Network Function (VNF) using DPDK offers fast detection of DDoS attacks.The experimental results of the D3 framework guarantee both efficiency and effect of the novel IDS framework.The publicly available CIC DoS datasets also ensure the detection effect of a single statistical anomaly detection algorithm against the DDoS attack.

III. METHODOLOGY
At first the datas are read from the input dataset by using pandas library.Then the data will be pre-processed by dropping null values, then we make feature selection by selecting input features for feeding it in the KNN module, we design a KNN model which can able to give high accuracy, the extracted features are inserted in to the KNN model and the machine gets trained.After training we predict DDos attack datas by feeding test datas into the model.
• In our work, the detection of DDoS is one of the most important processes.In this project, the main goal is to classify traffic as DDOs attack or normal traffic.
• For a wide range of problems, a large number of samples will be selected, but the sample size will not be large.
Algorithms work best when it comes to speed, but data sharing is not a problem.

IJCRT2212604
International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.orgf312 • The main advantage of this model is the use of a large amount of data for training and a small amount of data for testing, which can improve the accuracy of the classification. •

PREPROCESSING
Data preprocessing in Machine Learning refers to the technique of preparing (cleaning and organizing) the raw data to make it suitable for a building and training Machine Learning models.Pre-processing refers to the transformations applied to our data before feeding it to the algorithm.Data Preprocessing is a technique that is used to convert the raw data into a clean data set.
Missing data: Take a look for missing data fields, blank spaces in text, or unanswered survey questions.This could be due to human error or incomplete data.To take care of missing data, you'll have to perform data cleaning.
Data cleaning is the process of adding missing data and correcting, repairing, or removing incorrect or irrelevant data from a data set.Dating cleaning is the most important step of preprocessing because it will ensure that your data is ready to go for your downstream needs.

IJCRT2212604
International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.orgf313 Data cleaning will correct all of the inconsistent data you uncovered in your data quality assessment.Depending on the kind of data you're working with, there are a number of possible cleaners you'll need to run your data through.

FEATURE SELECTION
Feature Selection is the method of reducing the input variable to your model by using only relevant data and getting rid of noise in data.It is the process of automatically choosing relevant features for your machine learning model based on the type of problem you are trying to solve.Relief feature scoring is based on the identification of feature value differences between nearest neighbor instance pairs.If a feature value difference is observed in a neighboring instance pair with the same class (a 'hit '), the feature score decreases.

TRAINING MODEL
The process of training an ML model involves providing an ML algorithm (that is, the learning algorithm) with training data to learn from.The term ML model refers to the model artifact that is created by the training process.

PREDICTION MODEL
Predictive modeling is a mathematical process used to predict future events or outcomes by analyzing patterns in a given set of input data.It is a crucial component of predictive analytics, a type of data analytics which uses current and historical data to forecast activity, behavior and trends.This work is used KNN here.

ALGORITHM
 we consider in our implementation of the algorithm.Therefore, distance metric and K value are two important considerations while using the KNN algorithm.Euclidean distance is the most popular distance metric. Firstly, we will choose the number of neighbors, so we will choose the k=3. Next, we will calculate the Euclidean distance between the data points.The Euclidean distance is the distance between two points.

KNN
The K-NN working can be explained on the basis of the below algorithm: ❖ Step-1: Select the number K of the neighbors ❖ Step-2: Calculate the Euclidean distance of K number of neighbors ❖ Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

❖
Step-4: Among these k neighbors, count the number of the data points in each category.

❖
Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.

IV. EXPERIMENT AND ANALYSIS
The experimental result shows the output of the project.

Figure 2 Main window
The above figure illustrates the main windows of the project.There are four data plane and two servers.The servers are linked with SDN.When the attack is come to data plane, that node is colored as red.The intimation is going to SDN through server.

V. CONLIUSION
In this study, SDN-based detection systems developed for DDoS attacks were analyzed with machine learning systems.In the first proposed approach, by analyzing flow data, algorithms with 98.3% accuracy ensure the detection of attacks without discriminating the type of traffic.Among the proposed systems, the second approach proceeds by labelling DDoS attacks as normal traffic and attack traffic.With 97.7% sensitivity, KNN algorithms can perform this control by lightening the burden of the controller.
With the feature selection methods used in the study, initially 12 features were selected and the selected subset of features was trained using classifiers.The number of features selected was determined either by the algorithm itself or by the threshold value given to the algorithm.By changing this threshold value, different numbers of features can be selected and trained by the classifier.Different numbers of features can change the accuracy.In general, we observed that the performance rate of all models is above 80% and the algorithms used for this dataset are successful.At the same time, network browsing, attacks between layers, and malicious software can be detected on SDN with this approach.For protecting and improving SDN structure, the second approach can be employed.

Figure 1
Figure 1 Main targets of the Distributed Denial of Service (DDoS) attacks on Software Defined Networking (SDN).
Proposed flow diagram consists of Data Pre-Processing, Feature Selection, Trained Model, Test Data.MODULES  INPUT DATASET  PREPROCESSING  DEATURE SELECTION  TRAINING MODEL  PREDICTION MODEL The modules explanation is below  INPUT DATASET At first the datas are read from the input dataset by using pandas library.The dataset is downloaded from kaggle website. Dataset description 1.pkSeqID-row identifier 2.proto-Textual representation of transaction protocols present in network flow 3.saddr-source address 4.sport-source port 5.daddr-destination address 6.dport-destination port 7.seq-Argus sequence number 8.stddev-Standard deviation of aggregated records 9.N_IN_Conn_P_SrcIP-Number of inbound connections per destination IP. 10.min-Minimum duration of aggregated records 11.state_number-Numerical representation of feature state 12.mean-Average duration of aggregated records 13.N_IN_Conn_P_DstIP-Number of inbound connections per source IP 14.drate-Destination-to-source packets per second 15.srate-Source-to-destination packets per second 16.max-Maximum duration of aggregated records 17.attack-Class label: 0 for Normal traffic, 1 for Attack Traffic 18.category-Traffic category 19.subcategory-Traffic subcategory

Figure 3
Figure 3 Dataset details The above figure illustrates the details of the dataset.There are 794 rows and 19 attributes like source IP address, destination IP address, packet size and category etc.

Figure 4
Figure 4 Attributes Information The Above figure shows the attribute details with not null values and data type of the attributes.