Entropy Based Features Distribution for Anti-DDoS Model in SDN

: In modern network infrastructure, Distributed Denial of Service (DDoS) attacks are considered as severe network security threats. For conventional network security tools it is extremely difﬁcult to distinguish between the higher trafﬁc volume of a DDoS attack and large number of legitimate users accessing a targeted network service or a resource. Although these attacks have been widely studied, there are few works which collect and analyse truly representative characteristics of DDoS trafﬁc. The current research mostly focuses on DDoS detection and mitigation with predeﬁned DDoS data-sets which are often hard to generalise for various network services and legitimate users’ trafﬁc patterns. In order to deal with considerably large DDoS trafﬁc ﬂow in a Software Deﬁned Networking (SDN), in this work we proposed a fast and an effective entropy-based DDoS detection. We deployed generalised entropy calculation by combining Shannon and Renyi entropy to identify distributed features of DDoS trafﬁc—it also helped SDN controller to effectively deal with heavy malicious trafﬁc. To lower down the network trafﬁc overhead, we collected data-plane trafﬁc with signature-based Snort detection. We then analysed the collected trafﬁc for entropy-based features to improve the detection accuracy of deep learning models: Stacked Auto Encoder (SAE) and Convolutional Neural Network (CNN). This work also investigated the trade-off between SAE and CNN classiﬁers by using accuracy and false-positive results. Quantitative results demonstrated SAE achieved relatively higher detection accuracy of 94% with only 6% of false-positive alerts, whereas the CNN classiﬁer achieved an average accuracy of 93%.


Introduction
Digital services, such as banking, healthcare, education, entertainment, and national/local administration services to name a few, drive our modern society in which access to online services is often taken for granted. These services have become nonexclusive routines for almost everyone. Many of us check our official emails and social services first thing in the morning. The dependency of our day-to-day activities on these services introduces a large number of attacks on network services. The latest development in software, network, and system exploits and vulnerability tools has brought up new attack vectors to compromise access to an entire network or subnetwork. However, network defenders use up-to-date and the most sophisticated defence systems for their safeguard. Contrary to conventional host or service based attacks, Distributed Denial of Service (DDoS) attacks are considered more disruptive in nature. These attacks make targeted services unavailable by sending a significantly large number of malicious access requests to a service provider.
After resources being depleted, the service provider becomes unable to serve its potential legitimate users. Nowadays, DDoS is a commonly used attacking method which inflicts heavy financial and reputation losses [1].
With the advancements of virtualization-based computing, Software Defined Networking (SDN) has been widely adopted for security solution in various services and service provisioning models [2][3][4]. In SDN, most of routing and topological decisions are carried out by a separate entity called control-plane [5]. This decoupling approach has brought enormous benefits to network management and provides a feasible and effective solution to improve network efficiency [6]. Furthermore, the separation of data-plane and control-plane assists to manage a flexible and scaleable networking infrastructure to meet the day by day ever-changing modern business needs. Although the logical centralised architecture and its programability approach enables the SDN controller to detect malicious activities; however, the controller itself becomes vulnerable to DDoS attackers [7].
Most of the forwarding decisions are managed by the controller. The table in Open-Flow based switches consistently searches for new packets arrival, with a successful match, flow action is performed. If packet_in does not match then it is propagated to SDN main control-plane for detailed analysis. In case of DDoS attack, if the arrival rate of packet_in is significantly higher, then control-plane resources start to deplete, which results in discontinuity with data-plane and may overwhelm the controller. A single point failure, such as overwhelming the controller with malicious traffic could defunct the whole networking infrastructure [8].
Amongst existing security problems in SDN, DDoS attack is considered as one of the most urgent and hardest security issues [9]. So far, DDoS attack detection in SDN is well researched with approaches including [10][11][12][13][14][15][16]. However, most of these conventional studies are focused on attack detection and mitigation methods. Majority of work is based on time-based periodic detection-choosing a right time-period to detect an attack is very hard. If a large time-period is selected for attack launch then response time for detecting attack will be increased. This creates extremely large attack overload over deployed switches and its major SDN controller. In contrast, if the time threshold is set to a relatively low value, then deployed attack detection module will continuously run, which unnecessarily consumes controller resources, such as CPU, network up-stream and down-stream bandwidth. It also affects controller efficiency.
However, congestion at the controller is one of the major issues that could easily lower down the performance of deployed mechanism and easily left the entire infrastructure vulnerable especially for DDoS attacks. Currently, most of the research is not focused to improve the accuracy of the controller in SDN, as most of the detection modules work from the SDN controller. However, it is mandatory to rectify SDN controller efficiency with available characteristics of SDN. To solve the aforementioned issues, we propose a newfashioned anti-DDoS detection mechanism with entropy-based feature distribution in SDN. Our detection model comprises of Snort alert based Features, Entropy Calculation, Feature Distribution and Traffic Processing, and Machine Learning Classifiers. We summarised our contribution as below: • An effective anti-DDoS detection mechanism is proposed, to speed up major SDN controller unit accuracy so that deep learning model easily classifies trade-off between benign and unknown malicious code. • We have distributed specific traffic features with generalised entropy estimation of Shannon and Renyi formulas. • By utilising a Snort-Ryu implementation with entropy calculation, we acquired nonredundant traffic features. • As detection classifiers, we utilise well known deep learning classifiers, such as Stacked Auto Encoder (SAE) and Convolutional Neural Network (CNN) to compare the accuracy and False-Positive alerts with 30% and 60% attack rate with normal traffic.
Rest of the paper is organised as follow: Sections 2 and 3 represent the related literature work and background in details, respectively. Our novel entropy based DDoS detection model is provided with details in Section 4. Experimental results evaluations are presented in Section 5. The paper is concluded along with future directions in Section 6.

Related Work
In previous decades, most of the security research is performed mainly with legacy networks [17][18][19]. Different approaches have been implemented to detect and mitigate the DDoS traffic only in traditional network [20][21][22][23]. To meet the needs to digital services, traditional networks found computationally expensive, time-consuming and it requires more modern innovations for security implementation. OpenFlow enabled SDN infrastructure has been proved successful with various security challenges [14]. Although SDN provides a feasible and modern platform, the control-plane layer of SDN is not extensively researched from a security point of view. Due to the programmable model and logically centralised implementation, it brings new vulnerabilities and threats making SDN controlplane layer as an attractive target for potential intruders. Especially, massive DDoS attacks at control-plane of SDN based platform, which can result in the unavailability of the entire network [24].
One of the main challenges of network security and machine learning is to distribute and select optimal features [25]. The feature subsection aims to pick a feature subset that performs better within a certain condition of assessment [26]. According to the analysis of [25,27], feature selection can be classified from the technical viewpoint approaches. By adaptively validating the system with multiple combinations of features as inputs, also, existing researchers focused to collect and classify the ideal features for the optimal performance in proposed models.
To assess regular traffic flow, Mehdi et al. used the maximum entropy calculation methodology to address security challenges in SDN [11]. Experimental investigations was carried out by using OpenFlow NOX controller switches, using low-rate data traffic. Their main goal was to classify attack traffic in a home setting. In another research, investigators used the framework of entropy to predict the transmission of worms and threats through port scanning [13]. Besides this, Ref [10] suggested an entropy-based anomaly detection technique. In their proposed research, the identification module operated in the edge switches to lower down the overhead of the control plane. Kotani et al. suggested a packet filtering strategy to secure the controller [28]. Such a strategy identified the elements of the packet headers before the packet-in occurrence was forwarded. On the contrary, the strategy is ineffective unless the attacker produces new streams wherein the flows. Dong et al. developed a system for detecting the vulnerable applications for which attackers were connected [29]. Typically, the threshold was set in feature extraction, and any irregular variance of incoming traffic feature vectors helped to classify threat [30]. Mohammadi et al. suggested a prevention strategy to fight the TCP SYN DDoS attack targeting SDN. They used SDN's programmability [31] for identification purposes. The system, however, was also vulnerable to several other protocol attacks. DDoS attacks, as a comparison to other attacks, will cause a significant disruption of any sort of networking infrastructure [32].
Entropy is a common way of producing valuable traffic classification features and has been extensively seen in recent frameworks for DDoS attack detection [33]. Entropy is an analysis technique that scales the ambiguity about the content. In network activity, entropy uses a single value metric to identify the distributional variations in traffic [34]. This has been increasingly recognised that adequate analysis of such improvements can classify network anomalies [35]. A new study of identification showed that detection based on entropy has better detection efficiency than many other approaches [36]. The entropy-based features classification techniques are feasible and is widely used [37], which possess various significant features like effective and fast calculation, lower false alerts, and higher detection accuracy. In particular, entropy calculations are extended to input traffic attributes such as IP addresses of source and destination, the destination port of source and destination. For example, the high entropy value indicates that a significant disparity occurs concerning entropy specified at source address and that the low entropy value indicates a reduction in the source of traffic packets. This is valuable for the detection system, as a standard DDoS attack with many attack sources of a single target typically has a high variation of the source address and a low variation of the destination address relative to a regular traffic.
Identifying DDoS malicious traffic at the data-plane layer is difficult because Open-Flow enabled devices have no self-adaptive intelligence to segregate network traffic flows. Addition to this, attackers use easily available tools and hardware-assets [38]. This section presents a systematic literature of DDoS detection solutions, which are widely deployed in SDN control-plane and listed in Table 1. Most of the existing approaches have evaluated DDoS detection techniques by classifying packet traffic either legitimate or malicious and broadly categorised into entropy-based anomaly detection, signature-based, machine learning-based and hybrid detection. These approaches are deployed in SDN infrastructure to detect DDoS traffic. Maximum entropy, TRW-CB, Rate-limiting . [14] 2013 Proposed approach lower down the control-plane and data-plane overhead.
Interface migration technique.
[39] 2015 Proposed method reduces requests burden by utilising scheduling technique in SDN.
Malicious traffic redirection approach.
[10] 2015 Model running on edge switch to detect DDoS with low control-plane burden.
[ Some authors utilised entropy-based statistical techniques to analyse traffic [10,44,45]. The authors of [42,43] proposed an entropy-based technique to detect DDoS at POX controller during initial attack stage. This proposed work has a limitation, once the number of hosts increases the proposed model generates false positive alerts. The computational overhead from the controller is reduced by deploying fast-entropy approach with flowbased model [44]. The authors in [39] proposed a scheduling based method to detect DDoS, where a single processing queue is divided with subsets of k logical queues, each of them belongs to the network switch. During heavy traffic burst, the SDN controller utilises logical queues to satisfy scheduling request. The authors of [11] utilises maximum entropy estimation technique for classifying normal traffic distribution to solve home office network security concerns in SDN. Most of the experiments were deployed with OpenFlow enabled switches with NOX SDN controller. In [13], authors have used entropy methodologies for identification of port-scan attacks and worm propagation. Another entropy-based anomaly detection solution was proposed by the authors of [10], to detect DDoS attacks in SDN. This work more likely focused to reduce control-plane workload.
Some of other, well-known methodologies have been published to detect DDoS traffic with SDN based architecture, such as self-Organising Maps (SOM), which is Machine Learning (ML) based approach to detect malicious traffic [12]. This work uses only six features to classify the malicious attack traffic, i.e., Average Packet per flow (APf), Average Duration per flow (ADf), Average Byte per flow (ABf) etc. Similarly, Refs. [41,46], also utilised different ML-based approaches to classify traffic patterns. The authors in [16], proposed adaptive flow collection based DDoS detection model in SDN. This methodology utilises OpenSketch traffic measurement tool to create a hash table for measuring traffic. This approach uses three stages based pipeline process to gather traffic samples for identifying malicious traffic instead of using traffic flow sampling. In SDN, most of the DDoS detection solutions are carried out with the collaboration of ML and knowledge-based techniques to identify malicious attacks. Generally, ML-based techniques classify attack flows based on specific features. ML-based anomaly detection models are mainly suitable for small networks. For larger networks with heavy traffic flow overhead, are unmanageable for traffic collection and analysis inside controller [13]. Although during the attacking scenario, response time is more important to improve detection performance; however, ML performance is also dependant upon trained datasets and its features diversity.
Recently, ML-based detection techniques have been widely applied in SDN to address the challenges of DDoS detection. Authors in [42] proposed a semi-supervised one classbased Support Vector Machine (SVM) to classify anomalies, here the small quantity of malicious traffic is utilised as compared to normal traffic. This model is feasibly capable to detect outliers from the initial background traffic phase, which helps to easily manipulate majority of the traffic characteristics. The authors have used the Stacked Auto Encoder (SAE) to train datasets; however, it consumed a lot of time to process model iterations. Similarly, the authors of [43] proposed high precision DDoS detection model, which is based on Xboost classifier SDN. The proposed approach analysed most of the DDoS attacks to cater feasible and effective solution. In SDN, POX controller's grab bag connection, most of TCP, UDP, and ICMP flooding attacks were sent for manipulating connection records which enabled us to evaluate DDoS classifiers.
According to [47], a packet_in filtering approach can protect control-plane. This technique helps to list most of contents extracted from packet header field prior to sending the packet_in message. However, when intruders launch very distinctive flows in which all packets have different field values rather than specified values of the proposed technique, then it fails to capture malicious records. Authors in [29] deployed detection model for locating compromised interfaces, which are used by attackers during attack time. Most of anomaly detection model uses fixed threshold values, once incoming statistical features deviate with abnormal conditions it is identified as attack traffic.
From the literature survey, it can be seen that some research has been carried out for the detection of DDoS attacks by utilising traffic feature distribution with entropy-based methodologies. By utilising feature distribution with the help of entropy calculations over existing detection techniques, primarily we can reduce redundant and unnecessary features processing overhead and improve the detection requiring relatively less time.
Convolutional or convolutional neural networks (CNN) [48] are known as enhancements of conventional feed forward networks (FFNs). These were initially tested for object recognition using Convolution 2D layers, 2D layer pooling and a totally interconnected layer. This was accompanied by the natural language analysis of the Convolution 1D layer, the pooling of 1D layer and the completely connected layer [48]. Whereas the conventional CNNs used mostly for image analysis with the help of 2D, 1D, as CNNs can be used effectively for time series processing, since time series in 1D can effectively derived by convolutions [49]. In our proposed study, we utilise the 1D CNN as deep learning classifier to identify security threats in complex multivariate and distributed features based on entropy estimation.
In CNN, convolution is used as primary building block, where entropy based input features converted as 1D time series input vector of z = (z 1 , z 2 , z 3 , . . . , z n ). All distributed features based on entropy calculation are fed towards the fully connected layer of CNN, a fully connected layer comprises on the soft-max function, which actively utilises the probability distribution with input features vector one by one. CNN layer with fully interconnected soft-max function is provided as below in Equation (1).
where hl utilises the highest feature value connected to each input vector of z = (z 1 , z 2 , z 3 , . . . , z n ), and b 0 is used for non linear activation function. The SAE consisted of several self-encoders-input or visible layer, a hidden layer, and a output layer also called reconstruction layer. The input data is loaded into the visible layer. The construction layer is inducing output. The SAE architecture is special in design relative to CNN, DBN, and RBM deep learning models. In the first place, SAE is made up of a basic and straightforward structure and is trained in a much shorter time compared to the other described Deep Neural Network (DNN) algorithms [50]. Second, because of the nature of the unsupervised learning strategy, SAE is not using labelled datasets. On the other hand, CNN is based on supervised learning, while DBN and RBM use supervised learning. Finally, the SAE algorithm employs outputs as inputs, and detailed features components can be retrieved with a useful training strategy in the SAE. This paper uses comprehensive features of an SAE method based dataset to increase the rate of identification of DDoS attacks in SDN. SAE as DNN uses sparse auto-encoders and soft-max classifiers to extract and label unattended data.
In Equation (2), β values denotes sparsity penalty for the weight coefficient via Kullback-Leibler divergence. This divergence function enables to input features vector such as z = (z 1 , z 2 , z 3 , . . . , z n ) to process if there is a possibility of lower average activation function, when ρ = ρ ∧ J then this function comprises the minimum values of 0. During training stages of input layer values, ρ ∧ J is utilised for an average activation with Jth values and ρ is used for sparsity coefficient at hidden layer.

Detection Methodology in SDN
This section presents the proposed DDoS attacks detection methodology and its implementation details. The detection of the model relies on the specific DDoS feature distribution approach, which is achieved by generalised entropy (GE) calculation with Shannon and Renyi formulas, details of entropy distribution is provided in this section. Our detection methodology comprises: data acquisition with Snort-Ryu, feature distribution with entropy calculation, data processing and traffic classification with SAE and CNN. Overall, the general overview of our proposed DDoS detection methodology presented in Figures 1 and 2 elaborates the implementation detail.

DDoS Traffic
Normal Traffic

Snort-Ryu Based Data Acquisition
This section comprises on the huge amount of live datasets acquisition from the network, due to heavy network traffic human inspection and data analysis is unmanageable. The Snort [51] is capable of being used as a signature-based detection engine or as a log_tcpdump module with various output files. Although, log_tcpdump can be utilised for storing test-bed datasets, which is limited to store only 128 MB of total packets. We extended storage module of Snort by implementing a Barnyard2 logfile. We have utilised Snort-Ryu modular implementation to collect the test-bed datasets. The Snort [51] engine consists of various traffic attributes, such as timestamp, sig_generator, sig_id, sig_rev, msg, proto, src, srcport, dst, dstport, ethsrc, ethdst, ethlen, tcpflags, tcpseq, tcpack, tcplen, tcpwindow, ttl, tos, id, dgmlen, iplen, icmptype, icmpcode, icmpid, icmpseq. Our proposed DDoS detection model based on feature distribution is dependant on the features, such as time window, protocol, source IP, address, source port address, destination IP address, destination datagram length, port address, priority, etc. By utilising Snort, our model acquires relevant features in collaboration of entropy calculation. Snort utilises two different modes to capture distinct features for malicious and benign network traffic. SM-1 and SM-2 modes depicted in Table 2

Entropy Calculation
The proposed methodology is depicted in Figure 2. The main aim of our model is to capture various types of DDoS attacks, for this purpose, the proposed model mainly relies on the finding most common attributes from the flows. One of the most common attributes is source IP addresses which generate attacks. Once various types of features are collected with SM-1 and SM-2 modes, the proposed model takes advantage of Shannon entropy estimation to generalise most relevant types of features that lead to successful DDoS attacks detection in SDN. This model utilises H α (Z) Shannon entropy formula to identify relevant features from Snort model.
Addition to this least values of entropy is estimated in case of small uncertainty. The event distribution randomness is evaluated with Renyi entropy metric order with α, this enables Shannon entropy estimation towards more relevant features generalisation. The Generalised Entropy (GE) of discrete variables Z with possible number of outcomes, such as, z 1 , z 2 , . . . , z n , which can be accumulated, i.e., ∑ N i=1 , 0 ≤ Z i ≤ 1, then entropy of Renyi with order α can be defined as below: Here values of α ≥ 0 and Z i ≥ 0. Different entropy calculation is performed in order to quantify various α orders. With substitute of α = 0, we get maximum generated information and with substitute of α = 1 we achieve GE values, which is given below: (4) is known as Shannon Entropy, when we put α = 2, then GE expression is depicted as: Equation (5) is known as Renyi entropy estimation, the authors of [52] have depicted relationship between Renyi and Shannon, where GE estimation values relies on α values, such as α = 1 or α = 2. GE values exponentially increase with various probability distribution as compared to Shannon entropy probability [53,54]. Following this work, we classify our traffic as a benign and malicious probability distribution. To get GE values, our work also manipulates both different probability distribution values with a combination of attack and benign properties. Once we get higher uncertainty events then probability results in more GE information as compared to Shannon entropy [55].
By adjusting α values in GE, we can get different values of entropy to meet our DDoS detection methodology. This paper utilises Information Distance (ID) with the help of Renyi and Shannon GE estimation values via α = 1 or α = 2. this helps to estimate event's similarities-the methodology is depicted with two different probability distribution such as P ID = {p 1 , p 2 , . . . , p n } and Q ID = {q 1 , q 2 , . . . , q n } as below: The ID equation can be derived as below: ID always focuses non negative values, such as α ≥ 0. However, if both probability distribution are similar then D α (P ID , Q ID ) = 0, GE entropy expression can be achieved by varying α orders as given below: The Equation (9) is known as Kullback-Leibler divergence (KL) distance. This equation is utilised for measuring ID with identical, triangular inequalities and symmetrical properties of KL divergence. However, GE and ID both use these properties for DDoS detection to rectify DDoS most relevant traffic features with the help of Equations (5) and (7). In our approach, probability distribution is calculated by H α (Z) as shown in Equation (3), where z i depicts packets header variation between source and destination communication junctions that comprises of Src-IP, Src-Port, Dest-IP, Dest-Port, Source-Bytes, Destination-Bytes, TTL, Flags, proto, Distinct Datagrams. In Figure 3, a flow diagram is provided for the distribution of features. This is achieved by combining Shannon entropy and Renyi entropy formula. Our proposed work utilises probability distribution measuring approach to generalise relevant features from DDoS and malicious packet header. We manipulated all incoming packets with GE and ID metric to formulate our SDN based test-bed network to effectively detect DDoS traffic. Major aim focuses to lower down redundant features with GE and ID (information distance) entropy estimation such as:

Features Distribution and Traffic Processing
This section presents features selection and processing of datasets for our effective DDoS detection model. Major focus of selecting features relies on GE and ID entropy estimation to qualify relevant features as discussed before. However, an effective DDoS classifier needs many important kinds of traffic features distribution. This kind of traffic anomalies randomly changes during the distribution of most feasible addresses, ports, data length etc., in an observed traffic area. The overview of selected traffic features is presented in Table 3, which is achieved by distributing specific feature from test-bed datasets with the help of Shannon and Renyi joint entropy.
In DDoS detection event, the packet flowing rate is significantly higher than benign traffic, subsequently, packet diversity is also changing to generate entropy. In real-time traffic which is a complex form of different data rate, which remains stable with packet diversity and it also results irrelevant data pattern flows due to different traffic services. This is why entropy values are exponentially changed with respect to time. Due to the fact of such variation, setting a proper threshold to detect DDoS in the network is unmanageable.
We developed dataset to evaluate our proposed method. Our dataset is a combination of real-time legitimate data and synthesised malicious traffic, which is emulated with DDoS attacking tools comprises of Metasploit, Scapy, HPing, and Low Orbit Ionic Cannoin (LOIC). We run Scapy and HPing DDoS scripts with various attributes of DDoS from the remote virtual machine as shown in Figure 2. For the validation purpose, we simulate various DDoS attacks on widely targeted ports. Most important realistic network traces are less publicly available due to privacy concerns and to label properly which require some manual entries. With the help of deep domain knowledge and feasible tools and methodology, this approach enables to create realistic dataset. Our proposed model calculates and classify benign and malicious traffic using entropy estimation followed by two major components as window size and two threshold values. For window size, we utilised number of packet received. For entropy values we estimated the incoming packets occurrences with windows size, such as, if number of frequencies for each (dst IP ) is equably distributed then maximum value of entropy is established. If there is a sudden deviation, such as rapid decrements in entropy values for the same network, a malicious traffic flow event may occur. Similarly, main function of Algorithm 1 is to calculate the average entropy values for (GE, ID) for input features, which are presented in Table 3. SDN programability approach via collaboration of GE and ID also help to rectify specific features with more malicious attributes. Algorithm 1 helps to compute average entropy values (GE, ID), referred as AverageEntropy (GE, ID) in Algorithm 1.
In our proposed Algorithm 1, AverageEntropy (GE, ID) is calculated with two thresholds, such as lower δ 1 and upper δ 2 for every input stream with different time. Our aim is to classify and distribute specific features from incoming network stream. The distributed features are processed with the help of two different time slots, which comprise of deviation beyond normal ranges of δ 1 and δ 2 . Our model uses entropy deviation such as a sudden rise of values or a drop of values as compared to predefined threshold values between 0 and 1. For example, during the event of DDoS attacks such as port scan on a specific location will result in dispersion known as entropy. We utilised two threshold values, lower δ 1 , which is used for GE entropy values calculation with, and upper δ 2 is also used for ID values calculation. GE and ID are calculated for benign and malicious traffic with the help of Equations (4) and (7), such as if ID ≥ δ 2 or H α (Z) ≤ δ 1 .
In Algorithm 1, entropy estimation is presented which works with time-based sliding. The Algorithm 1 initialises with time windows size setting, which effectively maintains resources utilisation and network flow deviations. We used only three minutes time slot, as using larger time needed more resources for processing and storage. The Algorithm 1 produced average output with estimated entropy of feature set S, which used only two threshold, such as Lower δ 1 and upper δ 2 for every S feature order. Major aim was to focus to classify and maintain very specific features from incoming network. The rectified features from network connection were processed with the help of two feasible tasks of time slots, which had AverageEntropy values beyond normal ranges of δ 1 , δ 2 . First of all, each abnormal subset of network traffic time slot was cleared by adding empty values rows to maintain a stable and unique column and row-based format. Secondly, cleared subsets were normalised by using MinMax function. The MinMax function generated feature values between scale ranges of 0 and 1. The MinMax feature normalisation function is depicted below: where Z represents subsets of malicious traffic features and z i represent current values of Z to make normalisation. The term Z nor i is called final normalised features within the ranges of 0 and 1.

Experimental Setup
Our research experiment is conducted with three different virtual machines on a workstation by using Intel Xeon X5560 CPU with 2.88 GHz processor and 16 GB RAM (DDR3 ECC-Registered Memory PC3-12800MHZ). We run TensorFlow 1.4V and Mininet on Ubuntu LTS 16.04-64 bit operating system. The proposed functionality is illustrated in Figure 2. We use VMware player to create virtual machines: VM1 with 192.168.202.x1 IP address, VM2 with 192.168.202.x2 IP address, and VM3 with 192.168.202.x3 IP address. In our proposed model, a python script is used to generate attack and benign traffic with the Scapy tool. We also generated benign traffic with the help of normal web searches and browsing, and video streaming to validate the benign traffic of Scapy. We limited our model bandwidth up to 50 MB for 5 min interval to evaluate the performance, where TCP stood at 9 MB, UDP at 32 MB, and ICMP nearly at 2 MB, TCP, UDP, and ICMP minimum range were 380 Kbit/s and maximum range stood at 700 Kbit/s during attacks.
In VM1, Mininet as a the network emulator is utilised for creating 6 hosts with 6 OpenFlow switches. Kernel name-space properties of Mininet enables to prototype overall network environment within a single workstation. In Mininet, each process has its own network interfaces and routing table, these features enable to virtualise all network elements in Kernel. We connect these switches with Ryu controller with the help of OpenFlow (OF) version 1.3.
In VM2, the Ryu controller is implemented with Snort-IDS as Network Intrusion Detection System (NIDS) and entropy algorithm, which is presented in Section 3. This VMs play a vital role to collect networks traffic, then apply entropy probabilities property for feature distribution. The Snort-IDS collects every incoming_packet in Barnayrd2 log file, then Ryu based GE and ID entropy estimation reduces redundant features from all network traces. Our detection model relies on more specific features, which are collected with GE and ID feature distribution for the deep learning classifiers. SDN controller is deployed in VM2, which centrally handles all virtual machines of our test-bed. Network policies are installed via REST APIs. In our system, VM1 considered as data-plane and VM2 as a control-plane. ovs-ofctl utility is used to insert network policies in the switches table, addition to this, the ovs-ofctl utility is also utilised for monitoring and administration purposes between data-plane and control-plane.
In VM2, Ryu uses two network interfaces, one in promiscuous mode on eth0 interface to collect all OpenFlow traffic traces with Snort from VM1, while another Eth1 is utilised as a port mirror for entropy calculation on Snort-Barnyard2 packets. Snort as NIDS plays a very vital role to acquire all raw network traffic from our proposed model. Snort switch (switch_snort.py) application is implemented on the top layer of Ryu controller, which helps to support Layer L2 switch code and also redirects feasible traffic by using Open-Flow enabled promiscuous mode. The Ryu controller receives Snort alerts by utilising unixsock = f alse, which helps to collect network packets and then store log-file in Barna-yard2. This helps to manipulate data-plane traffic.
VM3 generates malicious traffic remotely, as illustrated in Figure 2. It utilises Scapy, LOIC, and Metasploit DDoS penetration testing frameworks such as Network Mapper (NMap) and Nessus Vulnerability Scanner. Scapy is considered powerful and feasible to launch real flooding attacks; however, our proposed work uses Scapy and LOIC to launch various TCP, UDP, and ICMP traffice. To validate our proposed work, we perform an attack and normal traffic on same VM1 data-plane area which is directly connected on the main SDN controller with deployed parameters. A python script is used to generate attack traffic and benign traffic with Scapy, where we select different hosts and source nodes during all injection. Our work also generates benign traffic by web activities such as web searches and browsing, and video streaming.
The probability of GE and ID entropy is applied on all test-bed datasets to acquire more special features set with no redundant features attributes. These features are which are provided in Table 3. Tshark and Tcpreplay tools are utilised to manipulate and analyse benign and malicious traffic individually. Once malicious traffic traces are classified with GE and ID as discussed in Section 4, then we categorise normal and benign CSV files into training and the testing datasets shown in Table 4. All datasets have values of non-zeros numbers due to unity based MinMax normalisation. We break our proposed CSV datasets into 0s and 1s values, normalisation as defined in Equation (10). After normalisation we utilise SAE and CNN deep neural network models to classify as an attack and nonattack values.

Performance Evaluation
In this work Snort IDS is also used for collecting all data-plane traffic from testbed. Snort with two different modes: SM1 mode to acquire only malicious traffic, and SM2 mode for acquiring benign traffic. These two modes are configured with specific signature rules of Snort detection engine as illustrated in Table 2. We process Snort alerts for feature distribution and generalisation by using GE and ID matric to analyse and calculate Src-IP, Src-Port, Dest-IP, Dest-Port, Source-Bytes, Destination-Bytes, TTL, Flags, proto, Distinct Datagrams.
In order to validate the effectiveness of detection model with proposed GE and ID feature distribution on the SDN controller, in proposed work, we are using two different scenarios. In the first one, we have used different attacking intensity, which is launched from a single host but remotely connected with Mininet data-plane VM1. We randomly generate attacks by using Scapy, LOIC, and Metasploit. The first scenario uses 20%, 30% and 40% attack rate. The second scenario uses 60% and 70% malicious traffic. In both scenarios, we generate benign traffic by using normal web searches and browsing, and video streaming. These searches are performed within VM1, where test-bed dataplane is created with Mininet. We maintain the attack intensity by using the following percentage equation: In this equation, Z attack represents the attack packets and Z total represents total number of packets flowing in our test-bed. We run our code 10 times in each case for setting threshold values. We find the False-positive (FP) rate decrements, but the False-negative (FN) rate is stable. Table 5 represents threshold values during different attacks scenario. Our model mainly relies on average output with input data (estimated entropy values)-it uses two thresholds, such as lower δ 1 and upper δ 2 , for every input data collected based on different time. Our aim is to classify and distribute specific features from incoming network stream. The distributed features are processed with the help of two different time slots, which comprise of deviation beyond normal ranges of δ 1 , δ 2 . Our model uses entropy deviation such as a sudden rise of values or a drop of values as compared to predefined threshold values between 0 and 1. For example, during the event of DDoS attacks such as port scan on a specific location will result in dispersion known as entropy. Following steps are carried out to set up threshold values:

•
We calculate possible maximum attack traffic values, this is achieved by combining attack traffic mean entropy values and confidence interval values. • After taking the difference between these values, we derive δ 2 values for mean and standard deviation.
In Figure 4, we utilise 30% and 40% attack rate, while Figure 5 uses 60% to 70% attack rate. Each points on horizontal line represent windows size and vertical line represents entropy values, such as E values . In Figure 4, blue curve represents the normal traffic and orange curve represents malicious traffic. We stabilise network by injecting manipulated malicious traffic remotely and run Algorithm 1 10 times. However, benign traffic entropy values is common in all attack rates as shows in Figures 4 and 5.
We have considered different attack scenarios with a single victim and multiple victims. In single attack victim only single host is under attack. On the other hand multiple victims attacks, we have launched attacks on 5 hosts. During simulation, the deviation and sudden drop of entropy values are considered to be used for traffic feature distribution. The rapid drop into flows represents malicious activity based on this phenomenon. From Figure 4, it can be seen that entropy values drop is least significant in case 30%, 40% of attack rate. However with 30% attack traffic, the mean entropy values are found as 0.77 on 53 windows interval, 0.72 on 57 windows interval, 0.76 on 63 windows interval, 0.72 on 71 windows interval. After every 25 windows intervals, entropy values dropped at average values of 0.72 to 0.77. With 40% attack rate, we found mean-values of entropy drops from 0.82 to 0.74 on windows interval of 55, after every 25 consecutive intervals, mean values repeatedly fall between 0.73 and 0.71.
Moreover, there is significant entropy values change in the case of 60%, 70% of the attack rate. In Figure 5a, the mean value of entropy drops to 0.64% after every 25 windows intervals. Similarly, Figure 5b, the mean value of entropy exponentially drops to 0.35% on 55 windows interval after 25 consecutive windows intervals mean values constantly falls to 0.35%. As compared to benign traffic mean values, attack traffic mean values drop around 0.50%, which is significantly higher with 70% attack rate through all experiments. This entropy value is far less than the threshold values, which is very feasible to classify this event as malicious.
Benign traffic is common in all attacks experiments, it is fixed threshold values to compare entropy values deviations. In Figures 4 and 5, entropy deviation values are less than the fixed threshold or some times it is higher than the fixed threshold. However, we have acquired traffic and classified based on mean values, which are significantly less than a fixed threshold as already illustrated in Figures 4 and 5.

Results
The major objective of our proposed work is to improve the accuracy of the deep learning-based model to detect various DDoS attack traffic, for which we mainly rely on malicious traffic features distribution and manipulation. For demonstration purposes, we used Snort as a NIDS, the practical application of this work is independent of any specific type of NIDS.
We have utilised collaborative approaches of Shannon and Renyi entropy such as GE and ID with SDN based programmability. This approach is able to lower down the unnecessary and redundant features from malicious traffic, which enable classifier to improve detection accuracy rate. Similar work has already been discussed in [40], the authors utilised Shannon entropy calculation as detection metric in their proposed work. However, in our contribution, we have utilised combined entropy calculation with Shannon and Renyi formulas to generalise DDoS traces from test-bed with traffic distribution approach.
Based on the various parameters such as Table 3, we classify our specific malicious traffic features comprise of Src-IP, Src-Port, Dest-IP, Dest-Port, Source-Bytes, Destination-Bytes, TTL, Flags, proto, Distinct Datagrams. We carry out our experiments based on the above design, and utilise four different attack rates with normal traffic also depicted in Table 5, the window size is fixed to 180 seconds for attacking and normal hosts used in our network.

Selecting DDoS Classifier
In order to get feasible detection results, we selected two SAE and CNN. First, we acquired input vector based on entropy feature distribution and then compared SAE and CNN detection accuracy and False Positive (FP) rates based on collected datasets (Table 3).
Our work uses different malicious rate such as 30% and 60% of attacks intensity with benign traffic, which results in a higher level of unbalanced features. In such cases, SAE and CNN detection algorithms fail to classify attacks. To overcome this issue, the authors of [56] have improved higher accuracy of detection model by using weighted loss function to stabilise the test-bed features. Our proposed SAE and CNN detection model uses well known metrics such as precision, recall and F1 score on datasets specified in Table 3. These metrics are very useful for the goodness of the detection rate of the proposed model. With the help of the confusion matrix we have provided parameters and its entries as below: 2.
Recall (R): To calculate percentage of predicted attacks with all available attacks. However, higher value of R is very important: 3. F-measure (F): To calculate model accuracy by utilising harmonic mean with both of precision (P) and the recall (R) values, higher F-value is considered feasibly important: Our work uses two types of CSV files with different malicious and benign traffic. In the first one, we combined 30% attack rate with normal traffic and the second one utilised 60% of attack rate with normal traffic as shown in Table 4. These CSV files are also divided by training and testing section followed by 80% and 20% rule of training and testing, respectively. We adjusted the SAE detection algorithm with only three hidden layers and these three hidden layers are set as descending order with the following hyper-parameters which are depicted in Table 6. In this paper, we have compared performance metrics by using SAE and CNN detection classifiers. We compare the SAE and CNN separately with the same traffic consisting of TCP, UDP, ICMP packets depicted in Figures 6 and 7. The first case uses 30% attack combined with normal traffic and the second case, utilises only 60% attack traffic with normal for better evaluation purpose.  Figure 6 represents the confusion matrix for SAE and CNN detection model for 30% attack rate. The Figure 6a confusion matrix of SAE depicts 91.47% of TP detection rate with less than 9% of FP rate. SAE model achieves more than 12% of FN rate with first case traffic. Similarly, as shown in Figure 6b, the CNN model is not receiving higher correct detection as compared to SAE. The CNN detection model with 30% attack traffic can only classify only 80% and 82% of TP rate and TN rate, respectively, the FP rate is around 17% of false triggers.
Comparing performance metrics of SAE and CNN with confusion matrix with 30% attack, SAE achieves 82.4% accuracy; whereas, CNN detection model achieves only 76.52% accuracy. Although SAE detection model is lightweight with only three weighted hidden layers, which require time for training and testing as compared to CNN model.
In Figure 7, comparison of SAE and CNN confusion matrix is presented with 60% of attack rate, where Figure 7a shows 97% and 91.22% of TP rate and TN rate, respectively for SAE. The FP rate is also acceptable, which is only 6%. Moreover, Figure 7b illustrates more than 92% and 95% of TP rate and TN rate, respectively for the CNN model. With 60%higher attack intensity, CNN false alerts are also less than 10%. Although SAE achieves an accuracy of 94%, on the other hand, CNN detection model also achieves nearly 93% detection accuracy. However, in this case, we combined more attack traffic as compared to benign traffic with 60% of malicious ratio, so both models perform well by utilising specific feature distributed traffic, which was obtained by entropybased GE. In Table 7, we have depicted an average conclusion of a confusion matrix for SAE and CNN, addition to this, it also provided the accuracy results of both classifiers without utilising proposed entropy-based feature distribution for validation purpose.
We have validated our test-bed performance, with a comparison of selected classifier accuracy detection with GE entropy-based feature distribution and without entropy-based feature distribution, as illustrated in Figure 8. In Figure 8a, we provided SAE and CNN classifier accuracy results with 30% attack intensity. It can be observed that accuracy of both SAE and CNN classifiers with normal features is comparatively less than entropy-based features class. For 30% attack rate without entropy-based feature distribution, the CNN classifier achieves the average accuracy of around 62%, similarly, SAE classifier receives an average accuracy of 68%. However, when we performed classifiers test with GE entropybased feature distribution with 30% of attack intensity, SAE classifier performed better with 84% accuracy scores, and CNN classifier was at an average of 77% accuracy score.
With 60% attack rate, both classifiers perform very well due to low false results, as depicted in Figure 12a, we observed total average rate for FP and FN as 6%, and 11%, respectively in SAE. However, we did not observe any drastic change in the results of CNN evaluation, as CNN classifier also achieved FP alert with 9% and FN alert 7%. Overall, we observed that higher attack intensity enabled both classifiers to achieve significant results, but SAE achieved higher accuracy due to its lightweight processing.  Although, with the attack intensity of 60% as shown in Figure 8b, SAE and CNN accuracy was recorded as 94.3% and 93%, respectively, this result was obtained with entropy-based feature distribution. In contrast, we also run both classifiers before proposed entropy algorithm, the CNN classifier and SAE classifiers achieved an average accuracy of 86%, which was nearly 10% accuracy less as compared to GE entropy-based feature distribution.

Performance Evaluation with CPU Usage at Controller
As our model is focused to improve Anti-DDoS model accuracy by reducing controller burden. We used combined entropy to distribute malicious features from data-plane to control-plane. In order to evaluate model performance we provided the comparison of CPU usage, after entropy calculation to distribute malicious traffic from test-bed, and before entropy calculation, which is depicted in Figures 9 and 10, respectively. During the event of DDoS attacks, control-plane was under heavy traffic as attack incoming-packets to SDN controller is measured in MB. Moreover, Shannon and Renyi entropy generalisation was deployed in the main controller, so that every attack incoming-packet was calculated with GE to distribute more specific features. The Figure 9a, represents the CPU utilisation with only 30% attack rate traffic with corresponding incoming-packets in megabits (mb) and outgoing-packets in kilobytes per second (kbps). In Figure 9a, the average CPU is utilised between 75% and 81% of the total.
As shown from the graph, when incoming-packets reached to an average of 4 Mbit/s, then nearly 80% of total CPU was used. When incoming-packets reduced to average of 2.6 Mbit/s after 5 s fraction then CPU fluctuated between 60% and 75%. At some time intervals of around every 55 s CPU utilization was as lower as 50%. Following entropy calculation, when we doubled the attack rate of 60% to our test-bed, then average CPU utilisation increased up to 85% and 93% as illustrated in Figure 9b. It represented around 93% of CPU usage during the event of around 9 Mbits/s incoming packets. When incomingpackets reduced to around 5 Mbit/s then CPU utilised was 85%. However, as the graph shows that we received only 60% of minimum CPU consumption. Outgoing-packets during both attack rates scenarios stood almost common, in Figure 9a,b, outgoing packets were changing between 80 and 96 kbps. In our work, we also calculated CPU usage before entropy-based feature distribution with respect to incoming packets and outgoing-packets of 30% and 60% attack rate traffic. The Figure 10a, shows CPU usage with only 30% attack rate without entropy calculation, the graph depicts the average CPU of 52% during the event of 3 Mbit/s incoming-packets towards proposed design. The CPU utilisation was 65% when incoming-packets reached to around 5.5 Mbits/s level, minimum CPU usage was calculated 45% in some instances. Similarly, Figure 10b depicts CPU usage with 60% of attack rate without GE calculation. It shows nearly 80% of CPU was found busy with 5 Mbit/s then it frequently increased up to 90% of total CPU with 9.8 Mbit/s incoming-packets flows.
It can be seen from Figure 9a, feature distribution with entropy generalisation used 20% more CPU than the normal traffic, when we used low-intensity attack rate such as 30% attack rate. On the other hand, when we increased attack intensity to 60%, then average CPU was around 7%, after comparing values from Figures 9b and 10b. Overall, the distribution of features with entropy generalisation was not consuming more CPU resources with higher attack intensity in test-bed.
In another experiment, we evaluated results of FP and FN reports with 30% and 60% of attack rates, we provided a comparison between SAE and CNN classifiers. We run both algorithms 10 times to acquire different results as depicted in Figures 11 and 12. With 30% of attack rates, the average number of FP alerts and FN alerts were 10% and 12%, respectively for SAE classifier as illustrated in Figure 11a. Similarly, from Figure 11b, we can observe that 30% of attack rate in the classifier of CNN, represented the average rate of FP alerts as 17% and FN alerts as nearly 10%, which was slightly higher than SAE classifier.

Conclusions and Future Directions
In SDN-based environment, control-plane is always under severe threats, due to heavy flows from the attacker. Control-plane, centrally manages and manipulates packet-in handling, data collection, classification algorithms and other traffic manipulation tools. In the event of DDoS attack, it fails to run all these implemented elements which cause low accuracy and higher false alerts. For the anti-DDoS model, it is very important to identify attacks as soon as possible otherwise massive flows of packet-in events will start to deplete the controller resources. To overcome this issue, we have utilised feature distribution technique by entropy generalisation of Shannon and Renyi combined formula. Although Shannon entropy is already used in existence work for classification results, our work used Generalised Entropy (GE) for the purpose of Information Distance (ID). This combined entropy technique helps to reduce controller overhead as GE removes redundant and unnecessary traffic feature, which enables the SDN controller to identify attack packets effectively so that networks regardless of it size, can effectively mitigated DDoS attack. Entropy calculation utilises around 25% more CPU, when we merge 30% of attack traffic. Our implemented GE technique to distribute traffic features uses only 5% more CPU when attack intensity was 60% in test-bed. This is due to the fact that we fixed TW = 3 threshold, which is higher time windows and is feasible to manipulate and distribute huge attack traffic. Our work uses two well-known classifiers SAE, and CNN to perform classification with distributed traffic as provided in Table 3. With 60% attack intensity, SAE and CNN achieved an average accuracy of 94% and 93%, respectively with only 6% of FP alerts in SAE traffic classification.
In this work, the deployment of the NIDS was fixed. As future work, we will explore the impact of NIDS placement within the network on detection rate and accuracy. We will also explore the practical implications of using various types of network sensors used by the industry-open-source and proprietary.