Effects of Machine Learning Approach in Flow Based Anomaly Detection on Software Defined Networking

Recent advancements in Software Defined Networking (SDN) makes it possible to overcome the management challenges of traditional network by logically centralizing control plane and decoupling it from forwarding plane. Through centralized controllers, SDN can prevent security breach, but it also brings in new threats and vulnerabilities. Central controller can be a single point of failure. Hence, flow-based anomaly detection system in OpenFlow Controller can secure SDN to a great extent. In this paper, we investigated two different approaches of flow-based intrusion detection system in OpenFlow Controller. The first of which is based on machine-learning algorithm where NSL-KDD dataset with feature selection ensures the accuracy of 82% with Random Forest classifier using Gain Ratio feature selection evaluator. In the later phase, the second approach is combined with Gated Recurrent Unit Long Short-Term Memory based intrusion detection model based on Deep Neural Network (DNN) where we applied an appropriate ANOVA F-Test and Recursive Feature Elimination feature selection method to improve the classifier performance and achieved an accuracy of 88%. Substantial experiments with comparative analysis clearly show that, deep learning would be a better choice for intrusion detection in OpenFlow Controller.


Introduction
Today's more advanced and more dynamic applications can no longer be handled effectively by traditional network architecture. Traditional network architecture has been in use for past decades. But it is becoming less effective gradually for modern day applications. To deal with the inadequacy of traditional network architecture, a dynamic and scalable networking architecture namely Software Defined Networking architecture (SDN) [1] has emerged. SDN shifts from the fully distributed conventional networking model to a more centralized one. Separation of the data and control planes is a defining characteristic of SDN. Forwarding components namely switches, routers, etc. are elements of the data plane and the controller is element of the control plane. Decoupling of the network control and forwarding functions, and direct programmability of the network give network managers enough control over the network. Separation of the routing and forwarding activities of networking components (e.g. switches, routers, etc.) from the data plane, makes the administration and management of the network straightforward because the control plane now only has to handle logical network topology related information, traffic routing, and so forth while the data plane only needs to manage the network traffic using the configuration provided by the control plane. SDN has been gaining attention recently for both academia and researchers. But, the concept of SDN is not something new. In fact, from the mid-1990s it started to evolve. A concrete implementation of SDN has been possible thanks to OpenFlow protocol. From the beginning, OpenFlow has been the leading standardized interface between SDN controllers and switches or other data paths [2]. OpenFlow

Background and Related Work
In recent years, Flow-based anomaly detection systems have been extensively researched. In [12] a Flow-based anomaly detection system was proposed which is based on Multi-Layer Perceptron (MLP) neural network with one hidden layer and Gravitational Search Algorithm (GSA). This model can differentiate between normal and malicious flows with reasonable accuracy. In [13], the authors introduce an inductive NIDS using One-Class Support Vector Machines for analysis. Unlike other systems, their model is trained with malicious network data, thus, achieved a low false alarm rate. Authors of [14] implemented four leading traffic anomaly detection algorithms (threshold random walk with credit-based rate limiting, rate limiting, maximum entropy detector and NETAD) in an SDN environment using NOX controller and OpenFlow compliant switches. But, according to the results of their experiments, it is evident that, these algorithms are more successful in identifying malicious activities in the small office or home office (SOHO) networks but not in the Internet Service Provider (ISP). A traffic flow features based, and lightweight DDoS attack detection system is demonstrated in [15]. Leveraging programmatic interface of NOX platform traffic flow information is obtained with a very low overhead. Using Self Organizing Maps this method detects DDoS attacks at a high rate. Kokila RT et al. [16] also devised method for detecting DDoS attack analyzing the SVM classifier. According to their experiments, this method produces less false positive rate in comparison with other techniques. An optimized protection mechanism (OpenFlowSIA) using SVM classifier is proposed by Trung et al. [17]. Along with SVM classifier, this mechanism uses Idle-timeout Adjustment (IA) algorithm devised the same authors and can detect DDoS attact in SDN. The entropy variation of the destination IP address is used in a lightweight solution [18] which can detect DDoS attacks within first 250 malicious packets of the traffic. Stacked autoencoder (SAE) based DL model designed by Niyaz et al. [19] can detect multi-vector DDoS attacks in SDN. Tang et al. [20] present a Gated Recurrent Unit Recurrent Neural Network (GRU-RNN) enabled intrusion detection systems and achieve a high degree of accuracy. However, in very recent times two different approaches of feature selection-based intrusion detection model has been proposed by Dey S.K in [21] and [22] were they employed both deep learning and machine learning approaches for finding the higher accuracy in terms of intrusion detection. Apart from that, Dey S.K [23] also showed the performance analysis of SDN-based intrusion detection model for various machine leaning based classifier with different feature selection approaches.

Materials and Methods
In this section of article, we will briefly discuss about our research methodology. We will start from machine learning based classification model approach with different feature selection methods and evaluation procedure. Deep learning-based model development will also be discussed in this section in order to capture a translucent comparative idea regarding these two approaches.

Proposed Machine Learning Based Model of Classification
Here, we present our proposed ML based model that establishes an effective intrusion detection scheme for classifying intrusion data. This intrusion detection scheme will provide high detection rate and low false alarm rate. A hybrid classification model based on two layers is shown in Figure. 1. The first layer which is based on some common feature selection methods eliminates unrelated and inconsequential features and provides the selected features to the second layer. The second layer, then, classifies the abridged data set using some useful machine learning algorithm. The model is trained and tested further using 10-fold cross-validation technique. We evaluated the model using different accurate measures.

Machine Learning Approach
According to [24] the general idea behind most machine learning system is that the system learns to perform a task by being trained using an example set of training data. This training enables the system of distributed computers and controllers to accomplish similar tasks where the system is confronted with completely new sets of data it has not encountered before. Therefore, machine learning can be used for flow-based anomaly detection system to automatically build a predictive model based on the training dataset. To solve numerous classification and prediction problems machine learning algorithms have been used [25]. A complete flow chart of anomaly detection mechanism in OpenFlow controller is shown in Figure 2.

Overview of NSL-KDD Dataset
Choosing an accurate data set for establishing and evaluating a model is by no means straightforward. Our intent is to select an extensive, free of noise, consistent and redundancy-free data set. For different anomaly detection systems, various types of dataset have been used; some are self-made while others are publicly accessible. KDD-99 is a publicly accessible dataset among others which is most extensively used and accepted dataset. According to authors of [26] KDD-99 is practically free of downside, but the whole dataset is so big that it dramatically increases the cost of computation of the Intrusion Detection. Further, the authors of [27] illustrates that only 10% of the dataset is usually used. In training dataset there are much extraneous data against which there are identical records in testing dataset. Consequently, the process of learning of135the system is sometimes distracted. NSL-KDD is a refined version of KDD-99 which minimizes the redundancies between training dataset and testing dataset. Tavallace et al. [26] recommended NSL-KDD dataset which consists of intrusion data. Many researchers are using the dataset for their experiment. There are41 features in NSL-KDD that contains both normal and attack patterns. There are 5 normal classes and 4 attack classes. The features of NSL-KDD Dataset are summarized below in Table 1.

Selection of Features for Machine Learning Approach
Accuracy of anomaly detection decreases in the presence of redundant attributes in the intrusion dataset. Therefore, developing methods for suitable feature selection that can sort out unrelated attributes can be a research challenge. Feature selection is the technique of choosing relevant features and removing irrelevant features from the dataset to accomplish a certain task [28]. Decreasing the number of redundant, irrelevant and noisy features can lead to a better model by speeding up a datamining algorithm with an improvement in learning accuracy [29]. We refined dataset features using Info Gain, Gain Ratio, CFS Subset Evaluation, Symmetric Uncertainty and Chi-Square Test. Attribute selection procedures with different evaluator and their search strategies are shown in Table 2.

Evaluation Metrics
Performance of intrusion detection rate is measured in terms of accuracy (AC), precision (P), recall(R), F-measure (F), False Alarm Rate (FAR) and Mathews correlation coefficient (MCC). To calculate the value of these techniques, some performance metrics derived from confusion matrix are taken into account. Confusion matrix illustrates the performance of the algorithm according to Table  3. True positive is the total number of samples predicted as normal while they were actually normal and false negative is the total number of samples predicted as normal while they were actually attack. On the other hand, false positive is opposite of true positive and it is the total number of samples predicted as attack while they were actually normal. True negative is the total number of samples predicted as attack while they were actually attack.  data. Using internal state, here, refers to the fact that, RNN takes advantage of previous computations for output. As they carry out same task for each element in the sequence, they are called recurrent.
The structure of a simple of RNN is depicted in Figure 3. In the above figure, xt is input and ot is output.
A gradient-based algorithm namely Backpropagation through time (BPTT) is generally applied for training the RNN. RNN training is considerably faster using BPTT algorithm than other existing optimization techniques. However, RNN Model with backpropagation has a significant drawback, called vanishing gradient problem. It happens when the gradient is so small that it seems vanished.
Consequently, it prevents the value of weight from changing and in some cases, stops further training.
According to [30], vanishing gradient problem prevents RNN from being accurate. To solve these problems, more powerful combined models like Long short-term memory (LSTM) [31] and Gated Recurrent Units (GRUs) [32] were suggested.

Long short-term Memory RNN (LSTM)
A deep neural network is unfolded in time and for every time-step an FNN is constructed. Then, weights and biases for each hidden layer are updated by the gradient rule. These updates minimize the loss between the expected and actual outputs. But, when the time-steps are more than 5-10, standard RNNs don't perform better. Weights fluctuate due to the prolonged back-propagation vanishing or blowing up error signals, making the network performance poor. Accordingly, to deal with this vanishing gradient problem, researchers suggested the Long-Short-Term-Memory (LSTM) network. LSTM bridges the minimal time gaps and makes use of a gating mechanism to handle longterm dependencies. The LSTM structure can be seen in Figure 4. the training phase of GRU is smoother and faster than that of LSTM, we have selected GRU for our model development [33]. Both the "forget gate" and "input gate" in an LSTM are merged to an "update gate" in GRU and the hidden state and cell state are combined, resulting in a simpler architecture as shown in Figure 5. The following relationship can be obtained from Figure 5. from Back-propagation through time, network input is passed through multiple GRU layers in the multilayer structure. In [34], it has been proven that multilayered RNNs learn from the different time lengths of input sequences. To achieve optimized performance, multi-layered RNNs share the hyper parameters, weights, and biases across the layers.

Overview of Scikit-Learn
During experiment, we used to scikit-learn which is a python-based machine learning library for data mining and data analysis [35]. Data of most machine learning algorithm needs to be stored in 2D array or in a matrix shape. In scikit-learn, these 2D shaped data can be stored effectively.
Following Figure 6 shows the scikit-learn data representation, where there are N samples and D

Designed Algorithm and Proposed SDN-based Anomaly Detection Architecture
Generally, OpenFlow switches are monitored by SDN controller. SDN controller can request for all network data whenever necessary. Therefore, we implemented our proposed intrusion detection segment in SDN controller for both machine learning and deep learning approaches, which is illustrated in Figure 7. Our suggested approach for ML based classification model is summarized in the following Algorithm 1. For requesting network data, an OpenFlow stats request message will be sent from the controller to all OpenFlow switches. As controller request for all the available statistics, an OpenFlow stats reply message with all available data send back to the controller by OpenFlow switch. Figure 8 Preprints (www.preprints.org) | NOT PEER-REVIEWED

Experimental results of machine learning approach
We used WEKA [36]  technique was used to perform the experiment. Dividing the training set into 10 subsets, we tested each subset when the model is trained on the other 9 subsets. Each subset is processed only once as test data; therefore, the process repeats 10 times. For simplicity, in Table 5, we only mentioned results of the higher accuracy obtained classifier with various feature selection method.

of LSTM units in a LSTM Hidden Layer
The model is developed using Python programming language along with several libraries like python based numpy, machine learning based scikit-learn, pandas for data visualization, and TensorFlow for model development. We began our experiments with a light-weight GRU with one hidden layer and one hidden unit. For each hyper parameter set (learning rate, time-steps, hidden layers) 10 sets of experiments were carried out and we tuned them to obtain the optimized results.
Following Table 8 represents the results of various evaluation metrics like accuracy, precision, recall, False alarm rate and F-1 score for each time steps.  Table 8. At this point Deep learning approach shows much potential outcomes comparing with machine learning approach. In our research implementation of the SDN based intrusion detection system, the classification model is mainly 2-class based namely normal and another is anomaly. After evaluating both the results, we proposed a model of security architecture which detects flow-based anomaly in OpenFlow based Controller. From the detailed results above and from Table 9, we can derive to a decision that, the ANOVA F-Test and Recursive Feature Elimination (RFE) methods with GRU-LSTM classifier provides highest Accuracy of 87.911% with a very low false alarm rate of 0.0076%.
Furthermore, we generated the results using the selected features from complete NSL-KDD dataset.
Some other approaches have been presented from different authors in order to show the accuracy of deep learning algorithm for NSL-KDD Dataset. Lack of preprocessing of dataset and appropriate feature selection for testing and training, however, are absent in some approaches.

Conclusions
In with ANOVA F-Test and RFE feature selection and Gain Ratio feature selection mechanism respectively. Although both the approaches produces significant experimental results comparing with others work therefore it still has some effective contribution in the field of appropriate feature selection from a dataset. In SDN environment, both the approaches has enormous potential to detect malicious activity. SDN supports the nature of centralized controller and a very flexible structure.
Our proposed intrusion detection module is capable of easily extract the information about network traffic due to its centralized controller and flexible nature. From the experimental results, it's evident that GRU-LSTM and GAIN RATIO-RF shows a high-test accuracy comparing with all other algorithms. However, deep learning approach produce slightly better results than machine learning approach therefore for flow based anomaly detection use of GRU-LSTM model is entirely essential in order to achieve high accuracy and speeding up the process of intrusion detection in SDN. In near future, we plan to implement our proposed model in a real environment of SDN with real traffic of network.