A Genetic-Based Extreme Gradient Boosting Model for Detecting Intrusions in Wireless Sensor Networks

An Intrusion detection system is an essential security tool for protecting services and infrastructures of wireless sensor networks from unseen and unpredictable attacks. Few works of machine learning have been proposed for intrusion detection in wireless sensor networks and that have achieved reasonable results. However, these works still need to be more accurate and efficient against imbalanced data problems in network traffic. In this paper, we proposed a new model to detect intrusion attacks based on a genetic algorithm and an extreme gradient boosting (XGBoot) classifier, called GXGBoost model. The latter is a gradient boosting model designed for improving the performance of traditional models to detect minority classes of attacks in the highly imbalanced data traffic of wireless sensor networks. A set of experiments were conducted on wireless sensor network-detection system (WSN-DS) dataset using holdout and 10 fold cross validation techniques. The results of 10 fold cross validation tests revealed that the proposed approach outperformed the state-of-the-art approaches and other ensemble learning classifiers with high detection rates of 98.2%, 92.9%, 98.9%, and 99.5% for flooding, scheduling, grayhole, and blackhole attacks, respectively, in addition to 99.9% for normal traffic.


Introduction
A wireless sensor network (WSN) is a kind of networks, which can be part of the Internet of Things (IoT) and is composed of a number of sensor nodes. These nodes are distributed in a wide range of different regions to collect required information and convey them to a central node called a base station (BS) node or a sink node, which is a more powerful, capable node [1,2]. They are used in many real-time applications such as security and healthcare monitoring, climate change and environmental monitoring, and military surveillance systems. Several studies have suggested various possible ways to overcome possible security threats related to WSNs. They include secure routing, key exchange, authentication, and other security techniques addressing specific kinds of intrusions. Intrusion detection systems (IDS) are one of the most flexible and useful tools to prevent different attacks and threats to WSNs.
Even though anomaly-based IDS has the capability to recognize both known and unknown attacks, it has some limitations in terms of false negatives and false positives alarms. Similarly, WSNs is not excluded from these intrusion attacks and security threats, which lead to decrease its performance and efficiency. Denial of service (DoS) attacks are the most popular intrusions in WSNs and can be issued in different ways. Each of them uses a specific way of access into the system. For example, there are several different attacks targeting the protocols of WSNs and their layers may lead to DoS [33]. To detect the attacks, network traffic has to be thoroughly analyzed for the purpose of definition of the proper detection technique [34]. This approach uses SVM algorithm to recognize anomalies in the system and creates a signature that would serve for detecting this threatening action in the future [35]. This cluster-based scheme engages detection and avoidance procedures with high-energy efficiency and low overhead of communication [36]. For the localization property, IDS can be employed at various levels of cluster head and sensor nodes. Moon et al. [37] proposed a routing protocol with intrusion detection and prevention at sensor network nodes.
To enhance the system capabilities, an integrated system for intrusion detection at cluster-based of wireless sensor networks has proposed by Wang et al. [38]. Barbancho et al. [39] investigated the usage of artificial intelligence methods in routing schemes of wireless networks to detect intrusion attacks. El Mourabit et al. [40] proposed a method for intrusion detection in wireless sensor networks based on mobile agents. They have used three main mobile agents (collector agent, misuse detection agent, and anomaly detection agent) based on SVM classifier for detection. Shamshirband et al. [41] proposed a competitive clustering algorithm for intrusion detection in WSNs using a density-based fuzzy method. Moreover, Shamshirband et al. [42] proposed an artificial immune system to detect intrusion in WSNs based on cooperative fuzzy theory. In other work, Shamshirband et al. [43] proposed a method to detect sinkhole kind of intrusions. In this method, a number of dubious nodes is produced by a verification process of data consistency and the attacker is recognized by information taken from the data flow.
Kumarage et al. [44] proposed a distributed method for anomaly detection in industrial WSNs using fuzzy data modelling. This distributed method is able to detect the DoS events in which the sink and base-station nodes are used as decision maker players. Sumitha and Kalpana [45] have used a MATLAB programming tool for simulating the DoS attack in WSN using low energy aware cluster hierarchy (LEACH) protocol. In this study, the authors proposed a hybrid method using ant colony optimization with hidden Markov model (ACO + HMM). This hybrid method provides enhanced performance than other methods.
Almomani et al. [46] published a new dataset of different DoS attacks in WSNs, namely, WSN-DS. This dataset consists of four types of DoS attacks (flooding, grayhole, blackhole and scheduling attacks), as well as the normal traffic class. It is created based on LEACH protocol, which is a hierarchical routing protocol in WSNs, and using NS-2 network simulator. A Waikato Environment for Knowledge Analysis (WEKA) data-mining tool was used for implanting neural networks (NNs) to detect the attacks. The results were reported using 10 folds cross-validation and held-out splitting techniques. This study achieved a satisfactory result; however, it suffers from the imbalanced problem in which the detection rate of grayhole attack is very low and reaches up to 75.6%.
Abdullah et al. [47] proposed an approach for detecting intrusions in WSNs' nodes using a set of machine learning classifiers. These classifiers are SVM, naive Bayesian (NB), DT and RF. Four types of DoS attacks (flooding, grayhole, blackhole, and scheduling attacks) were studied in this work. A WEKA data-mining tool was used for implementing their approach. The results were evaluated based on a number of different metrics, such as recall (R), precision (P), true positive rate (TP), and false positive rate (FP). This study demonstrated that the SVM achieves a high detection rate of 96.7% compared to the other classifiers.
Le et al. [48] proposed to use the random forest (RF) classifier for detecting the type of DoS attacks in WSNs. The proposed classifier attains best F1-score results are 96%, 99%, 98%, 96% and 100% for flooding, blackhole, grayhole, scheduling (TDMA), and normal attacks, respectively. However, the result of this study was for a small number of instances in the testing phase, which approximately represents 25% (94,042 instances) of the data. Recently, Tan et al. [49] proposed a method for intrusion detection using random forest classifier and synthetic minority oversampling (SMOTE) technique. They used the SMOTE technique for oversampling the minority samples. The experimental results of the study showed that the accuracy of using random forest classifier was 92.39% and the accuracy of using SMOTE has increased the accuracy to 92.57%.

Genetic Algorithm (GA)
Genetic algorithm (GA) is defined as a heuristic adaptive search algorithm and inspired from the evolutionary ideas of genetics. It represents an intelligent exploitation that uses a random search for solving both unconstrained and constrained optimization problems [50]. The GA repetitively alters individual solutions of a population and at each step, it selects randomly individuals from the population that are currently in process to be parents; then, it utilizes them to generate the children for the next generation of population. Undergoing development of these consecutive generations; the solution is improved to optimality. Genetic algorithm is used to solve a variety of problems, including mixed integer programming problems or the problems in which their objective function is stochastic, non-differentiable, discontinuous, or highly nonlinear. Generally, the GA applies three different rules on the current population at each step to produce the next generation. These rules are:

•
Selection rules, which selects the individuals to be parents for contributing at next generation; • Crossover rules, which combines two parents to generate the children of next generation; • Mutation rules, which changes randomly the individual of children.
The GA differs from a classical derivative-based optimization algorithm (DOA) in two key ways: it creates a population of solutions at each iteration in which the best solution approaches to optimality and uses a random computation for selecting the next population. While, the classical DOA creates a single solution at each iteration in which a sequence of solutions approaches to the optimal situation and uses a deterministic computation for selecting the next solution in the sequence. Algorithm 1 illustrates the pseudocode of GA as sequence of steps.

Gradient Boosting (GB) Model
Gradient boosting (GB) is an ensemble learning technique, used for classification and regression problems, proposed by Friedman [51,52]. It can produce an effective model consisting of weak learners, usually decision trees. The basic idea of GB is to build and generalize the ensemble model in a stage wise fashion by optimizing an objective arbitrary loss function [53]. The GB technique constructs its model from the previous loss function of negative gradient in an iteration manner. In the ML, minimizing the loss function is an important issue and needs to be optimized. In other words, the loss function represents the difference between the predicted output and the target. A low value of loss function means a high prediction or classification result. When the loss function decreases sequentially and iteratively, the model goes consecutively along a specific direction, which is the Gradient of loss function. Algorithm 1. The GA pseudocode.
while stopping_criterion is not reached do 6. Begin 7.

Output: Best-Solution
Assuming that the objective of a supervised classification problem is to find an approximation function,Ô(x) to fit the O(x). Therefore, the approximation function based on a loss function, L(y, O(x)) is defined as: where O represents the weak learners (C i (x)) with weights (w i ) in a linear combination; andÔ tries to minimize the loss function of the input vector. Thus, the GB sets a constant function, O 0 (x) as: The pseudocode of GB is shown in Algorithm 2.
Algorithm 2. The GB pseudocode. 1 Train weak learner C m (x) on training data.
End for 10 End 11 Output: O m (x) In case decision tree is chosen to be an estimator, gradient boosting will be selected as the appropriate algorithm, which is a better classifier that can be utilized for solving many problems in different fields. Previously, we noted that there are different boosting algorithms. Gradient boosting is considered as the most effective one from these algorithms. Although GB mainly depends on convex loss function, it can use different types of loss functions. Moreover, GB can solve regression and classification problems as well. Concerning classification problems, a log loss function is used to be an objective function to deal with these problems. From a fundamental element point of view, GB uses negative gradient to enhance the results.

Extreme Gradient Boosting (XGBoost) Model
In the last decade, data science has gained more interest for different fields in many applications. Currently, many big buzzwords such as big data and artificial intelligence has overwhelmed our lives. Boosting algorithms also have evolved with time. A well-known boosting model, which has achieved a high score for solving classification and prediction problems in many contests of the KAGGLE platform, is the extreme gradient boosting (XGBoost) model.
In fact, XGBoost is a type of GB that provides an innovative tree searching technique [54]. The improved technique has shown good performance in distributed computing and avoidance of overfitting, as well as in solving problems that have data sparsity. More precisely, computation complexity is reduced significantly with automatic learning in the splitting process. To tackle the overfitting problem, XGBoost appends regular terms to the objective function in the learning phase.
XGBoost applies second-order Taylor expansion to the loss function to substitute the first derivative unlike conventional GB, as given in Equation (3) as follow: where, l is the loss function of training and L define real loss function for XGBoost algorithm. The rest of the notations are constant as the same as boosting methods. G is defined as a weak estimator for decision tree, while F denotes for prediction. Additionally, decision trees complexity, Ω(G m ) is aggregated with the first term to form the objective function. Regular term definition, Ω(G m ), is calculated as: where, T is denoting number of decision trees' leaves. While, w j 2 denotes L2 norm of scores for each leaf. The γ is a control threshold to split nodes, and λ is a coefficient to reduce overfitting problem [55]. The final equation can be formed as: Finally, in the previous equation, two variables define the first derivative and second derivative of the loss function, which are

Proposed Genetic-Based Extreme Gradient Boosting (GXGBoot) Model
The basic idea behind the proposed GXGBoot model is to build an optimization task using genetic algorithm on top of XGBoot classifier to increase the classification accuracy of minority classes without significantly affecting the overall accuracy of other classes. The genetic algorithm generates random values for the XGBoot classifier to form a new decision boundary with a highest genetic fitness value.
More specifically, the GXGBoot model is composed of four main steps: generating the population of parameters' values, selecting the population of parameters' values, training the decision function of XGBoot, and evaluating the fitness function of XGBoot. Figure 1 shows the GXGBoot flow chart, and Algorithm 1 outlines the pseudocode of the main steps of GXGBoot.

Time Complexity Analysis of Proposed Model's Algorithm
Based on the computational complexity theory, time complexity analysis is used to compute the computational time of the proposed algorithm. Therefore, the worst case of running time can be defined as a function of its input using big O notation [56]. The big O notation usually defines the asymptotic behavior or the growth rate of the function's upper bound as follows: This means that f ∈ (g(n)) if and only if there exist two positive constants c and n0 for all n ≥ n0 such that the inequality 0 ≤ f ≤ cg(n) is satisfied. In this case, we can say that f is big O of g(n) or that g(n) is the asymptotic upper bound for f [57].

WSN-DS Dataset
In our experiments, a simulated WSN-DS dataset collected by Almomani et al. [46] is used as a case study to evaluate the proposed model. This dataset was generated to apply machine-learning methods for detecting and classifying Denial of Service (DoS) attacks. By using machine-learning methods, the sensor nodes can be able to detect attacks patterns from the normal traffic. As a result, the sensor nodes can make a right decision instantly on time. The simulated dataset contains 23 features extracted using LEACH routing protocol as shown in Table 1. The Low Energy Aware Cluster Hierarchy (LEACH) is a routing protocol which uses 23 features to identify the state of each sensor node in the wireless network. However, only 19 features as well as the class label were included in the dataset file. These 19 features were Id, Time, Is_CH, who_CH, Dist_To_CH, ADV_S, ADV_R, JOIN_S, JOIN_R, ADV_SCH_S, ADV_SCH_R, Rank, DATA_S, DATA_R, Data_Sent_BS, Dist_CH_BS, Send_code, Consumed_Energy, and Attack_Type. The distribution of attacks in the WSN-DS dataset is given in Figure 2. Furthermore, a number of data samples from this dataset is listed in Table 2. Table 1. Extracted features of the wireless sensor network-detection system (WSN-DS) Dataset.

NO.
Feature Name Symbol Description

Node ID Id
It is a unique symbolized number of the sensor node. For example, the sensor node number 13 in the fourth round and in the second stage has ID 002004013.

Time Time
It is the current time of the sensor node state in the simulation.

Is CH? Is_CH
It is a flag, which has 1 or 0 value for determining the node is cluster head (CH), or not. 4 Who CH who_CH It is the ID of the cluster head (CH) in the existing round.

Received Signal Strength Indication
RSSI It is the RSSI between a sensor node and its cluster head in the existing round. 6 Distance to cluster head Dist_To_CH It is the computed distance between a sensor node and its cluster head in the existing round.

7
Max distance to cluster head M_D_CH It is the maximum computed distance between sensor nodes and its cluster head within the same cluster.

21
Distance cluster head to base station Dist_CH_BS It represents the distance between the cluster head and the base station. 22 Send Code Send_code It is the sending code of the cluster.

Attack Type Attack_Type
It is the class label of the wireless sensor network traffic, which could be normal, or attack. There are four categorical types of attacks, namely, flooding, scheduling (TDMA), grayhole, and blackhole.     To prepare training and testing sets, the holdout method is used to separate the dataset into 60% training and 40% testing. The number of instances in these two sets is presented in Table 3.

Evaluation Metrics
A set of evaluation metrics including the accuracy (ACC), precision (PR), recall (RE), and f1-score are utilized to evaluate and compare the results of proposed intrusion detection model. They were used because they produced comparable results and frequently used in the machine learning field for evaluating and comparing its models. These performance evaluation metrics are computed as: where TP, TN, FP, and FN are the true positive, true negative, false positive, and false negative, computed from the confusion matrix.

Experimental Results and Comparisons
The subsection describes the experimental results and comparisons with other models and related works. The results of our experiment are obtained using both 10 fold cross validation and holdout methods on the simulated WSN-DS dataset [46]. In the 10 fold cross validation, the dataset is divided into 10 parts, one of them is used for testing for 10 times. Tables 4-8 show the results of the 10 fold cross validation method.    Figure 3 demonstrates the confusion matrix of intrusion detection for the proposed GXGBoost model using holdout method on the WSN-DS Dataset. Table 9 lists the true positive, true negative, false positive, and false negative rates results of the GXGBoost model using the holdout method. While, the precision, recall, and F1-score results and their weighted average using the holdout method are shown in Table 10.

Comparison with other Boosting Algorithms
For comparing the GXGBoost model with original XGBoost and other boosting classifiers models such as AdaBoost and gradient boosting (GB) classifiers, we used the true positive rate and receiver operating characteristic (ROC) curve as evaluation metrics to describe their performance. The ROC curve represents the area under curve (AUC) in which when it has a value close to 1, this confirms that the model produces better results. Table 11 and Figure 4 demonstrate the experimental results of the evaluation metrics for the proposed GXGBoost model compared to other boosting models.  To evaluate the efficiency of boosting algorithms for WSNs intrusion detection, the experiments are conducted on a laptop Intel(R) Core(TM) i7-4510U 2.0 GHz and 8 GB RAM with Windows 10. The average execution time of classification for the GXGBoot and other boosting models on the testing dataset is shown in Table 12. We can see that the classification time of GXGBoost and XGBoost is close to each other. However, the average classification time of GXGBoost is lower than XGBoost since it selects appropriate values for its parameters in the training phase. The AdaBoost has a higher classification time because it tries to classify all the cases into majority classes without losing the overall accuracy. In general, as seen in Table 10, the proposed model is efficient for real-time WSNs intrusion detection.

Comparison with Related Work
To compare our work with the related recent work on the same dataset, the true positive rate (TPR) is used as a uniform metric to do that. Figure 5 shows the values of TPR for the proposed GXGBoost compared to the results of related work in Reference [46] using a 10 fold cross validation method. From Figure 5, we can see how the GXGBoost is effective to classify with the minority classes without significantly affecting the detection rates of the other classes.

Conclusions and Future work
A new model for WSN intrusion detection is proposed based on genetic algorithm (GA) and extreme gradient boosting (XGBoot) classifier, called GXGBoost model. The GXGBoost model was designed to improve the performance of traditional models to detect minority classes of attacks in highly imbalanced data traffics of wireless sensor networks. A set of experiments were conducted on WSN-DS dataset using holdout and 10-folds cross validation techniques. The results of 10-folds cross validation test revealed that the proposed model outperforms the state-of-the-art models and other ensemble learning classifiers with high detection rates of 98.2%, 92.9%, 98.9%, and 99.5% for Flooding, Scheduling, Grayhole, Blackhole attacks, respectively, in addition to 99.9% for Normal traffic. In the future work, we will apply our model with feature selection methods to reduce the number of features and enhance the efficiency of intrusion detection in WSN.

Conflicts of Interest:
The authors declare no conflict of interest.