A Feature Selection Model for Network Intrusion Detection System Based on PSO, GWO, FFA and GA Algorithms

: The network intrusion detection system (NIDS) aims to identify virulent action in a network. It aims to do that through investigating the tra ﬃ c network behavior. The approaches of data mining and machine learning (ML) are extensively used in the NIDS to discover anomalies. Regarding feature selection, it plays a signiﬁcant role in improving the performance of NIDSs. That is because anomaly detection employs a great number of features that require much time. Therefore, the feature selection approach a ﬀ ects the time needed to investigate the tra ﬃ c behavior and improve the accuracy level. The researcher of the present study aimed to propose a feature selection model for NIDSs. This model is based on the particle swarm optimization (PSO), grey wolf optimizer (GWO), ﬁreﬂy optimization (FFA) and genetic algorithm (GA). The proposed model aims at improving the performance of NIDSs. The proposed model deploys wrapper-based methods with the GA, PSO, GWO and FFA algorithms for selecting features using Anaconda Python Open Source, and deploys ﬁltering-based methods for the mutual information (MI) of the GA, PSO, GWO and FFA algorithms that produced 13 sets of rules. The features derived from the proposed model are evaluated based on the support vector machine (SVM) and J48 ML classiﬁers and the UNSW-NB15 dataset. Based on the experiment, Rule 13 (R13) reduces the features into 30 features. Rule 12 (R12) reduces the features into 13 features. Rule 13 and Rule 12 o ﬀ er the best results in terms of F-measure, accuracy and sensitivity. The genetic algorithm (GA) shows good results in terms of True Positive Rate (TPR) and False Negative Rate (FNR). As for Rules 11, 9 and 8, they show good results in terms of False Positive Rate (FPR), while PSO shows good results in terms of precision and True Negative Rate (TNR). It was found that the intrusion detection system with fewer features will increase accuracy. The proposed feature selection model for NIDS is rule-based pattern recognition to discover computer network attack which is in the scope of Symmetry journal.


Introduction
Vinchurkar and A. Reshamwala [1] defined an intrusion detection system (IDS) as "an active process or device that analyzes system and network activity for unauthorized and nasty activity." There are three types of IDS. These types are [2]: host-based IDS (HIDS), network intrusion detection system (NIDS) and hybrid-based IDS (HISD). The HIDS aims at tracking the internal activities of the computer system. The NIDS aims at tracking the network traffic logs dynamically in real time. It aims at doing that for identifying any potential intrusion into the network. It aims at doing that through employing the correct detection algorithms. Regarding the detection mechanisms that are known attacks. The mechanism of anomalies detection performs an activity that is normal for detecting unknown attacks. It does that through testing whether the device state is normal or not. Figure 1 shows the IDS classification of anomaly detection. Hybrid IDS detects attacks of known and unknown activity. This paper focuses on the NIDS. The NIDS detects attacks through using the entire network traffic feature. For detecting attacks, not all the features are required. A lower number of features can minimize the time needed for detection and increase the detection rate. In addition, feature selection has many benefits that are for the favor of the learning algorithms. For instance, it prevents over-fitting, it prevents noise resistance and it improves the predictive performance.
Feature selection has been commonly used in many areas. For instance, it is used in IDSs. There are three techniques for selecting features [5−7]: wrapper [8], filter [9] and embedded techniques [10]. Through the embedded technique, the feature selection for a given learning algorithm is integrated into the training process. Through the wrapper technique, the features that are predicted by different learning algorithms with high predictive accuracy shall be chosen. The filter technique aims to classify a subset consisting of several features that are selected from the original set. Those features are selected based on the evaluation criteria. This work aimed at employing wrapper and filter techniques because it takes the prediction capability into consideration. This shall lead to having results that are better than the results of other works. The work aimed at reducing the number of the features for increasing the detection rate, and the performance of the NIDS. Although scholars have Feature selection has been commonly used in many areas. For instance, it is used in IDSs. There are three techniques for selecting features [5][6][7]: wrapper [8], filter [9] and embedded techniques [10]. Through the embedded technique, the feature selection for a given learning algorithm is integrated into the training process. Through the wrapper technique, the features that are predicted by different learning algorithms with high predictive accuracy shall be chosen. The filter technique aims to classify a subset consisting of several features that are selected from the original set. Those features are selected Symmetry 2020, 12, 1046 3 of 20 based on the evaluation criteria. This work aimed at employing wrapper and filter techniques because it takes the prediction capability into consideration. This shall lead to having results that are better than the results of other works. The work aimed at reducing the number of the features for increasing the detection rate, and the performance of the NIDS. Although scholars have proposed several NIDS models, the present study aimed to propose a model that is based on four well-known bio-inspired metaheuristic algorithms: genetic algorithm (GA) [11][12][13], particle swarm optimization (PSO) [14][15][16], grey wolf optimizer (GWO) [17][18][19] and firefly optimization algorithm (FFA) [20,21]. The latter model is tested through using a support vector machine (SVM) [22][23][24], J48 (C4.5) [24][25][26] and ML classifier.
The researcher of the present study proposed a feature selection model for the NIDS. This model is based on the PSO, GWO, FFA and GA algorithms. It aims to enhance the performance of NIDS through reducing the number of the selected features. It was evaluated through using SVM and the J48 ML classifier.
The present study is significant because it aimed to: 1.
Identify the optimal feature set that is in the UNSW-NB15 dataset. The present study aimed to do that based on the PSO, GWO, FFA and GA algorithms; 2.
Propose a filtering-based feature selection model for the NIDS. The present study aimed to do that based on the PSO, GWO, FFA and GA algorithms. It aimed to do that to reduce the number of the selected features; 3.
Determine the best combination between the PSO, GWO, FFA and GA algorithms. The present study aimed to determine that to filter the selected features that improve the performance of the detection mechanism.
The structure of the present study is identified below: Section 2 provides a literature review about the use of bio-inspired metaheuristic algorithms for building an efficient NIDS. In Section 3, brief information is presented about the proposed model. In Section 4, brief information is presented about the used dataset. Section 5 offers a description for the performance evaluation metrics. Section 6 presents the results and discussion. As for Section 7, it presents the conclusion.

Related Works
ML is commonly used for classifying anomalies in an IDS. ML is defined as a collection of computational methods that employ training data for enhancing performance, making specific future predictions and gaining knowledge from data. Figure 2 shows the procedures taken for creating an ML application. proposed several NIDS models, the present study aimed to propose a model that is based on four well-known bio-inspired metaheuristic algorithms: genetic algorithm (GA) [11−13], particle swarm optimization (PSO) [14−16], grey wolf optimizer (GWO) [17−19] and firefly optimization algorithm (FFA) [20,21]. The latter model is tested through using a support vector machine (SVM) [22−24], J48 (C4.5) [24−26] and ML classifier.
The researcher of the present study proposed a feature selection model for the NIDS. This model is based on the PSO, GWO, FFA and GA algorithms. It aims to enhance the performance of NIDS through reducing the number of the selected features. It was evaluated through using SVM and the J48 ML classifier.
The present study is significant because it aimed to: 1. Identify the optimal feature set that is in the UNSW-NB15 dataset. The present study aimed to do that based on the PSO, GWO, FFA and GA algorithms; 2. Propose a filtering-based feature selection model for the NIDS. The present study aimed to do that based on the PSO, GWO, FFA and GA algorithms. It aimed to do that to reduce the number of the selected features; 3. Determine the best combination between the PSO, GWO, FFA and GA algorithms. The present study aimed to determine that to filter the selected features that improve the performance of the detection mechanism.
The structure of the present study is identified below: Section 2 provides a literature review about the use of bio-inspired metaheuristic algorithms for building an efficient NIDS. In Section 3, brief information is presented about the proposed model. In Section 4, brief information is presented about the used dataset. Section 5 offers a description for the performance evaluation metrics. Section 6 presents the results and discussion. As for Section 7, it presents the conclusion.

Related works
ML is commonly used for classifying anomalies in an IDS. ML is defined as a collection of computational methods that employ training data for enhancing performance, making specific future predictions and gaining knowledge from data. Figure 2 shows the procedures taken for creating an ML application. Feature selection is a significant pre-processing stage in ML. It reduces the data dimensionality and increases the efficiency of the classification process. Scholars proposed several feature selection methods for IDSs. Those methods are proposed to classify important features based on several criteria. This section discusses briefly the state-of-the-art feature selection methods that are based on bio-inspired metaheuristic algorithms and ML classifiers. The latter methods aim to improve IDS performance.
Researchers in Reference [27] proposed a hybrid model of SVM along with GA for IDSs. This model can reduce the selected features from 41 features into 10 features. The selected features were categorized into three categories-based on priority-through using GA. The features of the highest importance are considered first priority. The features of the lowest importance are considered third priority. The distribution of features was performed. For instance, four features are considered first priority, and four features are considered second priority. In addition, two features are considered Feature selection is a significant pre-processing stage in ML. It reduces the data dimensionality and increases the efficiency of the classification process. Scholars proposed several feature selection methods for IDSs. Those methods are proposed to classify important features based on several criteria. This section discusses briefly the state-of-the-art feature selection methods that are based on bio-inspired metaheuristic algorithms and ML classifiers. The latter methods aim to improve IDS performance.
Researchers in Reference [27] proposed a hybrid model of SVM along with GA for IDSs. This model can reduce the selected features from 41 features into 10 features. The selected features were categorized into three categories-based on priority-through using GA. The features of the highest importance are considered first priority. The features of the lowest importance are considered third priority. The distribution of features was performed. For instance, four features are considered first priority, and four features are considered second priority. In addition, two features are considered third priority. The latter researchers employed the KDD'99 dataset in the experiment. It was found that the hybrid model can attain a positive detection of 0.973. It was found that the false alarm rate is 0.017.
Ahmad et al. [28] developed a feature selection model that is based on multilayer perception (MLP) for IDSs. This model is based on a combination of principal component analysis (PCA) and GA. The latter researchers conducted PCA to plan the features space to a principal feature space. They selected the features corresponding to the highest eigenvalues. The features that were selected by PCA may lack the adequate detection for the classifier. So, the researchers adopted GA to explore the principal feature space in order to find a subset with optimal sensitivity. The feature subsets selected through using PCA and GA were used to train the MLP classifier. The proposed method used the KDDCup'99 dataset in the evaluation. The number of the selected features was reduced from 41 features into 12 features only. The optimal features increased the detection accuracy. The latter accuracy became 99%.
Ghanem and Jantan [29] developed an artificial bee colony (ABC) method for the feature selection of IDSs. The latter method consists of two stages: -Through stage 1, the subsets of features were generated through using the Pareto front non-dominated solutions; -Through stage 2, a feed forward neural network (FFNN) and ABC (and PSO) were employed for assessing the feature subsets that were derived from the first stage.
Thus, the proposed method employs a new feature selection model. It is called (the multi-objective ABC method). It aims at reducing the number of network traffic features. The latter method adopts a new classification approach. The latter approach is named (the hybrid ABC-PSO approach). The latter method employs the optimized FFNN for categorizing the data derived from the first stage. Moreover, the latter researchers proposed a new fitness function for reducing the quantity of features. They did that to make sure that the false alarm rate is low.
Researchers in Reference [30] proposed a model to select features for IDSs. Those features were selected through using evolutionary algorithms: GA, PSO and differential evolution (DE). They conducted a comparison between these algorithms in terms of efficiency. They validated them through the use of the KDD Cup 99 data set, a neural network and an SVM. The optimum features that were selected by GA, PSO and DE were respectively as follows: 16, 15 and 13. They were selected from the 41 features that are in the dataset. They found that the training time of DE is 1.62 s. They found that DE is considered the best algorithm in terms of classification accuracy. To be specific, the classification accuracy of DE is 99.75%.
Researchers in Reference [31] proposed a new IDS model. The latter model is based on intelligent dynamic swarm through the use of a rough set. It is abbreviated as (IDS-RS) and simplified swarm optimization (SSO). It is considered as a new version of PSO that employs a new weighted local search (WLS) strategy. IDS-RS is conducted with a weighted sum fitness function to choose the most important features for having the features of the dataset reduced. They only collected six features out of 41 features that are located in the KDD99 dataset. At the final stage, the SSO classifier was used for identifying instances and achieving a classification accuracy of 93.3%.
The researcher in Reference [32] aimed to explore the performance level of the feature selection model that is in the NIDS. They aimed to explore that through using GA and PSO as algorithms for feature selection. GA and PSO played an effective role in having the number of the selected features reduced. The latter researchers found that GA can successfully reduce the number of the selected features from 41 features to 15 features. They found that PSO can have the number of the selected features successfully reduced from 41 features to 9 features. Through using k-nearest neighbor (k-NN) as a classifier, the GA-reduced dataset which consists of 37% of the original features shows an improvement in accuracy from 99.28% to 99.70%. Through using k-nearest neighbor (k-NN) as a classifier, the GA-reduced dataset shows an execution time that is 4.8 times faster than the execution time of the original dataset. Through using the same classifier, PSO-which consists of 22% of the original features-shows the fastest execution time (7.2 times faster than the execution time of the original datasets). However, its accuracy is slightly reduced from 99.28% to 99.26%.
Researchers in Reference [18] employed the grey wolf optimization (GWO) method to search the feature space to find the optimal feature subset that improves the classification accuracy. The latter method used mutual information and filter-based principles. Second, the wrapper approach was adopted to raise the accuracy of the classifiers. Regarding the proposed approach, its accuracy was measured. The accuracy of the proposed approach was compared against the accuracy of several metaheuristic algorithms that employ the NSL-KDD dataset.
Researchers in Reference [21] used the firefly algorithm based on the filter and wrapper methods to select the features. They also proposed a procedure for raising the dimensionality. They used the wrapper ensemble method with the Bayesian network, C4.5 and mutual information (MI). Originally, the KDDCUP 99 dataset possessed 41 features. However, that approach reduced those features into 10 features. This reduced the computational cost of the classifier.
Al-Yaseen [33] proposed a wrapper feature selection method through employing the SVM and firefly algorithms. The proposed method improves the performance level of the intrusion detection system. It improves this through having the irrelevant features removed. It improves this through reducing the duration needed for the classification. It reduces this duration through having the dimensions of the data reduced. The latter researchers employed NSL-KDD along with employing the common measures of the intrusion detection systems. Such measures include: the overall accuracy, the rate of detection and the rate of false alarm. The proposed method achieved an overall accuracy of 78.89%. It was found that the proposed feature selection method is effective in enhancing the performance of the network intrusion detection system. This work aimed to shed a light on the role of the MI of GA, PSO, GWO and FFA in finding the optimal set of features for NIDSs. It aimed to do that based on the UNSW-NB15 datasets. The ML of POS, GA, GWO and FFA algorithms was not considered in any of the works that address NIDSs. The section below offers information about the proposed model.

The Proposed Model
The proposed feature selection model aims at enhancing the performance of NIDSs. During recent years, numerous researchers employed data mining and ML techniques to solve problems and optimize system performance. This work has used the latter technique and reduced the number of features to raise the efficiency of NIDSs. Figure 3 presents the architecture of the model that has been proposed. The following subsection identifies the stages of the proposed model in details.

The Pre-Processing Stage
Through providing more appropriate data for the EvoloPy-FS optimization framework [34,35], the UNSW-NB15 dataset passed through several pre-processing steps. Those steps are identified below:

A
The removal of the labels: Each feature in the original UNSW-NB15 dataset has a label. Removing those labels is important in order to adapt the dataset with the EvoloPy-FS environment; B Removing Features: The original UNSW-NB15 dataset has 45 features. Two features of those features are class labels (attack cat and label). The attack cat cannot be considered as a feature. Thus, it is important to delete it. Deleting it is important because the main objective sought from this work is represented in reducing the features; C Label encoding: Some labels in the dataset-e.g., protocol, state and service type-are given string values. Therefore, it is very significant to have those values encoded into numerical values; D Data binarization: The numerical data in the dataset are in various ranges. During the training process, these data provide the classifier with a variety of challenges in order to compensate for such variations. Therefore, the values in each feature must be standardized. Thus, the least value in each one of the features should be 0. However, the maximum value should be 1. It makes the classifier more homogeneous. It preserves the difference between the values of each feature.
the performance of the network intrusion detection system. This work aimed to shed a light on the role of the MI of GA, PSO, GWO and FFA in finding the optimal set of features for NIDSs. It aimed to do that based on the UNSW-NB15 datasets. The ML of POS, GA, GWO and FFA algorithms was not considered in any of the works that address NIDSs. The section below offers information about the proposed model.

The Proposed Model
The proposed feature selection model aims at enhancing the performance of NIDSs. During recent years, numerous researchers employed data mining and ML techniques to solve problems and optimize system performance. This work has used the latter technique and reduced the number of features to raise the efficiency of NIDSs. Figure 3 presents the architecture of the model that has been proposed. The following subsection identifies the stages of the proposed model in details.

The Selection of Features Based on the Bio-Inspired Metaheuristic Algorithms
The selection of subset features is a difficult challenge. It cannot be managed efficiently when the dimensionality of the feature is high. Bio-inspired metaheuristic algorithms are suitable for addressing this challenge. They can offer high-quality solutions within an acceptable duration and through exerting reasonable effort. Through the proposed model, four subsets were extracted based on the GA, PSO, GWO and FFA algorithms.

GA Features Selection
GA [11][12][13] is an evolutionary search method that is employed for addressing the optimization problems based on a natural selection method. GA encodes a set of solutions for addressing the optimization problem. Those solutions are randomly generated to form a population. Then, GA evaluates this population in terms of a fitness function. The best solution is selected based on the problem being solved. It is assessed in terms of accuracy, root mean squared error (RMSE), F-measure or the area under curve (AUC). The fitter individuals were chosen for a set of reproduction operations, which are crossover and mutation. This operation gets repeated until it meets the termination criterion. This shall lead to forming a set of generations.

PSO Features Selection
Particle swarm optimization (PSO) was developed by Russell Eberhart and James Kennedy [36]. It was developed based on a simple concept derived from the movement of bird flocks and fish schools. It was developed after making several interpretations through using computer simulations.
PSO employs a variety of agents (particles) that make up a swarm. This swarm travels around in the search space in order to find the solution deemed the best. Regarding each particle in the search space, it alters its "flying" to match its flying experience and other particles' flying experience.
PSO is launched by randomly generated particles and their velocity, which indicate the search speed. Then, similar to the GA algorithm, the particles are evaluated in terms of fitness. Such evaluation is followed by two main tests. The first test compares the experience of a particle with itself, which is called personal best (pbest). The second test compares the fitness of a particle with the whole swarm experience. It is called global best (gbest). Performing these two tests leads to saving the best particle. After that, the termination criterion is met.

GWO Features Selection
GWO was proposed by Mirjalili et al. [17]. It was developed through performing hunting procedures. It was developed based on the leadership skills of grey wolves. The social hierarchy of wolves is shown in Figure 4. It describes four kinds of wolves: beta, alpha, omega and delta.  [17]. It was developed through performing hunting procedures. It was developed based on the leadership skills of grey wolves. The social hierarchy of wolves is shown in Figure 4. It describes four kinds of wolves: beta, alpha, omega and delta. Alpha wolves are decision makers. They may not be the strongest in the pack, but they are certainly the best to manage the pack. This is because managing and organizing the pack are more significant than strength. Beta is a lower level wolf in the pack. It operates as an advisor to the alpha. It should be capable of taking the alpha's place in case of death or any other circumstances. Moreover, it reinforces the alpha's decisions among the members of the pack. It provides the alpha with the feedback of the members of the pack about the decision made by the alpha. Omega is deemed as the lowest level wolf among the pack. It acts as a scapegoat as the members of the pack submit to dominants. The existence of omega is very important. That is because omega preserves the dominant structure and satisfies the whole pack. Delta represents the rest of the pack which submits to beta and alpha. It includes: sentinels, scouts, elders, caretakers and hunters belonging to this level.
Based on this hierarchy, the group hunting process is performed through following three main steps. These steps are identified below: 1) Tracking the prey and chasing and approaching it; 2) Pursuing the prey and encircling and harassing it to stop its movement; 3) Launching an attack against the prey being attacked. The algorithm mimics the whole described hierarchy and group hunting procedures. It mimics those procedures to solve complex engineering problems.

FFA Features Selection
The firefly optimization algorithm (FFA) for feature selection is a metaheuristic algorithm. It was proposed by Xin-She Yang [37]. It is based on tropical fireflies' communication behavior. It is also based on the idealized flashing pattern behavior. FFA employs the following idealized rules to construct the mathematical model of the algorithm. a) Regarding all the fireflies, they are unisex; b) The brightness of the fireflies is proportional to their attractiveness; c) The firefly's brightness is determined and influenced by the environment of the objective functions. In terms of the maximization problem, the brightness may be proportional to the value of the objective function.
The regular firefly algorithm includes two significant points. The first point is the formulation Alpha wolves are decision makers. They may not be the strongest in the pack, but they are certainly the best to manage the pack. This is because managing and organizing the pack are more significant than strength. Beta is a lower level wolf in the pack. It operates as an advisor to the alpha. It should be capable of taking the alpha's place in case of death or any other circumstances. Moreover, it reinforces the alpha's decisions among the members of the pack. It provides the alpha with the feedback of the members of the pack about the decision made by the alpha. Omega is deemed as the lowest level wolf among the pack. It acts as a scapegoat as the members of the pack submit to dominants. The existence of omega is very important. That is because omega preserves the dominant structure and satisfies the whole pack. Delta represents the rest of the pack which submits to beta and alpha. It includes: sentinels, scouts, elders, caretakers and hunters belonging to this level.
Based on this hierarchy, the group hunting process is performed through following three main steps. These steps are identified below: (1) Tracking the prey and chasing and approaching it; (2) Pursuing the prey and encircling and harassing it to stop its movement; (3) Launching an attack against the prey being attacked. The algorithm mimics the whole described hierarchy and group hunting procedures. It mimics those procedures to solve complex engineering problems.

FFA Features Selection
The firefly optimization algorithm (FFA) for feature selection is a metaheuristic algorithm. It was proposed by Xin-She Yang [37]. It is based on tropical fireflies' communication behavior. It is also based on the idealized flashing pattern behavior. FFA employs the following idealized rules to construct the mathematical model of the algorithm.
(a) Regarding all the fireflies, they are unisex; (b) The brightness of the fireflies is proportional to their attractiveness; (c) The firefly's brightness is determined and influenced by the environment of the objective functions.
In terms of the maximization problem, the brightness may be proportional to the value of the objective function. The regular firefly algorithm includes two significant points. The first point is the formulation of the light intensity. The second point is the shift in attractiveness. One can always presume that the encoded objective feature landscape shall determine the brightness of the firefly. One should describe the light intensity difference and formulate the attractiveness adjustment.

Feature Selection Model Based on MI
The feature selection's set of bio-inspired metaheuristic algorithms are described as follows: • Selected feature set based on PSO (S1); • Selected feature set based on GWO (S2); • Selected feature set based on FFA (S3); • Selected feature set based on GA (S4).
One subset from those selected feature sets is generated based on MI using different rules as displayed in Table 1. Table 1. The rules of the proposed model.

Machine Learning Classifiers
MLCs are used in ML for classifying data. Therefore, the output resulting in the feature set based on the proposed model rules is used as the input to the ML classifier. The function of the classifier is represented in classifying the incoming data as normal or abnormal data. In the present study, SVM and the J48 classifier are used.

SVM Classifier
SVM is a binary classifier. It is a common approach for making classifications between two classes. In SVM, a hyper plan is created to distinguish the positive sample class from the negative sample class based on the structural risk minimization principle. Alternatively, by choosing from different kernel functions, SVM can solve the problems of linear classification. SVM can get extended to nonlinear classification cases. It is a significant classification ML method because it employs the statistical learning theory [38]. Furthermore, due to its use for the structure risk minimization method, SVM has a strong generalization capability. Hence, SVM can be seen as a method that is better and more effective than another possible classifier. Through reviewing the relevant works that shed a light on IDSs [27,39], it has been proved that SVM is an effective classifier and shows a performance that is better than other classifiers.

J48 (C4.5 Decision Tree) Classifier
The J48 algorithm is the most popular tree classifier. It was developed by Quinlan [40]. It is an ID3 algorithm extension which uses a predictive ML model. The J48 algorithm uses the improved tree pruning technique to reduce the number of classification errors. In addition, the J48 algorithm adopts a dividing-and-conquer greedy approach for inducing recursively the decision trees that contain the features of the dataset for performing an additional classification. The J48 classifier algorithm is divided into datasets based on the attribute values of data to distinguish the probable prediction. The J48 classification algorithm will build its decision tree based on the theoretical attribute values of the present training data. Furthermore, in the J48 algorithm, each feature calculates the gain value separately. The estimation process proceeds until the process of prediction gets completed. A suitable feature is defined as the function that offers much information about the data instances. Several studies aimed at exploring the impact of using the J48 algorithm to improve the accuracy of IDSs [41].

Dataset Description
The dataset performs a major function in testing an IDS and measuring its performance. During the last couple of decades, numerous IDS datasets were introduced. Such datasets include: DARPA Dataset, KDDCup99, NSL-KDD and UNSW-NB15. A dataset typically consists of several attributes. These attributes are named class and feature. Most of the studies that shed a light on IDSs employed KDDCup99 and NSL-KDD. This study used the UNSW-NB15 dataset because the KDDCup99 and NSL-KDD datasets cannot meet the requirements of the current study. That is attributed to the rapid development of network security and the need for meeting operational requirements. While having inherent vulnerabilities in the dataset, they do not have typical traffic of the modern day nor do they have modified patterns of attack.
The UNSW-NB15 dataset was developed recently by Moustafa et al [42]. Figure 5 shows UNSW-NB15 testbed. The UNSW-NB15 dataset is a hybrid dataset that consists of an actual current normal network operation and synthetic modified attack. The researcher of the present study employed the UNSW-NB15 dataset in this research. The UNSW-NB15 dataset was created through using IXIA PerfectStorm, an attack generation tool. It contains nine families of modified attacks and real ones. These attacks are launched against different servers. The authors obtained tcpdump traces of the network traffic at the beginning of 2015 for a total period of 31 h. They created a dataset from these network logs, which consists of 49 features for each network flow. Support is received from Argus, Bro-IDS and custom utilities. Through such support, the features are extracted during the development process of the UNSW-NB15 dataset. They feed the pcap files into Bro-IDS and Argus. Regarding Argus, it is capable of handling raw traffic in the network. It operates on a client-server model, where the Argus server converts raw pcaps files into a format consistent with Argus. The Argus client will then read the functions and extract them from the Argus scripts. For every data instance, 49 connection features are available. Some of the features are statistical and other features are numerical. Other features suggest time stamp values. The UNSW-NB15 dataset got split into two datasets, testing datasets and training datasets. In the training Figure 5. UNSW-NB15 testbed [42]. Support is received from Argus, Bro-IDS and custom utilities. Through such support, the features are extracted during the development process of the UNSW-NB15 dataset. They feed the pcap files into Bro-IDS and Argus. Regarding Argus, it is capable of handling raw traffic in the network. It operates on a client-server model, where the Argus server converts raw pcaps files into a format consistent with Argus. The Argus client will then read the functions and extract them from the Argus scripts.  Table 2. There are features missing in the UNSW-NB15 training and testing datasets. These features are: ltime, sport, scrip, stime and dstip.

Performance Evaluation Metrics
For assessing the efficiency level of the proposed model, the following metrics employ several features. These metrics are: true positive (TP), true negative (TN), false positive (FP) and false negative (FN) [43]. The confusion matrix-as displayed in Table 3-calculates true positive rate (TPR), true negative rate (TNR), false positive rate (FPR) and false negative rate (FNR). Based on these metrics, other factors may be derived. Such factors include: sensitivity, precision, accuracy and F-measure. TPR is measured for estimating the quantity of the normal data identified as being normal data. It is calculated as follows: TNR is measured for estimating the quantity of the attack data identified as being attack data. It is calculated as follows: FPR is measured for estimating the quantity of the attack data that is identified as being normal data. It is calculated as follows: FNR is measured for estimating the quantity of the normal data that is identified as being attack data. It is calculated as follows: Accuracy is represented in a percentage. It refers to the degree to which the instances are predicted correctly. It is calculated as follows: Precision is represented by the ratio of the number of decisions that are considered correct. It is represented in the TP divided by the sum of FP and TP. It is calculated as follows: Sensitivity is represented in the number of TP evaluations that is divided by the number of all of the positive evaluations. It is calculated as follows: The F-measure serves as a measure for testing the level of accuracy. It refers to the balance existing between sensitivity on the one hand and precision on the other. It is calculated as follows: F − Measure = 2 * Precision * Sensitivity Precision + Sensitivity (9)

Experimental Evaluation Results
The proposed model is evaluated based on the J48 and SVM ML classifiers. The outcomes of the experiment that is based on the J48 classifier are presented in Table 5. The results of the experiment that is based on SVM are presented in Table 6. The rates of the classification accuracy of the proposed approach are within the range of 79.175%-90.484% based on the J48 classifier. The rates of the classification accuracy of the proposed approach are within the range of 79.077%-90.119% based on the SVM classifiers.
The accuracies of most of the reduction rules of the proposed models in this paper are higher than the accuracy of all the features. All the algorithms and rules were evaluated in terms of TPR, FNR, TNR, FPR, accuracy, sensitivity, precision and F1-measure. It was found that the rules of the proposed model vary in terms of effectiveness.  Based on the TPR, the data are normal and identified as normal. Figure 6 presents the TPR for all the features and the rules based on J48 and SVM. Based on the TPR, the data are normal and identified as normal. Figure 6 presents the TPR for all the features and the rules based on J48 and SVM. The TPRs of the features based on the J48 and SVM classifiers are 63.99% and 63.96%, respectively. The performance of J48 is a little better than the performance of SVM. GA has the highest TPR based on the SVM classifier. R13 shows the highest TPR based on the J48 classifier.
The FNR indicates that the data are normal and identified as attack. Figure 7 shows the FNR results for SVM and J48. The TPRs of the features based on the J48 and SVM classifiers are 63.99% and 63.96%, respectively. The performance of J48 is a little better than the performance of SVM. GA has the highest TPR based on the SVM classifier. R13 shows the highest TPR based on the J48 classifier.
The FNR indicates that the data are normal and identified as attack. Figure 7 shows the FNR results for SVM and J48. The highest FNR was obtained from all features based on J48 and SVM. GA has the lowest FNR based on the SVM classifier. R13 has the lowest FNR based on the J48 classifier.
The FPR indicates that the data are attack and identified as normal. Figure 8 shows the FPR results based on SVM and J48. The highest FPR was obtained from R11 and R9 and R8 for J48 and SVM, respectively. PSO shows the lowest FPR based on the J48 and SVM classifiers.
The TN indicates that the data are attack data and identified as attack. Figure 9 shows the TNR results of the proposed model based on SVM and J48. The highest FNR was obtained from all features based on J48 and SVM. GA has the lowest FNR based on the SVM classifier. R13 has the lowest FNR based on the J48 classifier.
The FPR indicates that the data are attack and identified as normal. Figure 8 shows the FPR results based on SVM and J48. The highest FNR was obtained from all features based on J48 and SVM. GA has the lowest FNR based on the SVM classifier. R13 has the lowest FNR based on the J48 classifier.
The FPR indicates that the data are attack and identified as normal. Figure 8 shows the FPR results based on SVM and J48. The highest FPR was obtained from R11 and R9 and R8 for J48 and SVM, respectively. PSO shows the lowest FPR based on the J48 and SVM classifiers.
The TN indicates that the data are attack data and identified as attack. Figure 9 shows the TNR results of the proposed model based on SVM and J48. The highest FPR was obtained from R11 and R9 and R8 for J48 and SVM, respectively. PSO shows the lowest FPR based on the J48 and SVM classifiers. The TN indicates that the data are attack data and identified as attack. Figure 9 shows the TNR results of the proposed model based on SVM and J48. The highest TNR was obtained from PSO for J48 and SVM. R11 and R9 and R8 obtained the lowest rates for J48 and SVM, respectively.
The accuracy reflects how accurate the process of classifying the normal and anomalous behaviors is. It is calculated as the percentage of the correctly categorized data in all dataset ranges. Figure 10 shows the accuracy of the proposed model based on SVM and J48. The results in Figure 10 show that R13 and R12 have the highest accuracy. R13 shows an accuracy of 90.48% based on J48. It shows an accuracy of 90.12% based on SVM. R13 reduced the number of The highest TNR was obtained from PSO for J48 and SVM. R11 and R9 and R8 obtained the lowest rates for J48 and SVM, respectively.
The accuracy reflects how accurate the process of classifying the normal and anomalous behaviors is. It is calculated as the percentage of the correctly categorized data in all dataset ranges. Figure 10 shows the accuracy of the proposed model based on SVM and J48. The highest TNR was obtained from PSO for J48 and SVM. R11 and R9 and R8 obtained the lowest rates for J48 and SVM, respectively.
The accuracy reflects how accurate the process of classifying the normal and anomalous behaviors is. It is calculated as the percentage of the correctly categorized data in all dataset ranges. Figure 10 shows the accuracy of the proposed model based on SVM and J48. The results in Figure 10 show that R13 and R12 have the highest accuracy. R13 shows an accuracy of 90.48% based on J48. It shows an accuracy of 90.12% based on SVM. R13 reduced the number of The results in Figure 10 show that R13 and R12 have the highest accuracy. R13 shows an accuracy of 90.48% based on J48. It shows an accuracy of 90.12% based on SVM. R13 reduced the number of features into 30 features. R12 shows an accuracy of 89.58% and 89.33% based on J48 and SVM, respectively. R12 reduced the number of features into 13 features.
Precision refers to the ratio of the truly positive to all the positive results. Figure 11 shows the precision of the proposed model based on SVM and J48. It was found that PSO shows the highest precision. Precision refers to the ratio of the truly positive to all the positive results. Figure 11 shows the precision of the proposed model based on SVM and J48. It was found that PSO shows the highest precision. Figure 11. The precision rate. Figure 12 shows the sensitivity rate of the proposed model based on J48 and SVM. Sensitivity reflects the ability of an IDS to identify a relation as being an attack.    Figure 11 shows the precision of the proposed model based on SVM and J48. It was found that PSO shows the highest precision. Figure 11. The precision rate. Figure 12 shows the sensitivity rate of the proposed model based on J48 and SVM. Sensitivity reflects the ability of an IDS to identify a relation as being an attack.  The results in Figure 12 show that R13 and R12 show the best sensitivity rates. The other features show the worst sensitivity rates based on both classifiers. This means that R13 and R12 can detect anomalies effectively at higher rates.
F-measure considers precision and sensitivity. F-measure acts as a balance between precision and sensitivity. It serves as the harmonious measure of sensitivity and precision. Figure 13 shows the F-measure values for the proposed model based on J48 and SVM. The results in Figure 12 show that R13 and R12 show the best sensitivity rates. The other features show the worst sensitivity rates based on both classifiers. This means that R13 and R12 can detect anomalies effectively at higher rates.
F-measure considers precision and sensitivity. F-measure acts as a balance between precision and sensitivity. It serves as the harmonious measure of sensitivity and precision. Figure 13 shows the F-measure values for the proposed model based on J48 and SVM. Based on the results above, R13 and R12 show the best F-measure rates, whereas, the other features have the worst F-measure rate. It can be observed that all the evaluation metrics can evaluate the quality of IDSs. As for accuracy and the F-measure rate, they can be used to evaluate on the overall efficiency of the NIDS.
Based on all experiments, R13 and R12 provided the best results in terms of the F-measure, accuracy and sensitivity based on the J48 and SVM classifiers. GA shows good results in terms of FNR and TPR, whereas H11, H9 and H8 show good results in terms of FPR. PSO shows good results in terms of precision and TNR.

Conclusion
Improving an intrusion detection system is something which is challenging. The detection rate of an NIDS is affected by the number of features. The key task of data mining and ML techniques aim at improving the detection accuracy and reducing the positive false rate for an NIDS. The latest models failed to identify the network intrusion through using all the UNSW-NB15 dataset features. The researcher of the present study aimed to propose an NIDS model that contains 17 rules for feature selection. The latter model is based on the UNSW-NB15 dataset. The proposed feature selection model is based on the PSO, GWO, FFA and GA bio-inspired algorithms and MI. In the case of bioinspired algorithms, PSO reduces the number of the selected features to 25 features; GWO reduces the number of the selected features to 20 features; FFA reduces the number of the selected features to 21 features; and GA reduces the number of the selected features to 23 features. In the case of the MI of PSO, GWO, FFA and GA, R1 reduces the number of the selected features to 12 features; R2 reduces the number of the selected features to 11 features; R3 reduces the number of the selected features to 12 acceptable features; R4 reduces the number of the selected features to 11 features; R5 reduces the number of the selected features to 10 features; R6 reduces the number of the selected features to 15 Based on the results above, R13 and R12 show the best F-measure rates, whereas, the other features have the worst F-measure rate. It can be observed that all the evaluation metrics can evaluate the quality of IDSs. As for accuracy and the F-measure rate, they can be used to evaluate on the overall efficiency of the NIDS.
Based on all experiments, R13 and R12 provided the best results in terms of the F-measure, accuracy and sensitivity based on the J48 and SVM classifiers. GA shows good results in terms of FNR and TPR, whereas H11, H9 and H8 show good results in terms of FPR. PSO shows good results in terms of precision and TNR.

Conclusions
Improving an intrusion detection system is something which is challenging. The detection rate of an NIDS is affected by the number of features. The key task of data mining and ML techniques aim at improving the detection accuracy and reducing the positive false rate for an NIDS. The latest models failed to identify the network intrusion through using all the UNSW-NB15 dataset features. The researcher of the present study aimed to propose an NIDS model that contains 17 rules for feature selection. The latter model is based on the UNSW-NB15 dataset. The proposed feature selection model is based on the PSO, GWO, FFA and GA bio-inspired algorithms and MI. In the case of bio-inspired algorithms, PSO reduces the number of the selected features to 25 features; GWO reduces the number of the selected features to 20 features; FFA reduces the number of the selected features to 21 features; and GA reduces the number of the selected features to 23 features. In the case of the MI of PSO, GWO, FFA and GA, R1 reduces the number of the selected features to 12 features; R2 reduces the number of the selected features to 11 features; R3 reduces the number of the selected features to 12 acceptable features; R4 reduces the number of the selected features to 11 features; R5 reduces the number of the selected features to 10 features; R6 reduces the number of the selected features to 15 features; R7 reduces the number of the selected features to 6 features; R8 reduces the number of the selected features to 6 features; R9 reduces the number of the selected features to 7 features; R10 reduces the number of the selected features to 9 features; R11 reduces the number of the selected features to 5 features; R12 reduces the number of the selected features to 13 features; and R13 reduces the number of the selected features to 30 features.
R13 and R12 show the best results in terms of F-measure, accuracy and sensitivity based on the J48 and SVM ML classifiers. The researcher recommends conducting other studies for assessing the effectiveness of the proposed model using deep learning architectures, such as: the recurrent neural network (RNN) and convolutionary neural network (CNN).