A Novel Hybrid IDS Based on Modiﬁed NSGAII-ANN and Random Forest

: Machine-learning techniques have received popularity in the intrusion-detection systems in recent years. Moreover, the quality of datasets plays a crucial role in the development of a proper machine-learning approach. Therefore, an appropriate feature-selection method could be considered to be an inﬂuential factor in improving the quality of datasets, which leads to high-performance intrusion-detection systems. In this paper, a hybrid multi-objective approach is proposed to detect attacks in a network efﬁciently. Initially, a multi-objective genetic method (NSGAII), as well as an artiﬁcial neural network (ANN), are run simultaneously to extract feature subsets. We modiﬁed the NSGAII approach maintaining the diversity control in this evolutionary algorithm. Next, a Random Forest approach, as an ensemble method, is used to evaluate the efﬁciency of the feature subsets. Results of the experiments show that using the proposed framework leads to better outcomes, which could be considered to be promising results compared to the solutions found in the literature.


Introduction
Presently, the internet can be considered to be an undeniable part of people's daily lives, and the total number of internet users has grown, exponentially. As a result, these users are eager to transmit a higher volume of critical data through the wires. Therefore, the infrastructure built for data transmission should consider security issues to create a reliable, accurate, and configurable security system. To protect the security and integrity of user data, various tools, such as firewalls, antivirus, encryption, and authentication applications are in place. However, the mentioned tools have not been efficient enough to safeguard the systems against various types of threats [1]. Moreover, when it comes to capability of detecting attacks, these tools face difficulty in separating Dos attacks from normal traffic [2]. To improve the security of the networks, it is suggested to combine the firewalls with intrusion-detection systems [3]. Ashfaq et al. [4] have described the intrusion-detection process as a series of steps which enables us to monitor, detect, and analyze activities violating network security policies [5]. Denning [6] et al. proposed a framework for detecting the network attacks, which is called an intrusion-detection system. This framework is based on the assumption that security violation can be identified by monitoring system audit records for abnormal patterns of system usage [4]. In other words, the attacker's behavior can be considered to be a basis for anomaly-based detection systems [7]. These systems define malicious behavior as an 1.
Can multi-objective-based feature-selection frameworks be used to address this problem? 2.
Since the redundancy in feature subsets is defined as a deficiency [10], how can we address the redundant feature subsets issue?
Hence, the motivation of this paper is to fill this gap by using multi-objective techniques. We have addressed the first research question by applying the conjunction of NSGAII, as a multi-objective genetic method, and an artificial neural network to solve the aforementioned issue. The research done in [10] mentions the availability of redundant feature subsets in traditional NSGAII as one of the main disadvantages of this approach for feature-selection purposes. To address the second research question, we have modified the NSGAII method to remove the redundant feature subsets to be replaced with new individuals. The most significant contributions of this paper can be stated as follows: • We proposed a novel hybrid intrusion-detection framework. The correlation between the multi-objective genetic algorithm (NSGAII) and the neural network builds the basis of the feature-selection approach. Afterwards, we have applied Random Forest to evaluate the efficiency of the method.

•
The NSGAII approach used in this framework is modified to improve the diversity of solutions through the redundant solution removal.
The rest of this paper is organized as follows. An overview of related works is presented in Section 2. Section 3 describes the datasets employed to evaluate the efficiency of the presented methodology and their advantages versus those found in the literature. Section 4 describes the proposed method including the pre-processing stage applied on the datasets, the details of the proposed feature-selection method, and the classification technique used in conjunction with the NSGAII method , and the ensemble method used to examine the selected feature subsets. Section 5 lists and explains the common performance metrics applied in the literature and the results are discussed in Section 6. Finally, Section 7 concludes the paper.

Related Work
In this section, we have conducted a narrative literature review of the latest feature-selection approaches used in intrusion-detection systems. We have focused our searching process to discover the answer to the research questions mentioned previously in Section 1.
Selvakumar et al. [11] have proposed a novel feature-selection method based on the firefly algorithm. Next, they have applied the C4.5 and Bayesian network at the intermediate stage to evaluate the feature-selection approach. The number of features has been fixed at a predefined value. If the amount of the proposed features is higher than this value the mutual information (MI) would be used to select the best features; otherwise, the MI would be used to add more features from unselected ones.
The work conducted in [12] is based on the combination of the evolutionary algorithm and support vector machine as a hybrid method. This technique consists of two main stages. In the first step, the genetic algorithm is applied to create the new feature sets. Then, the support vector machine (SVM) classification performance is used to optimize the feature-selection process. 10 out of 45 attributes are selected in this proposed hybrid method. Javaid et al. [13] have applied a deep learning approach for intrusion detection. The self-taught algorithm is selected in this work, and three metrics such as sum of square-error, weight decay, and sparsity penalty have been employed in the cost function.
Kang et al. [14] have employed a local search algorithm for the task of feature selection. The cost function used in this research is based on a clustering method, the K-means. This technique has been used to divide the samples into two distinct categories, labelled, normal cluster and Dos attack cluster. To evaluate the functionality of this method, Multi-layer Perceptron( MLP) has been applied on the NSL-KDD [15] dataset.
Khammassi et al. [3] have proposed a wrapper feature-selection method to decrease the feature subset size. In this work, they have searched the feature space by genetic algorithm initially. Next, the proposed feature subset has been evaluated by logistic regression(LR). The decision tree used to evaluate the effectiveness of this method is built based on the combination of three different classifiers C4.5, Random Forest, and Naive Bayes, respectively.
In [8], a combination of the filter and wrapper approaches have been employed to select the ideal feature subsets from the main dataset. The authors have used the feature grouping concept, which leads to lower variance and greater feature-selection stability. This technique applies the feature grouping linear correlation coefficient (FGLCC) as the feature ranking method in the first step. Afterwards, the cuttlefish (CFA) has been applied to the feature subsets to improve the final efficiency of the proposed method. The most significant benefit mentioned for this method is the combination of filter and wrapper methods, which enables the system to get the positive points of both approaches.
Aghdam et al. [16] have proposed a feature-selection method based on a nature-inspired meta-heuristic approach. The ant colony optimization approach has been used to address the dimensionality issue in the intrusion-detection problem. They have applied the proposed framework on NSL-KDD [15], and KDDCup99 [17] datasets. The outputs of the proposed method show a notable decline in the memory size and the CPU time required for intrusion detection by reducing the number of the features. As a result, it can be considered to be a reliable approach.
The work reported in [18] is based on the application of various machine-learning algorithms, such as Bayes Net, J48, Random Forest, and Random Tree on KDDCup99 dataset. The Random Forest and Random Tree methods have resulted in the highest optimized accuracy in this system. The feature-selection approach used on the dataset is the correlation-based feature selection with a Best First search method.
A feature-selection technique which improved the classifier performance has been introduced in [19]. The method used in this work is based on intelligent water drops (IWD). This method can be considered to be a nature-inspired optimization process. The support vector machine is responsible for evaluating the performance of the released dataset. The data set used in this work is KDDCUP99, and the samples are divided into normal and attack categories.
Aljawarneh et al. [20] have developed a feature-selection method which minimizes significantly the computational and time complexity. In the provided framework, initially, the feature subset size is decreased from 41 to 8 features involved in this dataset using the information gain feature-selection approach. First, the features with information gain upper than 0.4 were selected and next an ensemble classifier is applied to examine the performance of the classifier. The results of this approach enhanced the detection accuracy and lowered the false positive rate.
The research done in [21] has proposed an intelligent intrusion-detection system. The feature-selection step applied to this system is based on two ranking methods called the information gain and correlation methods. Then, a novel feature-selection approach is introduced to select useful features from the ranked features. Finally, the Artificial neural network is applied to evaluate the proposed feature subsets.
Bostan et al. [22] have presented a hybrid feature-selection method based on binary gravitational search algorithm (BGSA) and mutual information (MI). This technique can be considered to be a combination of both hybrid and wrapper feature-selection approaches. Moreover, a multi-objective function has been defined to maximize the detection rate and minimize the false-positive rate simultaneously by employing the binary gravitational search algorithm. The proposed feature-selection method is tested by support vector machine (SVM) on NSL-KDD [15] dataset.
An incremental feature-selection algorithm is proposed in [23], which merges the cuttlefish and extended chi-square method. Then, an intelligent classification method is used in the classification stage. This classification method is named a multi-class support vector machine. The clustering, intelligent agent, and decision tree are applied to this method to get improved results for classification accuracy. Potluri et al. [24] refer to the parallel computing abilities of the neural network as an essential factor to enhance the efficiency of the intrusion-detection systems. They have applied the deep learning neural network on the NSL-KDD dataset to fulfil the aforementioned benefit of this method. The differential evolution is considered to be a powerful tool in continuous optimization problems. However, the work reported in [25] established its capabilities in discrete optimization problems as well. They have proposed an intrusion-detection system which applies the discretized differential evolution (DDE) and C4.5 in the feature-selection step. The NSL-KDD dataset is used in this work to show the performance of the technique.
The research done in [26] applied the NSGAII approach for feature-selection purpose in intrusion-detection systems. This multi-objective feature selection is used to reduce the complexity of Growing Hierarchical Self-Organizing Maps (GHSOMs) as an unsupervised clustering procedure. However, the issue regarding the redundant feature subset presence in NSGAII has not been discussed in this paper. To the best of our knowledge, none of the current research available in the field of intrusion-detection systems has recognized, and resolved this issue for feature-selection purposes. As a result, we have decided to modify our NSGAII-ANN approach to resolve the redundancy issue in this framework.

DataSet Description
The KDDCUP99 dataset [17] is a publicly available dataset which has been widely applied in the research area of the intrusion-detection systems. It contains 5 million training and more than 2 million testing samples. In this work, we have applied a refined version of this dataset, which is called NSL-KDD [15]. The improvements of the novel dataset compared to the previous version could be mentioned as follows: • The redundant records of the KDDCUP99 has been removed in the newer version. These records were leading the classifiers to illustrate biased results in favor of frequent records.
• The logical number of records which exist in the dataset enables the experiment to be executed on the full dataset instead of randomly choosing a small segment of the dataset.

•
The volume of the selected records has an inversely proportional relationship with the percentage of records in KDDCUP99 dataset.
The second dataset which has been applied in this research to illustrate the efficiency of the proposed method is called the UNSW-NB15 [27]. It was developed in 2015 to solve some of the issues regarding NSL-KDD dataset. This dataset offers the novel categories of cyber-attacks as well as normal samples [28]. Moreover, it involves nine various attack categories, as follows: 'Fuzzers', 'Dos', 'Analysis', 'Reconnaissance', 'Exploits', 'ShellCode', 'Worm', 'Backdoor', and 'Generic' [29].
Tables 1 and 2 describe the sample distribution among different classes in UNSW-NB15, and NSL-KDD datasets, respectively.

Proposed Framework
In the proposed framework (Figure 1), the original datasets are fed into the prepossessing stage, initially. In other words, the nominal values of the selected datasets are transformed into integer numbers, and all the features are normalized changing them to a standard scale, without distorting differences in the ranges of values. In phase I, the interaction between the modified NSGAII, and the artificial neural network (ANN) has built the basis of the feature-selection approach. The proposed feature-selection method would suggest potential feature subsets which could offer the optimized performance. We have modified the NSGAII approach, which is used in this phase. The modification removes the redundant feature subset solutions and improves the diversity of the offered feature subsets. In phase II, the best solution among the suggested feature subsets is chosen. Next, the samples are classified using the Random Forest method. The aforementioned classifier would ascertain whether the specific traffic could be recognized as an intrusion or normal traffic. The details about the steps involved in this framework can be found in the following sections.

The Pre-Processing Stage
The dataset used in the context of intrusion detection contains different data types, such as continuous, discrete, and symbolic with different resolution, and ranges. Most of the existing classification algorithms are inadequate to deal with these heterogeneous datasets. Therefore, it is necessary to pre-process data and to transform those features in a way to be handled by the classification algorithms. The pre-processing steps applied in this research, consists of two main steps. In data transformation, all nominal features are mapped to integer values ranging from 0 to S − 1 where S is the number of symbols.
In normalization, all values of the used n features are linearly scaled into the range of [0, 1] according to Equation (1). Linear scaling is a min-max normalization that consists of finding the minimum and maximum value of the ith feature [3].

Phase I: Feature Selection Using NSGAII-ANN
In the first phase, a modified version of the NSGAII algorithm is applied to establish a feature-selection method. We have defined the feature-selection method to optimize the multi-objective problem of classification error, and the feature subset size minimization, which can be described as follows: where F 1 refers to the first objective function, which is equal to the classification error. F 2 denotes the second objective function, namely features subset size. The feasible solution (feature subset) is depicted by S. The Artificial neural network is employed as the classifier to calculate F 1 (classification error). Afterwards, the modified NSGAII algorithm is applied to optimize the multi-objective feature-selection problem expressed in Equation (2). Figure 2 demonstrates the steps involved in the optimization process of the traditional multi-objective genetic algorithm which is called NSGAII. Initially, the parents in the first iteration of the algorithm are generated through a random population called P i (i = 0). The multi-objective function mentioned in Equation (2) will be computed, and the population will be sorted based on non-dominated sorting and crowding distance. We have defined the crowding distance and crowded-comparison operator in Section 4.2.2 where the best solution can be selected among those ones with lower domination rank. More information about this sorting process can be found in [30]. The sorted parent population in this phase will be served into the main loop of the algorithm. The binary tournament selection, binary cross-over, and mutation are applied to gain the off-springs (O i ). In this paper, A roulette-wheel selection is employed to do the cross-over. This approach is the combination of single-point, double-point, and uniform crossovers with the probability of 0.1, 0.2, and 0.7, respectively.

Traditional NSGAII Algorithm
Imagine In the uniform cross-over, a fixed mixing ratio is applied to the parents. This type of cross-over provides an opportunity to the chromosomes of parents to work on the gene level compared to the segment level used in single and double-point crossovers. The two children are generated according to the following equation: The mutation operator in this formula is built by bitwise mutation. Each individual bit is filliped with the probability of P M (mutation probability) which is the conversion of 1 to 0, and vice versa. Then, the combined population is formed from the parents (P i ) and the off-springs (O i ) population. We will call the current population R i . The multi-objective function defined in Equation (2) will be computed in this step. Now, the total number of individuals in R i is more than the required population size which is equal to N. Consequently, we will need to sort the population based on non-dominated sorting and crowding distance to obtain the best set of solutions with size N. The process of sorting the R(i) based on the aforementioned factors is explained in [30].
First, the non-dominated sorting is applied to sort the population R i . Now, the solutions belonging to the best non-dominated set, F 1 , are considered to be the best solution in the combined population and should be emphasized more than any other solution. All the members of the first Pareto front (F 1 ) will be chosen if the size of F 1 is smaller than N for the P i+1 population. The Subsequent non-dominated fronts will fill the remaining members of the population P i+1 . As a result, the solution F 2 followed by F 3 until F i will fill the gaps. The i will be increased in each step as long as the population does not have enough space to accept the last front sorted by non-dominated sorting completely. Let us call this front as F l . In this case, crowding distance will be the factor which assists us in choosing the most important solutions in this Pareto front. We will apply the crowding distance in descending order, and we will select the best solutions required to fill all population slots.

Crowding Distance
The NSGAII algorithm exploits the crowding-distance assignment to estimate the density of solutions. The crowding-distance for each solution S in the population should be estimated by the cuboid-perimeter measurement. The closest results of each individual in the same non-dominated front are involved in the measurement of this perimeter. Figure 3 depicts the cuboid illustration where f 1 and f 2 refer to the two objective functions, and filled circles indicate the similar non-dominated front solutions. The two closest solutions, S + and S − , are shown by filled circles, and the solution S is surrounded by a box which refers to the crowding-distance, estimated by Equation (5) Algorithm 1 illustrates the crowded-comparison operator process where 'S' as a better solution is shown with s > s . In this algorithm, the favored solution is the case with the lower non-domination rank, and if both of the solutions are part of the same non-dominated front, the solution with the larger crowding distance will be selected.

Modified NSGAII Method
The modification that we have applied in our framework is based on the the redundant feature subsets availability issue which has been previously discussed in Section 1. As mentioned previously, the research in [10] refers to the presence of inefficient and redundant feature subsets as the most significant issue regarding NSGAII application in feature-selection solutions. As a result, we have applied an additional condition to the traditional non-dominated sorting method used in NSGAII to ensure that all the redundant solutions are omitted. The steps involved in this modification can be found in Algorithm 2.

. ANN Algorithm
In a neural network, one of the most significant methods used to compute weight adoption can be mentioned as the Levenberg-Marquardt (LM) approach. The gradient descent rule, as well as the Gauss-Newton method, construct the basis of the LM method. The gradient descent algorithm is applied with large values in the first step, where a parameter specifies the step size. Other small values are employed in the following steps, which are equal to the Gauss-Newton method. The most important advantage of the Levenberg-Marquardt method is that it takes the benefits of both techniques while evading their consequences.
In the LM method, the change ∆ in the weights (w) is obtained by the following equation where E is the mean-squared network error and can be calculated as follows: where N is the number of examples; y(x k ) is the network output corresponding to the example x k , and d k is the desired output for the cited example. The elements of the α matrix are given by: where ρ is the number of the network outputs. Starting from initial random weights, both α and ∇E are evaluated, and by solving (6), a correction for the values of the weights is obtained ( − → W ). This is known as an LM learning process in which each iteration reduces the error until the desired goal is achieved or a minimum is found. In Equation (8), λ is a parameter which is adjusted at each iteration, according to the error evolution. If it is very small, the matrix will become an approximation to the Hessian method.

Phase II: Best Solution Selection and Random Forest Classification for Evaluation
Figures 4 and 5 depict the feature subsets released in phase I for NSL-KDD, and UNSW-NB15 datasets, respectively. Each potential feature subset is depicted by star symbols in these figures, and the curve derived from these stars is known as the Pareto front.
Further information about each of the proposed feature sets for NSL-KDD, and UNSW-NB15 datasets can be found in Tables 3 and 4, respectively. In these tables, Num, nf, selected features, and MSE represent the solution identification number on the Pareto front, the number assigned to the chosen features, and MSE (Mean-square-error) of the first phase classifier, respectively. To select the best feature subset; we have attempted to make a balance between the number of features and the mean-square error.
In our experiment on the NSL-KDD dataset, we have chosen the 7th member on Table 3 as the solution (selected feature subset) for the evaluation process. This individual represents the second solution with the least error rate. Twenty-four out of 41 features presented on the NSL-KDD data set is suggested to be selected from the NSL-KDD to be used in the classification phase, which corresponds to 60% of the entire features. The selected features are listed in the third column of Table 3, These numbers represent the order of selected features on the NSL-KDD dataset. In the second experiment on the UNSW-NB15 dataset, the 6th feature subset available in Table 4 has been selected due to a low MSE and also a low number of features, which reduce the complexity of the dataset and the overall computing power of the classifiers. Therefore, we have selected only 19 features out of the total number of features available on UNSW-NB15 dataset.   The dimension of the both datasets have been reduced according to the recommended feature subsets. Afterwards, we have applied the Random Forest classification technique to evaluate the efficiency of the proposed approach. Ensemble classifiers are being selected to be used in the classification process, as in similar studies they have shown effectiveness in intrusion-detection systems [3]. These types of classifiers integrate several weak classifiers to improve classification performance. Moreover, Boosting and bagging could be categorized as the most well-known approaches in ensemble methods. The boosting approach is based on applying extra weights to incorrect predictions, and the final result is taken through the weighted vote of the predictions. On the other hand, the majority vote on the bootstrap sample of the dataset builds the basis of the bagging method. Random Forest could be considered to be a bagging approach which has received an extra layer of randomness. The structure of the classification and regression trees in Random Forest follows a different pattern compared to standard trees. First, the subset of the predictors are randomly generated. Next, the most efficient of the aforementioned factors are used to split the nodes. This approach has shown better performance in compassion with all the other methods such as: Support vector machine (SVM). It is also robust against over-fitting. As a result, the Random Forest, as an ensemble method, has been applied to the dataset to evaluate the performance of the proposed method.

Evaluation Metrics for IDS
We have applied the evaluation metrics used in the majority of the current state-of-art. Khan et al. [31] introduced accuracy, precision, recall, F-measure, and false-positive rate(FPR) as the most common metrics used in intrusion-detection systems. These metrics can be defined as follows: Precision(P) = TP TP + FP (10) where TP, TN, FP, and FN can be defined as follows:

Experimental Results
The proposed feature-selection solution was implemented using MATLAB R2019a. Next, the reduced size dataset was fed into the Random Forest algorithm available in Waikato Environment for Knowledge Analysis (Weka 3.8) and was executed on a PC with intel Core i7 processor, 2.1 GHz speed and 8 GB RAM. The 10-fold classification approach has been chosen where each dataset is split into ten folds. The test dataset is selected from one of these portions, and the remaining are used to train the classification method. The aforementioned procedure will be repeated over ten iterations, and the final estimation is calculated through the average of previous steps. The main benefit of this method is regarding the employment of all samples in training and testing.
The total number of features in the original NSL-KDD dataset is equal to 41 features. These features have been reduced to 24 features by the NSGAII-ANN approach. Afterwards, they have been classified into normal and abnormal states using the Random Forest classifier. The accuracy of this stage, according to the 10-fold cross-validation test is 99.408. The confusion matrix related to this experiment is presented in Table 5. Moreover, the accuracy obtained in multi-class states is equal to 99.3%, and its confusion matrix can be found in Table 6.  Tables 7 and 8 show the evaluation of the proposed method on the NSL-KDD dataset based on precision, recall, F-measure, false-positive rate, and accuracy. The reported results could be considered to be promising outcomes for both normal-abnormal and Multi-Class states using 24 out of 41 features.  To demonstrate the effectiveness of the proposed method, the experiment has been applied on UNSW-NB15 dataset as well. The accuracy obtained by this method is 94.802 for the two-state categorization of the labels where the class labels are divided into normal, and abnormal. Table 9 shows the confusion matrix regarding this experiment. The proposed method illustrate improved results regarding other factors such as: precision, recall, F-measure with a small low-average value of false-positive rate (0.06). The evaluation results are available on Table 10.  In the multi-state classification, 211,306 out of 257,673 samples were correctly classified, which is slightly equal to 82% of instances. Table 11 demonstrates the confusion matrix of the proposed method on UNSW-NB15, and further evaluations can be found in Table 12. The highest F-measure value has been reported for the normal, and generic classes with the values higher than 92%, and the lowest F-measure value is regarding the Worms, and Back classes which have approximately the lowest number of the instances. As a result, the imbalanced UNSW-NB15 dataset could be considered to be one of the potential factors in this difference between various classes performance. To show the effectiveness of the proposed method, we have compared the results to the state-of-art. The reported accuracy for the NSL-KDD and UNSW-NB15 are equal to 99.4%, and 94.8%, respectively. In addition, the 6% false-positive rate is obtained for the experiment on both datasets. Table 13 depicts the comparison of the proposed method compared to state of the art conducted on the same datasets. The evaluations are divided into two categories where the first, and second sections are related to the NSl-Kdd, and UNSW-NB15 datasets, respectively. Several factors such as the number of features which are employed, the classifier in use, the accuracy, and the false-positive rate have been used in this comparison. The reported results illustrate the superiority of the proposed method compared with the previous works done in this field. The experiments on NSL-KDD dataset demonstrate surpassing outcomes compared to UNSW-NB15 dataset. There are a couple of points which may have engendered the obvious difference between the outcomes. First, the UNSW-NB15 dataset include novel attack and normal classes. Moreover, the UNSW-NB15 dataset involve nine classes of attacks. However, the NSL-KDD dataset is limited to four classes of various attack types. Furthermore, the complexity of UNSW-NB15 features are higher due to the similarities of normal and attack behavior in this dataset.

Conclusions
In this paper, a feature-selection approach has been proposed for intrusion-detection systems. The main objective of this function is to create the optimal feature subsets that could classify the NSL-KDD, and UNSW-NB15 datasets instances. The proposed approach is based on a two-phase framework. A feature-selection step, followed by a classification stage constitutes the main phases of this framework. In phase I, the NSGAII-ANN constructs the basis for the feature-selection stage. The NSGAII method, as a feature search approach, interacts with the artificial neural network (ANN), as the learning algorithm, in this phase. We define the feature selection as a problem, including two competing objectives, and we attempt to discover a set of optimal solutions instead of a single optimal solution. The competing objectives are the minimization of features numbers as well as the classification error employing the ANN classifier. The multi-objective method, called NSGAII provides the opportunity to fulfil both of the aforementioned objectives. To improve the proposed framework, we have modified the traditional NSGAII method. During this process, the redundant solutions are emitted to enhance the diversity of the solutions.
In phase II, the best-chosen feature subset, which is derived from phase I is classified using Random Forest. This method evaluates the selected subsets. A comparison with recent approaches cited in the literature showed an improvement in the accuracy and FP rate for all attack profiles. In the future, we would like to test our proposed solution on other real datasets covering a broader range of attacks, and we would like to apply other classifiers, rather than Random Forest. Moreover, we would like to assess the impact of dataset balancing in the overall accuracy of the minority classes.