Advanced Feature-Selection-Based Hybrid Ensemble Learning Algorithms for Network Intrusion Detection Systems

: As cyber-attacks become remarkably sophisticated, effective Intrusion Detection Systems (IDSs) are needed to monitor computer resources and to provide alerts regarding unusual or suspicious behavior. Despite using several machine learning (ML) and data mining methods to achieve high effectiveness, these systems have not proven ideal. Current intrusion detection algorithms suffer from high dimensionality, redundancy, meaningless data, high error rate, false alarm rate, and false-negative rate. This paper proposes a novel Ensemble Learning (EL) algorithm-based network IDS model. The efﬁcient feature selection is attained via a hybrid of Correlation Feature Selection coupled with Forest Panelized Attributes (CFS–FPA). The improved intrusion detection involves exploiting AdaBoosting and bagging ensemble learning algorithms to modify four classiﬁers: Support Vector Machine, Random Forest, Naïve Bayes, and K-Nearest Neighbor. These four enhanced classiﬁers have been applied ﬁrst as AdaBoosting and then as bagging, using the aggregation technique through the voting average technique. To provide better benchmarking, both binary and multi-class classiﬁcation forms are used to evaluate the model. The experimental results of applying the model to CICIDS2017 dataset achieved promising results of 99.7%accuracy, a 0.053 false-negative rate, and a 0.004 false alarm rate. This system will be effective for information technology-based organizations, as it is expected to provide a high level of symmetry between information security and detection of attacks and malicious intrusion.


Introduction
Every day, different types of new cyber-attacks are discovered, and their sources are becoming more hazardous. As a result, detecting zero-day attacks is a difficult operation that potentially jeopardizes business continuity [1]. Computer attacks are becoming increasingly complex, posing difficulties in accurately detecting the intrusion [2,3]. Network Intrusion detection systems (NIDSs) are meant to monitor computer networks for unusual activities that a regular packet filter would miss Traditional IDSs have a number of flaws, such as the inability to discriminate between new malicious threats; the need for modification; poor accuracy; and a high rate of false alerts. Therefore, machine learning is used to detect new attacks. However, machine learning encounters many challenges because it enhances the computational and time complexity of the task by expanding the search space [4,5]. Numerous studies have been conducted on the use of multiple classifiers instead of single ones and the principle of ensemble learning techniques to ensure high accuracy and a low false alarm rate [6][7][8]. As a result, ensemble learning can be divided into three categories (i.e., bagging, stacking, and boosting) [9][10][11]. It is a general meta approach to machine learning to combine predictions from multiple models to improve predictive performance. Although

•
Reduce the dimensionality of the CICIDS2017 dataset through the proposed coupling of Correlation Feature Selection with Forest Panelized Attributes. • Find the best machine learning (ensemble method) approach to collect the four modified classifiers (Support Vector Machine, Random Forest, Naïve Bayes, and K-Nearest Neighbor) to ensure the best result of the hybrid ensemble method. • Conduct a comparative study between the CFS-FPA and other features selection techniques in terms of accuracy, Detection Rate (DR), and False Alarm Rate (FAR). The outcome will be used to generalize the efficiency of the proposed features selection technique.

•
Compare the four classifiers before and after modification and work as the AdaBoosting method. In addition, comparing the proposed method with other existing approaches.
The remainder of the paper is organized as follows: Section 2 includes a review of similar feature selection techniques and ML-based IDS. Section 3 defines the suggested system, approach, and distinct proposed ML. Section 4 presents the experimental data, discussion, and findings. Finally, Section 5 presents the conclusion and future work.

Related Work
Recently, researchers focused on developing ML-based IDS using two well-known datasets: NSL-KDD, and CICIDS2017. Zhou et al. [19] proposed an IDS based on feature selection and ensemble classifier. This framework is based on feature selection and ensemble learning techniques. In the first step, both heuristic algorithm CFS and Bat Algorithm (BA) are proposed for dimensionality reduction. In the second step, an ensemble approach that combines C4. 5 and Random Forest (RF) algorithms is applied. Finally, it performs a voting technique using NSL-KDD, AWID, and CICIDS2017 datasets. The experimental results of this work reach 84% accuracy in the testing and a 0.15 false alarm rate; 96% accuracy and a 94% detection rate with 10 selected features when applied to NSL KDD datasets; and 94.5% and 92% for accuracy and detection rates, respectively, for UNSW BN15 dataset with 13 features. Jaw and Wang [20] proposed a Comprehensive Approach for IDS. A wrapper methodology based on a genetic algorithm is adopted as a feature selection and logistic regression as an ensemble learning algorithm for network intrusion detection systems. Experimental results show excellent performance accuracy of 98.99%, 98.73%, and 97.997%, and detection rates of 98.75%, 96.64%, and 98.93% for CICIDS2017, NSL-KDD, and UNSW-NB15, respectively, based on only 11, 8, and 13 selected relevant features from the above datasets.
Gupta et al. [21] recommended that ensemble algorithms handle a class imbalance in network-based intrusion detection systems. This work consisted of three stages. The first stage is the deep neural network for splitting and discriminating normal from suspicious traffic network attacks and then for classifying major attacks using the eXtreme Gradient Boosting algorithm as the second stage. The final stage uses Random Forest to classify the minor attacks. This model used NSL-KDD, CIDDS-001, and CICIDS2017 datasets to evaluate the performance of the proposed system. The accuracy achieved was 99% for NSL, 96% for CIDDS-001%, and 92% for CICIDS2017, and complexity time was measured in hours, not in minutes.
Tama et al. [22] used a hybrid feature selection method with two stages of ensemble learning classifiers. CIC-IDS2017 dataset with 37 features was used to evaluate the performance of the proposed system, and the accuracy was 96.46%.
The IDS proposed by Aldallal and Alisa [23] merges genetic algorithm (GA) and support vector machine (SVM), where GA is used to select an optimal set of features from the CICIDS2017 dataset, while SVM is applied to classify the network traffic into benign and abnormal. The results obtained by using CICIDS2017 outperform those obtained when using KDD CUP 99 and NSL-KDD by up to 5.74%.
Pelletier and Abualkibash, in [24], proposed a model to detect intrusions on the network by applying Neural Network as a feature selection method and Random Forest algorithm as a classifier to detect the intrusion. This model is tested by using CIC-IDS2017 dataset, and the experimental result of the accuracy reached 97.30% whereas the number of features used in this model was 30 features.
Abbas et al. in [25] proposed a new ensemble-based intrusion detection system for the Internet of Things. Those researchers used different deployed classifiers (i.e., logistic regression, naive Bayes, and a decision tree) with voting technique. The experimental result using CICIDS2017 with two forms (binary, and multi-class).
An architectural model is presented in [26] for risk assessment (RA) of the information system with the CICIDS2017 dataset using ML algorithms. ML techniques including K nearest neighbors (KNN), NB, gradient boosting tree, RF, and decision tree (DT) were evaluated for RA in this study. The performance of the model was based on the ML technique that has efficient predictively of intrusion. The predictive model was the implementation of ML techniques that produced better results with the CICIDS2017 dataset. For RA, the risk matrix was analyzed by 15 models with predicted results.
All previous works suffer from conflict in the measured values, where some of them are sufficient in the accuracy but not sufficient in other measures, such as [19] where accuracy reached 96% while FAR is 0.15%, [20] where accuracy reached 98.3% while FAR is 0.14%. Hence, the proposed system increases robustness by using an advanced feature selection method based on hybrid ensemble learning algorithms aiming at achieving high accuracy and minimal FAR.

Materials and Methods
The proposed system provides an efficient ML-based IDS that uses new hybrid FS ensemble learning techniques with a voting classifier that is a group of classifiers. It is proposed to enhance the detection capabilities of IDS to protect service providers against selection method based on hybrid ensemble learning algorithms aiming at achieving high accuracy and minimal FAR.

Materials and Methods
The proposed system provides an efficient ML-based IDS that uses new hybrid FS ensemble learning techniques with a voting classifier that is a group of classifiers. It is proposed to enhance the detection capabilities of IDS to protect service providers against attacks. Figure 1 depicts the block diagram of the main idea of the proposed Hybrid Ada-Boosting and Bagging Algorithms (HABBAs).  Figure 1 consists of several stages starting from collecting the data and ending with detecting normal or attack traffic. The following subsections provide an informative explanation of the framework.

Description of CICIDS2017 Datasets
It is a challenging effort for researchers to find an appropriate dataset for evaluating IDSs. This paper applied the CICIDS2017 dataset for experiments. The Canadian Institute for Cybersecurity (CIC) issued the CIC IDS2017 dataset in 2017. It includes benign data and the most recent common attacks [13]. The results of the CIC flow meter network traffic analysis are also included. Protocols, source and destination IPs, ports, and attacks all have time-stamped flows. This dataset is one of the most updated datasets. It includes updated DDoS, Brute Force, XSS, SQL Injection, Infiltration, Port Scan, and Botnet assaults. This dataset has 2,830,743 records which are spread across eight files. Each record contains 78 various features with their labels. The Wednesday-working hours' set is chosen for experimentation using the cross-validation method to retain the same magnitude order of each dataset when multi-classification is needed. Table 1 shows the statistical information for this set, which contains 691,406 occurrences divided into six categories.   Figure 1 consists of several stages starting from collecting the data and ending with detecting normal or attack traffic. The following subsections provide an informative explanation of the framework.

Description of CICIDS2017 Datasets
It is a challenging effort for researchers to find an appropriate dataset for evaluating IDSs. This paper applied the CICIDS2017 dataset for experiments. The Canadian Institute for Cybersecurity (CIC) issued the CIC IDS2017 dataset in 2017. It includes benign data and the most recent common attacks [13]. The results of the CIC flow meter network traffic analysis are also included. Protocols, source and destination IPs, ports, and attacks all have time-stamped flows. This dataset is one of the most updated datasets. It includes updated DDoS, Brute Force, XSS, SQL Injection, Infiltration, Port Scan, and Botnet assaults. This dataset has 2,830,743 records which are spread across eight files. Each record contains 78 various features with their labels. The Wednesday-working hours' set is chosen for experimentation using the cross-validation method to retain the same magnitude order of each dataset when multi-classification is needed. Table 1 shows the statistical information for this set, which contains 691,406 occurrences divided into six categories.  analysis-ready format. This stage consists of three steps. These steps are: (1) filtration, when data is cleaned and duplicate values are removed; (2) transformation, when Label-Encoder and One-Hot Encoding techniques are applied; and (3) normalization, when minimax function is used to scale values between zero and one. The algorithm of this stage is explained in Algorithm 1.

Correlation Feature Selection-Forest Panelized Attribute (CFS-FPA)
This proposed method is explained in detail in [27,28]. It is used to reduce dimensionality and select the best subset features. Based on this method, the best 30 features are selected out of 78 features of the CICIDS17 dataset. Table 2 depicts these 30 features. Figure 3 depicts the main steps of the proposed FS.  In Figure 3, Correlation Feature Selection combined with Forest Panelised Attributes (CFS-FPA) is used to analyze the correlation of the selected features and is effective for enhancing the efficiency of the training and testing phases. In Figure 3, Correlation Feature Selection combined with Forest Panelised Attributes (CFS-FPA) is used to analyze the correlation of the selected features and is effective for enhancing the efficiency of the training and testing phases.

Classifiers
The IDS proposed in this work is based on four classifiers. A brief explanation of these classifiers is presented here: Sekulić suggested Random Forest in [29]. This is a decision tree methodology that works by constructing many decision trees. It categorizes hundreds of input variables based on their importance without eliminating any one of them. RF is a set of classification trees, each of which devotes a single vote to the task of identifying the most common class in the input data. SVM and ANN, for example, have smaller parameters when RF is used instead of other machine learning algorithms. In RF, a set of tree-structured classifiers can be defined as follows: {h In this model, h denotes an RF classifier and k is a collection of identical vectors dispersed at random.
Each tree has a vote for the most renowned class at input variable x. Its utilization has an impact on the proportions and design of the tree structure. Establishing each decision-making tree is critical to RF's success.
In RF, which has a minimal calculation cost, outliers and parameters have little impact. Furthermore, compared to a single DT, overfitting is less of an issue, and the trees do not need to be pruned [30]. With a volatility of two, the variance of an average of Bagging random variables has a 1/B2 volatility. The average variance is then computed using Equation (2), and if it is more than zero, the weight Wi for each subset feature is updated (XiBest).
Here, σ 2 is a stander division, p is population, and B is a constant.

Naïve Bayes Classifier
Naïve Bayes is one of the most widely used classifiers that is based on statistical classification. It is a form of supervised ML algorithm. It is featured by surprisingly usefulness and high accuracy due to possessing several properties. It is characterized by a strong independence probabilistic classifier. In which, for a given class variable, the presence or absence of a feature is unrelated to the absence or presence of another feature. In a supervised learning setting, Naïve Bayes classifiers can be trained very efficiently depending on the precise nature of the probability model, [31]. Basically, it has two variables: Class variable (C), and a set of attributes F = {A 1 ; A 2 ; . . . ; A n }, on a dataset D which consists of instances {I 1 , I 2 , . . . , I t } and can be defined as in Equation (3), assuming the that the attributes are independent within the class as in Equation (4). Figure 4 demonstrates the structure of Naïve Bayes [32], where C is the classifier and A1, A2, . . . are the attributes.
c(E) = argmax c∈C P(c) × P(a 1 , 3 2 , . . . , a n |c) P(E|c) = P(a 1 , a 2 , . . . , a n |c) = The conditional independence assumption leads to posterior classifier is constructed easily because of the simplicity of computi simplifies computations and provides high accuracy and speed wh tabases.

Support Vector Machine (SVM)
A statistical classifier is the use of a single-class was suggested to predict the support of a high-dimensional distribution. It uses re isolate the test point of a class from the rest of the datasets after fi with a kernel. Iterative relaxation parameter methods are used to linear systems. It is also used to solve problems involving linear lea ear equations.
The classifier converts instances into a large dimensional attrib and finds the best boundary hyperplane position to break the trai the following formula [31]: where w refers to the normal vector and b refers to a bias term.
By optimizing rule f, SVM changes the hyperplane to find a li can be assigned to a test example x using this classification law. If than zero, x is classified as an intrusion; otherwise, it is classified as in Figure 5, the classification situation can be clarified by the prod classified as regular, and negative is classified as an intrusion.  The conditional independence assumption leads to posterior probabilities. The NB classifier is constructed easily because of the simplicity of computing P(C) and P(a i |c). It simplifies computations and provides high accuracy and speed when applied to large databases.

Support Vector Machine (SVM)
A statistical classifier is the use of a single-class was suggested by [33,34]. It is possible to predict the support of a high-dimensional distribution. It uses relaxation parameters to isolate the test point of a class from the rest of the datasets after first processing features with a kernel. Iterative relaxation parameter methods are used to solve massive sparse linear systems. It is also used to solve problems involving linear least-squares and nonlinear equations.
The classifier converts instances into a large dimensional attribute space (via a kernel) and finds the best boundary hyperplane position to break the training data according to the following formula [31]: where w refers to the normal vector and b refers to a bias term.
By optimizing rule f , SVM changes the hyperplane to find a linear classifier. A mark can be assigned to a test example x using this classification law. If the result of f (x) is less than zero, x is classified as an intrusion; otherwise, it is classified as natural. As presented in Figure 5, the classification situation can be clarified by the product of f (x): Positive is classified as regular, and negative is classified as an intrusion. The conditional independence assumption leads to posterior p classifier is constructed easily because of the simplicity of computing simplifies computations and provides high accuracy and speed when tabases.

Support Vector Machine (SVM)
A statistical classifier is the use of a single-class was suggested by to predict the support of a high-dimensional distribution. It uses relax isolate the test point of a class from the rest of the datasets after first with a kernel. Iterative relaxation parameter methods are used to s linear systems. It is also used to solve problems involving linear least ear equations.
The classifier converts instances into a large dimensional attribut and finds the best boundary hyperplane position to break the trainin the following formula [31]: where w refers to the normal vector and b refers to a bias term.
By optimizing rule f, SVM changes the hyperplane to find a line can be assigned to a test example x using this classification law. If th than zero, x is classified as an intrusion; otherwise, it is classified as n in Figure 5, the classification situation can be clarified by the produ classified as regular, and negative is classified as an intrusion.

K-Nearest Neighbor (KNN)
According to the distance function, the nearest neighbor classifier (NNC) assigns a class to the given test pattern that is the same as its nearest neighbor in the training set. The k-nearest neighbor classifier (k-NNC) is a generalization of NNC, where k is an integer and k = 1. The training set contains k-nearest neighbors for the given test pattern Y. Each of the k closest neighbors' class information is maintained. In most circumstances, NNC using bootstrap samples outperforms traditional k-NNC according to experiments. It is worth noting that there is no theoretical explanation for why k-NNC, which uses the bootstrapped dataset, is better [35]. Figure 6 depicts KNN. ger and k = 1. The training set contains k-nearest neighbors for the Each of the k closest neighbors' class information is maintained. In NNC using bootstrap samples outperforms traditional k-NNC acco It is worth noting that there is no theoretical explanation for why kbootstrapped dataset, is better [35]. Figure 6 depicts KNN.

Hybrid Classifier Algorithms
Hybrid ensemble learning algorithms are built during this stag ent classifiers (i.e., SVM, RF, NB, and KNN) are used to facilitate seq weight is updated for an effective performance using the principle o classifiers are modified to work as AdaBoosting to run sequentiall weight in order to achieve a high weight with less variance and b results when aggregated and applied with other modified classifier technique (Overfitting is avoided through reducing tree depth, num iables at each split and using different dataset). Therefore, these alg in this manner to obtain better results and performance with a minim 7-10 demonstrate block diagrams for SVM, RF, NB, and KNN, resp depicts the proposed Hybrid AdaBoosting and Bagging Algorithms  Figure 7, CF classifier, considers the best subset features (XiBES processing and CFS_FPA [28]. Thereafter it initializes the weight subset forest by using Equation (1).

Hybrid Classifier Algorithms
Hybrid ensemble learning algorithms are built during this stage. At first, four different classifiers (i.e., SVM, RF, NB, and KNN) are used to facilitate sequential operation. The weight is updated for an effective performance using the principle of AdaBoosting. These classifiers are modified to work as AdaBoosting to run sequentially after modifying the weight in order to achieve a high weight with less variance and bias to produce better results when aggregated and applied with other modified classifiers by using the voting technique (Overfitting is avoided through reducing tree depth, number of samples of variables at each split and using different dataset). Therefore, these algorithms are modified in this manner to obtain better results and performance with a minimal error rate. Figures 7-10 demonstrate block diagrams for SVM, RF, NB, and KNN, respectively. Algorithm 2 depicts the proposed Hybrid AdaBoosting and Bagging Algorithms (HABBAs). Figure 7, CF classifier, considers the best subset features (XiBEST) after applying preprocessing and CFS_FPA [28]. Thereafter it initializes the weight (Wi), and creates the subset forest by using Equation (1). Figure 8 depicts the block diagram of SVM Classifier. It starts with the initialization step to set the weight Wi to zero and to begin the splitting process for the training dataset using a hyperplane. Next, it uses Equation (5) explained earlier to compute the function that utilizes each of Wi, bias, and vector of training data. The modified SVM overcomes two main drawbacks of classical SVM, the first one is that it is not suitable for large databases, and the second one is that it does not perform very well when the dataset has more noise. As the support vector classifier works by putting data points, above and below the classifying hyperplane, there is no probabilistic explanation for the classification. Figure 9 depicts the process of computing the probability of each subset feature (XiBEST) and finding the maximum values to update the weight of each XiBEST.

K-Nearest Neighbor (KNN)
According to the distance function, the nearest neighbor classifier (NNC) as class to the given test pattern that is the same as its nearest neighbor in the traini The k-nearest neighbor classifier (k-NNC) is a generalization of NNC, where k is a ger and k = 1. The training set contains k-nearest neighbors for the given test pat Each of the k closest neighbors' class information is maintained. In most circums NNC using bootstrap samples outperforms traditional k-NNC according to experi It is worth noting that there is no theoretical explanation for why k-NNC, which u bootstrapped dataset, is better [35]. Figure 6 depicts KNN.

Hybrid Classifier Algorithms
Hybrid ensemble learning algorithms are built during this stage. At first, four ent classifiers (i.e., SVM, RF, NB, and KNN) are used to facilitate sequential operatio weight is updated for an effective performance using the principle of AdaBoosting classifiers are modified to work as AdaBoosting to run sequentially after modify weight in order to achieve a high weight with less variance and bias to produce results when aggregated and applied with other modified classifiers by using the technique (Overfitting is avoided through reducing tree depth, number of samples iables at each split and using different dataset). Therefore, these algorithms are mo in this manner to obtain better results and performance with a minimal error rate. F 7-10 demonstrate block diagrams for SVM, RF, NB, and KNN, respectively. Algor depicts the proposed Hybrid AdaBoosting and Bagging Algorithms (HABBAs).  Figure 7, CF classifier, considers the best subset features (XiBEST) after applyin processing and CFS_FPA [28]. Thereafter it initializes the weight (Wi), and crea subset forest by using Equation (1).  Figure 8 depicts the block diagram of SVM Classifier. It starts with the i step to set the weight Wi to zero and to begin the splitting process for the train using a hyperplane. Next, it uses Equation (5) explained earlier to compute that utilizes each of Wi, bias, and vector of training data. The modified SVM two main drawbacks of classical SVM, the first one is that it is not suitable for bases, and the second one is that it does not perform very well when the datas noise. As the support vector classifier works by putting data points, above an classifying hyperplane, there is no probabilistic explanation for the classificat Figure 9 depicts the process of computing the probability of each subset (XiBEST) and finding the maximum values to update the weight of each XiBE    Figure 8 depicts the block diagram of SVM Classifier. It starts with the i step to set the weight Wi to zero and to begin the splitting process for the train using a hyperplane. Next, it uses Equation (5) explained earlier to compute t that utilizes each of Wi, bias, and vector of training data. The modified SVM two main drawbacks of classical SVM, the first one is that it is not suitable for bases, and the second one is that it does not perform very well when the datas noise. As the support vector classifier works by putting data points, above an classifying hyperplane, there is no probabilistic explanation for the classificat Figure 9 depicts the process of computing the probability of each subset (XiBEST) and finding the maximum values to update the weight of each XiBE     Figure 8 depicts the block diagram of SVM Classifier. It starts with the initializ step to set the weight Wi to zero and to begin the splitting process for the training da using a hyperplane. Next, it uses Equation (5) explained earlier to compute the fun that utilizes each of Wi, bias, and vector of training data. The modified SVM overc two main drawbacks of classical SVM, the first one is that it is not suitable for large bases, and the second one is that it does not perform very well when the dataset has noise. As the support vector classifier works by putting data points, above and belo classifying hyperplane, there is no probabilistic explanation for the classification. Figure 9 depicts the process of computing the probability of each subset feature (XiBEST) and finding the maximum values to update the weight of each XiBEST.

Implementation
This paper aims at building IDSs with better reliability, high accuracy, low false alarm rates, and low false negative rates. CFS-FPA is a proposed method that combines both CFS and FPA. This method applies correlation between features and a target, then it distributes them into subsets using Random Forest. Finally, it uses a panelized attribute to select only features that affect the final results [28]. It enables selecting the best set of features for removing unnecessary features and increasing classification performance with HABBAs to detect intrusion. This proposed system is implemented using the CICIDS2017 dataset to test binary and multi-class forms of the confusion matrix. It is executed using laptop CORE i7, 10th generation with RAM 16. Operated by win11 and Colab platform. Several packages of Sklearn from Python 8.3 are utilized in this model, such as cross_val_score from selection, and Voting Classifier from Ensemble.

Experimental Results and Discussion
To evaluate the proposed Modified Ensemble Learning Algorithms, 70% of the dataset is used for training and 30% is used for testing. Testing is done in two stages: feature selection and ensemble algorithms. Table 3 explains the experimental results of using thirteen different numbers of feature selection along with the accuracy obtained. It is obvious from this experiment that the best accuracy of 99% is achieved when the number of features is 30. Hence it will be adopted for the remaining experiments.

Binary and Multi-Class Confusion Matrix
The experiment is carried out at this stage. HABBAs use CICIDS2017 dataset by applying a confusion matrix for each class that contains both normal and abnormal traffic and with three sets of feature selections (i.e., 13, 30, and all features). After the applied proposed CFS-FPA method and Hybrid AdaBoosting, a bagging ensemble algorithm is used to detect intrusion.
The confusion matrix is presented in binary and multi-class. At first, the proposed model is applied to each class of CICIDS2017. Tables 4-6 reveal the predicted results when applying the CICIDS2017 dataset with the three feature selection sets (i.e., 13, 30, and 78) as a binary class. Each of these tables explains the distribution of these four states: TP True Positive, FP False Positive, TN True Negative, and FN False Negative, which are used in the calculation of the evaluation measures.  The numbers of attacks and normal distribution of each class where the best is when applying 30 features. Table 7 shows the accuracy and FNR of these tables. Based on the binary confusion matrix, Table 7 shows the highest and the best accuracy obtained by the proposed system, with 30 features chosen using the proposed CFS-FPA method. The lowest accuracy occurs when applied to 13 feature selections. Similarly, it shows the highest accuracy of 99% when applied to 30 selected features and the lowest FNR of 0.0008. This system performs better compared with using 13-features since the accuracy is 87% and FNR is 0.123, and when applying 78-features the accuracy is 0.92% and FNR is 0.053. Table 8 describes the CICIDS17 confusion matrix of multi-class when applied to 30 features and Table 9 depicts Precision, Recall, and F-score for the same feature selection.    Table 9 shows that the best results for all classes are achieved when selecting the 30 features reaching 100% in Bot and Brute Force, which means that the features' number is optimal and useful to detect all types of attacks.

Time Complexity
The time complexity of the proposed algorithm using Big O notation is O(N2) [28]. This means the run-time increases polynomial when the input is increased. The complexity time when applied to the CICIDS17 dataset is presented in Figure 11. It shows that the highest runtime is 11.5 s for DDoS_ston class, while the shortest runtime is 1.1 s for brute force class. Table 9 shows that the best results for all classes are achieved when selecting the 3 features reaching 100% in Bot and Brute Force, which means that the features' number i optimal and useful to detect all types of attacks.

Time Complexity
The time complexity of the proposed algorithm using Big O notation is O(N2) [28 This means the run-time increases polynomial when the input is increased. The complex ity time when applied to the CICIDS17 dataset is presented in Figure 11. It shows that th highest runtime is 11.5 s for DDoS_ston class, while the shortest runtime is 1.1 s for brut force class.

Analysis of Results
To demonstrate the effectiveness of the proposed HABBAs, a comparison study i conducted with similar work. Table 10 reveals that the proposed HABBAs outperform a the selected algorithms for this evaluation. For example, the work done by [30] is applie to the same dataset CICIDS2017, and examined the performance using 10 and 13 feature The best accuracy of 98.4% was when using 10 features compared with 30 in this work While FAR of 13 features is lower than that of 10 features but still higher than that of ou proposed model. Nevertheless, the HABBAs achieved 99.7% accuracy with an improve ment of 1.62% and FAR as low as 0.004 [31] applied voting on four ML techniques usin three sets of features (8, 11, and 13). These techniques are K-means, SVM, DBSCAN, an

Analysis of Results
To demonstrate the effectiveness of the proposed HABBAs, a comparison study is conducted with similar work. Table 10 reveals that the proposed HABBAs outperform all the selected algorithms for this evaluation. For example, the work done by [30] is applied to the same dataset CICIDS2017, and examined the performance using 10 and 13 features. The best accuracy of 98.4% was when using 10 features compared with 30 in this work. While FAR of 13 features is lower than that of 10 features but still higher than that of our proposed model. Nevertheless, the HABBAs achieved 99.7% accuracy with an improvement of 1.62% and FAR as low as 0.004 [31] applied voting on four ML techniques using three sets of features (8, 11, and 13). These techniques are K-means, SVM, DBSCAN, and Maximization-Expectation. The best average accuracy achieved was 98% which is lower than our model by 1.73%. In the same manner, the best average detection rate is higher than the detection rate of the proposed model. Reference [32] tried three values for the number of features (38, 41, and 78) and the best accuracy achieved is for 41 features. It is 99%, which is slightly lower than that of our model. This study does not provide values for the detection rate of FAR. The accuracy of the work done by [33] is less than the proposed model by 3.4%. The FAR measure of [32] is much higher than the proposed model although the accuracy is 98.5. These results reveal that there is an actual need to have a system that combines high accuracy and low FAR. This system is achieved by the proposed model. Finally, both works of [32] and [33] do not consider the number of features. However, the accuracy of [33] is 97.5% when using the same dataset and it is lower than our model by 2.26%. From the above discussion, it is obvious that the proposed HABBAs model outperforms all the selected algorithms. This is due to the effective feature selection algorithm that obtained the best combination and most important features which influence the accuracy of the classification of network traffic. This is from one side. On the other side, the proposed voting model of the modified ML algorithms (Support Vector Machine, Random Forest, Naïve Bayes, and K-Nearest Neighbor) demonstrates its ability to accurately classify the network traffic into benign and normal.

Conclusions
Despite previous attempts to increase the efficacy of IDSs through the use of various ML methods, existing IDSs are still ineffective by some measures. With hybrid techniques based on the desired FS, we proposed a novel IDS approach for dealing with unbalanced and high-dimensional traffic with low DR. A hybrid CFS FPA strategy with a 30-feature sample and a hybrid ensemble learning method is proposed to attain the best subset in terms of function correlation. Removing non-essential features and selecting only affected features through the proposed method CFS_FPA by combining correlation feature selection and forest penalized attribute enabled the proposed system to manipulate and process the conflict that the previous work suffers from such as (FAR, FNR, DR, and accuracy); hence, the accuracy in the testing phase enhanced to 87% and FNR is 0.123. Using the CICIDS2017 dataset, the suggested model's final experimental results showed an accuracy of 99.73%, an F-measure of 99.71%, a precision of 99.82%, a DR of 99.8%, and a FAR of 0.004. Furthermore, the suggested technique outperforms the currently available classification algorithms as well as the previously proposed CFS-FPA-ensemble method. Comparisons with other strategies reveal that this methodology can give a considerable competitive advantage in the IDS industry. Hence, provide high reliability and more robustness in classifying benign traffics and identifying intrusions.
Despite CFS-FPA's advantage with ensemble techniques (HABBAs), additional work is needed to enhance its capabilities to tackle infrequent traffic problems. In the future, we are interested in testing the performance of our proposed model on other datasets such as CICIDS2018 which includes more recent types of attacks. Other types of machine learning techniques could be considered for further enhancement that considers both memory utilization and time complexity to make it more efficient for real-time detection of intrusions and attacks.