CBA-CLSVE: A Class-Level Soft-Voting Ensemble Based on the Chaos Bat Algorithm for Intrusion Detection

: Various machine-learning methods have been applied to anomaly intrusion detection. However, the Intrusion Detection System still faces challenges in improving Detection Rate and reducing False Positive Rate. In this paper, a Class-Level Soft-Voting Ensemble (CLSVE) scheme based on the Chaos Bat Algorithm (CBA), called CBA-CLSVE, is proposed for intrusion detection. The Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Decision Tree (DT) are selected as the base learners of the ensemble. The Chaos Bat Algorithm is used to generate class-level weights to create the weighted voting ensemble. A weighted fitness function considering the tradeoff between maximizing Detection Rate and minimizing False Positive Rate is proposed. In the experiments, the NSL-KDD, UNSW-NB15 and CICIDS2017 datasets are used to verify the scheme. The experimental results show that the class-level weights generated by CBA can be used to improve the combinative performance. They also show that the same ensemble performance can be achieved using about half the total number of features or fewer.


Introduction
Shortly, the Cyber Security Law, which has been in force for more than five years in China, will be revised for the first time. It is expected to increase the maximum penalty to 5% of annual revenue. Cyber security has been paid more and more attention. How to identify various attacks quickly and in real time, especially unforeseen attacks, is an inevitable problem at present. An Intrusion Detection System (IDS) can identify the existing or ongoing intrusion and has become an important research object in the field of information security.
The IDS technique is mainly composed of misuse detection and anomaly detection [1][2][3][4]. Misuse detection, also called signatures-based detection, focuses on detecting attacks using signatures. Generally, misuse detection systems consist of a set of signature databases that must be updated in time to detect the latest attack types. The advantage of misuse detection is that the False Positive Rate is low, and the detailed attack types and possible causes can be obtained. However, misuse detection lacks the ability to detect unknown attacks, has a high False Positive Rate and needs to maintain a huge signature database. Anomaly detection needs to define a normal profile. Once the network behavior deviates from the normal behavior, it is considered that an attack has occurred. Anomaly detection has strong generalization ability and can identify unknown attacks. Its disadvantage is that it has a high False Positive Rate and cannot provide the possible causes of anomalies. Although the IDS has been developed for decades, it still faces many challenges, such as lower Detection rate and high False Negative Rate and False Positive Rate [1][2][3][4].
Machine-learning methods have been widely used to improve the performance of IDS. However, each machine-learning method has its advantages and disadvantages. The No Free Lunch (NFL) theorem points out that different machine-learning algorithms have their own application scenarios and there is no optimal and universal algorithm [5]. Ensemble learning, sometimes called a multiple classifier system or committee-based learning, completes the learning task by constructing and combining multiple learners. The ensemble methods are widely used to solve various problems because of their advantages in accuracy, stability and generalization [6,7]. According to whether the types of the generated base learners are the same, the ensemble methods can be divided into homogeneous methods and heterogeneous methods. For the homogeneous method, the same learning algorithm is used to generate base learners. The Random Forest (RF) method, a wellknown ensemble learning method, belongs to this type, in which all its base learners are in the Decision Tree. In the heterogeneous method, different technologies with different performances are used to form base learners.
Ensemble learning mainly involves generation, system topology and combination. In the first period, a large number of base learners to be ensembled are trained. There are six main ways to generate accuracy and diverse base learners, including different initializations, different parameters, different architectures, different classifier models, different training sets and different feature sets [8]. In the second stage, the base learners generated above can be organized in a parallel or serial manner. In the parallel structure, the classifiers are independent. In the serial topology, base learners are applied sequentially. First, the lead base learner in the sequence has to make a decision. If the main learner fails to decide, the task is handed over to the secondary learner, and so on. Most of the ensemble models reported in the literature adopt the parallel topology. In the last stage, the results of the base learner are combined to make the final decision.
Some methods use a function to combine the outputs of all base learners. The combining method includes the weighting method, probabilistic methods, evidential reasoning-based approaches and meta-learning methods [9]. The ideal combination method should be able to use the advantages of the base learners and minimize their disadvantages. There are three output types of the base learner, including crisp labels, class rankings and soft outputs. The three types of output carry different amounts of information. Among them, the soft output contains the highest amount of information, while the crisp labeling method has the least amount of information. Liu et al. [10] believe that the combination methods can also be classified into three groups. The crisp labels can be combined by voting approaches while the class rankings can be combined by class set reduction/reordering approaches. The Bayesian rule, fuzzy integrals and evidential reasoning can be applied to merge the soft outputs [10]. However, with the increase of the data complexity, some flexible methods are needed as the combination methods of the ensemble, which can be adjusted according to the attributes of the dataset used. Some scholars indicate that the efficiency of combinatorial methods can be improved by assigning weights to the classifiers [11]. Sava et al. [12] point out that it is necessary to assign weights to different base learners according to their performance in the heterogeneous learning. Cao et al. [13] provide two weight-optimization methods under the framework of the Class-Level Soft-Voting Ensemble. They suggest that the class-specific soft-voting method can refine the weights from classifiers to classes and improve the combinative performance.
In this paper, the Class-Level Soft-Voting Ensemble (CLSVE) is used for intrusion detection. Through literature research and extensive empirical analysis, the Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Decision Tree (DT) are selected as the base learners [14][15][16][17]. On the other hand, the Chaos Bat Algorithm with Opposition-Based Learning (OBL) is used for the weight optimization of the ensemble. The Bat Algorithm (BA) has better performance than the Genetic Algorithm (GA) and Particle Swarm Optimization Algorithm (PSO) [18]. However, it still has the disadvantages of the declining population diversity and local convergence in the searching processes [19] [20]. Note that the combination of chaos and metaheuristic algorithms is a promising research field [21]. Due to its dynamic characteristics, chaotic mapping has been widely recognized in the field of optimization, which makes the Optimization Algorithm explore the search space more actively and globally. Moreover, the OBL strategy [22], which aims to boost the efficiency of the Bat Algorithm, is also used for the initialization of the population. Therefore, a new soft-voting scheme based on the Chaos Bat Algorithm with Opposition-Based Learning is used to combine the base learners of the ensemble.
In short, the main contributions of this paper are as follows: (1) An improved ensemble framework named CBA-CLSVE is proposed. The Class-Level Soft-Voting Ensemble (CLSVE) is selected for intrusion detection. (2) The Chaos Bat Algorithm (CBA) with the Opposition-Based Learning method is used to generate class-level weights to create the weighted voting ensemble. (3) The soft-voting schemes are compared with the hard-voting methods based on the same learners. We also compare the performance of different voting ensemble methods with or without feature selection. It is verified that the Class-Level Soft-Voting method based on the CBA can be used to improve the ensemble performance. It also shows that the same performance of the ensemble can be obtained with half the total number of features or fewer.
The organization of the paper is as follows. The related work is provided in Section 2. Section 3 presents the background including the hard-voting and soft-voting method, the base learners and the basic Bat Algorithm. The proposed CBA-CLSVE framework for intrusion detection is described in Section 4. The experimental results are introduced in Section 5. The paper is concluded in Section 6.

Related Work
A large number of machine-learning algorithms, including shallow or deep learning, are applied to intrusion detection. Gu et al. [23] proposed a hybrid model containing SVM and Naïve Bayes (NB). Naïve Bayes was used to generate high-quality data and then sent to the SVM classifier. Liu et al. [24] applied the KNN and the Arithmetic Optimization Algorithm (AOA) to the intrusion detection. The AOA was used to optimize the relevant parameters of KNN. Kan et al. [25] designed an intrusion-detection model based on the Convolutional Neural Network (CNN) and PSO for the Internet of Things (IoT). Sahu et al. [26] introduced an attack-detection mechanism combining CNN and LSTM to detect infected devices in the Internet of Things.
There are also many intrusion-detection mechanisms based on ensemble learning. Amini et al. [27] proposed a new ensemble method using the Radial Basis Function (RBF) neural networks. To increase the diversity of classifiers, classifiers were trained using different subsets that were divided by the Fuzzy C-Means (FCM) method. The membership grade generated by the fuzzy clustering technology was used as the classifier weight to combine multiple classifiers. Experimental results showed that the proposed method has a better detection effect for small classes of samples compared with simple voting and weighted majority voting. Gu et al. [28] proposed an intrusion-detection framework based on an SVM ensemble. The authors used the ratio transformations technique to improve the quality of training data. The FCM method was also used to enhance the diversity of SVM classifiers. Finally, the authors used a nonlinear combination method to aggregate these SVM classifiers. Yang et al. [29] proposed a Gradient Boosting Decision Tree (GBDT) parallel quadratic ensemble learning method for intrusion detection. Firstly, the traditional ensemble learning method GBDT was used to make the prediction based on the spatial feature, and the temporal method Bidirectional Gated Recurrent Unit (Bi-GRU) was employed to capture the temporal information. Then, the GBDT model and Bi-GRU model were combined to form a quadratic ensemble. Euh et al. [30] introduced tree-based ensemble learning models such as AdaBoost, XGBoost, RF, extra trees and rotation trees for malware detection. Gao et al. [31] introduced an adaptive ensemble method for intrusion detection. The base classifiers, including DT, RF, KNN, and deep neural network, were combined using the voting method with class weights. The weights can be obtained by calculating the training accuracy of each algorithm for different attack types.
The ensemble learning, especially the weighted voting scheme, can also be applied to other fields, such as bankruptcy prediction, text sentiment classification, Named Entity Recognition (NER), etc. Zelenkov et al. [32] proposed an ensemble method for bankruptcy prediction in which GA was used to select relevant features and classifier weights. In the ensemble, the most extensive models, including Logistic Regression (LR), KNN, SVM, NB, and DT, were used. Zelenkov et al. [33] proposed an ensemble model for bankruptcy prediction in which different classifiers were combined by the soft-voting rule. The ensemble model can minimize the False Positive Rate (FPR) and False Negative Rate (FNR) simultaneously. Onan et al. [34] proposed a static classifier-selection ensemble for text sentiment classification in which five different classifiers were combined. A multi-objective Differential Evolution Algorithm was used to generate class-level weights. Saleena et al. [35] introduced a weighted ensemble approach for tweet sentiment analysis in which the NB, RF, SVM and LR were used as the basic classifiers. The weight of each classifier is obtained by its training accuracy. Saha et al. [36] proposed a classifier ensemble scheme using the weighted voting based on GA for NER. The experiments showed that instead of completely eliminating some classifiers, it is better to quantify the voting amount for each class in each classifier. Ekbal et al. [37] designed a classifier ensemble technology-based on the Maximum Entropy (ME), Conditional Random Field (CRF) and SVM. The multiobjective simulated annealing technique was used to determine the appropriate votes for each class per classifier in NER.
In summary, in most weighted voting ensembles, the classifier-level weight is usually used instead of class-level weight. The weight can be obtained according to the training result or using a metaheuristics method such as the Genetic Algorithm. To our knowledge, the method of using an improved Bat Algorithm to generate the class-level weight has not been reported. This paper explains the research conducted around this technique.

The Hard-Voting and Soft-Voting Method
The voting method is the most widely used and important classifier-fusion method. Suppose that the combined classifier consists of T single classifiers hi (opposite = 1, 2, ..., T). The learner hi will predict the sample label from the label set {c1, c2, ..., cN}. The fusion method needs to combine the decision results of T classifiers and output the final label. Suppose that the output of classifier hi for the input sample x is a Q-dimensional vector represents the output of classifier hi that belong to the jth class. The value of h j i(x) has different types, such as the binary label and the probability label.
The voting method using the binary label is also called hard voting. The binary labels can be combined through the majority voting or the weighted voting method [13].

•
The majority voting method can be described as follows: The majority voting method is suitable for all base learners with the same performance. However, in practical applications, the performance of the base learners is different, which requires different weights. Therefore, weighted voting came into being.
indicates the weight of classifier hi and a large weight means stronger classification performance.
If the output of a classifier is the class posterior probability, the base classifiers can be combined by the soft-voting method. There are two appropriate types of weight coefficients in the soft-voting method, including the classifier level and the class level [13].

•
The classifier-level weight represents the weight for each base classifier. The final output can be described as follows: The class-level weight represents the weight of each base classifier for an output class. This method considers that the classifier has different prediction performance with different output classes. The final output can be described as follows: represents the weight of class j for classifier hi.

Support Vector Machine
The SVM has good generalization performance and computational efficiency based on the principle of structural risk minimization [38]. The training samples can be expressed as {(x1, y1),(x2, y2),…,(xQ, yQ)}, xi∈R D , yi∈{−1, +1}. The SVM separates two different classes of samples by selecting an optimized hyperplane to maximize the difference between the two classes, which can be described as the following optimization problem: where w is the vector perpendicular to the hyperplane, C is the penalty factor, i ξ is a sparse variable.

K-Nearest Neighbor
The KNN is a simple and effective supervised classification technology, which can directly solve the multi-classification problem [24]. The KNN believes that birds of a feather flock together, and the test sample class is consistent with that of its nearest neighbors. The Euclidean distance is often used to evaluate the sample distance, expressed as:

Decision Tree
The DT models have many techniques such as ID3, C4.5, CART, etc. Their difference lies in the different standards for calculating features, such as information entropy, information gain, information gain rate, Gini coefficient, etc. [39]. The CART tree takes the Gini coefficient as an index. Assuming that there are n classes and the probability that the sample belongs to class i is pi, the Gini coefficient is defined as:

The Basic Bat Algorithm
The BA is a new Swarm Intelligence Optimization Algorithm that simulates the foraging behavior of bats [18]. Due to its strong robustness, simple parameter settings and better global search, it is applied to various optimization problems.
Suppose that bats use echolocation to sense the distance between themselves and the target and can distinguish the target from the background obstacles. The i-th bat flies at the spatial position Xi at the speed Vi, and conducts a target search with varying frequency f and loudness A. The bat adjusts the pulse emissivity r according to the distance between itself and its prey. The frequency f, speed V and position X of bats are updated according to the following formula, where fmin and fmax denote the frequency minimum and maximum, respectively, Vi t and Xi t represent the speed and position of the i-th bat at time t, and X * is the global optimal position.
The frequency f, pulse emission rate R and loudness A change as follows: where γ are the specified coefficients.

The Chaos Bat Algorithm with Opposition-Based Learning (CBA)
To improve the global convergence of the population, a Chaos Bat Algorithm based on the sinusoidal chaotic map is proposed. The chaos is an aperiodic phenomenon that is the characteristic of nonlinear systems [40]. Here, the sinusoidal map is used, described as follows [41]: k k x ax x π = (12) where xk is a chaotic variable, k represents the number of the iteration. When a = 2.3 and x0 = 0.7, it can be expressed as Since the range of chaotic variables and random variables to be replaced is the same, they are all between 0 and 1. There is no need to conduct interval mapping in this paper.
Chaos initialization: For metaheuristic Optimization Algorithms, the initial solution of the population is often generated randomly. The pseudo random numbers, computergenerated numbers, are not real random numbers. However, the quality of initialization will directly affect the convergence speed and accuracy of the algorithm. In this paper, the chaos initialization and the OBL strategy are combined to improve the quality of the initialization solution. The sinusoidal map is used for the initialization of the population.
The opposite value for the real value ∈ [ , ] can be calculated as follows: x = u + l -x (14) where indicates the opposite position of the bat's actual position.
The pulse emission rate based on chaos: In the original BA, the pulse emission r decreases monotonically as the iteration goes on. The literature shows that better results can be obtained when r changes disorderly [40,41]. In this paper, the sinusoidal chaotic sequences were used to tune r.

The CBA-CLSVE Framework for Intrusion Detection
The CBA-CLSVE framework for intrusion detection is divided into three stages. In the first stage, the chi-square index is used for feature selection, that is, to find the optimal subset and delete irrelevant and redundant features. In the second stage, the dataset is divided into training dataset and testing dataset based on the feature subset. The SVM, KNN and DT methods are selected as the base classifiers for the ensemble model, and the Class-Level Soft-Voting strategy is used to integrate the above base classifiers. The classlevel weight of each classifier is determined by the Chaos Bat Algorithm. In the third stage, the class-level weights are then used in the soft-voting ensemble to obtain the final evaluation result of the ensemble for each output class.
The flowchart of the CBA-CLSVE for intrusion detection is shown in Figure 1.

Individual Representation
Assuming that there are M classifiers and N output classes, the individual length should be M × N. The weight value represents the voting power of the classifier for each class and the real number coding method is adopted. As shown in Figure 2, it is a string representation for a problem with three classifiers and three classes. The length of the string is 9. According to the given weight value, Classifier 3 has the highest weighted voting power in the final decision of the first output class and the second output class, and Classifier 1 has the highest weighted voting power in the final decision of the third output class.

Fitness Function Definition
The Detection Rate (DR) and False Positive Rate (FPR), also called the False Alarm Rate (FAR), are the most commonly used indicators to evaluate the performance of IDSs. The optimization problem proposed in this paper is a multi-objective optimization problem that involves the two measures mentioned above. Multi-objective optimization can be transformed into single-objective optimization by designing a weighted fitness function. An ideal intrusion-detection model will have a higher Detection Rate and a lower False Positive Rate. Hence, a weighted fitness function Fit, shown below, can be proposed: where w1 and w2 represents the weights for the Detection Rate and the False Positive Rate. A higher fitness Fit means better intrusion-detection performance. The Class-Level Soft-Voting Ensemble is shown in Algorithm 1. The proposed Class-Level Soft-Voting Ensemble scheme based on the Chaos Bat Algorithm for intrusion detection can be described in Algorithm 2.

get the confusion matrix = confusion_matrix(train_label, predict_label) or other indicators
Algorithm 2: CBA-CLSVE. Input: the parameters used for the base learners and the BA, training and testing dataset with labels Output: the final result of the proposed model. 1. The SelectKBest function is used to select appropriated features 2. Training: 3. Initialize the bat's population using the chaos and opposition-based learning strategy 4. Compute the initial fitness of each bat to find the best search agent X * 5. Initialize the value of the chaotic map x0 randomly 6. for each iteration do 7. Update the chaotic variable using the sinusoidal map 8. for each individual do 9. Update the positions and other important metrics of the individual 10. Update the fitness = ensemble_model (train_data, train_label, the position) 11. Find the best individual based on their fitness values 12. end for 13. end for 14. return the optimal position, that is, the class-level weights to create the ensemble 15. Testing: 16. testing accuracy, confuse matrix = ensemble_model (test_data, test_label, the optimal weight)

The Experiments
The evaluation, experimental environment, specific experimental process and results analysis will be introduced in detail below.

Evaluation
There are four indicators to describe intrusion-detection results as follows: True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN). TP indicates that the attacks are correctly judged, TN denotes that the normal behavior is predicted as normal, FP represents that the normal behavior is judged as attacks, and FN represents that the attacks are wrongly predicted as normal. In this paper, the Accuracy (Acc), Detection Rate (DR), False Positive Rate (FPR) and F1 measure are used to evaluate the performance of different methods, which are defined as follows [

Experimental Descriptions
In this paper, NSL-KDD, UNSW-NB15 and CICIDS2017 datasets are used to evaluate the algorithm proposed.

•
KDD99 is the most famous and widely used dataset in the intrusion-detection field. However, it has been more than 20 years and does not cover various known attacks. More importantly, it has a large number of duplicate records, which will lead to algorithm deviation. NSL-KDD [43], a revised version of KDD99 that overcomes some native shortcomings of KDD99, is proposed. Like KDD99, each TCP connection record is represented by 42 items, with 41 attribute characteristics and 1 type identifier in NSL-KDD.

•
The UNSW-NB15 dataset [44] is generated by the Australian Security Laboratory using the IXIA PerfectStrom tool. The real normal network traffic and artificial attack traffic in modern networks are combined in the dataset. The dataset contains 9 types of attacks and 43 features in total. The first feature represents the sample number and can be deleted, so there are 42 available features.

•
The CICIDS2017 dataset [45] is provided by the Canadian Institute of Network Security. It contains the real normal and attack traffic and is more suitable for simulating the existing network environment. In the CICIDS2017, the CICFlowMeter is used to extract more than 80 network flow features, including 6 basic features and more than 70 functional features.
It is worth noting that the attack classes for the KDD99 and CICIDS2017 datasets include the main attack class and the sub-attack class. According to the literature, the main attack class is used in this paper. The number of features and the attack classes contained in the three datasets are shown in Table 1. The described experiments were carried out on an Intel ® Core™ i5 CPU @ 1.70 GHz computer with 4.00 GB RAM running Windows10 Enterprise Edition (64 bit). The proposed soft-voting ensemble scheme was implemented using Python. The machine-learning classifiers used in this paper were implemented by calling the scikit-learn class library [46]. The PyCharm community edition 4.0.6 (Windows version) was used as the programming tool, and the program compiler environment was Python 3.7.0 (Windows version).
Data preprocessing, including data cleaning, data normalization, feature selection, and data segmentation, was performed. The train_test_split () method in the scikit-learn was used to perform the dataset segmentation. The randomly selected data was divided into training set and testing set according to the ratio of 1:1. The SelectKBest function in the scikit-learn library was used to select features. The importance of features was evaluated by the chi-square value.

Experimental Results
To eliminate the influence of random error, the 10-fold cross-validation method was used. The parameters of the Chaos Bat Algorithm were selected empirically; the population size and the number of iteration are 25 and 100, respectively. The other important parameters of the methods used were all selected empirically and are shown in Table 2.

Methods
Parameters Support Vector Machine C = 1, kernel = 'rbf', gamma = 'auto' K-Nearest Neighbors n_neighbors = 3 Decision Tree criterion = 'gini', random_state = 0, max_depth = 20, min_samples_leaf = 20, Chaos Bat Algorithm A = 0.5, fmin = 0, fmax = 2, r is obtained by chaotic mapping The scores_function of sklearn. SelectKBest was used to obtain the scores of different features. The scores of features in the NSL-KDD, UNSW-NB15 and CICIDS2017 datasets are shown in Figure 3. The serial number of the features is in accordance with the abscissa, the ordinate score for the curve for result analysis. As can be seen in Figure 3, these features are ranked into a specified range: [0, 30,000], [0, 20,000] and [0, 20,000], respectively. The features with the highest scores are 39, 34 and 11 in the three datasets. In each dataset, almost half of the features have lower scores and some features with a 0 score also exist, especially in the CICIDS2017 dataset. The proposed model is trained based on different numbers of features. After the training, the appropriate ensemble weights were obtained. Finally, different numbers of features (2, 3, 5, 10, 15, 20, ...) were fed into the proposed method, and the three classification accuracy curves were obtained, as shown in Figure 4. The orange curve, blue curve and green curve represent the test accuracy based on NSL-KDD, NUSW-NB15 and CI-CIDS2017 datasets, respectively. The change trends of the three accuracy curves are similar. As the number of selected features increases, the testing accuracy increases rapidly at the beginning, and finally shows a stable trend. It is observed that the maximum accuracy is achieved when the number of features is about 20, 19 and 25 in the three datasets.  Table 3. The quantity of selected features is derived from Figure 4. Table 3. The selected features based on the chi-square filtering technology.

Dataset
The Selected Features Quantity F12, F14, F22, F23, F25, F26, F27, F28, F29, F30, F31, F33, F34, F35, F36, F37 Next, the hard-voting method was compared with the soft-voting model on the three datasets. The hard-voting model includes the majority voting and the weighted voting methods. The soft-voting includes the no weight, classifier-level weight and class-level weight methods. For the above five methods, the weight only exists in the weighted voting method, the classifier-level weight method and class-level weight method. The majority voting and the no weight methods do not use the weight. It is worth noting that the weights of the three weight voting methods mentioned above are all obtained by the Chaos Bat Algorithm (CBA) with the Opposition-Based Learning method. As shown in Section 4.2.1, the individual in the Bat Algorithm represents the weight. Since there are 3 classifiers and 5, 10 and 7 output categories on the three datasets, the length of the weight is the same as the individual length, which is 15, 30 and 21, respectively, in the NSL-KDD, UNSW-NB15 and CICIDS2017 datasets. The length of the weight of the weighted voting and the classifier-level weight methods is the same as the classifiers used in the ensemble. The weight comparison of different voting ensemble methods is shown in Table 4.    Tables 5-10 clearly show that the voting ensemble methods have better and more stable predictive performance than the base learners. The KNN performs best on the NSL-KDD and CICIDS2017 datasets, while it performs worst on the UNSW-NB15 dataset. SVM has a high False Positive Rate on the CICIDS2017. The performance of DT is stable, but it is still inferior to the ensemble learning method. Tables 5-10 show that the class-level weight method obtained better performance in Detection Rate, FPR, acc and F1 compared to other ensemble methods. By comparing the results presented in Tables 8-10 to the ones  in Tables 5-7, it can be seen that feature selection has positive significance because it can improve the efficiency of the models. Even if half of the features or more are omitted, these models can still maintain considerable performance.  Moreover, to verify the search and convergence capabilities of the CBA, the proposed weight-optimization scheme will be compared with the basic BA and GA. The GA is implemented by calling the scikit-opt library [47]. There are many important parameters to be specified in GA. In particular, the n_dim represents the number of variables in the problem to be optimized, which is the same as the individual length. The results of the performance comparisons of the CLSVE based on different Optimization Algorithms on the three datasets are shown in Table 11. It should be noted that feature selection is performed. Table 11 shows that the performance of the proposed CBA-CLSVE is better than the BAbased CLSVE and GA-based CLSVE.    Finally, the proposed method was compared with other models, such as XGBoost [48], etc., for intrusion detection on the UNSW-NB15 dataset. The comparison includes the feature selection methods, the classification method, the number of features used, DR, FPR, ACC, and F1 for intrusion detection. Table 12 shows that CBA-CLSVE outperforms other models with few features.

Conclusions
CBA-CLSVE is a Class-Level Soft-Voting Ensemble for intrusion detection. In ensemble learning, Support Vector Machine, K-Nearest Neighbor, and Decision Tree are combined to construct an ensemble model with better generalizability, robustness and prediction performance. To improve the global convergence of the population, a CBA based on a tent chaotic map is proposed to generate the class-level weight. A weighted fitness function combining the Detection Rate and the False Positive Rate is defined in the CBA. The hard-voting and soft-voting methods are used to combine results from different base learners. The CBA-based voting methods with weights (whether hard voting or soft voting) are designed. The performance of different voting ensemble methods with or without feature selection are compared. Finally, to compare with the CBA-based method, the BA and the GA approaches are also used to combine the expert opinions. The approaches mentioned above were empirically compared using the NSL-KDD, UNSW-NB15 and CI-CIDS2017 datasets. The experimental results show that the class-level weights optimized by the CBA can be used to improve the combinative performance and the same performance of the ensemble can be obtained with half of the total number of features or fewer. It also shows that the CBA method has better results in searching performance and efficiency than the basic BA and GA methods. In this paper, the heuristic algorithm is used to integrate the base learners, which can improve the ensemble performance. Therefore, our future work will involve improving current heuristic algorithms or proposing new intelligent heuristic algorithms. In addition, studying other methods for integrating the base learners will be the direction of our efforts.