1. Introduction
With the exponential growth and evolution of IT and Internet applications [
1] in the past few years, there has been an unprecedented expansion in the magnitude and dimensions of data in a range of disciplines [
2]. It has brought challenges in the realms of data mining and machine learning.
Extensive and intricate data encompass a significant volume of valuable information, as well as redundant and irrelevant information [
3], which increases computational and storage costs [
4,
5], reduces classification performance and efficiency [
6], and decreases data performance [
7]. This calls for a solution that can reduce the dimensions of the original data. Dimensionality reduction is an effective method that can reduce data dimensions and computational complexity, reduce storage space, and establish a more generalized model [
8].
FS, one of the most critical strategies for dimensionality reduction, effectively reduces storage and computing costs by eliminating irrelevant and repetitive features, while preserving the physical significance of the initial characteristics and endowing the model with better legibility and intelligibility [
9,
10,
11]. The FS technique plays an essential part in the preparation of data for subsequent tasks (e.g., classification) by analyzing the most relevant features [
12,
13]. It has been extensively utilized in various areas, including image processing [
14,
15], text mining [
16], social and behavioral science [
17], biomedical research [
18,
19,
20], fault diagnosis [
21], and so on.
According to evaluation criteria, typically, FS methods can be divided into three different groups: filter, wrapper, and embedded models [
22]. The filter chooses features based on their correlation with latent variables without the influence of any specific learning algorithm [
23]. The wrapper evaluates each feature subset using training models such as KNN and SVM to acquire the best subset [
24]. The embedded approach incorporates the process of FS within the training model and uses a specific structure to guide feature selection. Wrappers are frequently employed for addressing issues related to FS because they are better than filters in terms of the precision of classification and have a wider range of applications than embedded methods [
25].
The wrapper method repeatedly searches for feature subsets and evaluates the selected features until it reaches the stopping criterion [
7]. In wrapper-based FS methods, feature subsets’ quality is evaluated using support vector machine (SVM), decision tree (DT), artificial neural network (ANN), and k-nearest neighbor (KNN) as fitness functions searching for feature subsets is the NP-hard problem [
26]. The simplest approach to searching is to examine every potential combination of features, a process referred to as the exponential algorithm [
27], but its computational cost is extremely high and it is practically impossible to apply [
28]. In order to reduce computational costs, sequential algorithms have been suggested, which select or delete features in the design order [
29,
30]. However, once a certain feature is selected or deleted, it cannot be manipulated again, which leads to a local optimum. Over the past few years, random search algorithms have gained more attention for their advantage of exploring the randomness of feature space and effectively preventing the algorithm from getting stuck in a suboptimal solution within a specific region. Metaheuristic, the most promising solution in random search algorithms, has become one of the suitable solutions for FS owing to its excellent performance demonstrated in various optimization scenarios [
31].
There are four main sources of inspiration for metaheuristics: evolutionary-based, physics-based, human-based, and swarm intelligence-based FS algorithms [
32]. SI algorithms have been shown to be competitive with the other three algorithms mentioned above due to fewer parameters, faster convergence speed, better equilibrium between exploring and exploiting, and better performance [
33,
34]. The widely recognized representative algorithms of SI algorithms are particle swarm optimization (PSO) [
35], salp swarm algorithm (SSA) [
36], whale optimization algorithm (WOA) [
37], etc.
Although swarm intelligence algorithms are effective in solving FS problems, they still face problems such as stagnation of local minimum, premature convergence, unbalanced exploration and exploitation, and low population diversity. To enhance the efficiency of algorithms based on SI in FS, there is a need to seek an algorithm that can deal with the above challenges.
The walrus optimizer (WO) algorithm, introduced by Han et al., is a new SI algorithm that takes inspiration from the conduct of walrus groups. WO can be a desirable option due to its capacity for adaptation, minimal parameters, and powerful mechanisms for balancing exploration and exploitation. The effectiveness of the WO algorithm in addressing continuous issues has been demonstrated through experimental research [
38,
39,
40]. However, the WO algorithm still suffers from low population diversity, slow convergence, and inability to fully utilize the problem domain. In addition, the traditional WO algorithm was created to deal with continuous problems, and there has been no research or design directed at using WO for FS. This situation motivated us to enhance the original WO algorithm and design its binary version for FS tasks.
This paper aims to investigate an optimization algorithm that can improve population diversity and overcome local optimal stagnation while providing high optimization performance. In this research, initially, a refined WO algorithm, BGEPWO, is introduced, which uses an ICMIC chaotic map to initialize the population, introduces an adaptive operator to update the safety signal, and proposes a population regeneration mechanism to eliminate old and weak individuals while generating promising new ones. In addition, the EOBL strategy and golden sine strategy are used, where EOBL is used to update the escape direction of the walrus, and golden sine is used to perturb the population. The proposed algorithm improves population diversity through various mechanisms, continuously balances exploration and exploitation during the optimization process, avoids falling into local optima, and accelerates convergence speed, thereby improving the performance of the algorithm. Then, a binary version BGEPWO algorithm was developed. 21 datasets from the UCI and ASU have been chosen to assess the effectiveness of BGEPWO, and conducted a comparison with BWO, and other 10 metaheuristic algorithms, including binary artificial bee colony algorithm (BABC) [
41], binary particle swarm optimization algorithm (BPSO), binary bat algorithm (BBA) [
42], binary whale optimization algorithm (BWOA), binary Kepler optimization algorithm (BKOA) [
43], binary salp swarm algorithm (BSSA), binary Nutcracker optimizer algorithm (BNOA) [
44], binary Harris hawks optimization (BHHO) [
45], binary crested porcupine optimizer (BCPO) [
46], and binary coati optimization algorithm (BCOA) [
47].
The experimental findings indicate that the improved BGEPWO significantly enhances the ability of the WO algorithm to solve FS problems. In most cases, the BGEPWO outperforms the WO algorithm and competing algorithms in terms of fitness value, number of selected features, and F1-score. In addition, the use of a 5% Wilcoxon rank-sum test validated that BGEPWO performs significantly better than competitive algorithms on most datasets.
The main contributions of this research can be succinctly delineated as follows:
Initializing the population using ICMIC chaotic mapping instead of the random way in the original algorithm is able to improve variety in the population and prevent early convergence.
BGEPWO employs a new adaptive safety signal, enhancing the algorithm’s stability and promoting its convergence through the introduction of the adaptive operator.
The population regeneration mechanism is adopted to improve the development competence by eliminating old and weak individuals and generating promising new ones, so as to facilitate the ongoing movement of the population towards the best solution and accelerate the convergence process.
The EOBL strategy is employed in the escape behavior of walruses, which enables the method to flee from the current local optimum bottleneck while expanding the search range and improving the diversity.
This proposed method employs the golden sine strategy to perturb the walrus population in the advanced phase of each iteration, enabling the method to explore the search area more thoroughly during the iteration process, enhancing the algorithm’s capacity for exploration, effectively solving the issue of settling into local traps and thus accelerating convergence speed.
The remaining sections of this paper are structured in the following manner:
Section 2 presents the relevant studies regarding the utilization of metaheuristic algorithms in the field of FS. The WO algorithm is presented in
Section 3. It offers a detailed introduction of the proposed BGEPWO in
Section 4.
Section 5 describes the experimental design and result analysis.
Section 6 provides a summary of the paper and offers potential directions for future research.
2. Related Works
Feature selection methods typically search for feature subsets within the solution domain, which is an NP-hard problem. Metaheuristic algorithms provide an effective approach for addressing intricate optimization and NP-hard problems, enabling the discovery of acceptable solutions within a reasonable timeframe [
26]. Metaheuristic algorithms can be divided into evolutionary-based, physics-based, human-based, and swarm intelligence-based feature selection algorithms.
In order to implement feature selection using these algorithms, continuous search space is usually mapped to feature space by means of transfer functions, logical operators, etc. [
48]. Over recent years, an increasing pattern has been observed in the utilization of metaheuristic algorithms by researchers to address the process of selecting features in various fields. The related work is elaborated in this section.
Evolutionary algorithms take inspiration from the mechanism of natural progression and use operations on the best solution to create new individuals. Common evolutionary-based algorithms encompass genetic algorithms (GAs) [
49], differential evolution (DE) [
50], etc. To tackle the challenges of avoiding local pitfalls and reducing computational expenses, Tarkhaneh et al. [
51] proposed an improved MDEFS method using two novel mutation methodologies and demonstrated its superiority. Maleki et al. [
52] combined a classic genetic algorithm with a KNN classifier to effectively downscale the dimensionality of patient disease datasets and enhance the precision of disease identification. FSGAS is a GA-based FS method studied by Berrhail et al. [
53] for identifying the crucial and pertinent features of compounds in ligand-based virtual screening (LBVS), which effectively improves screening performance.
Methods derived from physics are derived from physical rules in the universe [
54], mainly including simulated annealing (SA) [
55], gravitational search algorithm (GSA) [
56], and so on. On the basis of extracting candidate lesions features of diabetes retinopathy (DR), Sreng et al. [
57] used hybrid simulated annealing (SA) for feature selection to enable automated DR screening. Albashish et al. [
58] combined the proposed model based on binary biogeography optimization (BBO) with SVM to achieve better accuracy than the BBO methods and other extant algorithms. Taradeh et al. [
59] added evolutionary crossover and mutation operators to the GSA to implement FS. Comparison experiments with GA, PSO, and GWO using KNN and DT as classifiers unfolded on the UCI datasets have demonstrated its superiority in dealing with FS problems. To improve the efficiency of virtual screening (VS) in drug discovery campaigns, Mostafa et al. [
60] recommended an FS framework consisting of a gradient-based optimizer (GBO) and KNN. The effectiveness of the suggested research on the high-dimensional dataset and the low-dimensional dataset is validated on real-world benchmark datasets. Dong et al. [
19] improved the dandelion algorithm (DA) using the sine cosine operator, restart strategy, and quick bit mutation. The suggested SCRBDA algorithm was evaluated against eight other classical FS algorithms on the UCI datasets. The outcomes demonstrated the excellent feature reduction ability and performance of SCRBDA.
Algorithms based on human behavior take inspiration from the actions and patterns of humans, in which each person has a way of influencing the behavior of the group. Some common human-based algorithms are the teaching–learning-based optimization (TLBO) algorithm [
61], brainstorm optimization (BSO) [
62], etc. For the purpose of augmenting the exploratory prowess in BSO, Oliva et al. [
63] improved it using chaotic mapping, opposition-based learning, and disruption operators. The new algorithm effectively improved the efficiency of FS by enhancing the population’s variety. Manonmani et al. [
64] utilized the enhanced TLBO for the classification and projection of chronic kidney disease (CKD), reducing the quantity of features needed for diagnosing CKD. To increase the variety in the search procedure and handle the duality of FS problems, Awadallah et al. [
65] developed the BJAM algorithm using adaptive mutation rate and sinusoidal transfer function and validated the suggested algorithm’s efficiency by employing KNN as a classifier on 22 datasets. Based on the gaining–sharing knowledge-based optimization algorithm (GSK), Agrawal et al. [
66] proposed a binary version named FS-NBGSK and demonstrated its excellent performance in terms of accuracy, convergence, and robustness on the benchmark datasets using the KNN classifier. In the study of Xu et al. [
67], Six distinct transfer functions were employed to produce six binary editions of the arithmetic optimization algorithm (AOA), and then the integrated transfer function and Lévy flight were employed to optimize the search performance, and the BAOA_S1LF algorithm demonstrated its superiority among the six methods tested on the UCI datasets.
The algorithm that makes use of swarm intelligence (SI) is inspired by the combined actions of a group of animals residing in communities [
68]. In swarm intelligence algorithms, individuals share their exploration of the search domain, making the whole group continuously move towards a better solution and ultimately approach the optimal solution [
69].
In order to identify effective features in chemical compound datasets, Houssein et al. [
70] combined Harris hawk optimization with SVM and KNN to formulate two classification algorithms, HHO-SVM and HHO-KNN. Experiments on chemical datasets of monoamine oxidase and QSAR biodegradation showed that the former method achieved the highest optimization feature ability among a group of competing algorithms.
In the newly proffered multi-population-based PSO method, Kılıç et al. [
71] utilized stochastic and relief-based methods for initialization and used two populations for simultaneous search. Their experimental results, based on 29 datasets, demonstrate that the mean accuracy in classifying of MPPSO consistently surpassed that of other algorithms.
Wang et al. [
72] suggested a hybrid sine–cosine chimp optimization algorithm (ChOA), which utilizes a multiloop iteration approach to enhance the integration of the sine–cosine algorithm (SCA) and ChOA and performs binary transformation through an S-shaped transfer function. Experiments using the KNN classifier on 16 datasets have shown that this approach demonstrates outstanding efficacy in mitigating FS challenges, surpassing other algorithms in performance.
In an effort to minimize redundant characteristics and enhance classification accuracy, Shen et al. [
73] introduced a refined fireworks algorithm (FA) by designing an innovative fitness evaluation method and a fitness-based roulette wheel selection strategy, while introducing a differential mutation operator. The effectiveness of this strategy and the importance of joint optimization were validated through experiments on 14 UCI datasets.
Tubishat et al. [
74] proposed the IWOA algorithm by integrating elite opposition-based learning (EOBL) and evolutionary operators with the whale optimization algorithm (WOA). Sentiment analysis was performed on four Arabic benchmark datasets using SVM classifiers. The experimental findings evidenced the superior efficacy of the novel algorithm in comparison with the alternative algorithm on the indicators of precision in categorization and minimization of feature subset size to the greatest extent possible.
For the purpose of analyzing and addressing FS issues in biological data, Seyyedabbasi et al. [
75] recommended the binary version of the sand cat swarm optimization (SCSO) algorithm, bSCSO. The evaluation on the dataset established the superiority of the bSCSO algorithm in high predictive performance and small feature size.
In the newly proposed adaptive ranking moth-flame optimization (ARMFO) algorithm, Yu et al. [
76] used ranking probability, adaptive chaotic mutation, and greedy selection to improve the search capability of the algorithm, then achieved satisfactory outcomes on the UCI datasets.
However, the metaheuristic algorithms described above have some limitations when dealing with the feature selection problem. First, the algorithms used in some studies have more parameters, and the selection of parameters has a significant impact on the performance as well as computational cost of the algorithms. Second, some algorithms have an imbalance between exploration and exploitation in the search process, which leads to a decrease in search performance and convergence depth. Third, many studies only focus on a single problem, and the generalized performance of the algorithms fails to be verified. Finally, some studies have limitations in the dimensionality of the dataset chosen for feature selection, and there is no extensive discussion on the data dimensions from low to high. To address the above issues, this paper developed the BGEPWO algorithm for feature selection on datasets of different dimensions, aiming to improve the algorithm’s performance, computational efficiency, robustness, diversity of solutions, and balance of search strategies.
4. The Proposed Approach
This part introduces a refined binary walrus optimization algorithm BGEPWO, which utilizes population regeneration mechanism, EOBL strategy, and golden sine perturbation. Firstly, the population is initialized using ICMIC chaotic mapping, replacing the random initialization in the original algorithm. Secondly, the BGEPWO algorithm introduces an adaptive operator to obtain a new adaptive safety signal. Thirdly, a population regeneration mechanism has been added to generate promising new individuals while eliminating old and weak ones. Fourthly, the EOBL strategy is adopted in the escape behavior of walruses to guide their escape direction. Fifthly, BGEPWO uses the golden sine strategy to perturb the population at the late stage of each iteration. Finally, BGEPWO, for FS is provided in a binary format.
4.1. ICMIC Chaotic Mapping
In heuristic algorithms, the distribution of the population at its initial position can affect the precision and efficiency of global optimization, while in WO algorithms, the initial population is obtained through random initialization. This random initialization reduces the diversity of solutions, making the algorithm prone to stuck in local optimal traps. It has been experimentally proved that population initialization using chaotic mapping can impact the entire algorithmic process, and typically produce superior outcomes compared with random initialization [
77].
ICMIC has a higher Lyapunov exponent; therefore, it exhibits more pronounced chaotic features compared with other chaotic maps [
78].
In this study, ICMIC chaotic mapping is introduced in the initialization phase to expand the dispersion of the population and enhance the efficacy of the initial solution. The ergodicity of ICMIC chaotic mapping enables better diversity in the beginning phase of the walrus group, avoiding premature convergence and improving the accuracy and convergence of global optimization, overcoming the shortcomings of traditional optimization algorithms.
Equation (22) describes the mathematical expression of ICMIC chaotic mapping.
where
,
.
4.2. Adaptive Safety Signal
The in the original WO algorithm is determined by randomly generating a numerical value within . As is an important control signal for selecting the operation process of the algorithm, the complete randomness of the calculation of will seriously affect the stability and convergence algorithm’s capability. With the aim of achieving a better convergence trend, it should prioritize exploring the search space during the initial iteration phase and enhancing development capacity in the later iteration phase. Therefore, this paper introduces an adaptive operator to attain a more robust and enhanced equilibrium between exploration and exploitation.
Adaptive operator
is shown in Equation (24). After adding the adaptive operator,
is calculated by Equation (23).
where
signifies the current iteration count, and
signifies the maximum iteration count.
is a random number within
.
The adaptive operator
converges from 1 to 0 as the iterative process; the trend is shown in
Figure 1. The random number
is used to maintain the randomness of process selection, facilitating the algorithm to avoid local optima. This change in an improved security signal with the iteration is depicted in
Figure 2.
It is evident from the trend presented in
Figure 2 that the safety signal retains an overall trend of converging from 1 to 0. In the beginning stage of the cycle, it can engage in more habitat behavior, and in the subsequent phase of the iteration, it is able to choose more foraging behavior, which achieves equilibrium between algorithmic exploration and exploitation. Concurrently, the utilization of a random number can play a certain role in perturbation, preventing the algorithm from getting stuck in suboptimal solutions.
4.3. Population Regeneration Mechanism
The original WO algorithm simulates the migration, roosting, gathering, or fleeing behavior of a walrus population, ignoring the simulation of the regeneration of the entire population. The old and weak individuals in the walrus herb may be removed from the group due to their own or natural enemies. Meanwhile, new walrus individuals will be generated during the process of roosting in reproduction. In this paper, we simulate the population regeneration mechanism during the roosting process of the walrus population, as shown in
Figure 3.
In the regeneration mechanism of the walrus population, the oldest and weakest walruses (i.e., the worst solutions) among the adult walruses (consisting of male and female walruses) are removed, and the strongest (i.e., the optimal solution) juvenile walruses grow up to be the adult walruses. At the same time, a juvenile walrus is bred and added to the population through optimal and suboptimal individuals. The above program is shown in
Figure 4.
The simulation of the regeneration mechanism is shown in Equations (25) and (26).
where
is the worst individual among adult walruses,
is the best individual among juvenile walruses, and
and
denote the optimal and suboptimal individuals under the current iteration, respectively.
The roosting process of the original WO algorithm updated the positions of male walruses, female walruses, and juvenile walruses, respectively. The addition of a population regeneration mechanism can enhance the algorithm’s developmental capacity, making the population continuously move toward the optimal solution. Algorithm 1 describes the pseudocode of the population regeneration mechanism.
Algorithm 1. Pseudocode of population regeneration mechanism. |
1: % Initialize the worst walrus position and its fitness value in adult walruses, and the best walrus position and its fitness value in juvenile walruses; |
2: Initialize , , , ; 3: % Find the worst individual among adult walruses (consisting of males and females) |
4: For from 1 to |
5: Calculate fitness value ; |
6: If then |
7: ; |
8: End If |
9: End For |
10: % Find the optimal individual among juvenile walruses |
11: For from to |
12: Calculate ; |
13: If < then |
14: = ; |
15: ; |
16: ; |
17: End If |
18: End For |
19: % Reproduce a juvenile walrus by optimal and suboptimal individuals |
20: If then |
21: ; |
22: End If |
4.4. Elite Opposition-Based Learning Strategy
If walruses encounter natural predators while foraging, they leave the immediate area to avoid danger. The escape behavior of the original WO algorithm is simulated as in Equation (17). The position of the walrus during the escape process is influenced by its current place and the place of the leader walrus. However, during the escape process, the location of the leader walrus may not necessarily be the current safest position (i.e., optimal solution).
In this research, EOBL is involved to guide the escape behavior of current walruses by searching for the better position between the leader walrus (i.e., the best solution at present) and the individual in reverse. The process is simulated as shown in the following equations:
where
denotes the current optimal solution after the EOBL strategy obtained by Equation (28).
is the reverse solution of
, calculated by Equation (29).
is the dynamic coefficient within
. The dynamic bounds are given by
and
.
By introducing the EOBL strategy, it is feasible to flee from the current local optimal trap, expand the scope of exploration, and enhance the variety of the population.
4.5. Golden Sine Disturbance
Throughout the iteration process, to prevent the algorithm from getting stuck in local optimal solutions as well as enhance the population’s ability to explore, this research employed the golden sine strategy to disturb the population at the conclusion of each iteration [
79]. The disturbance method is shown in Equation (30).
where
represents a randomly generated value within
, which dictates the distance an individual moves during each iteration.
is a stochastic value in the range of
that dictates the direction for updating the current individual in each iteration.
and
are coefficients derived from the golden section.
By the golden sine disturbance of individual positions, the algorithm is able to conduct a more comprehensive search of the area during the whole iteration, improve the capacity for exploration, and efficiently address the issue of being caught in local traps in order to increase the algorithm’s convergence rate.
4.6. Fitness Function
The primary objective of FS is to identify significant features and decrease the dataset’s dimensionality. The better the effectiveness of optimization algorithms in dealing with FS problems, the higher the accuracy of classification. Equation (31) shows the fitness function used here.
where
is the accuracy of the feature subset
.
4.7. Overall Procedure
The general procedure of the BGEPWO algorithm suggested in this research is depicted
Figure 5 and Algorithm 2. Firstly, the population is initialized using ICMIC chaotic mapping, and the relevant parameters are defined. Then, the fitness value is computed, and the current optimal solution is obtained. During the iteration, if
, the walrus herd needs to migrate to a safe area, and the algorithm enters the exploration stage, where the location of each walrus is revised; otherwise, it begins the exploitation stage. If
, the walrus herd chooses to roost, at which time the positions of male walruses, female walruses, and juvenile walruses are updated separately, and then a population regeneration mechanism is introduced to eliminate and generate the individual in the group. If
, the walrus population starts foraging. During the foraging behavior, if
, walruses need to escape the current area. At this time, an EOBL strategy is used to obtain the position of the leader walrus that affects the escape route, and then the walruses’ position is updated. If
, the walruses continuously move toward the food-intensive area and update their position. At the conclusion of each iteration, the golden sine strategy is utilized to perturb the position of the walruses. Afterwards, the fitness value is calculated, and the current optimal solution is updated. The optimal solution is returned upon the conclusion of the iteration.
Algorithm 2. Pseudocode of BGEPWO |
Input: Parameters |
1: Initialize the population by ICMIC chaotic mapping in Algorithm 1; |
2: Specify the relevant parameters; |
3: Calculate the fitness values, acquire the optimal solution; |
4: While () do |
5: If | then {Exploration phase} % Migration |
6: Utilize Equation (9) to calculate every individual’s position; |
7: Else {Exploitation phase} % Reproduction |
8: If then % Roosting process |
9: For every male individual |
10: Update new position based on Halton sequence; |
11: End For |
12: For every female individual |
13: Update new position by Equation (12); |
14: End For |
15: For every juvenile individual |
16: Update new position by Equation (13); |
17: End For |
18: Update the population using the mechanism in Algorithm 2; |
19: Else % Foraging process |
20: If then % Fleeing process |
21: Update new position of leader walrus by Equation (28); |
22: Update new position of each walrus by Equation (27); |
23: Else % Gathering process |
24: Update new position of each walrus using Equation (18); |
25: End If |
26: End If |
27: End If |
28: Update the position; |
29: Disturb the population using Golden sine strategy using Equation (30); |
30: Compute the fitness value and refine the optimal solution at present; |
31: t = t + 1; |
32: End While |
Output: the optimal solution |
4.8. Binary Mechanism Sigmoid
Metaheuristic algorithms are mostly applied for addressing continuous optimization issues, whereas the solution space of a formulated FS problem is discrete. Thus, a transfer function needs to be inaugurated to convert the solution from continuous domain to discrete domain. In this paper, the Sigmoid function, a commonly used S-shaped transfer function [
80], is employed to discretize the continuous algorithm into a binary format BGEPWO, which is capable of addressing feature selection issues using the following equation:
where
and
, respectively, express the position of the
th individual and the probability of changing its binary position.
Then, a threshold
is set. In the present investigation, the threshold
is set to 0.5 to ensure the uniform distribution of 0 and 1 in discrete space. The position is revised with the use of Equation (33).
5. Experiments and Discussion
This part first depicts the datasets, evaluation criteria, and experimental configuration utilized during the experiment and conducts experiments and discussions on parameter settings. Then, the BGEPWO is evaluated experimentally against the original BWO algorithm as well as the comparison algorithms. The experimental outcomes are analyzed, and the limitations are pointed out.
5.1. Datasets
We selected 21 datasets to evaluate the effectiveness of the BGEPWO algorithm suggested in this study on FS. These datasets were obtained from the UCI machine learning database, OPENML, and Arizona State University (ASU) [
81,
82,
83,
84,
85]. The details can be found in
Table 1, which mainly describes the instances, features, classes, and sources.
5.2. Experimental Design
5.2.1. Fitness Function Value
For the purpose of assessing the efficacy of the BGEPWO algorithm suggested in this study for FS, comparative experiments were executed on the BGEPWO algorithm, BWO algorithm, and the binary versions of ten metaheuristic algorithms. Based on the standards of being widely cited, researched, classic algorithms, and recently proposed [
86], we selected 10 metaheuristic algorithms and used their binary versions as comparison algorithms. They are BABC, BPSO, BBA, BWOA, BKOA, BSSA, BNOA, BHHO, BCPO, BCOA.
The evaluation indicators for comparative experiments are Fitness function value, Feature subset size, and F1-score. In addition, the 5% Wilcoxon rank-sum test is involved to verify statistical significance in this paper. The assessment metrics are presented as follows:
As shown in Equation (31),
signifies the accuracy of the feature subset
, calculated by the following equation:
where
,
,
, and
represent true positives, true negatives, false positives, and false negatives, respectively.
- 2.
Number of selected features
It denotes the quantity of features selected during the FS process.
It is a blend of precision and recall that is utilized for evaluating the quality of predictive models. The formulae of precision and recall are as follows:
F1-
score is the harmonic mean of recall and precision. For binary classification problems,
F1-
score is determined using the formula below:
In multiclass classification problems,
Macro_
F1-
score is used as an evaluation indicator [
19]. For an N-classification problem, the calculation formula is
where
represents the
F1-
score for the
ith category.
In addition, the 5% Wilcoxon rank-sum test is involved to verify whether there is a significant difference between BGEPWO and competitive algorithms for feature selection on the dataset.
5.2.2. Experimental Configuration
The experiments were conducted using identical computers and environmental settings, which use Intel Core i7, 2.8 GHz CPU, and 16 GB of RAM. All experiments were implemented in MATLAB 2023a and run on the same computer with the Windows 10 operating system. KNN (k = 10) was used as the classifier [
87].
In this study, we performed two sets of trials. The first set of trials selects the most suitable value of the parameter for processing FS of the BWO and BGEPWO. The second set of trials is the contrast of the effectiveness of BGEPWO with BWO and the comparison algorithms on the dataset.
5.3. Parameter Settings
We conducted trials on three distinct datasets with different dimensions, Sonar, SPECT, and Hill Valley, to select the optimal parameter settings for BGEPWO and BWO algorithms during FS. We examined four values for the proportion
of male walruses in a herd:
setting the maximum number of iterations to 200, the population size to 20, and the number of independent runs to 20. The experiment was analyzed based on the mean fitness values and convergence curves.
Table 2 presents the mean fitness values of BWO and BGEPWO when using four different values on three datasets.
Figure 6a,b displays the curves of convergence of the BWO algorithm and BGEPWO algorithm when using four different values on the aforementioned datasets, respectively. The findings from the experiment indicate that in the case where BWO and BGEPWO have the best results,
is recommended to be set to [0.3].
In the empirical assessment of each algorithm, the relevant parameters were set as depicted in
Table 3. The parameter values were set based on their original paper. To ensure fairness, the common parameter settings for all algorithms conducted on the dataset are the same, with 30 individuals in the population, a maximum of 200 iterations, and 20 independent runs.
5.4. Experimental Results and Discussions
In this section, the performance of BGEPWO was evaluated and compared with the original BWO and ten other representative metaheuristic algorithms, shown in
Table 4,
Table 5,
Table 6 and
Table 7. The table shows the mean, standard deviation, best value, and ranking for different algorithms for 20 independent runs on the datasets in
Table 4 and
Table 5. The ranking is based on the mean of the evaluation criteria, and the best rankings are highlighted in bold.
Table 4 shows the performance of BGEPWO and competitive algorithms on the fitness function. Overall, BGEPWO achieved the best average fitness value on 18 out of 21 datasets, ranking second on the Arrhythmia and Micro Mass datasets, and eighth on the Zoo dataset, indicating that BGEPWO exhibits strong FS ability on the dataset. The standard deviation and optimal values show that BGEPWO performs relatively consistently and is capable of attaining the highest possible optimal across the majority of the datasets. The mean fitness value of BGEPWO ranked 1.4 on all datasets, which is the highest among all the evaluated algorithms, indicating that the BGEPWO algorithm has a strong capability of searching solution space.
In addition, the BGEPWO algorithm achieved better fitness values compared with the original BWO on all datasets, indicating that our improvements of the algorithm for FS applications are effective.
Figure 7 and
Figure 8 show the convergence of these algorithms based on the fitness function on low- and high-dimensional datasets, respectively. The convergence curve exhibits that the BGEPWO algorithm demonstrates superior convergence capability among the competing algorithms on 21 datasets. On the Wine, Dermatology, and SPECT datasets, the proposed algorithm converged quickly toward the optimal solution in the preiteration period. On the Sonar, Isolet, Musk, Hill-Valley, and DLBCL datasets, the BGEPWO algorithm not only quickly converged to the optimal solution in the preiteration period but also showed strong convergence ability in the late iteration period. The BGEPWO algorithm did not have the best convergence ability in the early stages on the Breast Cancer, Heart, Lymphography, Semeion, Ionosphere, Amazon, Arcene, Leukemia, Prostate, and Colon datasets, but with its later convergence ability, it demonstrated superior performance compared with other algorithms at the conclusion of the iteration. The proposed algorithm performed only slightly worse than BKOA on the Arrhythmia dataset, but its ability to continuously approach global optima in the later stages could still be observed. The BGEPWO algorithm had also demonstrated excellent convergence ability on the Micro Mass dataset, but its performance was somewhat inferior to that of BPSO. On the Zoo dataset, the proposed BGEPWO performed poorly in both the early and late stages of iteration.
Table 5 shows the average number, the average feature reduction rate, and the least number of features selected by the experimental algorithm during the FS process. The mean number of features chosen by BGEPWO on 21 datasets ranked first among all algorithms. There are certain contradictions and trade-offs between accuracy and the feature subset size in the FS process. In a general evaluation system, accuracy is the first consideration, and the quantity of features that have been chosen should be minimized on the basis of ensuring accuracy. The fitness function formula indicates that the lower the value, the higher the accuracy. Based on
Table 4 and
Table 5 comprehensively, the algorithms whose combined ranking of the mean number of selected features was not worse than BGEPWO had a much worse average fitness value compared with BGEPWO, which means that the algorithms that chose the least number of features in the experiments achieved a poorer accuracy.
Figure 9 and
Figure 10 show the feature reduction rates obtained by all algorithms on low- and high-dimensional datasets, respectively. The BGEPWO algorithm had the best performance among all the algorithms with an average feature approximation rate of 67.07% on all the datasets, although it only selected the minimum number of features on 5 datasets. Comprehensive analysis shows that on these 21 datasets, the BGEPWO algorithm selected a relatively small size of feature subset on the basis of ensuring a relatively best classification ability, which proves that BGEPWO has a strong feature approximation ability.
Table 6 shows the
F1-
score procured in the FS process. BGEPWO achieved optimal performance on 16 datasets, marginally below BABC on the Amazon dataset, lower than the BCPO and BABC algorithms on the Isolet dataset, and lower than the BCPO and BPSO algorithms on the Musk dataset. The proposed algorithm achieved general results on the Zoo and Lymphography datasets, ranking sixth and seventh, respectively. In addition, the BGEPWO algorithm achieved higher performance than BWO on all datasets except the Zoo dataset, indicating that the model using the improved BGEPWO algorithm is more robust. Based on its performance on all datasets, BGEPWO had the best average ranking among all algorithms, signifying that BGEPWO had a better performance and could obtain better predictive classification models.
In order to measure the difference between the proposed BGEPWO and the competitive algorithms from a statistical perspective, we applied a 5% Wilcoxon rank-sum test.
Table 7 displays the outcome of comparing the competitive algorithms with BGEPWO, and the symbols +, =, − are used to indicate that the proposed BGEPWO performs “significantly better than, similarly to, or worse than” the compared algorithms, respectively. It can be seen from
Table 7 that BGEPWO outperformed the comparison algorithms in the majority of cases. The BGEPWO significantly outperformed the comparison algorithm on 19, 20, 18, 16, 19, 18, 20, 17, 19, 17, 12 datasets, respectively, and the median performance was similar to that of the comparison algorithm on 2, 1, 2, 4, 1, 2, 0, 2, 1, 4, 9 datasets. The BGEPWO algorithm was inferior to the BPSO algorithm only on the Micro Mass dataset, whereas the proposed algorithm performed average on the Zoo dataset, and median performance was worse than the BCOA, BCPO, BHHO, BKOA, BNOA, and BPSO algorithms.
The evaluation results on the datasets indicate that BGEPWO is significantly superior to the original algorithm and ten representative competitive algorithms in terms of fitness function, number of selected features, and F1-score. The proposed algorithm can fully search the solution space and find the optimal solution, thus achieving excellent performance in fitness function and selected features. The application of multiple strategies can effectively improve the balance between exploration and exploitation, and the involvement of the population regeneration mechanism can enhance convergence ability. Therefore, the BGEPWO algorithm has demonstrated strong convergence ability in experiments. The use of adaptive operators increases the stability of the algorithm, so the proposed algorithm can achieve better results on F1-score. In addition, 5% of the Wilcoxon rank-sum test findings indicate that BGEPWO is statistically significantly better than the comparison algorithms. The experimental results confirm that BGEPWO has excellent performance and feature reduction ability, as well as strong convergence ability in feature selection applications.
Comparative experiments with the latest metaheuristic algorithms, such as KOA, NOA, CPO, and COA, showed that the original WO algorithm has certain superiority in handling feature selection tasks, and the proposed BGEPWO algorithm outperforms these algorithms significantly on all evaluation indicators. In addition, compared with the results of recent and popular research on the same datasets dealing with feature selection problems, the BGEPWO algorithm achieved higher accuracy and performance than the MPPSO, HGSA (using KNN), and SDBA algorithms on 80%, 67%, and 80% of datasets, respectively, indicating that the proposed algorithm has certain competitiveness and advantages in feature selection applications.
5.5. Limitations of BGEPWO
Despite the experimental simulation results demonstrating the superiority of the suggested BGEPWO over some comparative algorithms in dealing with FS problems, there are still some limitations.
Compared with competitive algorithms, BGEPWO struggled to identify the smallest set of features in the majority of datasets. Although it is not the most important indicator for the algorithm in FS, and a blind reduction in the feature subset size may seriously affect classification accuracy, it is still an important indicator that we need to refer to when dealing with FS problems. Therefore, a novel strategy may be employed to improve BGEPWO to select fewer features while ensuring classification accuracy.
In addition, the algorithm may not always converge the fastest in early iterations, while it exhibits robust convergence ability in later iterations. This indicates that the algorithm requires sufficient iterations to achieve impressive results. Consequently, the algorithm is well suited for analyzing datasets with sufficient computational iteration time and is not conducive to environments necessitating expeditious results.