Improved Binary Grey Wolf Optimization Approaches for Feature Selection Optimization

Khaseeb, Jomana Yousef; Keshk, Arabi; Youssef, Anas

doi:10.3390/app15020489

Open AccessArticle

Improved Binary Grey Wolf Optimization Approaches for Feature Selection Optimization

by

Jomana Yousef Khaseeb

¹

,

Arabi Keshk

² and

Anas Youssef

^2,*

¹

Applied Computing Department, Palestine Technical University—Kadoorie, Ramallah P.O. Box 7, Palestine

²

Computer Science Department, Faculty of Computers and Information, Menoufia University, Shebin El Kom 32511, Egypt

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(2), 489; https://doi.org/10.3390/app15020489

Submission received: 31 October 2024 / Revised: 19 December 2024 / Accepted: 20 December 2024 / Published: 7 January 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Feature selection is a preprocessing step for various classification tasks. Its objective is to identify the most optimal features in a dataset by eliminating redundant data while preserving the highest possible classification accuracy. Three improved binary Grey Wolf Optimization (GWO) approaches are proposed in this paper to optimize the feature selection process by enhancing the feature selection accuracy while selecting the least possible number of features. Each approach combines GWO with Particle Swarm Optimization (PSO) by implementing GWO followed by PSO. Afterwards, each approach manipulates the solutions obtained by both algorithms in a different way. The objective of this combination is to overcome the GWO stuck-in-local-optima problem that might occur by leveraging the PSO-wide search space exploration ability on the solutions obtained by GWO. Both S-shaped and V-shaped binary transfer functions were used to convert the continuous solutions obtained from each proposed approach to their corresponding binary versions. The three proposed approaches were evaluated using nine small-instance, high-dimensional, cancer-related human gene expression datasets. A set of comparisons were made against the original binary versions of both GWO and PSO algorithms and against eight state-of-the-art feature selection binary optimizers in addition to one of the recent binary optimizers that combines PSO with GWO. The evaluation results showed that one of the proposed S-shaped and V-shaped approaches achieved 0.9 and 0.95 average classification accuracy, respectively, while selecting the fewest number of features. The results also confirmed the superiority of one of the proposed V-shaped approaches when compared with the original binary GWO and PSO approaches. Moreover, the results confirmed the superiority, in most of the datasets, of one of the three approaches over the state-of-the-art approaches. Finally, the results revealed that the best approach in terms of classification accuracy, fitness value, and number of selected features had the highest computational complexity.

Keywords:

swarm intelligence; feature selection; particle swarm optimization; grey wolf optimization

1. Introduction

The huge amount of data in many areas leads to the formation of high-dimensional datasets. Large amounts of data and a massive number of attributes increase computational complexity and decrease the efficiency of data mining [1,2] and learning processes [3]. The large amount of data also leads to a problem in classification tasks known as the curse of dimensionality [2,4]. Feature selection (FS) is a crucial process for shrinking the size of a dataset by removing duplicated and useless features to enhance classification performance and/or reduce computational costs [2].

The FS process consists of four steps, as shown in Figure 1 [5]. The four steps are subset generation, subset evaluation, verification for a stopping criterion, and result validation. Subset generation is the process of searching for feasible solutions to generate candidate feature subsets for evaluation. Subset evaluation is then used to evaluate the newly generated subset by comparing it with a previous best one in terms of a certain evaluation criterion. The new subset may replace the previous best subset if the former is found to be better than the latter. The stopping criterion determines whether the feature selection process must be stopped or not due to, for example, achieving a certain minimum number of features or reaching a certain maximum number of iterations. Finally, result validation is used to measure the accuracy of the result. It is responsible for evaluating the FS approach by validating the selected subset of features using a testing dataset.

The search algorithms that are commonly used to search for the best representative features in a dataset can be categorized into three different classes, namely, exhaustive, random, and heuristic search techniques [2,6]. An exhaustive search technique is time-consuming and, consequently, exhibits high costs since it investigates all possible solutions [2,7]. A random search technique generates a random subset of features from the dataset [8]. A heuristic search technique is faster than the former two searching techniques since it allows the user to obtain the near-optimal subset by applying some heuristic behavior without having to check all possible feature subsets [9].

To evaluate the generated feature subsets, filter, wrapper, and hybrid approaches can be used [10]. The filter approach evaluates a new subset of features based on a certain ranking method to return the most important attributes without applying any learning algorithm [11]. The wrapper approach evaluates the chosen features based on the feedback from a learning algorithm which takes, for example, classification accuracy into account [12]. The hybrid approach benefits from the advantages of both the filter and wrapper methods [13].

Metaheuristics are frequently used in the literature to solve optimization problems, specifically with broad searching areas to find the best solution in a reasonable time [14]. This paper focuses on metaheuristic search strategies to deal with the FS optimization problem. Swarm intelligence (SI) algorithms are metaheuristic methods that have been broadly used to resolve a variety of optimization issues [10]. These algorithms imitate the social attitudes of animals, insects, or birds when seeking food. SI suffers from falling into the local optima, especially when searching in a high-dimensional search space. Such algorithms require a balance between global search, i.e., exploration, and local search, i.e., exploitation, to discover promising areas in the search space and find a global optimum. The focus of this paper is to provide a heuristic solution to achieve this balance.

In this paper, three hybrid FS optimization approaches are proposed. The proposed approaches combine the Grey Wolf Optimization (GWO) [15,16] and Particle Swarm Optimization (PSO) [17,18] algorithms by applying GWO followed by PSO, and then each approach manipulates the solutions obtained by both algorithms in a different way. This combination aims to overcome the GWO stuck-in-local-optima problem that might occur by leveraging the PSO-wide search space exploration ability on the solutions obtained by GWO. The three proposed approaches are all named Improved Binary Grey Wolf Optimization (IBGWO) and each of them is abbreviated to IBGWO and appended to the approach number. The proposed work in this paper adds significant extensions to the work proposed in [19].

The FS optimization problem considered in this paper is a binary problem since the features are represented as a bit string of 0’s and 1’s to reflect the non-selected and selected features, respectively [20,21]. Transfer functions (TFs) are the mechanisms that are used to transform continuous solutions into binaries [22]. Two types of TF, namely, S-shaped and V-shaped, are used in this work. The three proposed hybrid approaches are based on the binary versions of both PSO and GWO to resolve the FS optimization problem.

Each of the three proposed approaches combines GWO with PSO by applying GWO followed by PSO and then each approach manipulates the solutions obtained by both algorithms in a different way. The first approach is named IBGWO2 and it performs peer-to-peer (P2P) comparisons between the PSO solutions and the corresponding GWO solutions and keeps the fittest solutions for the next iteration. The second approach is named IBGWO3 and it ranks all the solutions that are obtained from both GWO and PSO and keeps the fittest solutions from both algorithms together. The third approach is named IBGWO4. It is like IBGWO3 in ranking all the solutions that are obtained from both GWO and PSO and it adds the initially generated solutions to select the optimal solutions from these three populations, i.e., those obtained by GWO, PSO, and the initial population.

The performances of the three approaches were evaluated and compared with each other in addition to making comparisons with the original binary PSO (BPSO), binary GWO (BGWO), and other state-of-the-art metaheuristics used for FS optimization in the literature. The comparisons were performed using nine high-dimensional, cancer-related human gene expression datasets. The evaluations were performed in terms of a set of evaluation metrics, namely, average classification accuracy, average number of selected features, average fitness values, average computational time, and computational complexity.

The results showed that the proposed S-shaped optimizer, IBGWO3-S, achieved more than 0.9 average classification accuracy while selecting the fewest number of features in most of the datasets. The proposed V-shaped optimizer named IBGWO4-V achieved more than 0.95 average classification accuracy while also selecting the fewest number of features in most of the datasets. The results also confirmed the superiority of the proposed IBGWO4-V approach when compared with the original BGWO and BPSO. Moreover, the results confirmed the superiority of the proposed IBGWO4-V approach over other state-of-the-art metaheuristics selected from the literature in most of the datasets. Finally, the computational complexity analysis of the three proposed approaches showed that the best approach in terms of classification accuracy, fitness value, and number of selected features had the highest computational complexity.

This paper is organized as follows. Related works are discussed in Section 2. Section 3 introduces GWO, PSO, and their binary versions. The details of the proposed approaches are provided in Section 4. The evaluation metrics are described in Section 5. The datasets and experimental setup are presented in Section 6. The evaluation results are discussed in Section 7. A discussion of the evaluation results is summarized in Section 8. Conclusions and future work directions are presented in Section 9. Finally, Appendix A lists all abbreviations used throughout the paper.

2. Related Works

Metaheuristic algorithms are broadly employed to resolve optimization issues [23]. Metaheuristics are frequently used to determine the appropriate subset of features in a reasonable time. Such methods have been classified into a set of classes as classified in [24]. Such classes include physics-based [25,26], social-based [27], music-based, chemistry-based, sport-based, math-based, biology-based [28], and swarm-based [20].

Since the focus of this work is on swarm-based metaheuristic optimizers, a more detailed listing of such metaheuristics is provided as follows. Examples of swarm-based metaheuristics include PSO [17,18,20,29], Honey Bee Mating Optimization (HBMO) [30], Whale Optimization Algorithm (WOA) [31], Harris Hawks Optimizer (HHO) [32], Dragonfly Algorithm (DA) [33], Ant Colony Optimization (ACO) [34,35], Ant Lion Optimizer (ALO) [36,37,38], GWO [15,16,21,39,40,41], Bacterial Foraging Optimization (BFO) [42], Spotted Hyena Optimization (SHO) [43], and many others.

Many researchers have proposed hybrid metaheuristics to improve the exploration and exploitation aspects of searching techniques to avoid local optima. For example, a hybrid firefly algorithm with a mutation operator was used for FS optimization to detect intrusions in a network [44]. This hybrid approach was utilized to choose the best features without becoming stuck in the local optima. A hybrid algorithm that combines WOA with the flower pollination algorithm for feature selection to detect a spam email was proposed in [45].

Genetical Swarm Optimization (GSO) was proposed in [46], which provided a combination of Genetic Algorithms (GAs) and PSO. This technique was used to optimize the FS problem for digital mammogram sets of data. Another hybrid approach was proposed in [47] that combined GWO and WOA to create a wrapper-based FS method. A hybrid model that combined SHO with Simulated Annealing (SA) for FS optimization is called SHOSA and was proposed in [43]. The findings indicated the superiority of SHOSA in finding the best features. A hybrid algorithm called (MBA–SA) combined the Mine Blast Algorithm (MBA) with SA to optimize the FS problem and is presented in [48].

A binary algorithm that combined GWO with HHO to select the optimal features from the datasets was proposed in [49]. This hybrid algorithm is called HBGWOHHO. The results indicated that the proposed algorithm outperformed some well-known FS optimization algorithms. A hybrid algorithm based on PSO to boost SHO’s hunting methodology was proposed in [50]. The findings showed that the proposed algorithm outperformed four metaheuristic algorithms when tested on thirteen well-known benchmark functions including unimodal and multimodal functions.

An improved binary GWO algorithm was proposed in [51]. This work attempted to improve the accuracy of the classification process when compared with the original BGWO algorithm. The proposed improved BGWO algorithm was analyzed by tuning a set of parameters in the original GWO algorithm to achieve an acceptable balance between the exploration and the exploitation capabilities of the original GWO. A set of approaches based on this sensitivity analysis were implemented and compared against each other. This work is similar to our work in the sense that it aimed to solve the GWO exploration-exploitation balance problem. However, our approach combines the binary PSO algorithm with the binary GWO algorithm instead of implementing a kind of sensitivity analysis like the one proposed in that work.

An improved BPSO algorithm was proposed in [52]. This work aimed to enhance the exploration ability of the original PSO by modifying its population search process. This modification was based on applying two trained surrogate models, where each model was used to approximate the fitness values of the individuals in a separate sub-population. This work is also similar to our work in its ability to enhance the exploration capability of the original BPSO algorithm. However, the main objective of our work is different in the sense that it attempts to combine BGWO with BPSO algorithms to enhance the exploration capability of the original BGWO algorithm by leveraging the original BPSO algorithm.

A multi-objective differential evolutionary algorithm for optimizing FS was proposed in [53]. This algorithm was proposed to provide a tradeoff between the diversity and convergence of non-dominating solutions. Like our proposed work, this work addressed the problem of optimizing the FS problem. However, it did not include SI in its proposed mechanism.

An improved binary pigeon-inspired optimizer was proposed in [54] and applied to the FS optimization problem. This optimizer was proposed to overcome the stuck-in-local-optima problem of the original binary pigeon-inspired optimizer [55]. The objective of this work is like our proposed approaches in the sense that it tried to overcome the effect of the local optimal solution on the optimization process. Our work is different in applying the improvement to the BGWO algorithm by combining it with the BPSO algorithm.

A hybrid SI approach that combined the BGWO with BPSO algorithms was proposed in [56]. This approach is very similar to our work since it combined the same two metaheuristic approaches applied in our work. Like our work, the objective of this study was to provide a balance between exploration and exploitation of the search space. However, the way of combining the two metaheuristic approaches, i.e., GWO and PSO, differs from our proposed work since the two approaches were used in an alternative fashion, where BGWO was applied in the even iterations while BPSO was applied in the odd ones. The approaches proposed in this work combined both BGWO and BPSO algorithms together in each iteration, which, logically speaking, results in lower classification errors and better fitness values.

A recent hybrid swarm intelligence metaheuristic optimization approach was proposed in [57] to optimize the FS process. This work combined the BPSO, BGWO, and tournament selection [58] algorithms. Like our work, the objective of this combination was to provide a balance between exploration and exploitation of the solution space to achieve a near-global optimal solution.

Another recent hybrid SI metaheuristic optimization approach was proposed in [59] to optimize the FS problem. This work combined ACO with GWO to enhance the exploration power of GWO while preserving its strong exploitation characteristic. This is the same objective of our work with the difference in combining GWO with ACO instead of PSO.

In general, there does not exist an algorithm that is able to resolve all FS optimization issues for all datasets [60]. Therefore, there is always a strong requirement to design novel algorithms, approaches, and techniques to address the FS optimization problem. This emphasizes the significance of the work proposed in this paper.

3. Background

Particle Swarm Optimization

Kennedy and Eberhart proposed the original PSO algorithm in 1995 [17]. It imitates the social interactions of flocking birds when seeking food [17,61]. PSO is an SI metaheuristic algorithm that explores the problem space for the optimal solution using a set of particles called a swarm. The position of a particle represents a probable solution in the problem space, and the velocity is utilized to adjust the particle to transfer at a particular speed and direction during the searching process. The new velocity of each particle, vl^new, is updated as shown in the following equation:

vl^new = iw × vl^old + c1 × rand1 × (pbest − x^old) + c2 × rand2 × (gbest − x^old)

(1)

where vl^old is the particle’s old velocity and x^old is the particle’s old position. pbest and gbest are the personal and global best positions, respectively. The two numbers, rand1 and rand2, are two random numbers in the set [0, 1]. The acceleration factors c1 and c2 refer to the particle’s confidence in itself and its neighbors, respectively. The two factors are set to 2 as a generally accepted setting for most optimization problems, as mentioned in [17,62,63,64,65]. iw is an inertia weight that decreases linearly or non-linearly to regulate the exploration and exploitation phases during the search process [66,67,68,69]. A new position, x^new, for each particle is calculated by the following equation:

x^new = vl^new + x^old

(2)

b.: Grey Wolf Optimization

GWO is a metaheuristic algorithm that was proposed by Mirjalili et al. [15] to resolve optimization problems in different fields. It simulates the hunting technique of grey wolves in detecting prey positions by applying an intelligent searching methodology. The social hierarchy of the grey wolves is based on the hunting skills of the individuals. This hierarchy classifies the wolves into four types: α, β, δ, and ω. In the GWO algorithm, the initial grey wolves are set randomly. Each wolf represents a candidate solution in the problem space. The best solutions are considered as α, β, and δ, while the remaining candidate solutions are considered as ω. The hunting techniques of the grey wolves are explained in more detail in [15,16]. These techniques are summarized mathematically by the following set of equations:

D_{α} = |C_{1} \cdot X_{α} - X_{w o l f}|

(3)

D_{β} = |C_{2} \cdot X_{β} - X_{w o l f}|

(4)

D_{δ} = |C_{3} \cdot X_{δ} - X_{w o l f}|

(5)

X_{w o l f 1} = |X_{α} - A_{1} . D_{α}|

(6)

X_{w o l f 2} = |X_{β} - A_{2} . D_{β}|

(7)

X_{w o l f 3} = |X_{δ} - A_{3} . D_{δ}|

(8)

X_{(t + 1)} = \frac{X_{w o l f 1} + X_{w o l f 2} + X_{w o l f 3}}{3}

(9)

where X_(t+1) is the new location of each wolf in the pack depending on the three best locations X_α, X_β, and X_δ. This is determined by the distance D between each wolf and the locations of α, β, and δ. The two coefficient vectors A and C are calculated by the following two equations:

A = 2a · r1 − a

(10)

C = 2 · r2

(11)

where r1 and r2 are random vectors, whose values are in the set [0, 1]. Vector a is decreased linearly during the iterations from 2 to 0 and calculated using the following equation:

a = 2 - 2 (\frac{c u r r I t e r}{m a x I t e r})

(12)

where a is a vector that tunes the exploration and exploitation of the GWO algorithm. It is linearly reduced from 2 to 0 during the iterations depending on the current iteration, currIter, and the maximum number of iterations, maxIter [15,16].

GWO offers a natural transition from exploration to exploitation through its adaptive encircling and shrinking mechanisms. However, GWO’s exploration can be limited in certain cases because the wolves tend to converge prematurely around the best solutions, i.e., α, β, and δ, thus reducing diversity in later stages of the search. On the other hand, PSO includes mechanisms that help escape local optima. Its velocity update formula incorporates both cognitive and social components, with random coefficients adding stochasticity to the search. This enables particles to explore new regions even after they cluster around promising solutions. Therefore, the combination of GWO and PSO is motivated by the need to overcome the limitations of GWO in maintaining exploration and diversity while leveraging PSO’s ability to escape local optima and refine solutions.

c.: Binary Optimization Algorithms for the FS Problem

FS can be considered a binary problem where all features in a dataset can be represented by a series of zeros and ones, where zeros represent the unselected features and ones represent the selected features. According to Mirjalili and Lewis [22], the conversion from continuous solutions to their corresponding binary solutions is based on two types of TFs, namely, S-shaped and V-shaped. The mathematical formulas of these TFs are shown in [22,33]. The binary versions of both PSO and GWO are discussed in [37,62]. S2-TF and V2-TF, defined in Equations (13) and (14), are utilized in this work to convert the attributes into binaries. The two equations are shown as follows:

S2(x) = 1/(1+ e^−x)

(13)

V2(x) = |tanh(x)|

(14)

4. Proposed Approaches

An optimization problem entails determining the most effective and optimal solution from all possible solutions. The FS process is an optimization problem that aims to obtain the best and most optimized features in a given dataset. Four different SI approaches are described in this section as wrapper-based FS methods. These approaches are used to identify the most significant feature subsets that optimize the classification task. Searching strategies must provide a balance between the exploration and exploitation phases to identify the optimum solution. Throughout this paper, the PSO algorithm with powerful exploration potential is combined with the GWO algorithm, which has a good exploitation capability. This combination is employed in four different approaches to increase search efficiency and decrease the chances of falling in the local optimum. In this work, the continuous solutions are converted to binary solutions using the S2 and V2 TFs mentioned in [22,37].

The GWO algorithm has a good exploitation ability; however, new wolves may drop in a local optimum [70]. According to the GWO algorithm’s searching strategy, the new wolves in the pack follow the three best solutions, α, β, and δ. These best solutions may not be the optimal ones, particularly when the search is in a high-dimensional space. This increases the probability of being trapped in a local optimum. To improve GWO’s searching capability and the accuracy of its solutions, GWO is combined with PSO. This combination is proposed and implemented in the form of four different approaches. These approaches are explained in more detail in the following subsections.

IBGWO1

The first approach is named Improved Binary Grey Wolf Optimization 1 (IBGWO1). This approach was previously proposed in [19]. IBGWO1 is presented here for completeness and comparison with three newly proposed approaches in this paper. Figure 2 shows the flowchart of IBGWO1. As shown in the figure, IBGWO1 applies GWO followed by PSO to obtain a new population in each iteration. The initial population of GWO is generated randomly and then GWO improves the solutions of the obtained population according to its algorithm. The improved solutions of GWO are used as the initial population of the PSO algorithm, which, in turn, improves the solutions based on its algorithm. These steps are repeated until either the optimal solution or the maximum number of iterations is reached.

b.: IBGWO2

As previously described in the IBGWO1 approach, the GWO’s enhanced population serves as the initial population of the PSO algorithm. Some of GWO’s generated solutions may provide higher fitness than PSO’s generated solutions; however, these robust solutions are updated after implementing the PSO algorithm. Therefore, IBGWO2 proposes the employment of P2P comparisons to preserve the best solutions. The P2P comparisons are used to compare GWO’s solutions to those of their PSO peers, with the fittest solutions being stored. In this approach, the enhanced GWO population is saved before performing PSO. Then, P2P comparisons between GWO’s solutions and PSO’s solutions are performed. These steps are repeated if the stopping criteria are not met.

Algorithm 1 presents the IBGWO2 pseudo-code, while Figure 3 illustrates its flowchart. In lines 1 to 6, the maximum number of iterations, T, and the population size, P, are initialized. Afterwards, the positions are generated randomly and all GWO and PSO parameters are initialized. In lines 9 to 17, the solutions are updated according to the GWO algorithm and the new positions are saved in a new population. The new population obtained by GWO is then updated according to the PSO equations and another new population is obtained as shown in lines 18 to 24. In lines 25 to 29, P2P comparisons are performed and the best solutions are kept for the next iteration. These steps are repeated until either the optimal solution or the maximum number of iterations is achieved.

Algorithm 1: IBGWO2

1: Start
2: Set the maximum number of iterations T
3: Set the population size P
4: Initialize the individuals’ positions and velocities
5: Initialize the GWO parameters a, A, C
6: Initialize the PSO c1 and c2
7: t ← 1
8: While t < T do
9: Calculate individuals’ fitness values
10: Find best three solutions X_α, X_β, and X_α
11: p ← 1
12: For p < P do
13:   Update GWO individuals’ positions using Equations (3)–(9)
14:   Convert the new individuals’ positions to binary using Equations (13) and (14)
15:   Update a, A, C using Equations (10)–(12)
16:End For
17: Save the GWO population
18:   p ← 1
19: For p < P do
20: Find pbest
21: Update individuals’ velocities using Equation (1)
22: Update individuals’ positions using Equation (2)
23: Convert the new individuals’ positions to binary using Equations (13) and (14)
24: End For
25: Save the PSO population
26: p ← 1
27: For p < P do
28:Compare GWO’s solutions with its corresponding pair in PSO’s solutions
29:   Save the fittest solutions for the next iteration
30:   End For
31: t++
32: End While
33: Return the best solution
34: End

c.: IBGWO3

Some strong solutions may be omitted due to peer comparisons applied in IBGWO2. Therefore, IBGWO3 employs ranking to maintain the good solutions that might be lost due to this omission. The solutions, whether generated by GWO or PSO, are ranked based on their fitness values, and the fittest solutions are kept and used in the next iteration. These solutions may only be obtained from GWO, PSO, or both optimizers. Algorithm 2 presents the pseudo-code of IBGWO3, while Figure 4 illustrates its flowchart. The same steps implemented in IBGWO2 are also implemented in IBGWO3, except for the steps in lines 19 and 20. In these lines, all solutions obtained by both GWO and PSO are ranked together based on their fitness values and the fittest solutions are retained for the next iteration.

Algorithm 2: IBGWO3

1: Start
2: Set the maximum number of iterations T
3: Set the population size P
4: Initialize the individuals’ positions and velocities
5: Initialize the GWO parameters a, A, C
6: Initialize the PSO c1 and c2
7: t ← 1
8: While t < T do
9:   p ← 1
10: For p < P do
11:   Update the GWO positions like IBGWO2
12: End For
13: Save the GWO population
14: p ← 1
15:   For p < P do
16:   Update the PSO positions like IBGWO2
17:   End For
18:   Save the PSO population
19:   Rank the GWO and PSO populations
20:   Save the fittest solutions for the next iteration
21:   t++
22: End While
23: Return the best solution
24: End

d.: IBGWO4

The IBGWO4 approach augments IBGWO3 with an additional step in which it ranks the solutions of three populations based on their fitness values to find the best solutions. Algorithm 3 lists the pseudocode of IBGWO4, while Figure 5 illustrates its flowchart. The ranking is applied based on the solutions that are generated by both PSO and GWO, in addition to the initial population used by each iteration. The best solutions are kept as an initial population for the next iteration. Again, the same steps are implemented as those in IBGWO3 except for the steps in line 20. In this line, the ranking is performed for the three populations, i.e., the initial population, the GWO population, and the PSO population, and the best solutions are retained for the next iteration.

Algorithm 3: IBGWO4

1: Start
2: Set the maximum number of iterations T
3: Set the population size P
4: Initialize the individuals’ positions and velocities
5: Initialize the GWO parameters a, A, C
6: Initialize the PSO c1 and c2
7: t ← 1
8: While t < T do
9:   p ← 1
10: For p < P do
11:   Update the GWO positions like IBGWO2
12: End For
13: Save the GWO population
14: p ← 1
15: For p < P do
16:   Update the PSO positions like IBGWO2
17: End For
18: Save the PSO population
19: Rank initial, GWO, and PSO populations
20: Save the fittest solutions for the next iteration
21: t++
22: End While
23: Return the best solution
24: End

5. Evaluation Metrics

The objective function, which is used to evaluate the different solutions, combines the classification accuracy and the number of selected features. The near-optimal solution is the one that exhibits the highest classification accuracy, i.e., the lowest classification error, and the least number of features. The fitness function that evaluates each obtained solution is computed as follows:

F i t = v C l a s s E r r + (1 - v) \frac{|q|}{|Q|}

(15)

where ClassErr is the classification error provided by any classifier, such as K-Nearest Neighbor (KNN) [71], which was used in this study; |q| is the number of selected features; and |Q| is the number of all features in the dataset. The parameter v is used to control the weights of the classification error and the feature subset length, respectively, when computing the fitness of the solution, where v ∈ [0, 1], as mentioned in [41,72].

The proposed approaches were evaluated based on a set of metrics, namely, average classification accuracy (ACA), average number of selected features (ASF), average fitness value (AFV), and average computational time (ACT). The mathematical formula of each of these metrics will be described in Equation (16), Equation (17), Equation (18), and Equation (19), respectively.

The ACA evaluates the classifier’s prediction ability using the group of features that is selected. It is defined by the following equation:

A C A = \frac{1}{R} \sum_{r = 1}^{R} \frac{1}{S} \sum_{s = 1}^{S} {(P r e d}_{i} = = {A c t}_{i})

(16)

where R is the number of runs, S is the number of instances, and Pred_i and Act_i are the predictive and actual classes, respectively. The ASF evaluates how well each proposed approach performs in terms of the number of relevant features that are selected when tackling the FS issue. It is defined by the following equation:

A S F = \frac{1}{R} \sum_{r = 1}^{R} \frac{f}{F}

(17)

where R is the number of runs, f is the number of selected features, and F is the total number of features in the dataset. The AFV illustrates the average fitness value obtained over a set of runs. AFV is defined by the following equation:

A F V = \frac{1}{R} \sum_{r = 1}^{R} F i t

(18)

where R is the number of runs; Fit was previously defined in Equation (15). Finally, the ACT is calculated using the following equation, where Time is the total computational time of each run and R is the number of runs:

A C T = \frac{1}{R} \sum_{r = 1}^{R} T i m e

(19)

6. Datasets and Simulation Setup

Datasets

Table 1 provides a list of nine high-dimensional, small-instance medical datasets that were utilized to evaluate the performance of the proposed approaches [73,74]. Each of these datasets represents a multiclass, cancer-related human gene expression dataset. Based on what was mentioned in [74], dealing with this type of dataset is a significant challenge for many reasons. The small number of instances makes the training of the learning model insufficient due to the small number of observations. Moreover, the large number of features that expand the search space makes the search process very complicated.

b.: Simulation Setup

This section describes the simulation setup adopted in evaluating the three proposed approaches. In each dataset shown in Table 1, the associated instances were randomly split into two different subsets, namely, 80% training subsets and 20% different testing subsets. The proposed approaches were implemented using MATLAB and tested on a 2.2 GHz Intel Core i5 machine with 4 GB RAM.

Table 2 lists the values of the parameters used to compare the proposed approaches with each other and against other related approaches. It should be noted that the initial population that was generated at the start of running each of the compared approaches needed to be the same to enable a fair comparison across all compared approaches. However, this fairness could also be achieved even if each of the approaches started with a different random population. This could be guaranteed if each approach was left to run for a certain number of iterations until it converged to its fittest solutions. In that way, the starting point would not be significantly in favor of one approach over another, provided that each approach reached its stability.

The initial values of the PSO and GWO parameters listed in Table 2 were selected based on the usual and optimal settings assigned in some of the previous studies, as referred to in the table. The v parameter in Equation (15) was assigned a value of 0.99 to emphasize the effect of the classification accuracy relative to the effect of the number of selected features, as mentioned in [41,72].

The authors would like to note the reason for selecting the value of P in Table 2 as 10. The population size of 10 was selected based on a series of sensitivity analysis experiments that were conducted for each dataset. This analysis included a series of independent experiments that were conducted by fine-tuning the population size values to accurately determine this parameter and test how it affected the performance of the proposed approaches. For each dataset, each approach was tested with a set of different runs for each population size to determine the average classification accuracy and average fitness value. The sensitivity analysis study showed that a population size of 10 solutions in each iteration was able to achieve the best evaluation results when compared with other approaches. This was apparent in most of the datasets. Above this value, the proposed approach reached saturation with no further improvement in its evaluation results.

7. Simulation Results

A set of comparisons are introduced in this section to evaluate the performance of the proposed approaches. Firstly, the evaluation results of the proposed approaches based on the S-shaped TF are compared with the original BGWO and BPSO algorithms. Secondly, the results of the proposed approaches based on the V-shaped TF are compared with the original BGWO and BPSO algorithms. Finally, the comparisons of the best proposed approaches for S-shaped and V-shaped TFs are presented as follows. A comparative study was performed with eight state-of-the-art FS optimizers and another comparative study was performed with a recent hybrid FS optimizer that combined the BGWO and BPSO algorithms.

The classification accuracy of the proposed approaches was measured using the KNN classifier [72]. KNN is one of the most widely used classifiers and is employed to evaluate and compare the performance of different feature selection algorithms [12]. This is due to its simplicity, ease of implementation, and low computational complexity when compared with other classifiers. The authors would like to note that other common classifiers can also be employed but are deferred to a future extension of this work.

Evaluating the proposed approaches based on the S-shaped TF

The proposed approaches based on the S-shaped TF were compared against the original BGWO and BPSO algorithms. These comparisons are illustrated in terms of ACA, ASF, and AFV, as shown in Figure 6, Figure 7 and Figure 8, respectively. Figure 9 shows a statistical analysis of the proposed approaches based on the S-shaped TF in terms of the same three evaluation metrics. This figure shows the mean and overall ranks of each approach across all datasets under consideration. The rank is generally a number from 1 to 6 since there are six approaches compared against each other, where 6 indicates the highest-ranked approach. Additionally, Figure 9 shows the number of datasets in which each approach obtained the best rank.

According to the ACA evaluation results shown in Figure 6 and Figure 9, IBGWO3-S exceeded other approaches in five datasets. IBGWO3-S achieved an average ACA value of 0.9 across all datasets. IBGWO4-S yielded the best performance in two datasets. It was expected that the IBGWO3-S and IBGWO4-S optimization approaches would result in the best solutions. The reason for this was that the IBGWO3-S approach took into consideration in each iteration the fittest of all populations that resulted from applying BGWO alone and from applying BGWO followed by BPSO. Also, IBGWO4-S added to this ranking process the initial population that resulted from the previous iteration. However, using the S-shaped TF was found to negatively affect the results of IBGWO4-S relative to the results of IBGWO3-S. In general, all proposed approaches were found to outperform the original BGWO-S and BPSO-S since the hybrid approaches combined the exploration and exploitation properties of both approaches.

Figure 7 shows that IBGWO1-S, IBGWO2-S, and IBGWO3-S achieved the best ASF results by selecting the fewest number of features in most of the datasets. As shown in Figure 8, the mean and overall ranks show that IBGWO1-S, IBGWO2-S, and IBGWO3-S achieved the best ranks and outperformed other approaches in terms of selecting the most suitable features. IBGWO4-S was negatively affected by applying the S-shaped TF relative to the other three proposed approaches; however, it achieved better mean and overall ranks relative to the original BGWO-S and BPSO-S approaches.

Figure 8 and Figure 9 summarize the findings in terms of AFV, which is a combination of the classification error and the number of selected features. Note that it is required to minimize the AFV value to improve the performance of the proposed optimizer. According to the results, IBGWO3-S achieved an average AFV value of 0.1 across all datasets. This optimizer provided the fittest outcomes when compared with all other optimizers, including the original BGWO-S and BPSO-S optimizers. This means that this approach was able to provide the most accurate balance between the minimum possible classification error and the minimum possible number of selected features in all datasets. Using the S-shaped TF negatively affected the usage of the IBGWO4-S optimizer approach in terms of the AFV metric.

Figure 10 shows the evaluation results of the proposed FS optimizers based on the S-shaped TF in terms of the ACT. As expected, the figure shows that the most complicated approach, i.e., IBGWO4-S, had the most ACT, followed by IBGWO3-S and IBGWO2-S. The two latter approaches were less complex in the ranking process of the different solutions that were investigated in each iteration when compared to IBGWO4-S. This difference was more apparent for the 11_Tumers and 14_Tumers datasets than for other datasets. The reason for this was that, as shown previously in Table 1, these two datasets contained the largest number of instances, features, and classes relative to other datasets. This normally increased the ACT of IBGWO4-S relative to the other two approaches. The original BPSO-S and BGWO-S had lower ACT than the three proposed approaches since the three proposed approaches were a hybrid of the former two approaches.

Figure 11 and Figure 12 show the convergence curves for all FS optimizers based on the S-shaped TF over all iterations. IBGWO3-S had the best performance because other approaches converged early and then became stuck at the local optima. IBGWO3-S exhibited good behavior and obtained the lowest fitness values for most of the datasets. Since the fitness value is the ratio between the number of selected features to the overall features in addition to the classification error, as shown in Equation (15), the solution with the least ratio and the smallest error is the fittest. The curves of the fitness values clarify that IBGWO3-S was the best optimizer based on the S-shaped TF. This means that IBGWO3-S had a high tuning ability for local and global searches, which allowed it to achieve the global optimal solution. Moreover, it had a high ability to escape the area of local optima and search for the most promising area until it found the best global solution.

b.: Evaluating the proposed approaches based on the V-shaped TF

The proposed approaches based on the V-shaped TF were compared against the original BGWO and BPSO algorithms. These comparisons are illustrated in terms of ACA, ASF, and AFV, as shown in Figure 13, Figure 14 and Figure 15, respectively. Figure 16 shows a statistical analysis of the proposed approaches based on the V-shaped TF in terms of the same three evaluation metrics.

As shown in Figure 13 and Figure 16, the ACA results show the superiority of IBGWO4-V compared with other approaches. It achieved the best ACA for six datasets, with five of them having an ACA equal to 1. IBGWO4-V achieved an average ACA value of 0.95 across all datasets. Moreover, IBGWO1-V obtained the best results for four datasets. In general, it was expected that IBGWO4-V would outperform other approaches since it considered in each iteration the ranking of the population that resulted from the previous iteration, the population that resulted from applying BGWO, and the population that resulted from both BGWO and BPSO. It was expected that IBGWO-2-V and IBGWO3-V would outperform IBGWO1-V; however, both were negatively affected by applying the V-shaped TF when compared with either BGWO-V or IBGWO4-V. This was not the case when using the S-shaped TF.

Figure 14 shows that IBGWO4-V achieved the best, i.e., lowest, ASF results when compared with other approaches. Also, BGWO-V and IBGWO1-V, respectively, obtained the best results after IBGWO4-V. The same reasons apply as described previously when discussing Figure 13.

Figure 16 confirms the results obtained in Figure 14 and Figure 15. IBGWO4-V achieved an average AFV value of 0.05 across all datasets. This confirms that the IBGWO4-V had the best ability over all other proposed approaches to scan the search space and achieve the global optimal solution without getting stuck in local optima.

Figure 17 shows the evaluation results of the proposed FS optimizers based on the V-shaped TF in terms of the ACT. This figure confirms the results obtained previously in Figure 10, which shows that using either S-shaped or V-shaped TFs did not affect the computational time of the proposed approaches.

Figure 18 and Figure 19 show the convergence curves for all FS optimizers based on the V-shaped TF over all iterations. The two figures show that, unlike many optimizers that get stuck at local optima in early iterations, IBGWO4-V could converge to the optimal solution. These curves show that IBGWO4-V explored the search space before exploiting the optimal solution.

In summary, the results shown in Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18 and Figure 19 confirm the importance of selecting a suitable TF to increase the performance of the optimizers. Moreover, the good performance of the optimizer means a strong searching ability to avoid being stuck in a local optimum by having a good balance between global and local searching. Overall, IBGWO3-S and IBGWO4-V were the fittest optimizers based on S-shaped and V-shaped TFs, respectively.

c.: Comparison with state-of-the-art metaheuristics

As confirmed in the previous two sections, IBGWO3-S and IBGWO4-V outperformed other approaches based on S-shaped and V-shaped TFs, respectively. This subsection presents the comparisons between these two approaches with eight state-of-the-art metaheuristic-based FS optimizers. The metaheuristics used in the comparisons were as follows: Binary Whale Optimization Algorithm (BWOA) [31], Binary Dragonfly Algorithm (BDA) [33], Binary Ant Lion Optimization (BALO) [36,38], Binary Gravitational Search Algorithm (BGSA) [26], Binary Teaching–Learning-Based Optimization (BTLBO) [27], Binary Harris Hawk Optimization (BHHO) [32], Binary Bat Algorithm (BBA) [77], and Binary Salp Swarm Algorithm (BSSA) [78].

The work proposed in this paper was compared with the selected set of state-of-the-art feature selection optimizers since these optimizers were applied before to address our same research objective, which was to optimize the feature selection process. The next section will show the results of the comparison of our proposed approaches with one of the recent and advanced optimizers that combines binary PSO and binary GWO with a tournament selection strategy. The authors also plan to compare the proposed approaches with a set of advanced metaheuristic methods like the work proposed in [51,52,58,59] and other works that combine GWO with PSO. Such work can be compared with our approaches in a future extension of this work.

Figure 20 shows a statistical analysis of the comparison between IBGWO3-S and IBGWO4-V on one side and the eight selected state-of-the-art metaheuristic-based FS optimizers on the other side. This comparison was made in terms of the same three evaluation metrics, i.e., ACA, ASF, and AFV. Figure 20 shows the mean and overall ranks of each approach across all datasets under consideration. The rank was generally a number from 1 to 10 since there were ten approaches compared against each other, where 10 means the highest-ranked approach. Additionally, Figure 20 shows the number of datasets in which each approach obtained the best rank.

Figure 20 shows that IBGWO4-V achieved the best ACA results in nine datasets. The mean and overall rank values showed the superiority of IBGWO4-V over other optimizers, followed by IBGWO3-S. In terms of ASF, IBGWO4-V also outperformed other optimizers in all datasets. Additionally, the mean and overall rank ASF results showed that IBGWO4-V attained the best rank compared to other optimizers, followed by IBGWO3-S. In terms of AFV, the figure shows the superiority of IBGWO4-V in seven datasets. The mean and overall rank ASF results revealed that IBGWO4-V and IBGWO3-S also outperformed other optimizers. This confirms the potential of these two proposed approaches to jump out of the areas that contain the local optima and orient themselves towards the most promising area of relevant features.

Figure 21 and Figure 22 show the convergence curves for all ten optimizers over all iterations. The two figures illustrate that IBGWO4-V outperformed other optimizers in order of fitness values. Convergence curves reveal the ability of this optimizer to avoid local optima and immature convergence. If an optimizer cannot make a stable trade-off between exploration and exploitation, then it will converge to local optima, which leads to immature convergence.

d.: Comparison with a recent metaheuristic method

The best two proposed approaches, i.e., IBGWO3-S and IBGWO4-V, were compared with one of the recent metaheuristic FS optimizers that enhanced the performance of BGWO [57]. This approach combined BPSO, BGWO, and tournament selection (TS) [58] optimizers. This approach is named Hybrid Tournament Grey Wolf Particle Swarm (HTGWPS). A set of S-shaped and V-shaped binary transfer functions were used to develop different variants of this approach. The best two performing approaches were abbreviated in [57] as HTGWPS-S1 and HTGWPS-V1. These two variants were compared with our two best-performing approaches, i.e., IBGWO3-S and IBGWO4-V.

Figure 23, Figure 24, Figure 25 and Figure 26 show the results of the comparisons of IBGWO3-S and IBGWO4-V with the two recent FS optimizers, i.e., HTGWPS-S1 and HTGWPS-V1, from the literature. The four figures show comparisons in terms of the four previously described evaluation metrics, i.e., ACA, ASF, AFV, and ACT, respectively.

Figure 23 shows that the proposed IBGWO3-S and IBGWO4-V optimizers achieved the highest ACA in three and seven datasets, respectively. Figure 24 shows that IBGWO4-V achieved the lowest ASF in six datasets. Figure 25 shows that IBGWO4-V achieved the lowest AFV in six datasets. Finally, Figure 26 shows that IBGWO3-S and IBGWO4-V had the lowest ACT in three and four datasets, respectively. In general, IBGWO4 obtained better results than HTGWPS because the latter was based on selecting only one of three optimizers, i.e., BGWO, BPSO, or TS. On the contrary, IBGWO applied both BGWO and BPSO in each iteration with a ranking process among all solutions obtained from both optimizers. This enhanced the results obtained from IBGWO when compared with HTGWPS.

e.: Computational complexity of the proposed approaches

In this section, the computational complexity of the proposed approaches is analyzed and presented. Table 3 shows the computational complexity of the proposed approaches. The complexity was analyzed by computing the estimated total number of iterations for each approach. Moreover, a Big-O analysis in terms of the maximum number of iterations (T) and the population size (P) is presented. In general, all approaches passed through a set of iterations, i.e., T. In each iteration, two independent loops were run, where each loop counter was equal to P. In each iteration of IBGWO2, a P2P comparison was conducted between the two corresponding solutions produced by BGWO and BPSO. This comparison was not performed in IBGWO3 or IBGWO4. A ranking process, i.e., sorting, was applied at the end of each iteration in both IBGWO3 and IBGWO4 but not in IBGWO2. The complexity of this ranking process depended on the complexity of the applied sorting algorithm, which was a function in P. If the applied sorting algorithm had a linear or quadratic complexity, then it exhibited an O(P) or O(P²) complexity, respectively. This led to total complexity for both IBGWO3 and IBGWO4 O(T*P) or O(T*P²) depending on the complexity of the ranking process.

8. Discussion

In this section, the strong and weak points of each of the three proposed approaches are discussed. In general, hybrid metaheuristics are classified into two levels in terms of design issues: low-level and high-level [23]. Low-level hybridization means that one optimizer is embedded inside another optimizer, while high-level hybridization means that one optimizer is performed after applying another optimizer. The three proposed approaches can be classified as high-level optimizers. In general, such hybridization adds more computational complexity, which is considered one of the limitations of such hybrid approaches.

In each iteration of IBGWO2, a P2P comparison was performed between each individual obtained from applying the BGWO optimizer and its corresponding individual obtained after applying BGWO and then BPSO. This P2P comparison was not applied collectively to all individuals in both populations. Therefore, the best solution among both populations could have been missed.

The previously mentioned limitation of IBGWO2 was overcome in IBGWO3 by applying the ranking process across all individuals in both populations obtained by BGWO and BGWO followed by BPSO. A further enhancement of IBGWO3 was achieved with IBGWO4. This enhancement considered the initial population obtained from every previous iteration in addition to the two populations obtained by BGWO and BGWO followed by BPSO. This clarified the reason why IBGWO4 generally provided better evaluation results than IBGWO3 and IBGWO2. However, the complexity of the ranking process increased when IBGWO4 or IBGWO3 was applied compared with IBGWO2.

Since a high-dimensional dataset was used in the evaluation of the three proposed approaches, the authors would like to discuss the suitability of each of the three approaches as the dimensionality of the dataset increased. As previously mentioned, IBGWO4 provided the best general performance, followed by IBGWO3 and IBGWO2, at the expense of computational complexity. Therefore, IBGWO4 can be used to provide the best classification accuracy with the least number of selected features for datasets with very high dimensionality of features. IBGWO4 will be suitable for this type of dataset provided that the computational complexity is not a concern or the target system is not time-constrained. However, if the computational complexity of analyzing this dataset type has priority over the classification accuracy and fitness values, IBGWO2 or IBGWO3 will be better to apply relative to IBGWO4. Selecting IBGWO2 or IBGWO3 will depend on the required classification accuracy and fitness values. IBGWO2 will provide the lowest classification accuracy and fitness values at the lowest computational complexity compared with the other two approaches.

An extensive numerical analysis needs to be implemented to provide insights into the relation between the performance of each of the three proposed approaches and the degree of dimensionality of the different datasets. This analysis will be implemented in a future extension of this work.

9. Conclusions and Future Directions

In this paper, three hybrid approaches were proposed to provide a solution to the FS optimization problem based on S2 and V2 binary TFs that were used to convert continuous solutions to binaries. In this work, the three hybrid approaches were proposed to enhance FS optimization by combining BGWO with BPSO. The objective of this combination was to resolve the GWO stuck-in-local-optima problem that can occur by leveraging the PSO-wide search space exploration ability on the solutions obtained by GWO.

A set of high-dimensional, small-instance medical datasets was utilized to evaluate the three proposed approaches. The datasets included multiclass, cancer-related human gene expression datasets. The evaluation experiments were performed to compare the three proposed approaches against the original BPSO and BPGWO, eight state-of-the-art FS optimizers, and, finally, one of the recent binary FS optimizers that combines BPSO with BGWO algorithms.

The evaluation results showed the superiority of IBGWO4-V compared with other investigated approaches in terms of a set of evaluation metrics, which were ACA, ASF, AFV, and ACT. The computational complexity of all proposed approaches was investigated and it was shown that the most complex approaches were IBGWO3 and IBGWO4. In summary, the results showed that IBGWO4-V is the most effective approach and provides the best balance between global and local searching techniques for FS optimization tasks. The results also confirmed the importance of selecting a suitable TF to increase the performance of the optimizers.

One of the salient limitations of the best-proposed approach, i.e., IBGWO4, is its computational complexity relative to the native BGWO, the native BGWO, and the other two proposed approaches, i.e., IBGWO2 and IBGWO3. When selecting one of the three proposed approaches, there should be a trade-off between the classification accuracy and fitness on one side and the computational complexity on the other side.

Future work will include a set of extensions as follows. Firstly, 10-fold cross-validation data splitting will be applied instead of the 80% training/20% testing data splitting approach. Secondly, the proposed algorithms will be further tested using unimodal functions, multi-modal functions, and fixed dimensional benchmark functions [79]. This evaluation will provide a deeper analysis of the performance of the three proposed optimizers and show how these optimizers can overcome the issue of entrapment in local optima.

Thirdly, a numerical analysis that shows the relation between the performance of the three proposed approaches and the degree of dimensionality of the different datasets needs to be implemented in a future extension of this work. Fourthly, the authors plan to incorporate more advanced metaheuristic methods like the work proposed in [51,52,58,59] so that such work can be compared with the proposed approaches in a future extension of this work.

A fifth extension of this work will include utilizing the PSO’s searching mechanism to improve other SI methods, i.e., instead of GWO, which suffers from the problem of being stuck in local optimum solutions. Finally, the three proposed approaches can also be evaluated by using other common classifiers such as Support Vector Machines [80] and Artificial Neural Networks [81] to ensure that the proposed approaches maintain superiority and stability.

Author Contributions

Conceptualization, J.Y.K., A.K. and A.Y.; methodology, J.Y.K.; software, J.Y.K.; validation, J.Y.K. and A.Y.; formal analysis, J.Y.K. and A.Y.; investigation, J.Y.K.; resources, J.Y.K.; data curation, J.Y.K.; writing—original draft preparation, J.Y.K.; writing—review and editing, J.Y.K. and A.Y.; visualization, J.Y.K. and A.Y.; supervision, A.K. and A.Y.; project administration, A.K. and A.Y.; funding acquisition, J.Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Palestine Technical University-Kadoorie.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Acknowledgments

The authors would like to thank Palestine Technical University—Kadoorie for its financial support to conduct this research.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1 shows a list of all abbreviations used throughout this paper. This list is provided to help the readers clearly understand and follow all the details mentioned in this paper. The listed abbreviations are sorted alphabetically in ascending order.

Table A1. List of abbreviations used in the paper.

Abbreviation	Terminology
ACA	Average Classification Accuracy
ACO	Ant Colony Optimization
ACT	Average Computational Time
AFV	Average Fitness Value
ALO	Ant Lion Optimizer
ASF	Average Number of Selected Features
BALO	Binary Ant Lion Optimization
BBA	Binary Bat Algorithm
BDA	Binary Dragonfly Algorithm
BFO	Bacterial Foraging Optimization
BGSA	Binary Gravitational Search Algorithm
BGWO	Binary Grey Wolf Optimization
BHHO	Binary Harris Hawk Optimization
BPSO	Binary Particle Swarm Optimization
BSSA	Binary Salp Swarm Algorithm
BTLBO	Binary Teaching–Learning-Based Optimization
BWOA	Binary Whale Optimization Algorithm
DA	Dragonfly Algorithm
FS	Feature Selection
GA	Genetic Algorithm
GSO	Genetical Swarm Optimization
GWO	Grey Wolf Optimization
HBGWOHHO	Hybrid Binary Grey Wolf with Harris Hawks Optimizer
HBMO	Honey Bee Mating Optimization
HHO	Harris Hawks Optimizer
HTGWPS	Hybrid Tournament Grey Wolf Particle Swarm
HTGWPS-S	Hybrid Tournament Grey Wolf Particle Swarm S-Shaped TF
HTGWPS-V	Hybrid Tournament Grey Wolf Particle Swarm V-Shaped TF
IBGWO	Improved Binary Grey Wolf Optimization
IBGWO-S	Improved Binary Grey Wolf Optimization S-Shaped Binary Transfer Function
IBGWO-V	Improved Binary Grey Wolf Optimization V-Shaped Binary Transfer Function
KNN	K-Nearest Neighbor
MBA–SA	Mine Blast Algorithm with Simulated Annealing
P2P	Peer-to-Peer
PSO	Particle Swarm Optimization
SA	Simulated Annealing
SHO	Spotted Hyena Optimization
SHOSA	Spotted Hyena Optimization with Simulated Annealing
SI	Swarm Intelligence
TF	Transfer Function
TS	Tournament Selection
WOA	Whale Optimization Algorithm

References

Maimon, O.; Rokach, L. Introduction to knowledge discovery and data mining. In Data Mining and Knowledge Discovery Handbook; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–15. [Google Scholar]
Liu, H.; Motoda, H. Feature Selection for Knowledge Discovery and Data Mining; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 454. [Google Scholar]
Hall, M.A.; Smith, L.A. Feature Subset Selection: A Correlation-Based Filter Approach. 1997. Available online: https://researchcommons.waikato.ac.nz/entities/publication/e2403495-6f14-4ee5-b61f-37d3b3f03e56 (accessed on 26 December 2024).
Gheyas, I.A.; Smith, L.S. Feature subset selection in large dimensionality domains. Pattern Recognit. 2010, 43, 5–13. [Google Scholar] [CrossRef]
Liu, H.; Yu, L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 2005, 17, 491–502. [Google Scholar]
Langley, P. Selection of relevant features in machine learning. In Proceedings of the AAAI Fall Symposium on Relevance, New Orleans, LA, USA, 4–6 November 1994; Volume 184, pp. 245–271. [Google Scholar]
Feizi-Derakhshi, M.-R.; Ghaemi, M. Classifying different feature selection algorithms based on the search strategies. In Proceedings of the International Conference on Machine Learning, Electrical and Mechanical Engineering, Dubai, United Arab Emirates, 8–9 January 2014. [Google Scholar]
Lai, C.; Reinders, M.J.; Wessels, L. Random subspace method for multivariate feature selection. Pattern Recognit. Lett. 2006, 27, 1067–1076. [Google Scholar] [CrossRef]
Faris, H.; AlaâĂŹM, A.-Z.; Heidari, A.A.; Aljarah, I.; Mafarja, M.; Hassonah, M.A.; Fujita, H. An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inf. Fusion 2019, 48, 67–83. [Google Scholar] [CrossRef]
Brezočnik, L.; Fister, I.; Podgorelec, V. Swarm intelligence algorithms for feature selection: A review. Appl. Sci. 2018, 8, 1521. [Google Scholar] [CrossRef]
Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Recent advances and emerging challenges of feature selection in the context of big data. Knowl.-Based Syst. 2015, 86, 33–45. [Google Scholar] [CrossRef]
Wang, A.; An, N.; Chen, G.; Li, L.; Alterovitz, G. Accelerating wrapper-based feature selection with k-nearest-neighbor. Knowl.-Based Syst. 2015, 83, 81–91. [Google Scholar] [CrossRef]
Hsu, H.-H.; Hsieh, C.-W.; Lu, M.-D. Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl. 2011, 38, 8144–8150. [Google Scholar] [CrossRef]
Beheshti, Z.; Shamsuddin, S.M.H. A review of population-based meta-heuristic algorithms. Int. J. Adv. Soft Comput. Appl. 2013, 5, 1–35. [Google Scholar]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Faris, H.; Aljarah, I.; Al-Betar, M.A.; Mirjalili, S. Grey wolf optimizer: A review of recent variants and applications. Neural Comput. Appl. 2018, 30, 413–435. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95 International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; IEEE: Piscataway, NJ, USA, 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Kennedy, J. The particle swarm: Social adaptation of knowledge. In Proceedings of the 1997 IEEE International Conference on Evolutionary Computation (ICEC’97), Indianapolis, IN, USA, 13–16 April 1997; IEEE: Piscataway, NJ, USA, 1997; pp. 303–308. [Google Scholar]
Yousef, J.; Youssef, A.; Keshk, A. A hybrid swarm intelligence-based feature selection algorithm for high dimensional datasets. Int. J. Comput. Inf. 2021, 8, 67–86. [Google Scholar] [CrossRef]
Mafarja, M.; Jarrar, R.; Ahmad, S.; Abusnaina, A.A. Feature selection using binary particle swarm optimization with time varying inertia weight strategies. In Proceedings of the 2nd International Conference on Future Networks and Distributed Systems, Amman, Jordan, 26–27 June 2018; ACM: New York, NY, USA, 2018; p. 18. [Google Scholar]
Too, J.; Abdullah, A.; Saad, N.M.; Ali, N.M.; Tee, W. A new competitive binary grey wolf optimizer to solve the feature selection problem in emg signals classification. Computers 2018, 7, 58. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. S-shaped versus v-shaped transfer functions for binary particle swarm optimization. Swarm Evol. Comput. 2013, 9, 1–14. [Google Scholar] [CrossRef]
Talbi, E.-G. Metaheuristics: From Design to Implementation; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 74. [Google Scholar]
Akyol, S.; Alatas, B. Plant intelligence-based metaheuristic optimization algorithms. Artif. Intell. Rev. 2017, 47, 417–462. [Google Scholar] [CrossRef]
Bingol, H.; Alatas, B. Chaos based optics inspired optimization algorithms as global solution search approach. Chaos Solitons Fractals 2020, 141, 110434. [Google Scholar] [CrossRef]
Rashedi, E.; Nezamabadi-Pour, H.; Saryazdi, S. BGSA: Binary gravitational search algorithm. Nat. Comput. 2010, 9, 727–745. [Google Scholar] [CrossRef]
Rao, R.V.; Savsani, V.J.; Vakharia, D. Teaching–learning-based optimization: A novel method for constrained mechanical design optimization problems. Comput. Aided Des. 2011, 43, 303–315. [Google Scholar] [CrossRef]
Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT Press: Cambridge, UK, 1992. [Google Scholar]
Eberhart, R.; Kennedy, J. A new optimizer using particle swarm theory. In Proceedings of the MHS’95, Proceedings of the Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, 4–6 October 1995; IEEE: Piscataway, NJ, USA, 1995; pp. 39–43. [Google Scholar]
Haddad, O.B.; Afshar, A.; Mariño, M.A. Honeybees mating optimization (HBMO) algorithm: A new heuristic approach for water resources optimization. Water Resour. Manag. 2006, 20, 661–680. [Google Scholar] [CrossRef]
Mafarja, M.; Mirjalili, S. Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. 2018, 62, 441–453. [Google Scholar] [CrossRef]
Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
Mirjalili, S. Dragonfly algorithm: A new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput. Appl. 2016, 27, 1053–1073. [Google Scholar] [CrossRef]
Dorigo, M.; Stützle, T. Ant colony optimization: Overview and recent advances. In Handbook of Metaheuristics; Springer: Berlin/Heidelberg, Germany, 2010; pp. 227–263. [Google Scholar]
Kashef, S.; Nezamabadi-pour, H. An advanced aco algorithm for feature subset selection. Neurocomputing 2015, 147, 271–279. [Google Scholar] [CrossRef]
Mirjalili, S. The ant lion optimizer. Adv. Eng. Softw. 2015, 83, 80–98. [Google Scholar] [CrossRef]
Mafarja, M.; Eleyan, D.; Abdullah, S.; Mirjalili, S. S-shaped vs. V-shaped transfer functions for ant lion optimization algorithm in feature selection problem. In Proceedings of the International Conference on Future Networks and Distributed Systems, Cambridge, UK, 19–20 July 2017; ACM: New York, NY, USA, 2017; p. 14. [Google Scholar]
Mafarja, M.M.; Mirjalili, S. Hybrid binary ant lion optimizer with rough set and approximate entropy reducts for feature selection. Soft Comput. 2019, 23, 6249–6265. [Google Scholar] [CrossRef]
Heidari, A.A.; Abbaspour, R.A. Enhanced chaotic grey wolf optimizer for real-world optimization problems: A comparative study. In Handbook of Research on Emergent Applications of Optimization Algorithms; IGI Global: Hershey, PA, USA, 2018; pp. 693–727. [Google Scholar]
Heidari, A.A.; Pahlavani, P. An efficient modified grey wolf optimizer with lévy flight for optimization tasks. Appl. Soft Comput. 2017, 60, 115–134. [Google Scholar] [CrossRef]
Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016, 172, 371–381. [Google Scholar] [CrossRef]
Chen, Y.-P.; Li, Y.; Wang, G.; Zheng, Y.-F.; Xu, Q.; Fan, J.-H.; Cui, X.-T. A novel bacterial foraging optimization algorithm for feature selection. Expert Syst. Appl. 2017, 83, 1–17. [Google Scholar] [CrossRef]
Jia, H.; Li, J.; Song, W.; Peng, X.; Lang, C.; Li, Y. Spotted hyena optimization algorithm with simulated annealing for feature selection. IEEE Access 2019, 7, 71943–71962. [Google Scholar] [CrossRef]
Alwan, K.M.; AbuEl-Atta, A.H.; Zayed, H.H. Feature selection models based on hybrid firefly algorithm with mutation operator for network intrusion detection. Int. J. Intell. Eng. Syst. 2021, 14, 192–202. [Google Scholar] [CrossRef]
Mohammadzadeh, H.; Gharehchopogh, F.S. A novel hybrid whale optimization algorithm with flower pollination algorithm for feature selection: Case study email spam detection. Comput. Intell. 2020, 37, 176–209. [Google Scholar] [CrossRef]
Jona, J.; Nagaveni, N. A hybrid swarm optimization approach for feature set reduction in digital mammograms. WSEAS Trans. Inf. Sci. Appl. 2012, 9, 340–349. [Google Scholar]
Mafarja, M.; Qasem, A.; Heidari, A.A.; Aljarah, I.; Faris, H.; Mirjalili, S. Efficient hybrid nature-inspired binary optimizers for feature selection. Cogn. Comput. 2020, 12, 150–175. [Google Scholar] [CrossRef]
Alweshah, M.; Alkhalaileh, S.; Albashish, D.; Mafarja, M.; Bsoul, Q.; Dorgham, O. A hybrid mine blast algorithm for feature selection problems. Soft Comput. 2021, 25, 517–534. [Google Scholar] [CrossRef]
Al-Wajih, R.; Abdulkadir, S.J.; Aziz, N.; Al-Tashi, Q.; Talpur, N. Hybrid binary grey wolf with harris hawks optimizer for feature selection. IEEE Access 2021, 9, 31662–31677. [Google Scholar] [CrossRef]
Dhiman, G.; Kaur, A. A hybrid algorithm based on particle swarm and spotted hyena optimizer for global optimization. In Soft Computing for Problem Solving; Springer: Berlin/Heidelberg, Germany, 2019; pp. 599–615. [Google Scholar]
Hu, P.; Pan, J.S.; Chu, S.C. Improved binary grey wolf optimizer and its application for feature selection. Knowl.-Based Syst. 2020, 195, 105746. [Google Scholar] [CrossRef]
Pan, J.S.; Chu, S.C.; Sun, C. Multi-surrogate assisted binary particle swarm optimization algorithm and its application for feature selection. Appl. Soft Comput. 2022, 121, 108736. [Google Scholar]
Pan, J.S.; Liu, N.; Chu, S.C. A competitive mechanism based multi-objective differential evolution algorithm and its application in feature selection. Knowl.-Based Syst. 2022, 245, 108582. [Google Scholar] [CrossRef]
Pan, J.S.; Tian, A.Q.; Chu, S.C.; Li, J.B. Improved binary pigeon-inspired optimization and its application for feature selection. Appl. Intell. 2021, 51, 8661–8679. [Google Scholar] [CrossRef]
Duan, H.; Qiao, P. Pigeon-inspired optimization: A new swarm intelligence optimizer for air robot path planning. Int. J. Intell. Comput. Cybern. 2014, 7, 24–37. [Google Scholar] [CrossRef]
El-Kenawy, E.S.; Eid, M. Hybrid gray wolf and particle swarm optimization for feature selection. Int. J. Innov. Comput. Inf. Control 2020, 16, 831–844. [Google Scholar]
Khaseeb, J.Y.; Keshk, A.; Youssef, A. A hybrid swarm intelligence feature selection approach based on time-varying transition parameter. Int. J. Electr. Comput. Eng. 2023, 13, 781–795. [Google Scholar] [CrossRef]
Goldberg, D.E.; Korb, B.; Deb, K. Messy genetic algorithms: Motivation analysis, and first results. Complex Syst. 1989, 3, 493–530. [Google Scholar]
Widians, A.J.; Wardoyo, R.; Hartati, S. A Hybrid Ant Colony and Grey Wolf Optimization Algorithm for Exploitation-Exploration Balance. Emerg. Sci. J. 2024, 8, 1642–1654. [Google Scholar] [CrossRef]
Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef]
Poli, R.; Kennedy, J.; Blackwell, T. Particle swarm optimization. Swarm Intell. 2007, 1, 33–57. [Google Scholar] [CrossRef]
Unler, A.; Murat, A.; Chinnam, R.B. mr2pso: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf. Sci. 2011, 181, 4625–4641. [Google Scholar] [CrossRef]
Zhan, Z.-H.; Zhang, J.; Li, Y.; Chung, H.S.-H. Adaptive particle swarm optimization. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2009, 39, 1362–1381. [Google Scholar] [CrossRef] [PubMed]
Alam, M.N. Particle swarm optimization: Algorithm and its codes in matlab. ResearchGate 2016, 8, 1–10. [Google Scholar]
Karaboga, D.; Basturk, B. A powerful and efficient algorithm for numerical function optimization: Artificial bee colony (ABC) algorithm. J. Glob. Optim. 2007, 39, 459–471. [Google Scholar] [CrossRef]
Shi, Y.; Eberhart, R.C. Empirical study of particle swarm optimization. In Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), Washington, DC, USA, 6–9 July 1999; IEEE: Piscataway, NJ, USA, 1999; Volume 3, pp. 1945–1950. [Google Scholar]
Yang, C.; Gao, W.; Liu, N.; Song, C. Low-discrepancy sequence initialized particle swarm optimization algorithm with high-order nonlinear time-varying inertia weight. Appl. Soft Comput. 2015, 29, 386–394. [Google Scholar] [CrossRef]
Fan, S.-K.S.; Chiu, Y.-Y. A decreasing inertia weight particle swarm optimizer. Eng. Optim. 2007, 39, 203–228. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R.C. A discrete binary version of the particle swarm algorithm. In Proceedings of the 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, Orlando, FL, USA, 12–15 October 1997; IEEE: Piscataway, NJ, USA, 1997; Volume 5, pp. 4104–4108. [Google Scholar]
Altman, N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 1992, 46, 175–185. [Google Scholar] [CrossRef]
Xu, H.; Liu, X.; Su, J. An improved grey wolf optimizer algorithm integrated with cuckoo search. In Proceedings of the 2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Bucharest, Romania, 21–23 September 2017; IEEE: Piscataway, NJ, USA, 2017; Volume 1, pp. 490–493. [Google Scholar]
Mafarja, M.M.; Mirjalili, S. Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 2017, 260, 302–312. [Google Scholar] [CrossRef]
Statnikov, A.; Aliferis, C.F.; Tsamardinos, I.; Hardin, D.; Levy, S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 2005, 21, 631–643. [Google Scholar] [CrossRef]
Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Feature selection for high-dimensional data. Prog. Artif. Intell. 2016, 5, 65–75. [Google Scholar] [CrossRef]
Engelbrecht, A.P. Computational Intelligence: An Introduction; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary ant lion approaches for feature selection. Neurocomputing 2016, 213, 54–65. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Yang, X.-S. Binary bat algorithm. Neural Comput. Appl. 2014, 25, 663–681. [Google Scholar] [CrossRef]
Mirjalili, S.; Gandomi, A.H.; Mirjalili, S.Z.; Saremi, S.; Faris, H.; Mirjalili, S.M. Salp swarm algorithm: A bio-inspired optimizer for engineering design problems. Adv. Eng. Softw. 2017, 114, 163–191. [Google Scholar] [CrossRef]
Jamil, M.; Yang, X.-S. A literature survey of benchmark functions for global optimisation problems. Int. J. Math. Model. Numer. Optim. 2013, 4, 150–194. [Google Scholar] [CrossRef]
Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef]
Jain, A.K.; Mao, J.; Mohiuddin, K.M. Artificial neural networks: A tutorial. Computer 1996, 29, 31–44. [Google Scholar] [CrossRef]

Figure 1. Feature selection process [5].

Figure 2. IBGWO1 [19].

Figure 3. IBGWO2 approach.

Figure 4. IBGWO3 approach.

Figure 5. IBGWO4 approach.

Figure 6. Evaluation of the proposed approaches based on S-shaped TF in terms of ACA.

Figure 7. Evaluation of the proposed approaches based on S-shaped TF in terms of ASF.

Figure 8. Evaluation of the proposed approaches based on S-shaped TF in terms of AFV.

Figure 9. Statistical analysis of the proposed approaches based on S-shaped TF.

Figure 10. Evaluation of the proposed approaches based on S-shaped TF in terms of ACT.

Figure 11. Convergence curves for all approaches based on S-shaped TFs (Part 1 of the datasets).

Figure 12. Convergence curves for all approaches based on S-shaped TFs (Part 2 of the datasets).

Figure 13. Evaluation of the proposed approaches based on V-shaped TF in terms of ACA.

Figure 14. Evaluation of the proposed approaches based on V-shaped TF in terms of ASF.

Figure 15. Evaluation of the proposed approaches based on V-shaped TF in terms of AFV.

Figure 16. Statistical analysis of the proposed approaches based on V-shaped TF.

Figure 17. Evaluation of the proposed approaches based on V-shaped TF in terms of ACT.

Figure 18. Convergence curves for all approaches based on V-shaped TFs (Part 1 of the datasets).

Figure 19. Convergence curves for all approaches based on V-shaped TFs (Part 2 of the datasets).

Figure 20. Statistical analysis of IBGWO3-S and IBGWO4-V against eight state-of-the-art binary FS optimizers.

Figure 21. Convergence curves for IBGWO4-V and eight well-known FS optimizers (Part 1 of the datasets).

Figure 22. Convergence curves for IBGWO4-V and eight well-known FS optimizers (Part 2 of the datasets).

Figure 23. Comparison of the two best-performing proposed approaches with a recent metaheuristic (HTGWPS) in terms of ACA.

Figure 24. Comparison of the two best-performing proposed approaches with a recent metaheuristic (HTGWPS) in terms of ASF.

Figure 25. Comparison of the two best-performing proposed approaches with a recent metaheuristic (HTGWPS) in terms of AFV.

Figure 26. Comparison of the two best-performing proposed approaches with a recent metaheuristic (HTGWPS) in terms of ACT.

Table 1. High-dimensional, small-instance, cancer-related human gene expression datasets.

Dataset Name	Description	Instances	Features	Classes	Samples
11_Tumors	11 various human tumor types	174	12,533	11	72
14_Tumors	14 various human tumor types and 12 normal tissue types	308	15,009	26	49
Brain_Tumor1	5 human brain tumor types	90	5920	5	66
Brain_Tumor2	4 malignant glioma types	50	10,367	4	207
Leukemia1	acute myelogenous leukemia (AML), acute lymphoblastic leukemia (ALL) B-cell, and ALL T-cell	72	5327	3	74
Leukemia2	AML, ALL, and mixed-lineage leukemia (MLL)	72	11,225	3	156
DLBCL	diffuse large B-cell lymphomas (DLBCL) and follicular lymphomas	77	5469	2	71
Prostate_Tumor	prostate tumor and normal tissues	102	10,509	2	103
SRBCT	small, round blue cell tumors (SRBCTs) of childhood	83	2308	4	28

Table 2. Values of parameters used in simulations.

Parameter	Value
Maximum number of iterations (T)	100
Number of runs (R)	30
The weight v in Equation (15)	0.99 [41,71]
Population size (P)	10
Vector A (GWO algorithm input parameter)	Decreased linearly in the range [−2, 2] [15,16]
Vector a (GWO algorithm input parameter)	Reduced linearly from 2 to 0 [15,16]
Vector C (GWO algorithm input parameter)	Random value between 0 and 2 [15,16]
c1, c2 (PSO algorithm input parameters)	2 for each of the two parameters [75]
K for KNN	5 [76]

Table 3. Computational complexity of the proposed approaches.

	IBGWO1	IBGWO2	IBGWO3	IBGWO4
Total number of iterations	T	T	T	T
GWO loop estimated iterations	T × P	T × P	T × P	T × P
PSO loop estimated iterations	T × P	T × P	T × P	T × P
P2P comparison loop estimated iterations	N/A	T × P	N/A	N/A
Ranking process	N/A	N/A	T × P or T × P²	T × P or T × P²
Total estimated loops	2 × T × P	3 × T × P	3 × T × P or 3 × T × P²	3 × T × P or 3 × T × P²
Big-O notation	O(T × P)	O(T × P)	O(T × P) or O(T × P²)	O(T × P) or O(T × P²)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khaseeb, J.Y.; Keshk, A.; Youssef, A. Improved Binary Grey Wolf Optimization Approaches for Feature Selection Optimization. Appl. Sci. 2025, 15, 489. https://doi.org/10.3390/app15020489

AMA Style

Khaseeb JY, Keshk A, Youssef A. Improved Binary Grey Wolf Optimization Approaches for Feature Selection Optimization. Applied Sciences. 2025; 15(2):489. https://doi.org/10.3390/app15020489

Chicago/Turabian Style

Khaseeb, Jomana Yousef, Arabi Keshk, and Anas Youssef. 2025. "Improved Binary Grey Wolf Optimization Approaches for Feature Selection Optimization" Applied Sciences 15, no. 2: 489. https://doi.org/10.3390/app15020489

APA Style

Khaseeb, J. Y., Keshk, A., & Youssef, A. (2025). Improved Binary Grey Wolf Optimization Approaches for Feature Selection Optimization. Applied Sciences, 15(2), 489. https://doi.org/10.3390/app15020489

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Binary Grey Wolf Optimization Approaches for Feature Selection Optimization

Abstract

1. Introduction

2. Related Works

3. Background

4. Proposed Approaches

5. Evaluation Metrics

6. Datasets and Simulation Setup

7. Simulation Results

8. Discussion

9. Conclusions and Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI