Improved Equilibrium Optimization Algorithm Using Elite Opposition-Based Learning and New Local Search Strategy for Feature Selection in Medical Datasets

: The rapid growth in biomedical datasets has generated high dimensionality features that negatively impact machine learning classiﬁers. In machine learning, feature selection (FS) is an essential process for selecting the most signiﬁcant features and reducing redundant and irrelevant features. In this study, an equilibrium optimization algorithm (EOA) is used to minimize the selected features from high-dimensional medical datasets. EOA is a novel metaheuristic physics-based algorithm and newly proposed to deal with unimodal, multi-modal, and engineering problems. EOA is considered as one of the most powerful, fast, and best performing population-based optimization algorithms. However, EOA suffers from local optima and population diversity when dealing with high dimensionality features, such as in biomedical datasets. In order to overcome these limitations and adapt EOA to solve feature selection problems, a novel metaheuristic optimizer, the so-called improved equilibrium optimization algorithm (IEOA), is proposed. Two main improvements are included in the IEOA: The ﬁrst improvement is applying elite opposite-based learning (EOBL) to improve population diversity. The second improvement is integrating three novel local search strategies to prevent it from becoming stuck in local optima. The local search strategies applied to enhance local search capabilities depend on three approaches: mutation search, mutation–neighborhood search, and a backup strategy. The IEOA has enhanced the population diversity, classiﬁcation accuracy, and selected features, and increased the convergence speed rate. To evaluate the performance of IEOA, we conducted experiments on 21 biomedical benchmark datasets gathered from the UCI repository. Four standard metrics were used to test and evaluate IEOA’s performance: the number of selected features, classiﬁcation accuracy, ﬁtness value, and p -value statistical test. Moreover, the proposed IEOA was compared with the original EOA and other well-known optimization algorithms. Based on the experimental results, IEOA conﬁrmed its better performance in comparison to the original EOA and the other optimization algorithms, for the majority of the used datasets.


Introduction
The classification process of biomedical datasets is a critical procedure for disease detection and diagnoses. Classifying such datasets could allow the control and prevention of certain non-treatable diseases, such as tumor, cancer, etc. Most biomedical datasets use several features to diagnose the disease symptoms and histories. Some features could be redundant, ineffective, or have a similar classification impact as other features. These dimensionality features need a large amount of computational storage and time, and could negatively affect the classifier's accuracy. Moreover, these stated challenges can affect the classification accuracy, pattern recognition, and data analysis since they mainly depend on the machine learning (ML) classifier. To accurately classify these features, feature selection (FS) techniques need to be considered [1].
FS techniques have a significant role in ML, as a pre-processing step to reduce irrelevant and redundant features [2]. This works by excluding the features that may negatively affect the classifier's performance, such as irrelevant, redundant, and less informative features. FS refers to selecting the minimum features out of the exclusive features that are employed or related to the problem [3]. Therefore, FS techniques improve the performance of the classifier in the majority of the cases [4]. FS techniques are categorized into two primary types: filter based techniques (FBT) and wrapper based techniques (WBT).
The FBT employs linear functions to select and classify the feature subsets before applying the classifier. The FBT, such as information gain (IG), Pearson correlation, and chisquare, has no explicit connection to the classifier and the fitness function before utilizing the classifier [5]. Alternatively, WBT techniques have an explicit connection to the applied classifier [6]. Several experiments have employed WBT in optimization algorithms for FS, such as in [7,8]. Computationally, WBT is more expensive than FBT but it can achieve better scores [9]. Usually, in an optimization algorithm, WBT is applied in FS problems because of its ability to cooperate with the classifier. Moreover, WBT is used to minimize the search space, which improves the classification performance and minimizes the selected features, such as in [10,11].
In WBT, the fitness function is used to guide the search process in a FS problem, taking into consideration the classification accuracy. Several studies have conducted optimization algorithm-based wrapper methods, such as in [7,[12][13][14][15], in order to increase the classification accuracy in the FS problem. However, applying optimization algorithms in FS determines the optimum feature sets or the sets near to the optimum within a logical time. Alternatively, the standard complete-search that searches all possible combinations of features is considered a time-consuming search and a type of NP-hard problem [16]. However, depending on the problem types to be solved, some optimization algorithms suffer from local optima and population diversity problems, specifically when they are applied to datasets with high dimensionality, such as biomedical datasets.
EOA is a novel meta-heuristic algorithm proposed by [17]. EOA is inspired by the control mass balance function for estimating both dynamic and static states. EOA has been classified as one of the most powerful, fast, and best performing population-based optimization algorithms in many studies, such as [18][19][20]. In EOA, each solution with its position represents a search agent. The search agents randomly update their positions regarding the best-so-far solutions, specified as equilibrium candidates, to reach the optimal result (equilibrium state). According to the authors of EOA, the algorithm outperforms several well-known meta-heuristic algorithms, such as the grey wolf optimizer (GWO), gravitational search algorithm (GSA), slap swarm algorithm (SSA), generic algorithm (GA), and particle swarm optimization (PSO). In addition, EOA was benchmarked with 58 unimodal, multi-modal, and mathematic functions and engineering problems. The study reported very promising results. However, like other optimization algorithms, EOA has limitations, and these include solution diversity and local optima problems. Furthermore, based on the stated no-free-lunch theorem (NFL) [21] there is no perfect optimization algorithm for all kind of problems. This means that an algorithm can outperform other algorithms in some types of problems, but not all types of problems. The above-mentioned limitations of EOA and the NFL motivated the research presented in this paper.
This research proposes a novel algorithm, named the improved equilibrium optimization algorithm (IEOA). IEOA aims at improving the classification performance of the FS problem in biomedical datasets. IEOA employs elite opposite-based learning (EOBL) to improve the diversity of solutions during the exploration phase in EOA. Employing EOBL adds various advantages to IEOA, and these include improving the search agents' distribu-tion in the search space, enhancing the computational performance, and accelerating the convergence speed. Furthermore, IEOA employs a local dynamic search mechanism during the exploitation phase to avoid becoming stuck in a local optimum. The dynamic search is conducted using three strategies, namely mutation search, mutation-neighborhood search, and a backup strategy. In the literature, different improvements were proposed to EOA in order to enhance the feature selection problem performance. However, as far as the authors are aware, this is the first time a hybrid EOA algorithm with EOBL method and new local search approaches for the feature selection problem has be utilized. IEOA will be used to improve the classification performance for the FS problem in biomedical datasets. The main contributions of this study are listed as follows: 1.
An improved version of the original EOA, named IEOA is proposed for FS problems in wrapper mode.

2.
Two main improvements were introduced to the original EOA to solve its limitations: • EOBL technique is applied at the initialization phase of EOA to improve its population diversity. • A novel local search mechanism is proposed and integrated with EOA to prevent trapping in local optima and to improve the EOA exploitation search.

3.
The performance of IEOA was evaluated using classification accuracy, selected features, fitness value, and p-value. In addition, IEOA results were compared with the results of other well-known and recent optimization algorithms, including particle swarm optimization (PSO), genetic algorithm (GA), whale optimization algorithm (WOA), grasshopper optimization algorithm (GOA), ant lion optimizer (ALO), slime mould algorithm (SMA), and butterfly optimization algorithm (BOA). In these experiments, 21 benchmark biomedical datasets from the UCI repository were used. The conducted experiments revealed the superior performance of IEOA in comparison to these baseline algorithms.
The rest of the paper is structured as follows: Section 2 reviews related works. Section 3 briefly describes the EOA, EOBL, and the local search strategies, and Section 4 shows the proposed IEOA. Section 5 details the used datasets and the conducted experiments, and Section 6 presents the experimental results and analysis. Finally, Section 7 concludes the paper.

Related Works
Recently, optimization algorithms have been used to solve high-dimensional feature selection problems in many fields. The optimization algorithms verified their efficiency for improving classification accuracy and reducing the selected features. Samples of these recent implementations are PSO [22], BOA [23], SSA [8], ALO [24], WOA [21,25], GOA [26], and GA [27]. Despite the unique construction of each optimization algorithm, there are some shared characteristics: initializing a random population (solutions) as the opening process, evaluating the solutions on each iteration based on the fitness function, updating the solution, and determining the best solution based on a termination term. The search behavior of optimization algorithms includes exploration and exploitation stages. During these stages, an optimization algorithm tries to search the promising regions of the search space. Additionally, the optimization algorithms' stochastic search scans all promising areas of the feature space. However, some of these optimization algorithms suffer from population diversity and local optima limitations when they are applied to high-dimensional features, such as in [28,29]. Thus, many methods are applied to the optimization algorithm to improve the local search problem and the population diversity and make it suitable for these dimensional features.
Meta-heuristics are mainly divided into three main classes: evolutionary algorithms, swarm intelligence, and physics-based algorithms. The equilibrium optimization algorithm (EOA) is a physics-based algorithm. Physics-based algorithms are based on the principles of physical laws and are often used to characterize the interactions of search agents. One of the most widely used algorithms in this class is simulated annealing [30], which uses thermodynamics laws applied to the heating and then controlled cooling of a material to increase the size of its crystals. The gravitational search algorithm [31] employs Newton's gravitational laws between masses and their interactions to update the position toward the optimum point. Henry gas solubility optimization (HGSO) mimics the behavior controlled by Henry's law to solve challenging optimization problems. Henry's law is an essential gas law, relating the amount of a given gas that is dissolved to a given type and volume of liquid at a fixed temperature. The equilibrium optimization algorithm (EOA) was recently developed by Faramarzi et al. [17], and has been used in many benchmark problems, such as in [18,20,32,33].
Based on the NFL theorem, there are still many alternative methods that can be used for the FS problem. Therefore, we were prompted to work to improve the EOA algorithm that will be used in FS. The EOA algorithm was applied to FS problems in several studies. For example, [32] applied S-shaped and V-shaped transfer functions for selecting the optimal feature set in classification problems. In [34], the authors implemented a general learning strategy in EOA, helping the search-agents to avoid the local optima and to improve the capability for discovering a promising area. Moreover, [35] integrated simulated annealing with the equilibrium algorithm to improve its local search. In our proposed algorithm, IEOA, EOBL is used at the initialization phase to improve the initial solutions created in the standard EOA. To the best of our knowledge, this is the first time that an improvement to the EOA with EOBL and the new local search strategies has been integrated and applied in FS problems.
EOBL is an enhanced version of the OBL technique, proposed by Tizhoosh in 2005 [36]. The primary purpose of EOBL is to produce more promising solutions by considering the opposite solutions of the best solutions [21]. The opposite solutions are possible to locate in the best position, in which the global optima are located [37]. The EOBL method has been integrated into many optimization algorithms to improve the population diversity of the algorithm. For example, [38] applied EOBL to improve the flower pollination algorithm (FBA). Reference [37] utilized EOBL to enhance the diversity of the population of Harris hawk optimization (HHO). EOBL was used in [39] to increase the grey wolf optimizer's population diversity quality. While, in [21,40], the EOBL was applied at the initialization phase to improve the quality of initial solutions of WOA. Moreover, in [41] EOBL improved the cuckoo search algorithm's population diversity (CSA). Additionally, EOBL was used to improve the convergence speed in particle swarm optimization [42].
In the literature, optimization algorithms have been hybridized with multiple types of local search approaches to improve their exploitation capabilities. As an example of this implementation, study [23] improved the BOA by applying a local search method based on a mutation (LSAM) operator to avoid the local optimum problem. Study [43] hybridized the PSO algorithm with a variable neighborhood search (VNS) technique to improve the local search. Study [6] also hybridized a POS algorithm with a novel local search strategy for FS problems. Study [44] enhanced the harmony search (HS) algorithm with stochastic local search (SLS) for the FS problem. Study [45] combined WOA with a local search strategy to escape from the local optimum problem. Study [46] combined simulated annealing with a binary coral reefs optimization (BCRO) algorithm as a local search strategy. Study [47] hybridized the algorithm of ACO ant with iterated local search (ILS) as a stochastic local search method. Study [8] included a novel local search algorithm with SSA to improve the exploitation capability of the algorithm. Study [48] hybridized WOA with simulated annealing as local search, to enhance the best solution discovered after each iteration. Study [49] improved WOA with a new local search algorithm (LSA) to solve the WOA local optima. Study [37] improved HHO using EOBL and a novel search mechanism to avoid the local optima problem. Thus, the mentioned studies, and more, motivated our research into hybridizing EOA with dynamic local search. This dynamic search is proposed based on a group of strategies: the mutation method, mutation neighborhood method, and backup method. The dynamic search is for improving the capabilities of both the exploration and exploitation searches of EOA.

Equilibrium Optimization Algorithm (EOA)
This section explains the mathematical model and algorithm of the equilibrium optimizer algorithm (EOA). EOA is a novel physics-inspired population-based optimization algorithm introduced in 2020 by Faramarzi et al. [17]. EOA is based on the conservation of mass principle in physics. That is, a mass balance equation is used to describe the centralization of a non-reactive component. In this sense, the mass balance equation models the conservation of mass entering, leaving, and generated in a control volume. The first-order ordinary differential equation expresses the generic mass-balance equation, and it is formulated in (1). In this equation, the change in mass over time equals the amount of mass entering the system plus the amount being generated inside the system minus the amount leaving the system.
where V is the control volume and V dc/dt is the mass change's speed in the control volume. Q is the volume velocity into and out of the control volume. C is the concentration within the control volume and C eq stands for the concentration at the equilibrium state, where there are no production waves inside the control volume. G is the mass production rate inside the control volume. When V dc/dt equals zero, a stable equilibrium state is achieved.
In EOA, there are three main aspects for updating a particles' positions, and each particle updates its concentration via three individual aspects. The main aspect is the equilibrium concentration, known as the best solutions; so far randomly chosen from the equilibrium pool. The second aspect is related to the difference between particle concentration and the equilibrium state, which works as a direct search technique. This aspect helps particles explore the search space. The third aspect is related to the generation rate, which mainly performs the exploitation search. These aspects and how they affect the search pattern are described in the following.

Initialization Phase
In this phase, the initial particles with their centralization are constructed. Moreover, the initial population, the objective function and solution space are defined as in (2) where C initial i denotes the initial concentration vector of the ith particle. In addition, C max and C min symbolize the maximum and minimum values of dimensions. rand i is a random vector over the interval [0, 1], and N is the number of particles in the population. The solutions (particles) are evaluated for their fitness function and stored to determine the equilibrium candidates.
The equilibrium position is the ultimate convergence state of the algorithm, which is searched for to be the global optimum. The optimization process starts with no information about the equilibrium position, and hence, the first equilibrium candidates are generated to support a search pattern for the particles. These candidates are the four best-so-far particles, selected during the entire optimization process, combined with the fifth particle, which is the concentration of the arithmetic mean of the above four particles, as in Equation (3). These candidates increase the EOA exploration capability, while the average value improves the exploitation. These five particles are chosen as equilibrium candidates, and they are used to create a vector, named the equilibrium pool as in Equation (4) C eq.pool = C eq(1) , C eq(2) , C eq(3) , C eq(4) , C eq(ave) (4) Furthermore, the velocity term function (F) is used to balance exploration and exploitation. Here, λ has been used to model the turnover rate, which may vary over time in a real control pool. To this end, λ is used to generate a random vector in an interval of [0, 1], as formulated in (5 where t is the time represented by a function of iteration (Iter), and thus, it decreases with the number of iterations, as formulated in (6) where Iter and T define the number of current and maximum iterations, respectively. a 2 is a constant value that controls the exploitation capability. To secure the convergence carve by decreasing the search speed along with improving the global and local search ability of the algorithm, the algorithm considers formulation (7) where a 1 is a constant value proposed to control the exploration ability. Furthermore, sin(r − 0.5) value impacts the direction of the global and local search. For all the experiments executed in this paper a 1 , a 2 are equal to 2 and 1, respectively. r is a random vector in the interval of [0, 1]. A modified version of Equation (5) with the substitution of Equation (7) into Equation (5) is as follows in (8) The generation rate, G is also one of the most important values for providing the typical solution by improving the local search. Therefore, the generation rate equations are presented as follows in (9) where where r 1 and r 2 are random numbers between [0, 1], and GCP is the generation rate control parameter that monitors the probability of generation rate. This probability determines the number of particles that employ the generation rate to update their concentration. In addition, this is determined by another parameter called generation probability, GP. The mechanism of this contribution is determined by Equations (10) and (11). Equation (11) is considered at the level of each particle. A good balance between exploration and exploitation is achieved when GP = 0.5. Finally, the updating rule of EOA is as in (12) where C is an equilibrium concentration, F is calculated as in Equation (8), and V is the control unit, as in Equation (1). However, both the second and third terms correspond to the differences in concentration. Figure 1 shows a theoretical drawing of the cooperation of all equilibrium candidates and how they update the concentration one by one in the algorithm.

Exploration Phase
There are four parameters and techniques in EOA that can direct the exploration process, and summarized as follows. First, a 1 value; this value controls the exploration by estimating the extent of the new position to the equilibrium candidate. The higher the value of a 1 , the higher the exploration power. However, if the a 1 value is larger than three, then the exploration performance will reduce considerably. Second, sin(r − 0.5) value; controls the exploration direction. Since r is a random vector in an interval of [0, 1] with equal distribution, there is an equal possibility of signs being either negative or positive.
Third, GP value controls the probability of a candidate's concentration. When GP = 1 there will be no generation rate involved in the optimization process. This condition confirms a high-level of exploration capability, and it often leads to inaccurate results. When GP = 0, then the generation rate is considered in the optimization process, and hence, it increases the probability of stagnation in the local optimum. Based on the experimental analysis, GP = 0.5 provides a good balance between global and local search. Fourth, the equilibrium pool; this vector contains five particles. The selection of these candidates is based on experimental testing. In the initial iterations, the particles are far away from each other in the search space. Updating the concentration according to these candidates can improve the ability of the algorithm to search the global space. The average candidate also supports finding unknown search spaces at initial iterations when particles are far apart from each other.

Exploitation Phase
There are four parameters and techniques in EOA that can affect the exploitation process, and summarized as follows. First, a 2 value; this value works like a 2 , but controls the local search by estimating the magnitude of exploitation via mining around the best solution. Second, sin(r − 0.5) value; also responsible for controlling the direction of local search. Third, is the memory saving parameter, this factor keeps the best-so-far particles and uses them to replace the poorer ones. This feature clearly improves the exploitation of the EOA algorithm. Fourth, is the equilibrium pool; as the iteration progresses, exploration gradually decreases, and exploitation gradually disappears. Therefore, in the last iteration, the candidate's positions are close to each other, and the concentration update process will help to perform a local search near the candidate positions, leading to exploitation.
To compute the fitness value, the classification error and number of selected features need to be involved to the fitness function, which is mathematically formulated as in (13) ↓ f itness = αγ(R) + β |F|/|N| (13) where γ(R) is the classifier error rate, |F| is the number of selected features, and |N| is the total number of features. In addition, α, β are two factors where α ∈ [0, 1] and β = (1 − α).

Elite Opposition Based-Learning (EOBL)
EOBL is an improved edition of the OBL technique, proposed by Tizhoosh in 2005 [36]. OBL is a machine intelligence approach designed to improve the performance of optimization algorithms. This technique considers discovering a more useful solution among the current individuals, usually initialized randomly by the optimization algorithm and its corresponding opposite solution. The evaluation function is applied in both solutions, and the best solution is selected for the next iteration. Mathematically, OBL can be formulated as follows: if x = (x 1 , x 2 , . . . , x D ) is a location of the current particles, where D is the problem dimension, and x ∈ [y k , z k ], k = 1, 2, . . . , D. Thus, the opposition location x = ( x 1 , x 2 , . . . , x D ) is formulated as in (14) x EOBL employs an elite individual to lead the population to the global optima solution. The elite individual is likely to have more helpful information than other individuals. Basically, EOBL uses the elite individual in the current population to generate corresponding opposites of the current particles located within the search dimension. Thus, the elite will guide the particles and finally reach a promising area, where the best solution could be found. Consequently, utilizing the EOBL method will improve the population diversity and enhance the exploration of the EOA algorithm. As stated, EOBL was previously applied in the literature to improve several optimization algorithms.
In this paper, the EOBL method was utilized to improve the exploration ability of EOA. The opposition position is framed as follows: for the individual X k = (x k1 , x k2 , . . . , x kD ) in the current population X i = (x i1 , x i2 , . . . , x iD ); therefore, the elite opposite position will be X k = ( x k1 , x k2 , . . . , x kD ) formulated as (15): where F ∈ [0, 1] and F is a generalization factor. dy j and dz j are dynamic boundaries, and can be formulated as in (16) However, the consequent opposite can exceed the search boundaries [y k , z k ]. To solve this problem, a random value is assigned to the transferred individual in [y k , z k ], as in (17) x k,j = rand y j + z j , i fx k,j < y j x k,j z j (17) However, EOBL improves population diversity by generating a different population from opposite solutions. Consequently, the exploration ability of the EOA is improved.

The Mutation Search Strategies (MSS)
The EOA employs various search mechanisms including both exploratory and exploitative ones to randomly change the solutions. The search agents represent the particles with their concentrations, and the optimal results represent the equilibrium state. The concentrations are randomly updated, considering the best-so-far solutions, called equilibrium candidates. This random updating, along with an accurate generation rate value, enhances EOA's exploratory behavior in the initial iterations and the exploitative search in the final iterations, avoiding the search being trapped in local optima. In addition, balancing exploration and exploitation provides an adaptive value for the control parameter, and thus will reduce the magnitude of the motions of the particles. EOA depends on G to move from exploration to exploitation and to select the current exploitation method. Additionally, G is used to avoid the particles becoming trapped in local optima. However, G might quickly change its convergence speed towards the optimal solution, which may cause the particles to fall to a local optimum problem. [21]. In this subsection, we explain the proposed three MSS that enhance both the global and local search in the EOA algorithm, and help avoid being stuck in local optima, to some extent.

Mutation
The mutation method is used in GA to improve the diversity of the chromosome population. The mutation factor is employed to avoid being trapped in local optima by creating a more innovative and evolutionary solution to the problem. There are many types of mutations that rely on the algorithm used and the designated problem. However, in this study we applied a bit chain mutation that functioned by twisting features at arbitrary positions. For example, assuming X = (x 1 , x 2 , . . . , x D ) is a location of the current particle's, then the bit chain mutation can be mathematically formulated as in (18) where MU is the particle (solution) after utilizing bit chain mutation, I = 1, 2, . . . , D is an array of randomly selected positions that twisted in solution X. Figure 1 shows an example of solution X, where the third and sixth positions are twisted. Due to various empirical observations and error tests, the mutation rate is randomly selected between 10% and 25% in the exploration phase, and between 1% and 9% in the exploitation phase. EOA relies on the generation rate G to switch from exploration to exploitation search. The ratio of G controls the selection of the global search when it is greater than 0.5, and the exploitation phase when it is less than 0.5. Based on Equations (10) and (11), the value of G is based on G 0 and F, as in Equation (9). Therefore, in the first fifty percent of iterations, the G value is varied between [0, 2], and in the second fifty percent it is fluctuated between [0, 1]. Consequently, EOA can perform exploration and exploitation in the first part of the iterations. However, in the second part, it can only perform exploitation.
In IEOA, we included the G value to select the number of features to be twisted. Generally, in the global search, more features of the current best solution need to be twisted to improve the power of the exploration. However, in the local search, the particles are supposed to be closer to the equilibrium state (optimal solution). Therefore, fewer features are twisted to improve the exploitation. Thus, the mutation rate is mathematically formulated as in (19) No o f Featuers * rand [1,9] 100 i f G < 1

Mutation Neighborhood Method (MNM)
MNM was applied by Das et al. in 2009 [50] in order to balance between global and local search in differential evolution. The idea of the neighborhood search is to use the mutation operator to search a small region around the current best solution instead of searching the whole population. In this proposed work, we applied MNM. MNM search is monitored by the current best solution found by the mutation method. In other words, whenever a mutation causes a change in the position of the current best solution (equilibrium state), MNM will be applied. However, after the current best position is mutated, the fitness value will be calculated again in every iteration. If the fitness value of the new-found location is better than the current location, the current best solution is replaced with the new mutated solution, and thus the MNM search is performed.
Essentially, the MNM considers two contiguous techniques of the switched feature. First, in the forward switched technique, the right feature is mutated, and then fitness values for the two solutions (the best solution with the current switched solution) are evaluated. Second, in the backward switched technique, the same technique is applied but the left feature is mutated. Consequently, two solutions are created, and the best value is ranked as the best solution. Furthermore, the MNM circle is used, as the last feature is connected to the first feature to have two contiguous neighbors on both sides. Figure 2 explains the technique of the MNM circle.

Backup Method (BM)
Mutation is a powerful strategy that can effectively improve the exploration and exploitation process. However, it might change the direction of the optimization algorithm and lead to a local optimum. Generally, local optima are one of the common challenges for optimization algorithms. Therefore, BM is included in the proposed IEOA. BM is a straightforward and functional method. In BM, if the new mutated solution has a better fitness value than the current solution, it will not be immediately considered as the current best solution. It will be tentatively saved as a possible solution for the next iteration. If the EOA results at the next iteration achieved a better solution, the current best solution is also modified. Then, the possible solution (BM solution) is compared with the current best solution, and at this round the higher solution value is considered to be the best current best solution. However, MSS accepts the new location resulting from mutation or MSM if it maintains the best fitness value for two consecutive iterations.

Improved Equilibrium Optimization Algorithm (IEOA)
This section introduces IEOA, which is an improved version of EOA. The IEOA utilizes the powers of EOA and tunes it for the FS problem. Particularly, two main improvements for EOA were introduced. The first improvement involves employing the EOBL method at the initialization phase. This improvement enhances the diversity of the population. The second improvement involves employing enhanced MSS. This improvement strengthens the search abilities of the algorithm in both local and global search. In IEOA, the feature subset in the FS problem is considered a binary value consisting of "1" and "0". The value of "1" indicates the corresponding feature is selected, while "0" indicates that the corresponding feature is not selected, as in Equation (13). The framework of the proposed IEOA using EOBL and MMS strategies is illustrated in Figure 3. The steps of the proposed IEOA algorithm are illustrated as follows: 1.
In the first step: the particle population C is initialized using the random generation function with the size N, as defined in Equation (2) and the equilibrium candidate's fitness is assigned with a large number. In this step, each generated particle (searchagent) is regarded as a possible solution, which includes a random set of features from the complete set of features.

2.
In the second step: compute the fitness value of each solution and find the elite position from the initial population. After that, the EOBL method creates the opposite elite solutions, as defined in Equation (15), then selects the best N solution.

3.
In the third step: the EOA algorithm is executed to update the location of each particle in the population and to find the best current location based on the best fitness value, as defined in Equation (13). IEOA works based on KNN classification accuracy and the feature selection is based on the wrapper mode.

4.
In the fourth step: MSS strategies are employed to improve the current location. Here, a potential best solution is considered; if the fitness value of a new location is better than the current one, then the MNM is executed for further improvement. 5.
The next iteration of EOA is executed and the current best solution compared with the potential solution in the fifth step. Here, the BM strategy is used if the current best solution is better than the potential location. Otherwise, the current best location is changed to be equal to the potential solution 6.
In the sixth step: The proposed solution proceeds with the iterations until the stopping criteria is met. The pseudocode of the proposed IEOA is illustrated in Algorithm 1. Input: Initialize the particle's population randomly C i (i = 1, 2, . . . , N), T: the maximum number of iterations. Output: The equilibrium state and its fitness value. Apply EOBL method to find the best N opposite solutions, then select the fittest N solutions, according to Equations (14)-(16) Assign free parameters a 1 = 2; a 2 = 1; GP = 0.5; While (Maximum iteration not reached (Iter < T)) do Calculate the fitness of the particle locations.
f rom Equation (3) C eq.pool = C eq(1) , C eq(2) , C eq(3) , C eq(4) , C eq(ave) f rom Equation (4) I f ( f itness C potential < f itness C Best sloution (Iter+1) then = C Best solution = C potential Else C Best solution = C Best solution (iter+1) % BM Apply mutation strategy to current best location C Best solution using Equations (17) and (18) I f current location (C mutation < C Best solution ) then Apply MNM search on C mutation Set C potential = C mutation Return the best location (C Best solution ) Iteration = Iter + 1 End While

Platform
The performance of IEOA was evaluated and compared with the original EOA and some popular and new optimization algorithms, including the GOA, GA, PSO, ALO, WOA, BOA, and SMA algorithms. All the experiments were executed using MATLAB R2020b 9.9 (Natick, MA, USA), and operated on a PC running with an Intel Core i7-8550U, 1.80 GHz, 16 GB of RAM, and Windows 10 version 20H2 operating system.
The displayed Equations (20)- (23) are the computation methods of the average value classification accuracy, the average fitness value, and the average of the selected feature, respectively.
Avg_acc is the average classification accuracy scored by running the algorithm independently for 30 iterations, acc i symbolizes the classification accuracy scored from each iteration. acc i is computed as in (21) where N symbolizes the total number of test cases. CL c is the class label of the expected class data, AL c is the existing class in the labeled data. In addition, match(CL c , AL c ) is a discrimination function. When CL c and AL c are equal, match(CL c , AL c ) = 1, if not match(CL c , AL c ) = 0.
Avg_fitness is the average fitness value acquired by running the algorithm for 30 iterations, and fitness i represents the best fitness value acquired from each run.
where Avg_feature is the average value of the selected feature acquired by running the algorithm for 30 iterations, and f i is the value of the selected number of features acquired from each run.

Benchmark Datasets
To validate the efficiency of the proposed IEOA algorithm, the experiments were conducted on 21 benchmark medical datasets from the UCI repository. The selected datasets were utilized to determine the capabilities of the IEOA algorithm. In addition, to confirming the solidity of IEOA, two feature dimensionalities were used, including average and high dimensionality. The selected datasets have been used in many feature selection problems, such as [37,51,52]. Table 1 presents the details of the selected datasets.

Algorithms and Experiment Parameter Setting
In this work, the parameters were set after many experimental observations and similarly to [32]. Additionally, it has been noted that adjusting the control parameter can improve the performance of the algorithm. Therefore, the random parameter settings are very important and should be chosen carefully. In this experiment, the K-nearest-neighbors (KNN) classifier (wrapper mode) with 10-fold cross-validation was used to evaluate the performance of the algorithms. The validation of the dataset was divided into ten equal parts (fold). Nine-folds were used in the training phase, and the final fold was used for the testing.
Furthermore, in order to ensure the fairness of the comparison, the maximum number of iterations of each algorithm was set to 50 iterations. Moreover, the experiments were repeated 30 times and considering the settings used in [12,37]. Therefore, the results were obtained from an average of 30 runs. The parameter settings for the proposed IEOA are presented in Table 2. In addition, the general parameter settings for the baseline algorithms are displayed in Table 3.
Moreover, the computational complexity of utilizing the MSS strategy can be computed as O(T*I*M), where I is the number of MSS iterations, and M is the MSS search strategies, together with mutation and MNM. Consequently, the computational complexity of IEOA is presented in (25)

Results and Analysis
This section demonstrates the effectiveness of the proposed IEOA by performing two main experiments. The first experiment included the comparison of the proposed IEOA with the standard EOA. The second experiment involved the comparison of IEOA with state-of-the-art algorithms, such as GOA, GA, PSO, ALO, WOA, BOA, and SMA. In all conducted experiments, each algorithm was utilized on all the datasets to verify the solidity of the algorithm within feature dimensionalities. Additionally, the reported results are based on computing the average of 30 runs for every experiment.

Comparison of EOA and IEOA
This section includes the proposed IEOA in comparison with the original EOA. The comparison is based on four metrics, which are the average classification accuracy, the average number of selected features, average fitness value, and p-value (Wilcoxon test) as a statistical test. Table 4 demonstrates the experimental results of the IEOA in comparison with the original EOA, the best results are underlined. For the statistical tests, if the p-value was lower than 0.05, then the improvement was considered to be significant; otherwise, it was not significant. The p-value was utilized to determine if the classification accuracy of IEOA improved significantly.
According to the results, IEOA outperformed EOA for the majority of the datasets in terms of classification accuracy, while it provided similar accuracy to EOA in two datasets. Consequently, it is obvious that the use of EOBL and MSS improve the performance of IEOA. In terms of the number of selected features, IEOA outperformed the standard EOA by decreasing the number of selected features in 15 datasets, while it was comparable with EOA in two datasets, and EOA was better in six datasets. In terms of fitness value, IEOA outperformed EOA in all datasets. Statistically, the p-value shows that the IEOA significantly outperformed EOA in 15 datasets. Therefore, IEOA significantly improved the classification accuracy, feature selection, and fitness value across the different dataset's dimensions.
In addition, it can be observed from the stated results in Table 4, that the use of EOBL, achieved using Equation (15), improved the choice of solutions, instead of using the random methods in the original EOA. The possible reason is that the EOBL chose the best obtainable solutions. Thus, compared with solutions produced by random methods, there are fewer opportunities to choose weak solutions. Furthermore, the use of the MSS method improves the algorithm's capabilities in balancing exploration and exploitation. The algorithm uses the current best location to update the positions of the other search agents. Therefore, the use of the proposed MSS enhanced the algorithm's exploration capability when looking for promising areas. Moreover, by using the mutation methods in Equations (18) and (19), the algorithm avoided dropping into a local solution. Furthermore, both the proposed mutation method and the MNM search increased the algorithm's exploitation capability, searching for the best solution in a specified local area. Consequently, the superiority of IEOA was demonstrated in three main aspects: the number of selected features, the classification accuracy, and the fitness value.

Comparison of IEOA Algorithm with Other Optimization Algorithms
The previous experiments proved the superiority of IEOA, especially in terms of classification accuracy and fitness value over the original EOA. This superiority is a result of improving the population diversity and achieving an appropriate balance between exploration and exploitation for preventing the local optima. Therefore, to validate the advantage of IEOA, an additional comparison was made between IEOA and highly citied and recent optimization algorithms like GOA, GA, PSO, ALO, WOA, BOA, and SMA. Here, we also used the four-evaluation indicators to evaluate the performance of IEOA compared with the other optimization algorithms. First, the classification accuracy was evaluated for the considered algorithms, as in Table 5. According to the results obtained, IEOA outperformed the other algorithms for the selected datasets in terms of classification accuracy, the significant results are underlined, whereas it gave a similar accuracy to WOA in one dataset. The average accuracy of IEOA was 9.52% higher than GOA, 8.8% than BOA, 8.1% than SMA, 5.64% than GA, 5.14% than ALO, 5.04% than WOA, and 4.1% than PSO. The classification accuracy results for IEOA and all algorithms are displayed in Table 5. The Wilcoxon test was applied to verify the significance of classification accuracy, as displayed in Table 6, the best results are underlined. Accordingly, the significant results were verified, with a p-value < 0.05, for all algorithms and datasets except GA, PSO, and ALO. There was no significance in only one dataset, which was Fertility. Therefore, these significant results proved the superiority of IEOA over all the other algorithms. The results signify the capability of IEOA to balance exploration and exploitation. Moreover, it has a better chance of avoiding the trap of local optima, which ultimately led to a significant improvement in the classification accuracy of IEOA.
The average number of the selected features is displayed in Table 7 for all algorithms for 30 runs, the best results are underlined. It can be observed that IEOA outperformed all the algorithms in terms of selected features. Moreover, IEOA ranked first by selecting fewer features in 21 datasets, the average of IEOA's selected features was 7.5, followed by WOA with 9.85, then ALO with 12.03, then PSO with 16.59, and then GA with 19.003. GOA, SMA, and BOA gave a lower performance for the selected features, respectively. These results validate EOBL and MSM's effectiveness for decreasing the number of selected features and increasing the classification accuracy. In addition, IEOA concentrates on promising regions in the search space to select the critical features and prevent irrelevant features. Table 8 illustrates a comparison between IEOA and all optimization algorithms in terms of the average fitness value. The results show the superiority of IEOA, as the IEOA outperformed all the other algorithms in all datasets. The superiority in fitness values shows the reliable capabilities of IEOA.  Furthermore, applying the MSS methods accelerates the searching of a promising region and the best solution. Moreover, as can be noticed from Tables 5 and 6, the datasets have a plurality of local optima, which implies a challenge to all optimization algorithms. Therefore, the ability of an algorithm to balance exploration and development can be distinguished. For example, the classification accuracy of the "Cortex_Nuclear" dataset displayed different results among the algorithms. The best accuracy "underlined" was accomplished by IEOA with 99%, followed by PSO with a 95% accuracy value, WOA with 93%, ALO with 92%, then with GA 90%, SMA and BOA gave a similar accuracy with 81%, and lastly, GOA with an 80% accuracy value. The proposed IEOA is an adaptable algorithm that searches for new promising areas, which is achieved by using the mutations method in Equation (18). This method prevents the algorithm from dropping into a local optima state.  Furthermore, the MNM strategy improved the local search of the IEOA by mining the promising area and exploring for a superior solution. Figures 4 and 5 display graphical representations of the convergence curves. The convergences curves also need to be considered to evaluate the convergence speed of IEOA and the other optimization algorithms. In cases where the optimization algorithm cannot balance the exploration and exploitation in all iterations, it likely to converge to the local optimum. It can be observed from the convergence-curve results that IEOA accomplished a superior speed to all other algorithms, which implies the superiority of IEOA in processing different dimension datasets. Moreover, the effectiveness of the proposed MSS search strategies was notable, switching from exploration to exploitation search in the midpoint of iterations (from iteration 25 to the maximum iteration 50), and increasing the convergence speed in all cases. A brief comparison of IEOA with the other algorithms by calculating the average classification accuracy, selected features, and fitness value for all experiments is shown in Table 9.

Limitations of the Proposed IEOA
Our proposed algorithm, IEOA, can solve high-dimensional and complex optimization problems. It has an edge over the original EOA, and this includes improving the classification accuracy and fitness value, and reducing the number of selected features. However, similarly to other optimization algorithms, IEOA has some limitations. The main limitation is the comparatively high-time consumption in comparison with the other algorithms. Nonetheless, the high-time consumption originated from the original EOA, and the proposed improvements had a marginal impact on the computational complexity of IEOA. An additional limitation is associated with the number of iterations in the proposed MSS. As such, we believe that the time complexity of IEOA can be reduced by replacing ten iterations of MSS with a less complicated solution.

Conclusions and Future Work
The equilibrium optimization algorithm (EOA) is a novel population-based optimization algorithm. EOA was inspired by the physics-based equation of mass balance. This study introduces an improved version of EOA, named IEOA, which adds two main im-provements to the original EOA: (1) applying the EOBL method, and (2) employing MMS search strategies, including the mutation method, mutation MNM search, and backup strategies. These improvements significantly enhance the exploration and exploitation searches of IEOA. In particular, the use of EOBL improves the population diversity, whereas MMS strategies prevent trapping in local optima. Furthermore, IEOA maintains a good balance when transferring between global and local search. We used 21 medical benchmark datasets from the UCI repository to evaluate the performance of IEOA. In particular, ten average-dimensional and eleven high-dimensional datasets were used. Furthermore, we compared IEOA with well-regarded and recent optimization algorithms, such as GOA, GA, PSO, ALO, WOA, BOA, and SMA. The comparison was conducted considering four evaluation metrics: classification accuracy, fitness value, number of selected features, and p-value. The experiment results confirmed the superiority of IEOA over all other algorithms by these metrics. Furthermore, the results showed the capabilities of IEOA to improve the computational accuracy and to speed up the convergence rate. Additionally, the results proved the ability of IEOA to minimize the number of features selected for the majority of the twenty-one datasets. These obtained results indicate that IEOA can be employed as a capable technique for real-world feature selection datasets having average and high-dimensional features. Additionally, IEOA has the ability to succeed in many other fields, such as engineering problems, data science, data mining, and many more implementations. For future work, there are several ways that the IEOA could be expanded to deal with different real-world datasets, for example, using IEOA along with the filter feature selection method. Additionally, the performance of IEOA could be developed by utilizing other classifiers such as support vector machine (SVM) or artificial neural networks (ANN). Additionally, improving the computational time can be considered in future work. The proposed IEOA performance could be tested on the CEC 2017 and CEC 2020 benchmark problems [58]. Finally, EOBL and the proposed MSS techniques could be applied to develop other optimization algorithms.