Mutational Slime Mould Algorithm for Gene Selection

A large volume of high-dimensional genetic data has been produced in modern medicine and biology fields. Data-driven decision-making is particularly crucial to clinical practice and relevant procedures. However, high-dimensional data in these fields increase the processing complexity and scale. Identifying representative genes and reducing the data’s dimensions is often challenging. The purpose of gene selection is to eliminate irrelevant or redundant features to reduce the computational cost and improve classification accuracy. The wrapper gene selection model is based on a feature set, which can reduce the number of features and improve classification accuracy. This paper proposes a wrapper gene selection method based on the slime mould algorithm (SMA) to solve this problem. SMA is a new algorithm with a lot of application space in the feature selection field. This paper improves the original SMA by combining the Cauchy mutation mechanism with the crossover mutation strategy based on differential evolution (DE). Then, the transfer function converts the continuous optimizer into a binary version to solve the gene selection problem. Firstly, the continuous version of the method, ISMA, is tested on 33 classical continuous optimization problems. Then, the effect of the discrete version, or BISMA, was thoroughly studied by comparing it with other gene selection methods on 14 gene expression datasets. Experimental results show that the continuous version of the algorithm achieves an optimal balance between local exploitation and global search capabilities, and the discrete version of the algorithm has the highest accuracy when selecting the least number of genes.


Introduction
Microarray technology [1,2] is a new analytical tool that simultaneously measures the expression levels of thousands of genes in a single experiment, greatly helping researchers understand disease at the genetic level. However, the gene expression data are all highdimensional, and the number of features is much larger than the number of samples [3,4]. A large number of unrelated and complex features will reduce the computational performance and waste computational resources, which is not conducive to the classification of gene expression [5][6][7]. The application of feature selection in genes, namely gene selection, is a screening technology to reduce unrelated genes and gene dimensions [8][9][10]. Through this technology, feature size can be effectively reduced, and classification performance can be improved [11][12][13].
Feature selection is an essential technology in data processing and machine learning [7,14]. The essence is to pick out the relatively optimal features from the raw data so that the data go from high to low dimensions [15,16]. The commonly used (classical) feature selection methods can be divided into filter, wrapper, embedded, and hybrid methods [17]. Filter methods typically select features independently and evaluate individual features without providing a practical evaluation across feature subsets, which may ignore the correlation between feature combinations [18][19][20][21]. Because it does not use any algorithm, the computation is less, leading to the failure to find the optimal gene subset. The wrapper method relies on the classification algorithm to select the feature subset, which can obtain the ideal effect, but the calculation cost is high [22][23][24]. Embedded methods usually use some machine learning algorithms and models for training and then select the best feature subset through the classifier algorithm [25]. When extracting features, it needs to train the model to automatically obtain the corresponding threshold value, which is realized by the algorithm with a built-in feature selection method. The hybrid method combines the advantages of the filter and wrapper methods to determine the optimal subset of a given cardinality by independent measurement and select the final subset in the optimal subset using a mining algorithm [26][27][28][29].
Optimization methods can be approximated or deterministic [30], and their model can be single objective and multi-objective, including multiple objective algorithms that can deal with multiple objectives at once [31,32]. In recent years, since the wrapper method based on the meta-heuristic algorithm or its variants can find an acceptable solution, that is, the approximate optimal subset of features, it has been widely used in feature selection [33,34]. In this study, we tried to use an improved slime mould algorithm (SMA), called ISMA, to develop an efficient wrapper gene selection method for finding the smallest feature subset. The optimization algorithm proposed in this paper is aimed at the shortcomings and characteristics of the original SMA, using the main operators of the SMA, but some of the operators use binary conversion to adapt to the genetic selection problem because the original version of the algorithm was created to solve the continuity problem. SMA is a new meta-heuristic algorithm recently proposed by Li et al. [35], which is used to deal with continuous global optimization and engineering design problems. It is an optimal algorithm used to simulate the dynamic vibration behavior of slime mould in dispersive foraging and food searching. This method consists of three search patterns with different morphologic variations, which are mathematically expressed using a unique model. The mathematical model of SMA mainly adopts the adaptive weight to simulate the propagation wave of the biological oscillator and generates positive feedback during the optimization process, which helps form an optimal exploration trajectory of the optimal solution with good searchability. In addition, the survey and results confirm that SMA achieves a balanced competitive advantage between global exploration and local exploitation. Notably, it shows a superior tendency towards local exploitation. With the help of adaptive weighting and efficient and reasonable structure, SMA can provide significantly enhanced performance compared to many recognized advanced algorithms, such as whale optimization algorithm (WOA), gray wolf optimization (GWO), grasshopper optimization algorithm (GOA), moth-flame optimization (MFO), ant lion optimizer (ALO), bat algorithm (BA), salp swarm algorithm (SSA), sine cosine algorithm (SCA), particle swarm optimization (PSO), and differential evolution (DE) [36]. Other examples include biogeography-based learning particle swarm optimization (BLPSO) [37], comprehensive learning particle swarm optimizer (CLPSO) [38], improved grey wolf optimization (IGWO) [39], and binary whale optimization algorithm (BWOA) [40], etc. Therefore, SMA has been applied in engineering design problems [35,41], solar photovoltaic cell parameter estimation [42,43], multi-spectral image segmentation [44], numerical optimization [45], prediction problems [46,47], support vector regression parameter adjustment [48] and other aspects. This algorithm is a sufficiently effective meta-heuristic optimization algorithm, but it may have the shortcoming of local optimal convergence and slow convergence speed when dealing with some complex problems. Therefore, there are some challenges in improving the optimization capability of SMA and expanding its application value.
In order to alleviate the shortcomings of traditional SMA and strengthen the trend of coordination between global exploration and local exploitation, an advanced SMA variant was proposed based on the reasonable integration of Cauchy mutation (CM) and crossover mutation (MC). After the initial search agent is generated, the solution is updated in three phases. First, execute the search process of SMA and update the search agent. The Cauchy mutation strategy is adopted in the second stage to adjust the SMA-based search agent. Finally, the optimal search agent is selected from the previous generation of search agents through a crossover mutation strategy. In addition, we convert the continuous version of ISMA to a discrete ISMA with a transfer function. Tests on gene expression data sets have shown that BISMA has significant advantages over some advanced gene selection methods and is very effective. It shows that ISMA can effectively solve high-dimensional complex gene problems, which makes improving SMA more valuable.
The main contributions in this paper can be summarized as follows: (1) An improved slime mould algorithm (ISMA) is proposed to solve continuous global optimization problems and high-dimensional gene selection problems. (2) The performance of the ISMA algorithm is verified by comparing it with several famous optimization algorithms. (3) Different transfer functions are used to transform the proposed ISMA into a discrete version of BISMA, and they are compared to choose the most suitable transfer function for the binary ISMA optimizer. (4) The optimal BISMA version was selected as a gene selection optimizer to select the optimal gene subset from the gene expression data set. (5) The performance of the selected method is verified by comparing it with several other advanced optimizers.
The rest of this article is as follows: The second part introduces the work of gene selection and meta-heuristic algorithms. In the third section, Cauchy mutation and a crossover mutation strategy based on the DE algorithm are introduced in detail, and ISMA is proposed. In the fourth section, a series of comparative experiments between ISMA and other similar algorithms are introduced. In the fifth part, we design the structure of wrapper gene selection for discrete ISMA. In the sixth part, we discuss the application of BISMA and other related algorithms in gene selection. In the seventh part, we discuss a summary of the proposed work as well as its shortcomings and implications. The eighth part gives a brief description of the work of this paper and points out the future direction of the work.

Related Works
The dimensions of microarray data are often extremely asymmetric and highly redundant, and most genes are considered to be irrelevant to the category under study. Traditional classification methods cannot effectively process such data. Many researchers have achieved good results using machine learning techniques to process gene expression data sets.

Machine Learning for Gene Selection
Singh et al. [49] proposed a hybrid improved chaotic emperor penguin (CEPO) algorithm based on the Fisher criterion, ReliefF, and extreme learning machine (ELM) for microarray data analysis. In this paper, the Fisher criterion and ReliefF method were first used as gene selection filters, and then relevant data were used to train the ELM to obtain a better model. Banu et al. [50] used the fuzzy clustering method to assign initial values to each gene and then predicted the likelihood of belonging to each cluster to carry out gene selection. The comparative experimental results show that the fuzzy clustering algorithm performs well in gene prediction and selection. Chen et al. [51] proposed a support vector machine for binary tumor diagnosis, extending the three kinds of support vector machines to improve the performance of gene selection. At the same time, lasso, elastic net, and other sparse regression methods were introduced for cancer classification and gene selection. Mahendran et al. [52] conducted an extensive review of recent work on machine learning-based selection and its performance analysis, classified various feature selection algorithms under supervised, unsupervised and semi-supervised learning, and discussed the problems in dealing with high and low sample data. Tan et al. [53] proposed an integrated machine learning approach to analyze multiple gene expression profiles of cervical cancer to find the genomes associated with it, with the expectation that it could help in diagnosis and prognosis. The gene expression data were identified effectively through the analysis of three steps.
Zhou et al. [54] proposed an improved discretized particle swarm optimization algorithm for feature selection. In their work, a modest pre-screening process is first applied to obtain fewer features; then, a better cutting combination is found through the encoding and decoding method based on PSO and the local search strategy guided by probability to obtain the desired feature subset. Zohre Sadeghian et al. [55] proposed a three-stage feature selection method based on the S-BBOA algorithm. In the first stage, the minimum redundancy-maximum new classification information (MRMNCI) feature selection was used to remove 80% of the irrelevant and redundant features. The best feature subset was chosen using IG-BBOA in the second step. Furthermore, the similarity ranking approach was used to choose the final feature subset. Veredas Coleto-Alcudia et al. [56] proposed a new hybridization method based on the dominance degree artificial bee colony algorithm (ABCD) to investigate the problem of gene selection. The method combines the first step of gene screening with the second part of the optimization algorithm to find the optimal subset of genes for the classification task. The first step is to use the Analytic Hierarchy Process (AHP) to select the most relevant genes in the dataset through five sequencing methods. In this way, gene filtering reduces the number of genes that need to be managed. For the second step, gene selection can be divided into two objectives: minimizing the number of selected genes and maximizing classification accuracy. Lee et al. [57] embedded the formal definition of correlation into Markov coverage (MB) and established a new multi-feature sequencing method, which was applied to high-dimensional microarray data, enhancing the efficiency of gene selection and, as a result, the accuracy of microarray data classification.

Swarm Intelligence for Gene Selection
Alok Kumar Shukla et al. [4] created TLBOGSA, a hybrid wrapper approach that combines the features of the Teaching Learning based Optimization (TLBO) and the Gravity Search Algorithm (GSA). TLBOGSA was updated with a new encoding approach that transformed the continuous search space into the binary search space, resulting in the binary TBSA. First, significant genes from the gene expression dataset were chosen using the minimal redundancy and maximum correlation (mRMR) feature selection approach. Then, using a wrapper strategy, information genes were chosen from the reduced data generated by the mRMR. They developed the gravitational seeking mechanism in the teaching stage to boost the evolutionary process's searching capabilities. The technique selected the most reasonable genes using a Naive Bayes classifier as a fitness function, which is useful for accurate cancer classification. Based on the phase diagram approach, Elahe Khani et al. [58] suggested a unique gene selection algorithm, and Ridge logistic regression analysis was performed to evaluate the likelihood that the genes belong to a stable group of genes with excellent classification ability. To address the problems, a methodology for the final selection of the selected set is suggested. The model's performance was assessed using the B632+ error estimation approach. To identify genes from gene expression data and valuable information genes from cancer data genes, a decision tree optimizer based on particle swarm optimization was presented by Chen et al. [59]. Experimental results demonstrate that this strategy outperforms different popular classifiers, including support vector machines, self-organizing mapping, and back propagation neural networks. Dabba et al. [10] developed the Quantum MFO (QMFOA), a swarm intelligent gene selection technique based on the fusion of quantum computing with the MFO, to discover a relatively small subset of genes for high-precision sample classification. The QMFOA gene selection algorithm has two stages: the first is preprocessing, which acquires a preprocessing gene set by measuring the redundancy and correlation of genes, and the second is hybrid combination and gene selection, which utilizes several techniques such as MFO, quantum computing, and support vector machine. To select a limited, representative fraction of cancer-related genetic information, Mohamad et al. [60] developed an enhanced binary particle swarm optimization for gene selection. The velocity of particles is incorporated in this approach to give the rate of particle position change, and the particle position update rule is presented. The experimental findings show that the suggested technique outperforms the classic binary PSO in terms of classification accuracy while picking fewer genes (BPSO).

SMA
Several swarm intelligence optimization techniques have appeared successively in recent years, such as slime mould algorithm (SMA) [35], Harris hawks optimization (HHO) [61], hunger games search (HGS) [62], Runge Kutta optimizer (RUN) [63], colony predation algorithm (CPA) [64], and weighted mean of vectors (INFO) [65]. Due to the simplicity and efficiency of swarm intelligence algorithms, they have been widely used in many different fields, such as image segmentation [66,67], the traveling salesman problem [68], feature selection [69,70], practical engineering problems [71,72], fault diagnosis [73], scheduling problems [74][75][76], multi-objective problems [77,78], medical diagnosis [79,80], economic emission dispatch problems [81], robust optimization [82,83], solar cell parameter identification [84], and optimization of machine learning models [85]. Among them, SMA is a new bionic stochastic optimization problem, simulating slime mold behavior and morphological changes during foraging. At the same time, SMA used weight to simulate the positive and negative feedback effects of slime mould propagation waves during foraging behavior to construct a venous network with different thicknesses. The morphology of the slime mould changed with the three search patterns: proximity to food, wrap around food, and oscillation.
From the brief description of SMA shown in Figure 1, the random value rand helps to find the optimal solution. The slime moulds were randomly distributed in any direction to search for solutions (food), and when rand < z, there was no venous structure. During the search phase, when rand ≥ z and r < p, individuals form diffuse venous structures to access food. The adaptive change of decision parameter p ensures better adaptability of the transition from the exploration stage to the exploitation stage. During the exploitation phase, when r ≥ p, the individual encapsulates the solution (food) through venous fibrillation. Based on the following significant parameters, a specific mathematical model of SMA can be constructed to represent the three contraction modes of slime mould: where X(t) and X(t + 1) represent the position vectors of slime mould during iteration t and (t + 1), respectively. UB and LB indicate the upper and lower boundaries of the search space, respectively. X b denotes the position vector of the individual with the highest fitness (highest concentration). X A (t) and X B (t) indicate the position vectors of random individuals selected from the slime mould during iteration t. rand and r are random values between 0 and 1. The parameter z is set to 0.03 as in the original literature.
In addition, the decision parameter p can be calculated as follows: where S(i) indicates the fitness of the ith individual in the slime mould X, i ∈ 1, 2, · · · , N. N. denotes the size of the population. DF represents the best fitness, which is attained during all of the iterations. W is the weight vector of slime mould, which can be obtained from the following equation. This vector mimics the rate at which slime mould shrinks around food for different food masses.
[SmellOrder, Smell Index] = sort(S) where bF and wF are the best and worst fitness obtained in the current iteration, respectively. Smell Index and SmellOrder denote, respectively, the fitness sort order (smallest problems sorted in ascending order) and the corresponding fitness value. condition indicates the first half of SmellOrder and is also the overall fitness ordering value. condition simulates the individuals adjusting their search patterns dynamically according to the quality of things. The collaborative interaction between the parameters vb and vc can simulate the selection behavior of slime mould. vb denotes a random value in the interval [−a, a]. The parameter vc represents a decrease in the number of iterations within the interval where Max_iter indicates the maximum number of iterations. The simplified pseudo-code of SMA is listed in Algorithm 1. We can find more specific descriptions in the original literature.

The Cauchy Mutation Operator
In this section, we will briefly introduce the Cauchy mutation. The Cauchy density function can be described as: where t > 0 and is the proportional parameter, and the distribution function is expressed as follows: By increasing the search range in each generation, individuals can be guaranteed to find better solutions in a wider range, thus avoiding local optimization. Therefore, Cauchy mutation was selected as an improved mechanism.
In the original SMA based on Equations (6) and (7), the version using the Cauchy mutation operation is expressed as: where Cauchy is the random number of the distribution obtained by the Cauchy distribution, x i is a position in the SMA at the time of the current iteration, x i_cauchy is the corresponding position of x i after Cauchy mutation. The introduction of the Cauchy mutation mechanism improves the foraging behavior of slime mould searching the unknown space, so the quality of SMA solutions can be enhanced by using the Cauchy operator in the simulation process.

The Mutation and Crossover Strategy in DE
During the optimization procedure, the major operations are mutation and crossover. Each solution

A. Mutation
A mutant vector can be generated via the mutation operator ? i according to selected components from randomly nominated vectors x a , x b , and x c , where a = b = c = i. The mathematical equation can be represented as follows: where F is a random number that is able to control the mutation's perturbation size.

B. Crossover
The crossover operator may construct a trial vector v i by applying crossover to a mutant vector, where the trial vector is constructed by randomly selecting items from the mutant u i and the target vector x i depending on the probability ratio P c . The math formula appears such as this: The probability feature P c controls the diversity of the swarm and relieves the risk of local optima, and j 0 is an index between [1,2,3, . . . , N p ], which guarantees that v i obtains at least one component from u i .

The Hybrid Structure of the Proposed ISMA
Considering that the original SMA may not converge to some suboptimal solutions precociously or face the risk of falling into local optimal solutions, the improved algorithm ISMA proposed in this paper combines two strategies, Cauchy mutation and crossover mutation based on DE, to promote the coordination of global exploration and local exploitation and forms a new SMA variant, namely ISMA. The structure of the proposed ISMA is shown in Figure 2, which is demonstrated in Algorithm 2. Under the ISMA framework, these two strategies are used, in turn, to generate the new search agent and the best agent with the best solution in the current iteration. Figure 2 depicts the ISMA process. As illustrated in the picture, the position of each agent may be rebuilt when the location of each agent is updated according to Equation (1), implying that each agent achieves the best solution in a larger search space. The position update based on SMA is to solve the position vector of slime mould according to the optimization rules of SMA, as detailed in Section 2.1. This phase produces a population based on SMA. The Cauchy-based mechanism and the crossover mutation mechanism are based on the behavior of the Cauchy-based mutation operation and the crossover mutation operation shown in Section 2.2 to adjust the position vector of an SMAbased individual to produce a new SMA-based population. In this stage, the advantages of Cauchy and the crossover mutation mechanism in the exploration stage are utilized to make up for the shortcomings of the SMA exploration. Considering both mechanisms' effects on search ability, this means increasing population size and thus population diversity. The research shows that this stage not only helps to promote the coordination of exploration and exploitation capabilities but also helps to improve the quality of solutions and accelerate the convergence rate.  (1) EndFor Use Cauchy mutation strategy to update the best individual and the best fitness Adopt MC strategy to update the best individual and the best fitness iteration = iteration + 1 EndWhile Return the best fitness and X b as the best solution End

Computational Complexity
The proposed SMA structure mainly includes the following parts: initialization, fitness evaluation, fitness sorting, weight updating, position updating based on SMA strategy, position updating based on Cauchy mutation strategy, and position updating based on crossover mutation strategy, where N is the number of cells of slime mould, D is the dimension of function, and T is the maximum number of iterations. The computational complexity of initialization is O(D). In the process of evaluation and sorting of fitness, the computational complexity is O(N + NlogN). The computational complexity of updating the weight is O(N × D). The computational complexity of the location update process based on SMA is O(N × D). Similarly, the computational complexity of the location updating process based on the Cauchy mutation mechanism and cross mutation mechanism is O(N × D). Therefore, the total computational complexity of ISMA is O(D + T × N × (1 + 4D + logN)).

Experimental Design and Analysis of Global Optimization Problem
To evaluate successive versions of ISMA, we considered two experiments to compare the methods presented in this section with several other competitors. We used 23 continuous benchmark functions (including 7 unimodal functions, 6 multimodal functions, and 10 fixed-dimensional multimodal functions) and 10 typical CEC2014 benchmark functions (2 hybrid functions and 8 composition functions) for a total of 33 benchmark cases. Experiment 1 is a series of SMA variants with different update strategies: ISMA, CSMA, and MCSMA. The best SMA variants are obtained by comparing them with the original SMA and DE algorithm. Experiment 2 is to compare the ISMA algorithm with 8 other advanced optimization algorithms, including multi-population ensemble differential evolution (MPEDE) [86], successful history-based adaptive DE variants with linear population size reduction (LSHADE) [87], particle swarm optimization with an aging leader and challengers (ALCPSO) [88], comprehensive learning particle swarm optimizer (CLPSO) [38], chaos-enhanced sine cosine-inspired algorithm (CESCA) [89], improved grey wolf optimization (IGWO) [39], whale optimization algorithm with β-hill climbing (BHC) algorithm and associative learning and memory (BMWOA) [90], modified GWO with random spiral motions, simplified hierarchy, random leaders, oppositional based learning (OBL), levy flight (LF) with random decreasing stability index, and greedy selection (GS) mechanisms (OBLGWO) [91]. In this study, all experimental evaluations were conducted on a Windows 10(64-bit) operating system with 32GB RAM, Intel(R) Xeon(R) Silver 4110 CPU @ 2.40 GHz 2.10 GHz (dual processor), and MATLAB R2014a coding.
Tables A1-A4 contain information on 23 benchmark functions and 10 classic CEC2014 benchmark functions. It can be seen that the information of the 33 functions used in the experiment contains a wide variety of problems. These capabilities can be used not only to verify the local exploitation ability and global exploration ability but also to verify the ability to balance the two abilities. In addition, to reduce algorithm randomness's impact on the experiment [92], we conducted 30 independent tests for each test case. In order to exclude the influence of other factors on the experiment, all the test algorithms were run under the same settings and conditions [93][94][95]. The maximum function evaluation was set as 300,000, and the population size was 30.
In addition, statistical results such as mean and standard deviation (std) are used to represent the global optimization ability and robustness of the evaluation method. The Wilcoxon signed-rank test at the significance level of 0.05 was used to measure the degree of improvement, which was statistically significant. It is worth noting that the label '+/=/−' in the results indicates that ISMA is significantly superior to, equal to, or worse than other competitors. For a comprehensive statistical comparison, the Friedman test was used to see whether the performance of all the comparison algorithms on the baseline function differed and was statistically significant. The mean ranking value (ARV) of the Friedman test was used to evaluate the average performance of the investigated method. It is worth noting that a reliable comparison must involve more than 5 algorithms for more than 10 test cases [96].

Comparison between SMA Variant and Original SMA and DE Algorithm
In this section, to prove the superiority of the Cauchy mutation mechanism and the combination of mutation and crossover strategies in DE, we compare the three combinations of the two mechanisms and the original SMA with the DE algorithm. The comparison results are shown in Tables A5-A7, and the algorithm convergence curve is shown in Figure 3.
As the results show in Tables A5 and A6, ISMA clearly outperforms the other mechanism combinations and the original SMA and DE algorithms, as ISMA outperforms almost all algorithms in handling most of the test functions. As can be seen from the ARV of Friedman's test in Table A7, ISMA can be considered the first algorithm when comparing the five algorithms. Mean and std in Table A5 also indicate the superiority of ISMA in F1-F6, F9-F14, F26-28, and F30-33 functions. ISMA ranks 2nd in F7, F15-F17, F19-25, and F29. According to the statistical significance of p-values in Table A6, almost all values in the SMA column are less than 0.05, indicating that ISMA has significantly improved the original SMA algorithm. The final optimization effect of F1-3, F9-11 and F26-28 functions by CSMA and MCSMA is the same as that by ISMA. In summary, the results of the Wilcoxon signed-rank test show that, statistically, ISMA has significantly improved performance compared with other algorithms. The results show that the addition of the Cauchy mutation strategy and crossover mutation strategy based on DE is beneficial to ISMA's exploitation ability and exploration ability and the balance between ISMA's exploitation ability and exploration ability.
The convergence analysis can show which optimizer as an iterative method can reach better quality results within a shorter time [97,98]. Figure 3 shows the convergence curves of the comparison method on 12 functions. We can intuitively find that, compared with the original SMA, DE, and other two SMA variants, the ISMA using the two mechanisms has a better effect. Combining the two mechanisms makes the SMA avoid falling into the local optimal solution and can obtain the global optimal solution. The overall advantage of ISMA is significant because of the positive effect of the Cauchy mutation mechanism and the crossover mutation strategy on SMA, which highlights the optimization capability of the proposed method.
Tables A8-A10 record the results of the comparison between ISMA and eight advanced algorithms. As can be seen from the comparison results in Table A10, among ISMA and 8 other advanced meta-heuristic algorithms, the average Freidman test result of ISMA is 3.7075758, ranking first, followed by CLPSO. The statistical results in Table A8 show that among all the comparison algorithms, the std of ISMA on more test functions is 0, so it can be seen that ISMA is more stable. In addition, the comparison results of specific functions show that ISMA has a stronger ability to deal with complex functions and mixed functions than other advanced algorithms. The mean and std in Table A8 also indicate the superiority of ISMA in F1-6, F9-15, F26-28 and F30-33 functions. ISMA also ranks high in the F7, F21-23. In addition, Table A9 shows the Wilcoxon signed-rank test results between ISMA and other advanced algorithms. It can be seen that ISMA outperforms other algorithms on most of the benchmark functions, especially CESCA, which outperforms CESCA on 90.9% of the functions. As a result, ISMA is superior to other strong competitors.
The convergence curves of all nine algorithms over 12 functions shown in Figure 4 show that the convergence rate of ISMA is competitive with other more advanced methods, which always converge to local optimum earlier than ISMA. It can be proved that the ISMA algorithm has a strong ability to avoid local and global searches, and ISMA can produce more accurate solutions. To sum up, the optimization power of ISMA is reflected in the overall superior performance of ISMA in different types of functions compared to the more challenging advanced methods. The combination of the Cauchy mutation mechanism and crossover mutation strategy based on the DE algorithm enables the proposed ISMA to obtain a higher quality solution in the optimization process and makes exploration and exploitation in a better equilibrium state.

The Proposed Technique for Gene Selection
In this section, the proposed ISMA is applied to the gene selection problem, which makes improving the proposed algorithm more practical. For this purpose, we transform the continuous ISMA into a discrete version, namely the BISMA of the wrapper method, to solve the gene selection problem for binary optimization tasks.

System Architecture of Gene Selection Based on ISMA
The procedure of selecting or generating some of the most significant features from a set of features in order to lower the dimension of the training dataset is known as feature selection. Many fields with large data sets want to be able to reduce the dimensions of application data, such as gene selection for high-dimensional gene expression data sets in the medical field. The task of gene selection is to reduce the number of irrelevant and unimportant genes, identify the most relevant genomes with the greatest classification accuracy, reduce the cost of high computing costs and improve the accuracy of disease analysis. The continuous ISMA optimizer is converted to binary ISMA (BISMA) using the transfer function (TF) for the gene selection problem. The machine learning algorithm was used as a classifier to evaluate the ability of BISMA to identify discriminant genes and eliminate irrelevant, redundant genes in high-dimensional gene expression datasets. In addition, cross-validation (CV) was used to evaluate the optimality of selected gene subsets for classification during the evaluation process.

Fitness Function
Gene selection is a process that uses the least subset of genes to obtain the optimal classification accuracy, and both goals need to be achieved simultaneously. Therefore, in order to meet each objective, the fitness function expressed in Equation (11) can be designed to comprehensively evaluate the candidate solutions by using classification accuracy and the number of selected genes.
where Acc indicates the classification accuracy of the classifier (machine learning method), so (1 − Acc) is the error rate of the classifier. The weighting factors α and β are the importance of error rate and the number of selected genes, respectively, and α ∈ [0,1], β = 1 − α. D is the total number of genes in the exponential data set, and the numerator D R is the number of genes filtered by the proposed gene selection optimizer. In this study, α and β were set to 0.95 and 0.05, respectively.

Implementation of Discrete BSSMA
The proposed ISMA optimizer searches for the optimal solution in a continuous search space in previous work. Gene selection is a binary problem. The transfer function restricts the continuous search space to 0 or 1. When the value is 0, it means not selected, and when the value is 1, it means selected.
Individuals with binary position vectors are initialized through a random threshold, as shown below: where x d i is the i-th gene on the d-th dimension of the position vectors of the slime mould. In addition, the transfer function (TF) is a suitable converter that can convert a continuous optimization algorithm to a discrete version without changing the algorithm's structure because it is convenient and efficient [99]. There are 8 types of TFs, which can be divided into S-shaped and V-shaped according to their shapes. Their mathematical formulae and graphical descriptions are shown in Table A11.
For an S-shaped family, a gene of the position vector at the next iteration can be converted according to the TFS1-TFS4 shown in Table A11 as follows: where T x d i (t + 1) represents the probability value of the i-th gene on the d-th dimension at the next iteration.
For a V-shaped family, the gene of the position vector at the next iteration can be converted according to the TFV1-TFV4 shown in Table A11 as follows:

Experimental Design
In this experiment, two kinds of comparison results are used to evaluate the optimization ability of the proposed algorithm. In the first assessment, we studied BISMA with different TFs to determine the best version of BISMA out of the eight TFs. The resulting BISMA is compared with other mature meta-heuristic optimizers in the second evaluation. Fourteen gene expression datasets were used in the two case studies. Table A12 lists the detailed characteristics of these microarray datasets, including the number of samples, the number of genes per sample, and the number of categories. These 14 representative gene datasets have been widely used to test a variety of gene selection optimizers to evaluate their performance.
In addition, to obtain more convincing results, this paper also considers the Leave-One-Out cross-validation (LOOCV) to validate the gene selection process. A sample in the data set is taken as the test set to verify the classification accuracy of the classifier, while the rest of the sample is taken as the training set to be trained with the classifier. The number of validations per dataset is equal to the size of the test dataset. The KNN classifier is used for classification tasks. For KNN, let the field size k in KNN be 1. The test method of distance D is as follows: To be fair in comparison [100][101][102], each evaluation and comparison involving BISMA was performed on the same computing environment, namely Intel(R) Xeon(R) Silver 4110 CPU @ 2.40 GHz 2.10 GHz (two processors) and 8 GB RAM (Windows 10)(64-bit). MATLAB R2014a software was used to test the algorithm. For each algorithm, we set the maximum number of iteration agents and the number of search agents to 50 and 20, respectively. It was run 10 times independently. The initial parameters of all algorithms are identified as their original reference parameters.

The Proposed BISMA with Different TFs
Considering the effect of TF on the performance of the gene selection optimizer, we developed eight BISMA optimizers using eight different TFs and evaluated their effectiveness in finding the optimal gene from each gene dataset listed in Table A10. These eight TFs include four S-shaped and four V-shaped TFs, as shown in Table A11. This assessment helps to obtain the best binary version of BISMA to the gene selection issue .  Tables A13-A16 show the average number of selected genes, the average error rate, the average fitness, the average calculation time, and the corresponding std and AVR of the 8 developed versions of the BISMA optimizer.
The average number of selected genes produced by each version of BISMA on the 14 datasets is shown in Table A13. The number of genes required by BISMA based on V-shaped was the least among all versions of BISMA. As can be seen from the ARV value, the average number of selected genes of BISMA based on TFV4 was the least and ranked the first, while the four BISMA based on V-shaped were ranked as the first four, and the number of selected genes was significantly lower than that of BISMA based on S-shaped. Table A14 records the average classification error rates of the eight versions of BISMA on the baseline gene dataset. Judging from the average ranking value, BISMA with TFV4 is significantly better than other competitors. The four V-shaped BISMAs obtained an average error of 0 in 57% of the gene data sets, indicating the stability of feature selection based on V-shaped BISMA. Meanwhile, BISMA based on V4 obtained an error of 0 and a standard deviation of 0 on 85.7% of gene data sets. Therefore, from the average error rate, the ability of BIMSA with V-shaped TFs to solve the gene selection task is due to its S-shaped TFs counterpart.
According to the average fitness test results reported in Table A15, it can be found that BISMA_V3 achieved the best fitness on about 42.9% of the baseline gene data set, which was slightly better than BISMA_V3 and significantly better than other competitors. However, from the ranking mean, BISMA_V4 ranked first, followed by BISMA_V3, BISMA_V1, BISMA_V2, BISMA_V1, BISMA_S2, BISMA_S3, and BISMA_S4. The fitness results also showed that TFs of the V-shaped family were better than that of the S-shaped family.
Similarly, it can be seen from the calculated time that, except for V1, the version of the V-shaped TFs takes less time to run than the version of the S-shaped TFs. In particular, the first-place V4 takes much less time on average than the second-place V3. The calculation overhead of BISMA_V4 with the best average ranking value is lower than that of the other versions over all the benchmark datasets.
As shown in Tables A13-A16, the BISMA version with TFV4 was superior to other versions in terms of the average number of selected genes, average error rate, average fitness, and average time cost, and the BISMA version with TFV4 was far superior to the second in terms of average time cost. In comparing S-shaped and V-shaped, V-shaped can achieve better results than S-shaped. Therefore, the transfer function TFV4 was chosen as the best choice to establish a BISMA optimizer with better stability for genetic problems. In this case, BISMA_V4 is used to represent BISMA, which is further evaluated by comparison in the following sections.

Comparative Evaluation with Other Optimizers
In this section, the superiority of the proposed BISMA optimizer is evaluated by comparing it with several state-of-the-art meta-heuristic approaches. These algorithms considered to be meta-heuristics are bGWO [103], BGSA [104], BPSO [99], bALO [105], BBA [106], BSSA [107], bWOA [108], BSMA, the binary form of the original SMA [35], and BISMA, the discrete version of the improved ISMA. Table A17 shows the parameter settings for the relative comparison optimizer.
Tables A18-A21 show the selected genes' statistical results in terms of length, error rate, fitness and computational time. According to the average gene length in Table A18, the proposed BISMA had the least number of selected genes on 57.1% of the gene datasets, while bWOA had the least number of selected genes on 42.9% of the gene datasets. It can be seen that in the 14 data sets, BISMA and bWOA are far more competitive than other algorithms in reducing the data dimensions.
The results of the mean error rate are shown in Table A19, which shows the superiority of the proposed BISMA. BISMA achieves the minimum mean error rate on 85.7% of the gene data sets and only performs slightly worse on Lung_Cancer and Tumor_14. bGWO showed the best error rate on the Tumor_14 gene dataset, while bWOA showed competing results on the Lung_Cancer gene dataset. From the perspective of the ARV index, BISMA ranked first, followed by bWOA, bGWO, BISMA, BGSA, BPSO, bALO, BSSA, and BBA.
The fitness of the important measurements shown in Table A20 comprises the weighted error rate and the number of genes selected by weighting. It is clear that the performance of the proposed BISMA is superior to other competitors on 64.3% of the gene data sets. The average fitness of BISMA and bWOA on 14 gene datasets was significantly better than that of the other algorithms.
In addition, according to the std values shown in Tables A18-A20, BISMA showed better performance, satisfactory standard deviation and excellent average fitness values in most of the gene data sets tested, which indicated that BISMA was more stable than bALO, BSSA, BBA, etc. There is a big gap between the overall performance of BISMA, BSMA, bWOA, and bGWO and the overall performance of BGSA, BPSO, bALO, BBA, and BSSA, and the first four optimizers are obviously better than the last five.
As can be seen from the average calculation time results shown in Table A21, the proposed BISMA has the highest time cost, and the time complexity of BSMA and bWOA with better performance is also relatively high, indicating the increase in calculation time cost caused by the improvement of performance. The time cost of BISMA was influenced by the introduced Cauchy mutation and the crossover mutation strategy based on DE. As shown in Table A21, the calculation time of the original SMA is also relatively expensive, which is also the reason for the high cost of BISMA time.
Compared with other gene selection optimizers, it is found that BISMA is the best one. Although the result is not ideal in terms of calculation time, BISMA is expected to select the optimal gene subset on the vast majority of microarray data sets to obtain the best fitness and the best classification error rate without the loss of meaningful genes. This fact proves that the combination of Cauchy mutation and crossover mutation strategy based on DE guarantees the improvement of global exploration in the proposed BISMA to achieve a more effective balance between local exploitation and global exploration.

Discussions
In this part, the ISMA algorithm proposed in this paper is discussed, and its advantages and existing points can be improved. In the original SMA, the global exploration ability of slime moulds was not strong, and they would fall into local optimum in the face of some problems, limiting the algorithm's use. In this paper, Cauchy mutation (CM) and cross mutation are introduced to update the population, increasing the global exploration space and avoiding falling into local optimum. Experiments show that the effect of a dual mechanism is better than that of a single mechanism, and ISMA is better than some advanced optimization algorithms.
However, ISMA exposes some common shortcomings of random optimizers in certain areas. As seen in Tables A5 and A8, when processing some multimodal functions, the algorithm's performance is sometimes poor due to the randomness of the crossover mutation mechanism. The search speed is slow in global exploration and local exploitation.
A binary algorithm (BISMA) performs feature selection in feature selection optimization problems on 14 data sets. The experimental results show that the proposed algorithm exhibits smaller average fitness and lower classification error rates while selecting fewer features. However, the introduction of Cauchy mutation and cross mutation mechanism brings good effects but also leads to a long running time of the algorithm, and the time complexity is the highest among all comparison algorithms.
According to the study [109], Ornek et al. combined the position update of the sines and cosines algorithm with the slime mold algorithm. In these updates, various sines and cosines algorithms are used to modify the oscillation process of slime molds. Experimental results show that the algorithm has good exploration and exploitation ability. Gurses et al. [110] applied a new hybrid slime mold algorithm, the Simulated Annealing Algorithm (HSMA-SA), to structural engineering design problems. Experimental results demonstrate the feasibility of the proposed algorithm in solving shape optimization problems. Cai et al. [111] proposed an artificial slime mold algorithm to solve the traffic network node selection problem, and the experimental results are of great significance to studying traffic node selection and artificial learning mechanisms. These ideas can be used as a reference to improve the shortcomings of ISMA in the future so that it can be applied in more fields, such as dynamic module detection [112,113], road network planning [114], information retrieval services [115][116][117], drug discovery [118,119], microgrids planning [120], image dehazing [121], location-based services [122,123], power flow optimization [124], disease identification and diagnosis [125,126], recommender system [127][128][129][130], human activity recognition [131], and image-to-image translation [132].

Conclusions
In this study, based on the basic SMA, an improved ISMA version is proposed, and the combination of the Cauchy mutation and crossover mutation strategy based on the DE algorithm is used to improve the SMA so as to achieve the coordination between global exploration and local exploitation. We first evaluate the effectiveness of the continuous version of the ISMA algorithm on 33 benchmark evaluation functions to deal with global optimization problems, compared with some advanced swarm intelligence algorithms. The results show that ISMA has a strong global exploration capability. In order to verify the performance of ISMA in practical application, the BISMA was obtained by mapping ISMA into binary space through the transfer function and then applied to the feature selection problem of 14 commonly used UCI datasets. In order to understand the optimal conversion function of the ISMA variant, we compared the number of selected genes, average error rate, average fitness, and computational cost. It can be seen that BISMA_V4 is superior to other versions. Therefore, BISMA_V4 is regarded as the final method to solve the gene selection problem. We compare BISMA_V4 with binary SMA, binary GWO, and several other advanced methods. The experimental results show that BISMA can select fewer features and obtain higher classification accuracy.
Therefore, we believe that the proposed BISMA is a promising gene selection technique. There are several ways to extend the work we have conducted. We can consider applying BISMA to other high-dimensional data sets and study the effectiveness of BISMA on other data sets. Secondly, other strategies can be used to improve the SMA and improve the coordination between the SMA global exploration and local exploration. Thirdly, interested researchers can apply SMA to more areas, such as financial forecasting, optimization of photovoltaic parameters, and other engineering applications. Finally, we can extend the application of ISMA to multi-objective optimization, image segmentation, machine learning, and other fields.

Data Availability Statement:
The data involved in this study are all public data, which can be downloaded through public channels.

Conflicts of Interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A
See Tables A1-A21.   Table A1. Descriptions of unimodal benchmark functions.

Function
Dim Range f min               Table A11. The descriptions of two types of TFs.

Name
TFs Graphs Note: x j i (t) denotes the i-th element on j-th dimension in the position vector.