Classification of Gene Expression Data Using Multiobjective Differential Evolution

Gene expression data are usually redundant, and only a subset of them presents distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in bioinformatics. In this paper, a multiobjective binary differential evolution method (MOBDE) is proposed to select a small subset of informative genes relevant to the classification. In the proposed method, firstly, the Fisher-Markov selector is used to choose top features of gene expression data. Secondly, to make differential evolution suitable for the binary problem, a novel binary mutation method is proposed to balance the exploration and exploitation ability. Thirdly, the multiobjective binary differential evolution is proposed by integrating the summation of normalized objectives and diversity selection into the binary differential evolution algorithm. Finally, the MOBDE algorithm is used for feature selection, and support vector machine (SVM) is used as the classifier with the leave-one-out cross-validation method (LOOCV). In order to show the effectiveness and efficiency of the algorithm, the proposed method is tested on ten gene expression datasets. Experimental results demonstrate that the proposed method is very effective.


Introduction
Gene expression data are characterized by thousands of and even tens of thousands of measured genes on only a few tissue samples, which gives rise to difficulties for many classifiers [1,2].Therefore, feature selection in the computational intelligence field [3,4] plays an important role in gene array-based cancer classification, because gene selection can help to remove the irrelevant and redundant features and choose a small subset of features to carry out the classification task in an optimal way.In general, feature selection can be categorized into wrappers and filters according to whether or not it is done independently of the learning algorithm [3,4].By using the filter and wrapper techniques, many feature selection methods [5][6][7][8] have been proposed to optimize the efficiency of the search and selection process.For example, a novel correlation-based memetic framework (MA-C), which is a combination of genetic algorithm (GA) and local search (LS) using correlation-based filter ranking, was proposed [9].The local filter method used here fine-tunes the population of GA solutions by adding or deleting features based on the symmetrical uncertainty (SU) measure.In order to take into account the experimental conditions and the time points simultaneously, Gutiérrez-Avilés D. et al. [10] presented the TriGen algorithm, a genetic algorithm that finds triclusters of gene expression.From the results, TriGen has proven to be capable of extracting groups of genes.In [11], Xue B. et al. propose three new initialization strategies and three new personal best and global best updating mechanisms in particle swarm optimization to develop novel feature selection approaches with the goals of maximizing the classification performance, minimizing the number of features and reducing the computational time.
The superior performance of this algorithm is due mainly to both the proposed initialization strategy, which aims to take advantage of both the forward selection and backward selection to decrease the number of features and the computational time, and the new updating mechanism, which can overcome the limitations of traditional updating mechanisms by taking the number of features into account, which reduces the number of features and the computational time.Based on the above analysis, the main purpose of the feature selection method is to maximize the model performance and to minimize the number of genes selected at the same time.That is to say, feature selection has two different objectives, including maximizing the classification performance and minimizing the number of genes selected.In some cases, these two objectives conflict.Based on the above situation, the feature selection may be more suitable to be designed for a multiobjective problem rather than a single-objective problem.
Recently, many multiobjective optimization approaches based on different evolutionary algorithms have been reported to solve feature selection [12,13].For example, a hybrid multiobjective optimization method based on particle swarm optimization was proposed [5] to find a small set of non-redundant disease-related genes.Two objectives, including sensitivity and specificity, are simultaneously evaluated by the artificial neural network (ANN) classifier.Based on the real-life datasets of various types of cancers, the performance of multiobjective particle swarm optimization can perform better compared with sequential feature selection (SFS), the t-test and rank-sum.Xue B. et al. [14] proposed other multiobjective particle swarm optimizations, which are multiobjective binary particle swarm optimization (PSO) using the idea of non-dominated sorting (NSBPSO) and multiobjective binary PSO using the ideas of crowding, mutation and dominance (CMDBPSO).The proposed algorithms are examined and compared with a single-objective method on eight benchmark datasets.Experimental results show that the proposed multiobjective algorithms can evolve a set of solutions that use a smaller number of features and achieve better classification performance than using all features.Different from particle swarm optimization, the multiobjective genetic algorithm [15] was proposed to select the optimum subset and then the classification of gene expression data.Support vector machine with the radial basis function (RBF) kernel is used to measure the accuracy of the classification.This approach was tried on two benchmark gene expression datasets.It obtained encouraging results on those datasets as compared with an approach that used a single-objective strategy in a genetic algorithm.In [16], a different optimization algorithm based on an artificial immune system was used to solve feature selection in classification problems aiming at minimizing both the classification error and the cardinality of the subset of features.The algorithm is able to perform a multimodal search maintaining population diversity and controlling automatically the population size according to the problem.The experimental results show that parsimonious subsets of features and the classifiers produced a significant improvement in the accuracy.Another multiobjective artificial immune algorithm [17] was used to optimize the kernel and penalize the parameters of support vector machine (SVM).In the training stage of SVM, multiple solutions are found by using a multiobjective artificial immune algorithm, and then, these parameters are evaluated in the test stage.The proposed algorithm is applied to fault diagnosis of induction motors and anomaly detection problems, and successful results are obtained.Rubio-Escudero C. et al. [18] used EMO-CC (evolutionary multiobjective conceptual clustering) to obtain such gene product information, which retrieves meaningful substructures from network databases.The experiment results show that expectation maximization algorithm performs better than other algorithms for the analysis of microarray data.Romero-Zaliz R. et al. [19] proposed a multiobjective methodology to combine state-of-the-art algorithms into an aggregation scheme in order to obtain the optimal methods' aggregations.The results obtained by the multiobjective algorithm show a major improvement in sensitivity when our methodology is compared to the performance of individual methods for gene finding and gene expression problems.Based on the above discussion, many different multiobjective evolutionary algorithms have been used to handle the feature selection problem.However, these algorithms still have some drawbacks, such as low optimization efficiency, easily falling into local optima and premature convergence.Moreover, this field of study is still in its early days; a large number of future research works is necessary in order to develop a multiobjective algorithm for feature selection.
Recently, the differential evolution algorithm was proposed as a powerful evolutionary algorithm [20][21][22][23], which has good global search and local search capabilities, and it can quickly search out all solutions from the solution space.Several variations of differential evolution (DE) have also been proposed to enhance the performance of the standard DE [24][25][26][27][28][29][30][31].The algorithm was considered as an intelligent optimization method for heuristic random search in a continuous space.The algorithm consists of three different operators, including mutation, cross-over and selection operators.By these operators, the differential evolution algorithm can generate new individuals by combining the target vector and the trial vector.However, it should be noted that most of these algorithms work in continuous space rather than in discrete space.Therefore, in this paper, we propose a novel multiobjective binary differential evolution algorithm (MOBDE) to solve the binary problem in terms of the feature selection problem.
This paper uses a novel multiobjective differential evolution algorithm for the feature selection problem, and support vector machine (SVM) is used as the classifier with leave-one-out cross-validation (LOOCV).The Fisher-Markov selector is used to choose a fixed number of the top gene expression data features, and then, a multiobjective binary differential evolution algorithm based on the summation of normalized objectives and diversity selection is adopted to select the most important gene subsets.Finally, a classifier SVM is trained based on the gene subset and then used to predict the test sample.Numerical results of ten gene expression data are reported and compared with other algorithms.As is shown, the solutions obtained by the proposed approach are all superior to those best solutions obtained by other algorithms in the literature.

Computational Methods
In this part, we shall introduce a hybrid multiobjective binary differential evolution and support vector machine method (MOBDE) for feature selection.The flowchart of the proposed method is shown in Figure 1.As can be seen in this figure, there are mainly three important components, i.e., the Fisher-Markov selector component, the multiobjective binary differential evolution component and the support vector machine component.
In the first component, the Fisher-Markov selector method is used to select 180 top genes with the highest scores.These selected genes will then be utilized for the second component, multiobjective binary differential evolution component.In this component, at first, a randomly-generated initial solution will be represented by a binary (0/1) string.Then, a novel binary mutation method is proposed to balance the exploration and exploitation ability during the search process.After that, the multiobjective binary differential evolution is proposed by integrating the summation of normalized objectives and diversity selection into the algorithm.
By using MOBDE, the parameters of the support vector machine (SVM) in the third component and the features subset are dynamically optimized.Specifically, for feature selection, each gene is represented as a bit of binary encoded individual, where one denotes a gene selected and zero denotes a non-selected gene.For SVM, two important parameters of RBF kernels, i.e., c and γ, are taken into account.In this sense, the length of each individual is equal to D + 2, where D is the number of genes in the initial microarray dataset.Table 1 shows the solution representation of the algorithm.
The framework of the multiobjective binary differential evolution method (MOBDE) with support vector machine (SVM).
From the above solution representation, P c is the parameter C of the SVM, and P γ denotes the parameter γ of SVM.In this paper, we use the evolutionary algorithm to optimize the parameters of SVM and the feature subset in each individual; the multiobjective function can be defined as below: where SV M accuracy denotes the classification accuracy of SVM and R denotes the number of selected genes.Finally, The fitness values of each individual will be assessed by the accuracy of LOOCV.

Fisher-Markov Selector
In the field of machine learning, selecting suitable features is very important for classification.The Fisher-Markov selector is proposed by Cheng et al. [32] to identify the more useful features in describing essential differences among the possible groups.The authors present a way to represent essential discriminating characteristics together with sparsity as an optimization problem.In this paper, we use this method, and the detailed description can be seen in [32].

Multiobjective Differential Evolution Component
In this part, we shall introduce the proposed multiobjective binary differential evolution algorithm in detail.
As we know, differential evolution (DE) is a fairly novel population-based search heuristic, which is simple to implement and requires little parameter tuning compared with other search heuristics in continuous space.
The process of DE can be summarized into three major steps: mutation, cross-over and selection.In the mutation operator, the process of generating the mutation vectors the current population, where D denotes the dimension of the individual, i denotes the i-th of individual and G denotes the current iteration of the algorithm.In the DE algorithm, "DE/rand/1/bin" is the most common mutation strategies, as below: where r1, r2, r3 ∈ [1, • • • , NP], r1 = r2 = r3 = i and F is the mutation factor of the differential evolution.NP is the size of the population.
In the cross-over operation, a recombination of the candidate solution V i,G and the parent X i,G produces an offspring solution U i,G = [U 1,i,G , U 2,i,G , . . ., U D,i,G ].Usually, the binomial cross-over is accepted, which is defined as follows: where j ∈ [1, . . ., D]; rand j ∈ [0, 1] is a random number between zero and one; j rand ∈ [1, . . ., D] is a randomly chosen index.CR is the cross-over rate.
A greedy selection is used to choose the next population (i.e., G = G + 1) between the parent population and the offspring population.The selection operation is described as follows: As we know, the original differential evolution algorithm is a continuous optimization algorithm, but the feature selection problem is a classic binary optimization problem.Therefore, the original continuous encoding scheme of DE cannot be used directly for gene selection problems.In order to make DE suitable to solve the gene selection problem, a binary differential evolution (BDE) algorithm is proposed first.In the proposed method, the initial population is represented as a vector in which each bit is a binary value of zero or one, where one denotes this gene is selected and zero denotes a non-selected gene.The objective function values are calculated, and then, new binary populations are transported into the mutation operators.The binary cross-over operations are used to generate the trail solution.Finally, greedy selection method is used to choose the better results for the next generation.
During the reconstruction of the mutation operation, the key idea is to use some appropriate operators in place of the arithmetic operators.In [33], He and Han used the XOR, AND and OR operations instead of the subtraction, multiplication and addition operations in the formula, which can be described as follows: where ⊕ denotes the XOR operations, ⊗ represents the AND operation and denotes the OR operation.Note that in Formula ( 5), the use of OR operation will make the probability of a result be true.The probability of the binary "1" will be three times higher than the probability to be false (binary "0").In other words, the binary "1" would be easily accumulated with the binary string V j,i,G+1 of the trial solution after the OR operation.This would decrease the diversity of the algorithm.Accordingly, in [34], another novel mutation operation is proposed by considering the distance of the X r 1 ,G and X r 2 ,G for each dimension: Compared with Formula (5), the new mutation strategy can enhance the diversity of the algorithm because it does not use the OR operation.However, in this formula, the value of the previous generation will be discarded.Therefore, it cannot inherit the advantage of the original individual from the previous population.
Therefore, in this paper, we propose a new mutation strategy, which can both increase the diversity of the algorithm and take advantage of the original population, as described in the following: As can be seen in this formula, first, it does not use the OR operation, so this operation will not harm the diversity of the algorithm.Second, the values of the previous generation, e.g., X j,r 1 ,G , X j,r 2 ,G , X j,r 3 ,G , will be kept with a probability.In this way, the algorithm can inherit the advantage of the original individual from the previous population.Following the binary mutation strategy, a binary cross-over operator is used to build a trial solution U j,i,G+1 by combining the mutation vector and the target vector.The concept of the binary cross-over mechanism of BDE is similar to that of the original DE, though there is a difference in terms of the component data type.In BDE, the binary data is selected from the mutation vector if a random number is smaller than the cross-over rate; otherwise, the original solution is chosen to generate the trail solution.After the binary mutation and cross-over operator, the better solutions between the trial solution and the target solution will be retained to the next generation.
Based on the binary differential evolution algorithm, we will propose our multiobjective binary differential evolution algorithm (MOBDE).Specifically, in our method, two fitness objectives are taken into account for optimization.One is the accuracy of the classification, and the other is the number of selected genes.In order to tackle the feature selection problem, a non-dominated sorting process is often used to find the Pareto front.However, the non-dominated sorting process is always complex and time consuming.In order to solve this problem, Qu and Suganthan [35] used the summation of the normalized objective and diversity selection, and in this paper, we use a very similar method based on the summation of the normalized objective and diversity selection for the feature selection problem.For the summation of the normalized objective, first, we need to find the maximum and minimum value for every objective and calculate the different range of every objective; then, we need to sum all normalized objective values to obtain a single value.In this way, the multiobjective problem can be regarded as a single-objective optimization problem.However, this kind of transformation may cause the problem of lacking the diversity of the population.Therefore, the diversity selection method is used to maintain the diversity of the algorithm.
The preferential set and backup set are generated from the current population, and three rules are used to select the sets in the next process: 1.The preferential set can be selected in the next process firstly.2. The backup set will be chosen based on the summation of the normalized objective and diversity selection if the preferential set is not sufficient for the solution.3.While the individuals in the store exceed the maximum size, the required number of solutions will be randomly chosen from the preferential set.
Based on the above discussion, we can show the framework of our multiobjective binary differential evolution algorithm as follows in Algorithm 1.

Algorithm 1 Algorithm description of the MOBDE algorithm
Set the generation counter G = 0; and randomly initialize a population of NP individuals X i .Initialize the parameters F, CR.Evaluate the fitness for each individual in P. Return the non-dominated solutions Ar 0 from the individual P. while stopping criteria is not satisfied do for i = 1 to NP do select randomly

Support Vector Machines
In our system, the support vector machine with the leave-one-out cross-validation serves as the evaluator of the multiobjective binary differential evolution algorithm.Let be a set of training samples and the corresponding labels, respectively.Vapnik and Cortes [36] defined the SVM method as follows: where ω is a normal vector to the hyperplane and b is a constant, such that b ω represents the Euclidean distance between the hyperplane and the original feature space.The ξ i is the slack variables to control the training errors, and C is a penalty parameter of SVM.In this paper, the radial basis function (RBF) is used in SVM to obtain the optimal solution for classification.Considering two samples d ] T , i = j and i, j are the different samples, the RBF function is calculated by using K(q i , q j ) = exp(−γ q i − q j 2 ), where γ > 0 is the width of the Gaussian.K(q i , q j ) is the kernel function.
For the RBF kernel function, C and γ are the very important parameters, and the performance of SVM depends on the choice of kernel function in terms of the parameters C and γ.If the value of C is large, the accuracy value of the training will perform better, but the test rate will perform worse.
Meanwhile, if the value of C is small, the accuracy rate will be unsatisfactory, though the test accuracy rate may be high.Sometimes, the parameter γ has a more effective effect on the test phase than the parameter C. In order to optimize the feature selection and parameter simultaneously, in the modified MOBDE, each individual is encoded to a string of binary bits associated with the number of genes, and the parameters C and γ of the SVM will be dynamically optimized by a real code differential evolution in Equations ( 3) and (4).Specifically, the constrained ranges of the value of C and γ are [−5, 15] and [−15, 5] respectively.In our method, the classification accuracy of the prediction models and the number of selected genes derived from all datasets will be measured by the LOOCV procedure discussed in Section 2.

Computational Complexity of the Multiobjective Binary Differential Evolution with Support Vector Machine
In this part, we will analyze the time complexity of the multiobjective binary differential evolution with support vector machine model.In the beginning of the algorithm, the Fisher-Markov selector is used to choose the suitable feature.Cheng Q. et al. [32] shows that the complexity of the Fisher-Markov selector is O(n 2 ), where n is the size of the dataset.Then, for each iteration of MOBDE, the SVM needs to be called.Tsang et al. [37] shows that the data subroutines of standard SVM are O(n 3 ), where the summation of the normalized objective values method is O(M • NP), where NP denotes the population size and M is the number of objectives.Therefore, for each iteration, the runtime complexity is O(M • NP • n 3 + n 2 ).Suppose the total number of iterations is I; the time complexity of the algorithm is then

Why Use Each Finding in the Algorithm and the Strong Impact of the Finding
Firstly, the first problem is why we use the Fisher-Markov selector.The reason is that the Fisher-Markov selector selects the more suitable features to describe essential differences among the possible groups.This method uses the Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection.The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations.The algorithm has been used to solve the high-dimensional microarray gene expression datasets better.Therefore, in this paper, we firstly use the Fisher-Markov selector to select the feature.
Secondly, the second problem is why we use the multiobjective binary differential evolution to solve this problem.As we know, the original differential evolution algorithm is a continuous optimization algorithm, but the feature selection problem is a classic binary optimization problem.Therefore, the original continuous encoding scheme of DE cannot be used directly for gene selection problems.In order to make DE applicable to the gene selection problem, a binary differential evolution (BDE) algorithm is proposed first.As shown in Section 2.2, previous work may either decrease the diversity of the algorithm or the new individual cannot inherit the advantage of the original individual from the previous population.Therefore, in this paper, we propose a new mutation strategy, which can both increase the diversity of the algorithm and take advantage of the original population.

Experimental Setup
To demonstrate the effectiveness of the MOBDE algorithm, the experiments are performed on 10 benchmark datasets.All of these characteristics of gene expression datasets are listed in Table 2.The gene expression datasets consist of 10 well-known datasets.These datasets have been widely used by researchers as a primary source of feature selection datasets.The library for support vector machines (LIBSVM) is proposed by Chang and Lin [38].The datasets are classified by LIBSVM based on LOOCV.We compared our method with some binary differential evolution algorithms: binary DE [33], binary differential evolution (BDE) [34], binary differential evolution with artificial immune system (BDEAIS) [39], binary particle swarm optimization (BPSO) [40], binary genetic algorithm (BGA) [41] and binary estimation distribution algorithm (BEDA) [42].In this paper, we replace these methods with our binary method and then compare our method to show the effective of the algorithm.That is to say, all of these methods use the same multiobjective framework.Meanwhile, we compare our algorithm with the nondominated sorting genetic algorithm II (NSGAII) in order to show the difference of the summation of the normalized objective and diversified selection with the non-domination sorting process.As the same time, we also compare our method with some optimization methods, including SVM + grid search, improved binary particle swarm optimization (IBPSO), hybrid binary particle swarm optimization and tabu search (HPSOTS), PSO/GA [5][6][7][8] and some different versions of support vector machines.The parameters are as follows.For all algorithms, the population size is 50; the maximum number of iterations is 100.For the different version of binary DE algorithms [33,34,39], the value of the F is 0.5, and the value of CR is 0.7.For the genetic algorithm, the cross-over rate is 0.7, and the mutation rate is 0.5.For the binary PSO, the values of c1 and c2 are both 2. For the binary estimation of the distribution algorithm, the probability of selection is 0.3.The parameters were selected (after some preliminary experiments) so as to result in roughly the best results generated by the algorithms used for comparison.However, with different strategies used by each algorithm, it is very difficult to ensure the best suitable parameters as reflected in the experiments.

Discussions and Analysis
As is discussed in the previous section, LOOCV is used in our algorithm.As the training set and test set are changing under the LOOCV strategy, the genes selected and the test accuracy are different each time.Tables 3 and 4 show the test accuracy and the number of genes selected in 10 runs on the ten datasets.As we can observe in Table 3, the results of the proposed methods are almost consistent on all datasets.Moreover, MOBDE can obtain 100% LOOCV accuracy with less than 10 selected genes for the Leukemia1, Leukemia2, small, round blue cell tumors (SRBCT) and diffuse large B-cell lymphomas (DLBCL) datasets.For another dataset, Brain_Tumor2, from the Table 3, we can find that MOBDE obtains 100% accuracy with smaller selected genes.For the average accuracy, the MOBDE algorithm can obtain 99% accuracy.Meanwhile, the average number of selected genes is 7.5.For the gene expression data 11_Tumors, the MOBDE algorithm can provide 97.19% accuracy with less than 40 selected genes.It is noted that the MOBDE can obtain more than 98% accuracy four times.For the dataset Lung Cancer in Table 4, MOBDE can provide 100% LOOCV accuracy two times.The average accuracy rate of MOBDE can provide 99.12 with less than 30 selected genes.Meanwhile, the MOBDE can obtain the average selected genes of 15.5.For the dataset Prostate_Tumor, the MOBDE algorithm can provide 98.63% average LOOCV accuracy with 10.9 selected genes.For the dataset Brain_Tumor1, the MOBDE algorithm can also provide more than 97% classification accuracy.Among them, for Lung Cancer and Prostate Tumor, the algorithm can also find 100% classification accuracies for two and one times, respectively.From the point of view of the accuracy average in each independent run, the LOOCV accuracy and the number of selected genes obtained by MOBDE are shown in Figures 2 and 3. Results for 10 runs are listed in this table.The best subset is shown in shaded cells.In this work, the accuracy is more important than the number of selected genes.Therefore, a solution with the best accuracy can be chosen from the final Pareto front."Acc" denotes the accuracy of the classifications, and "Selected genes" represents the number of selected genes.The bolding denotes the best solutions.Results for 10 runs are listed in this table.The best subset is shown in shaded cells.In this work, the accuracy is more important than the number of selected genes.Therefore, a solution with the best accuracy can be chosen from the final Pareto front."Acc" denotes the accuracy of the classifications, and "Selected genes" represents the number of selected genes.The bolding denotes the best solutions.From the results in Table 5, we can find that the average percentage of genes selected is 0.0016.For the Leukemia1, Leukemia2, SRBCT and DLBCL datasets, our algorithm provides 100% LOOCV accuracy, even though the percentage of genes selected for these datasets is reduce to 0.0011, 0.0005, 0.0023 and 0.0010 of the total available, respectively.Therefore, it can demonstrate that not all features are necessary for achieving better classification accuracy.Figure 4 shows the percentage of genes selected.6.The results represented in Table 6 show that both MOBDE with the Fisher-Markov selector and MOBDE without the Fisher-Markov selector provide 100% classification accuracy for Leukemia2 and DLBCL.MOBDE with the Fisher-Markov selector provides less genes selected for all datasets.For the 9_Tumors, Brain_Tumors1 and Brain_Tumors2 datasets, MOBDE can not only provide better classification accuracy, but also a lower number of genes selected.However, for 11_Tumors, Lung_cancer and Prostate_Tumor, MOBDE with the Fisher-Markov selector cannot obtain better classification accuracy than the latter, which demonstrates that the Fisher-Markov selector is not suitable for solving different problems.Overall, the Fisher-Markov selector is very effective in feature selection for the bioinformatics dataset.
For the second experiment, we compare our algorithm with three different versions of binary differential evolution: binary DE, BDE and BDEAIS.We replace our binary differential evolution in MOBDE by using the binary DE, BDE and BDEAIS [33,34,39].That is to say, all of these methods use the same multiobjective framework to conduct a fair comparison.Therefore, the purpose of this experiment is to show the effectiveness of the new mutation strategy.Table 7 shows the results obtained by different binary differential evolution algorithms in terms of the mean and standard deviation (S.D.) of the classification accuracy and the number of genes selected.As can be seen in Table 7, for Leukemia2, SRBCT and DLBCL, all algorithms can obtain 100% LOOCV accuracy.However, MOBDE can obtain fewer genes selected.For the 11_Tumors dataset, the BDE can provide the best solution of 97.24% with the number of the genes selected being 48.4.The MOBDE can provide a similar accuracy of 97.19% and a lower number of genes selected of 27.5.For Leukemia1, three out of four algorithms can find the best accuracy with 100% LOOCV.For the rest of the datasets, the best classification and a lesser number of genes selected are provided by the MOBDE.Therefore, we can draw the conclusion that the MOBDE can obtain a better performance compared with other binary differential evolution algorithms.The bolding denotes the best solutions.
In order to show the effectiveness of the binary differential evolution, we also compare our algorithm with other well-known metaheuristics, such as the genetic algorithm [40], particle swarm optimization [41] and the estimation of distribution algorithm [42].In order to conduct a fair comparison, we replace our binary differential evolution in MOBDE by using these metaheuristics.All of these methods use the same multiobjective framework.Table 8 shows the results obtained by binary differential evolution, the genetic algorithm, particle swarm optimization and the estimation of distribution algorithm in terms of the mean and standard deviation (S.D.) of the classification accuracy and the number of genes selected.We can observe in this table that MOBDE clearly outperforms other algorithms in all of the datasets.Therefore, we can conclude that our proposed algorithm shows an efficient and better performance in comparison with these algorithms.In addition, we also list the time of these algorithms in Table 9.The bolding denotes the best solutions.
In the test experiment, we compare our proposed method MOBDE with the grid-search SVM without feature selection method.The results are listed in Table 10.From the table, better results between the two algorithms are shown in shaded cells.It is easy to see that both the classification accuracy and the number of selected genes of MOBDE are superior to grid search SVM.This also demonstrates the effectiveness of MOBDE.The bolding denotes the best solutions.

Compared with Some Single-Objective Algorithms
In order to demonstrate the effectiveness of the proposed method, we also compared our work with some single-objective algorithms.It is worth mentioning that in the previous research, many single-objective algorithms only focused on the accuracy rate of the classification.Therefore, in this paper, we also use the accuracy rate as the compared criteria.Tables 11 and 12 show the results obtained by the MOBDE with other single-objective algorithms including IBPSO1 [8], IBPSO2 [6] and hybrid binary particle swarm optimization and tabu search (HPSOTS).As can be seen in Tables 11 and 12, we can find that the MOBDE algorithm can provide a higher LOOCV classification accuracy on all datasets compared with the other PSO algorithms [6][7][8] and the other SVM-based algorithms [43,44], except Leukemia1 data.For the Leukemia1 data, the algorithm MOBDE, IBPSO1 [8] and IBPSO2 [6] can all obtain a 100% accuracy rate, while IBPSO1 [8] can obtain a lesser number of genes compared with MOBDE.Based on the above analysis, we can conclude that when only considering the accuracy of classification, the MOBDE algorithm can also perform better than the other algorithms.The bolding denotes the best solutions.The bolding denotes the best solutions.

Compared with A Multiobjective Algorithm
In this section, we compare our algorithm with a well-known multiobjective optimization algorithm (NSGAII) [9].The NSGAII algorithm is based on the non-dominated sorting and crowding distance method.Generally speaking, as is shown in Table 13, for all gene datasets, the MOBDE algorithm can provide better accuracy and smaller selected genes for most datasets compared with NSGAII.In Table 13, we also show the computation time comparison of these two algorithms.As can seen in table, in all instances, the computational time of our algorithm is less than that of NSGAII.The reason is that our algorithm is still efficient, and the time complexity is O(I NT 3 ), as discussed in Section 2. As for NSGAII, we can simply analyze its time complexity here.In NGSAII, for each iteration, the non-dominated sorting is O(M(2N) 2 ), and the crowding-distance assignment is O(M(2N)log(2N)), where N is the population size and M is the number of objectives; the data subroutines of standard SVM are O(T 3 ); so the overall complexity of the iteration is O(4MN 2 T 3 ).Given the I iteration, the total time complexity of NSGAII is O(4I MN 2 T 3 ), i.e., O(I N 2 T 3 ).Obviously, our algorithm is more efficient than NSGAII based on the above analysis.There may be two reasons that our algorithm performs better than NSGAII with fewer selected genes.The first reason is that the new binary mutation strategy used in MOBDE tends to enhance the diversity of the population and share the previous good individuals with the next generation.The second reason is that there may be only very few genes that are necessary for achieving the better classification accuracy, and our method seems more efficient for selecting such genes.The bolding denotes the best solutions.

The Paired Wilcoxon's Signed Rank Test of Our Algorithm with Other Algorithms
In this part, the paired Wilcoxon's signed rank test is adopted to compare MOBDE with other algorithms to verify whether the experiment results of MOBDE are better than other algorithms [45].The Wilcoxon's signed-rank test is a non-parametric statistical hypothesis test, which can be used as an alternative to the paired t-test when the results cannot be assumed to be normally distributed.In the paired Wilcoxon's signed rank test, the null hypothesis represents that there proteins is 100%, for membrane proteins is 92.31% and for other proteins is 83.33%.From the compared results, the feature selection method can reduce the data dimensionality and find out an optimal amount of features that result in the better performance of the predict model.We hope that the results using the new feature selection method can improve the performance of protein subcellular location prediction.

Conclusions
The objective of this study is to provide a multiobjective optimization method for feature selection.Our proposed method called MOBDE embraces the strength of the binary differential evolution for classification method and find the smaller subsets.In the first stage, we use the Fisher-Markov selector method to rank the scores of the features and select the 180 top features as the input of the binary differential evolution.Then, a novel binary differential evolution is proposed to select the feature subset.Following that, we propose a multiobjective differential evolution method for the feature selection based on the summation of the normalized objective and diversity selection on ten gene expression datasets.According to the experiments, the following can be concluded.
1.The proposed method can find useful informative features in terms of classification accuracies 2. By using this feature selection method, there is no need to set the number of selected features since the proposed algorithm can automatically select the most useful features in terms of classification accuracies.3. To show the effectiveness of the Fisher-Markov selector, the experiment of MOBDE with the Fisher-Markov selector and MOBDE without the Fisher-Markov selector is designed.
The experimental results show that the Fisher-Markov selector is very effective in feature selection for the bioinformatics dataset.4. To show the effectiveness of the proposed differential evolution, we compare our algorithm with three different versions of binary differential evolution: binary DE, BDE and BDEAIS.It is better than these different binary differential evolution algorithms in terms of classification accuracy and the number of selected features.Meanwhile, our algorithm also provides better solutions than other binary evolutionary algorithm, including BGA, BPSO and BEDA. 5. Compared with some single objective algorithms, our algorithm outperforms the best algorithm so far on these problems.
The proposed MOBDE algorithm is not only suitable for feature selection and classification in gene expression data, but also for other application domains, such as electricity load forecasting, face recognition and vehicle detection or any other high dimensional data classification.
,j,G end if end for end for Calculate the objective function for the new population Select the better individual based on the summation of normalized objectives and diversified selection Update the archive Ar G+1 based on the new individual end while

Figure 2 .
Figure 2. The accuracy obtained by MOBDE in each independent run.

Figure 3 .
Figure 3.The number of selected gene obtained by MOBDE in each independent run.

Figure 4 .
Figure 4.The percentage of genes selected.

Table 1 .
The solution representation.

Table 2 .
Format of gene expression classification data.

Table 5 .
The genes, selected genes and percentage of gene selected percentage.

Table 6 .
Comparative experimental results of the binary differential evolution algorithm with and without the Fisher-Markov selector.
The bolding denotes the best solutions.

Table 7 .
Comparative experimental results of different binary differential evolution algorithms.

Table 8 .
Comparative experimental results of the binary differential evolution algorithm with the binary genetic algorithm, binary particle swarm optimization and the binary estimation of binary estimation of distribution algorithm.

Table 9 .
Comparative experimental times of the differential evolution algorithm with different multiobjective optimization algorithms.

Table 10 .
Comparative experimental results of the binary differential evolution algorithm with grid search support vector machine (SVM).

Table 11 .
Comparative experimental results of the binary differential evolution algorithm with some single-objective methods.

Table 12 .
Comparative experimental results of the binary differential evolution algorithm with MOBDE with The maximum margin criterion and support vector machine-based recursive feature elimination (MMC + SVM-RFE), support vector machine-based recursive feature elimination (SVM-RFE) and minimum-redundancy maximum-relevancy (MRMR).

Table 13 .
Comparative experimental results of the binary differential evolution algorithm with a different multiobjective optimization algorithm.

Table 15 .
Prediction results with different models on the 317 apoptosis proteins dataset and 98 independent proteins dataset.