A Novel Binary QUasi-Afﬁne TRansformation Evolutionary (QUATRE) Algorithm

: QUasi-Afﬁne TRansformation Evolutionary (QUATRE) algorithm generalized differential evolution (DE) algorithm to matrix form. QUATRE was originally designed for a continuous search space, but many practical applications are binary optimization problems. Therefore, we designed a novel binary version of QUATRE. The proposed binary algorithm is implemented using two different approaches. In the ﬁrst approach, the new individuals produced by mutation and crossover operation are binarized. In the second approach, binarization is done after mutation, then cross operation with other individuals is performed. Transfer functions are critical to binarization, so four families of transfer functions are introduced for the proposed algorithm. Then, the analysis is performed and an improved transfer function is proposed. Furthermore, in order to balance exploration and exploitation, a new liner increment scale factor is proposed. Experiments on 23 benchmark functions show that the proposed two approaches are superior to state-of-the-art algorithms. Moreover, we applied it for dimensionality reduction of hyperspectral image (HSI) in order to test the ability of the proposed algorithm to solve practical problems. The experimental results on HSI imply that the proposed methods are better than Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA).


Introduction
The optimization problem refers to determine the value of decision variables under special constraints so that the objective functions can reach the optimal values. Traditional methods to solve the optimization problems include simplex method, steepest descent method, trust region method, penalty function method, etc. However, traditional optimization algorithms usually need to satisfy some specific conditions. For example, some algorithms require that the objective function must be continuous or differentiable, some require the problem to be solved is a convex optimization problem, and others require the constraint conditions to meet linear conditions. However, it is sometimes difficult to satisfy these conditions in practical applications. Therefore, in the 1940s, heuristic algorithms emerged, which constructed a model highly similar to the problem to be solved based on intuition or experience, and gave a feasible solution to the optimization problem at an acceptable cost. Furthermore, the deviation degree from the obtained feasible solution to the optimal solution could not be predicted. Heuristic algorithms reduce computational complexity at the expense of computational accuracy. In the 1960s, under the inspiration of bionics, meta-heuristic algorithms appeared, which obtained inspiration from random phenomena in nature. Meta-heuristic algorithms combined random algorithms with local (3) A new scaling factor is proposed in order to balance exploration and exploitation. (4) The proposed algorithm is performed on 23 benchmark functions and HSI. (5) A new fitness function for dimensionality reduction of HSI is proposed.
The rest of the paper is organized as follows. Section 2 describes related works include QUATRE and transfer functions. Section 3 describes the proposed two binary QUATRE approaches. Section 4 is experimental results and analysis on benchmark test functions and HSI. Section 5 depicts the main work of the paper and gives some suggestions for further work.

QUATRE
The algorithm is named QUATRE because its evolution equation is in an affinetransformation like form. Affine transformation is the transformation of a vector space into another vector space by a linear transformation coupled with a translation in geometry. It can be written as X = MX + B. Set X = [x 1 , x 2 , ...x ps ] T denotes position matrix of the particle population, where ps is the population size. A particle can be written as x i = [x i1 , x i2 , ...x iD ], where i = 1, 2, ...ps, and D is the dimension of problem. The exact evolution equation used in QUATRE is shown in Equation (1). Where means component-wise multiplication, same as the ".*" operation in Matlab. X G , X G+1 are the position matrixes at Gth and (G + 1)th generations.
Matrix B denotes the mutation matrix of particles, which can be generated in six schemes shown in Table 1. Among them, F ∈ (0, ∞) is a scaling factor but, usually, we constrain F ∈ [0, 2]. F means variation rate and the value of F determines the ability of exploration and exploitation. X r1,G , X r2,G , X r3,G , X r4,G and X r5,G are random matrixes generated by permutating row vectors of position matrix X G . X gbest,G is defined as Equation (2), where x gbest,G is a vector denotes the global best particle in Gth generation. As shown in Table 1, the first mutation strategy is B = X gbest,G + F * (X r1,G − X r2,G ) which can be denoted as QUATRE/best/1, meaning that the vector to be perturbed is the global best solution X gbest,G , and that only one difference vector (X r1,G − X r2,G ) is included.
The number of rows in M is the population size ps, while the number of columns in M is the dimension D of problem. If ps > D, we can extend M tmp according to ps. As shown in Equation (5), if ps = k × D + i, then the first k × D rows of M tmp are made up of k lower triangular matrices of D × D. The last i rows of M tmp are the first i rows of D × D matrix. Then, we can get M by randomly swap each row and each column. If ps < D, we can do similar operations to extend the matrix M tmp according to D.
The pseudo code of the QUATRE Algorithm is shown in Algorithm 1, where X pbest,G = [x 1pb,G , x 2pb,G , ...x ipb,G , ...x pspb,G ] is the personal best position matrix. x ipb,G , i = 1, 2, ..., ps is a vector which means the personal best position of particle i until Gth generation.

Algorithm 1: Pseudo code of QUATRE.
input: The dimension D, popolation size ps, max iterations MAXG mutation schema to calculate B, and the fitness function f (X).

Initialization:
Initialize searching space V, G = 1, position matrix X G X pbest,G = X G and calculation X gbest,G . Iteration: 1: while G < MAXG|!stopCriterion do 2: Generate matrix M by Equation (5), calculate M 3: Calculation mutation Matrix B according to mutation schema.

Transfer Function
In the binary version of the meta-heuristic algorithm, the role of transfer function is very important because the value of the transfer function is the probability that the element of the position vector takes 0/1 or the probability that the element goes from 0 to 1. Therefore, the transfer function must be a bounded function of [0,1]. In this section, four families of transfer functions are introduced.
In Reference [62], the sigmoid transfer function was firstly proposed on binary PSO by Kennedy, J. The particles of binary PSO can only be 0 or 1 according to their position vector; as shown in Equation (6), where v k i (t) is the velocity value of particle i at iteration t in k dimension.
After converting the continuous value of velocity to a probability value, the position value of particle i at iteration t + 1 in k dimension x k i (t + 1) can be updated with the probability value by Equation (7), where rand ∈ [0, 1] is a random variable. In this strategy, the sigmoid function forces particles to take values of 0 or 1 according to their velocity values.
Subsequently, in Reference [63], S-shaped and V-shaped families of transfer functions are proposed by Mirjalili, S. The expressions are shown in Tables 2 and 3. The S-shaped families of transfer functions are extensions of the sigmoid function. Therefore, they used Equation (7) to update the position value. However, the V-shaped transfer functions used a different position update strategy as shown in Equation (8), where x k i (t) −1 refers to the binary inverse operation. In this strategy, particles stay in their current positions when their velocity values are low, and switch to their complements when the velocity values are high.  Table 3. The expressions of the V-shaped families of transfer functions.

Name
Transfer Function The new U-shaped families of transfer functions [63] and Z-shaped families of transfer functions [64] show good performance in binary PSO. The expresses are shown in Tables 4 and 5. The position value update strategy used in U-shaped and Z-shaped transfer functions is Equation (8), the same as V-shaped families of transfer functions. Table 4. The expressions of the U-shaped families of transfer functions.

Name
Transfer Function T 4 (x) = min (|x 4 |, 1) Table 5. The expressions of the Z-shaped families of transfer functions.

Proposed Binary QUATRE Algorithm
In this section, a novel binary QUasi-Affine TRansformation Evolutionary (BQUATRE) algorithm is proposed for dimensionality reduction on HSI. First of all, mathematical analysis is performed on BQUATRE, then the improved four families of transfer functions are proposed. Furthermore, in order to balance the exploration and exploitation, a new linear increment scale factor is presented. Finally, two approaches of BQUATRE algorithm are described in detail.

Mathematical Analysis
Suppose that the position matrix at G generation X G of BQUATRE is made up of 0 or 1. So the X r1,G , X r2,G , X r3,G , X r4,G , X r5,G and X gbest,G at Gth generation are also binary matrixes. The mutation matrix B is calculated according to the schemes shown in Table 1. However, the range of scale factor F is [0, 2]. Therefore, the mutation matrix B will not be a binary matrix, and the position matrix X G+1 at (G + 1)th generation obtained by Equation (1) will also not be a binary matrix. In order to further analyse, we choose the first mutation strategy in Table 1 as an example, which is Equation (9).
There are eight combinations about the values of X gbest,G , X r1,G and X r2,G , which are described as follows: (1) if X gbest,G = 0, X r1,G = 0 and X r2,G = 0.
From the above analysis, it can be concluded that B ∈ [−2, 3]. The same conclusion can be drawn from mutation strategies QUATRE/target/1 and QUATRE/best/1. In the last three mutation strategies, QUATRE/target/2, QUATRE/rand/2 and QUATRE/best/2, it can be concluded that B ∈ [−4, 5] by the same way.

Improved Four Families of Transfer Functions
In this section, the results concluded from Section 3.1 will continued to be discussed. The same as in Section 3.1, we choose Equation (9) as an example, then B ∈ [−2, 3]. As shown in Figure 1, the four dotted lines represent the original S-shaped families of transfer functions define in Table 2, we can find that they do not fit the search space [−2, 3]. The center of search space is 0.5, while the center of S-shaped families of transfer functions is 0. Another problem is that the maximal and minimal value of these functions in [−2, 3] does not reach 1 and 0. However, we update the position matrix by Equation (7), in which the rand is generated in [0,1]. Therefore, we shift the functions to the center of the search space, then stretch the value of them to [0, 1]. The exact expression of the four improved S-shaped families of transfer functions are described in Table 6. The solid lines in Figure 1 show the curve of the improved four functions. Table 6. The expressions of the improved S-shaped transfer functions.

Name
Transfer Function

Name
Transfer Function Table 8. The expressions of the improved U-shaped transfer functions. Table 9. The expressions of the improved Z-shaped transfer functions.

New Scaling Factor Based on Exploration and Exploitation
The convergence process of meta-heuristic algorithm can be regarded as two stages: exploration and exploitation. In the early stage of the algorithm, the exploration ability is conducive to searching more space and jumping out of the local optimal solution. In the later stage of the algorithm, the approximate optimal solution has been found, and the development ability can help to find the optimal solution. Similarly, for the binary version, the position needs to be able to switch between 0 and 1 quickly in the beginning and slowly in the end. In continuous QUATRE, Liu, N. proposed a linear decrease scale factor shown as Equation (10) [48], where F max and F min are the predetermined maximum and minimum values of scale factor F. Usually, we set F ∈ [0, 2], then F max = 2 and F min = 0. G is the current generation number, and MAXG is the maximum generation number.
However, in the binary version, things are different, as proved by Hu, P. [54] and Liu, J. [65], as the transfer function also determines exploration and exploitation. The slope of the transfer function can be calculated to evaluate the velocity trend, for the slope indicates the switching speed of position. Figure 5 shows the curve of original S 1 (x), V 1 (x), U 1 (x), and Z 1 (x) functions described in Tables 2-5. We can find that the slope is small when the value of |x| is large, while the slope is large when the value of |x| is small.
Since the transfer function and scale factor F combined determine the probability of conversion in the proposed binary QUATRE, a linearly increasing scale factor is presented in this manuscript as shown in Equation (11).

Binary QUATRE Algorithm-Approach 1 (BQUATRE1)
The first approach of binary QUATRE algorithm (BQUATRE1) converts the continuous position matrix at (G + 1)th generation X G+1 to the binary. In this section, BQUATRE1 will be described in detail.
Suppose that the position matrix at Gth generation X G is obtained. First of all, the cooperative search matrix is generated at this generation M tmp and M is calculated according to Equation (5). M can be obtained through binary inverse operation on the matrix M.
After that, the scale factor F can be obtained by Equation (11), that is The mutation matrix B is calculated according to a mutation strategy in Table 1. For example, we choose the first mutation strategy Equation (9), that is B = X gbest,G + F * (X r1,G − X r2,G ). Then X G+1 can be calculated by Equation (1) The element in X G+1 denoted as x ik,G+1 ∈ [−2, 3] refers to the ith particle in k dimension. In order to distinguish, the continuous position matrix at (G + 1)th generation is denoted as X cont G+1 , the binary matrix at (G + 1)th generation is denoted as X bin G+1 . Then, Equation (1) can be rewritten as Equation (12).
The particle i at k dimension of X cont G+1 denoted as x cont ik,G+1 . Convert every element x cont ik,G+1 to probability value by a specified transfer function, then whole matrix X cont G+1 can be obtained. In detail, if S-shaped families of transfer functions are selected, the element of matrix X cont G+1 can be updated with the probability value by Equation (13).
However, if V-shaped, U-shaped or Z-shaped families of transfer functions are selected, the position matrix X bin G+1 can be updated with the probability value by Equation (14).
As shown in Equation (12), some elements of matrix X cont G+1 come from X bin G , while the others come from B. Since the elements come from X bin G is inherently binary, the improved four families of transfer functions do not fit X cont G+1 . In this situation, the original four families of transfer functions T(x) are used to convert X cont G+1 from a continuous matrix to binary one. The pseudo code of BQUATRE1 is described in Algorithm 2. Since most matrices are binary, we only denote the continuous matrix, vector or number, for example, X cont means X is a continuous matrix.

Algorithm 2:
Pseudo code of binary QUATRE algorithm-Approach 1(BQUA-TRE1). input: The dimension D, population size ps, max iterations MAXG, fitness function f (X), mutation schema to calculate B, and original transfer function T(x).

Initialization:
Initialize searching space V, G = 1, position matrix X G X pbest,G = X G and calculation X gbest,G . Iteration: Generate matrix M by Equation (5), calculate M 3:

Binary QUATRE Algorithm-Approach 2 (BQUATRE2)
The second approach of binary QUATRE algorithm (BQUATRE2) only converts the mutation matrix B to the binary. In this section, BQUATRE2 will be described in detail.
Again, we suppose that the position matrix at Gth generation X G is got. First of all, M and M can be obtained in the same way.
The mutation matrix B is calculated according to Equations (9) and (11) . The particle i of B in k dimension can be denoted as b ik , where i = 1, 2, ..., ps and k = 1, 2, ...D. Since all elements in B are continuous values, the improved families of transfer functions T i (x) are used in this approach. In order to distinguish, the continuous mutation matrix is denoted as B cont , while the binary mutation matrix denoted as B bin .
Then, we should convert the probability value T i (x) to binary. If the S-shaped families of transfer functions are selected, consider the second case in Section 3.1, that is if X gbest,G = 0, X r1,G = 0 and X r2,G = 1, then B ∈ [−2, 0]. In this situation, the b ik should have a high probability to be 0 for only the value of X r2,G is not 0. Similarly, consider the seventh case in Section 3.1, if X gbest,G = 1, X r1,G = 1 and X r2,G = 0, then B ∈ [1,3]. If the value of b ik have a high probability to be 0, the design is reasonable. Therefore, if the S-shaped functions are selected, the Equation (7) is replaced by Equation (15) If the improved V-shaped, U-shaped or Z-shaped families of functions, which are described in Tables 7-9 are chosen, we still adopt the strategy shown as Equation (8) to update the probability value. In this approach, the update function is shown in Equation (16), where x ik,G is a binary value that shows the position of particle i in k dimension at G generation. Then, the whole binary mutation matrix B bin can be obtained. Finally, the position matrix X G+1 can be obtained.
The pseudo code of BQUATRE2 described in Algorithm 3.

input:
The dimension D, population size ps, max iterations MAXG, fitness function f (X) mutation schema to calculate B, and improved transfer function T i (x).

Benchmark Function
In this section, the proposed BQUATRE1 and BQUATRE2 are examined by 23 benchmark functions. The mathematical formulas and properties of these functions are described in Tables 10-12; especially, D means the dimension of function and f min represents the optimum. In detail, Table 10 contains the unimodal functions (denoted as f1-f7), Table 11 shows the details of common multimodal functions (denoted as f8-f13) and Table 12 displays the details of multimodal functions in low dimension (denoted as f14-f23).
Unimodal functions have only a global optimal solution and no local optimal solution, which can verify whether the algorithms can find the global optimal solution within the finite population size and iterations. Multimodal functions have many local optima, which can verify whether the algorithms can fall into the local optimal solutions or not. Since the function dimension is too low, the algorithms are prone to precocity and fall into local optimum. Therefore, the multimodal functions in low dimension can verify the convergence results under more stringent conditions.
Generalized penalized 1 [1 + 10 sin 2 (πy i+1 )] + (y n − 1) 2 } + ∑ n i=1 u(x i , 10, 100, 4) Generalized penalized 2 Goldstein-Price Two approaches of BQUATRE algorithm with four families of transfer functions are examined in the experiment. Due to space limitation, we choose the second function of these families of transfer functions as an example, that is S 2 , V 2 , U 2 , Z 2 , S 2i , V 2i , U 2i and Z 2i . Fist of all, the S-shaped an V-shaped families of transfer functions are compared, that is the first approach of BQUATRE algorithm with S 2 transfer function (record as BQUATRE1-S2), the first approach of BQUATRE algorithm with V 2 transfer function (record as BQUATRE1-V2), the second approach of BQUATRE algorithm with the improved transfer function S 2i (record as BQUATRE2-S2i) and the second approach of BQUATRE algorithm with the improved transfer function V 2i (record as BQUATRE2-V2i). Since QUATRE is the improved DE, the proposed binary QUATRE is compared with Binary DE(record as BDE). The binary PSO is the most primitive binary algorithm, so we also used it in our experiment (record as BPSO). The third compared algorithm is the Advanced Binary Grey Wolf Optimizer algorithm with the V3a transfer function (record as ABGWO-V3a), which is proposed in Reference [54] by Hu, P. ABGWO-V3a can obtain better performance than many other binary meta-heuristic algorithms. Table 13 shows the experimental results of BDE, ABGWO-V3a, BQUATRE1-S2, BQUATRE1-V2, BQUATRE2-S2i, BQUATRE2-V2i, and BPSO. In detail, all algorithms in the experiment run 30 times, 500 iterations and 30 individuals on each benchmark function. Among them, AVG and STD are the mean and standard deviation of the results of 30 times, respectively. The results in red and in blue are the best result of the seven algorithms. The blue results means that all algorithms obtained the optimal solution. The last line refers to the number of times the algorithm gets the optimal result.
From Table 13, we can see that the seven algorithms get the same solutions on 6 benchmark functions, which are all multimodal functions in low dimensions. BQUATRE1-V2, BQUATRE2-S2i and BQUATRE2-V2i obtained 14, 15 and 14 red results, respectively. However, BDE, ABGWO-V3a, and BPSO only obtained 6, 3 and 1 red results, respectively. Therefore, BQUATRE1-V2, BQUATRE2-S2i and BQUATRE2-V2i are superior to BDE, ABGWO-V3a, and BPSO. At the same time, we can see that BQUATRE1-S2 does not perform well. However, BQUATE2-S2i obtained the best results compared to the other five algorithms. This result shows Equation (14) is effective for BQUATE2. Figures 6-8 are the visualization results of Table 13.    Moreover, the U-shaped and Z-shaped families of transfer functions are compared to BDE, ABGWO-V3a, and BPSO, that is the first approach of BQUATRE algorithm with U 2 transfer function (record as BQUATRE1-U2), the first approach of BQUATRE algorithm with Z 2 transfer function (record as BQUATRE1-Z2), the second approach of BQUATRE algorithm with the improved transfer function U 2i (record as BQUATRE2-U2i) and the second approach of BQUATRE algorithm with the improved transfer function Z 2i (record as BQUATRE2-Z2i). The results are shown in Table 14.
From Table 14, it can be seen that the seven algorithms get the same solutions on six benchmark functions. However, BQUATE1-U2 gets the best results on other 12 functions. Furthermore, BQUATRE1-Z2, BQUATRE2-U2i and BQUATRE2-Z2i obtained 16, 15 and 7 red results, respectively. However, BDE, ABGWO-V3a, and BPSO only obtained 6, 3 and 1 red results, respectively. On the whole, BQUATRE1-U2, BQUATRE1-Z2, BQUATRE2-U2i and BQUATRE2-Z2i are superior to BDE, ABGWO-V3a, and BPSO. From Tables 13 and 14, we can conclude that V-shaped, U-shaped and Z-shaped functions in BQUATRE1 and BQUATRE2 are superior to BDE, ABGWO-V3a, and BPSO. The BQUATRE2 with S-shaped transfer function can also perform well. The proposed second approach of BQUATRE can perform well on the unimodal functions and common multimodal functions. However, at the same time, we can see the improved algorithm does not improve much on multimodal functions in low dimensions. Figure 9-11 are the visualization results of Table 14.
To further compare the results, t-test is used for a significance test. The t-test is used to compare the mean value of two groups of data. It is suitable for a normal distribution with a small sample size and unknown population standard deviation. In the experiments, a two tailed t-test is used with significant level of 0.01, which means very significant. Table 15 is shows the results between ABGWO-3Va and QUATRE1-S2, QUATRE1-V2, QUATRE1-U2, QUATRE1-Z2, QUATRE2-S2i, QUATRE2-V2i, QUATRE2-U2i, QUATRE2-Z2i. "+" appears that the compared algorithm is superior to ABGWO-3Va, and "−" indicates that the algorithm is inferior to ABGWO-3Va. "=" implies that the performance of the algorithms is consistent.      We can see from Table 15 that the four BQUATRE2 algorithms and BQUATRE1 with Z2 transfer function are not inferior to ABGWO-3Va on all 23 benchmark functions. BQUA-TRE1 with V2 and U2 transfer function are superior to ABGWO-3Va on 11 functions, while inferior on 2 functions. Table 16 shows the results between BDE and the eight BQUATRE. From Table 16, QUATRE1-Z2, QUATRE2-S2i, QUATRE2-U2i and QUATRE2-Z2i are not inferior to BDE on all 23 benchmark functions. QUATRE1-V2, QUATRE1-U2 and QUATRE2-V2i are superior to BDE on 9 functions, while inferior on 1 or 2 functions.  Finally, we compare the runtime of the 10 algorithms in this experiment, as shown in Table 17. It can be seen that BDE algorithm has the shortest running time and ABGWO-V3A has the longest running time. As a whole, the four BQUATE2 takes longer to run than BQUATE1, as the transfer functions they used are more complex.

Hyperspectral Imagery
In this section, the proposed BQUATRE1 and BQUATRE2 are used for dimensionality reduction in hyperspectral dataset. The dataset used in the experiment is the Indian Pines image, which is obtained by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor in 1992. The dataset captures the Indian Pines unlabeled agricultural site in northwestern Indiana and contains 220 × 145 × 145 bands. The spatial resolution of this image is 20 m per pixel. The water absorption bands of HSI are seriously polluted by noise and are not suitable for classification. Therefore, twenty water absorption bands have been deleted from the Indian Pines. In the image, there are 16 features, including corn, soybean, alfalfa, wheat, oat, grass, tree, etc. Some features are subdivided; for example, corn is divided into no-tillage, low-tillage and traditional tillage. Figure 12 is the false-color composite image and ground truth data of Indian Pines. As is known, the HSI cannot be displayed on the computer because it has too many bands. False color image is the color image obtained by the synthesis of different wavebands and is often used in the classification of remote sensing images. False color synthesis technology is one of the methods of image enhancement. It converts the images composed of many bands (more than four) into three-bands or four-bands synthesis images. Figure 12a is synthesized by three-bands synthesis technology. Classification performance is the main index of dimensionality reduction in HSI. In this manuscript, three common metrics are used to evaluate classifier performance, namely: overall accuracy(OA), average accuracy (AA) and Kappa coefficient. Therefore, the fitness function is Equation (17), where OA k f oldLoss , AA k f oldLoss and Kappa k f oldLoss are the OA, AA and Kappa coefficient errors of k-fold cross validation; W i , i = 1, 2, 3 are weight coefficients. In our experiment, we set k = 5 and set W 1 = 0.4, W 2 = 0.3, W 3 = 0.3, for the OA is the most important index of the three classification performances.
In this experiment, we use the first function of the families of the transfer function. In order to compare S-shaped, V-shaped, U-shaped and Z-shaped transfer functions, we adopt the approach BQUATRE1 with U-shaped and Z-shaped transfer functions, the BQUATRE2 with with S-shaped and V-shaped ones. The compared dimensionality reduction algorithms are classic PCA and LDA. The number of dimensions selected by PCA and LDA is 10, which is computed by the "intrinsic dimensionality estimation" function of "Matlab toolbox for dimensionality reduction" [66]. The classifier used in experiments is Support Vector Machine (SVM), which is implemented from a library for LIBSVM and used a Gaussian kernel [67]. Since it is difficult to obtain samples of hyperspectral images, the classification with fewer samples is of more practical significance. Thereby, in this experiment, the training samples are randomly chosen and account for 5% of the ground truth. Thereby, seven algorithms are compared in this experiment; that is, the traditional SVM without dimensionality reduction (record as SVM), PCA performed to dimensionality reduction before SVM (record as PCA-SVM), LDA performed to dimensionality reduction before SVM (record as LDA-SVM), BQUATRE1 performed with U-shaped function to dimensionality reduction before SVM (record as BQUATRE1-U1-SVM), BQUATRE1 performed with Zshaped function to dimensionality reduction before SVM (record as BQUATRE1-Z1-SVM), BQUATRE2 performed with improved S-shaped function to dimensionality reduction before SVM (record as BQUATRE2-S1i-SVM), and BQUATRE2 performed with improved V-shaped function to dimensionality reduction before SVM (record as BQUATRE2-V1i-SVM). Furthermore, the proposed algorithms run 500 iterators with 20 populations. We performed random 5% sampling 500 times before doing the experiment in order to ensure the robustness. Table 18 describes the classification performance of these algorithms performed on the Indian Pines dataset. The red color means the proposed algorithm performs better than all three compared algorithms on this index. We can see that the proposed four algorithms all get higher OA and Kappa indexes than the three compared algorithms. However, the AA value of all proposed algorithms are slightly lower than LDA-SVM. That is due to the poor performance in classifying the "oats" and "alfalfa" by the proposed algorithms. It can be seen that the samples of these two classes are smaller than others from Figure 12. LDA-SVM correctly classified a few samples in these two classes that has a greater impact on the value of AA, but has a smaller impact on the value of OA. Figure 13 is the visualization results of the average OA, AA, Kappa values of the BQUATE2-S1i-SVM, BQUATE2-V1i-SVM, BQUATE1-U1-SVM, BQUATE1-Z1-SVM, SVM, PCA-SVM, and LDA-SVM on Indian Pines.
In order to visually see the classification results, we paint the different ground features with different colors. Figure 14 shows the classification maps. As a whole, we can see that the proposed algorithms perform better for dimensionality reduction of HSI than PCA and LDA.

Conclusions
In this manuscript, we convert the QUATRE algorithm to binary version by two approaches in order to enable the QUATRE algorithm to solve the practical application of binary types. In the first approach, the new individuals produced by mutation and crossover operation are binarized. In the second approach, binarization is done after mutation, then perform cross operation with other individuals. Mathematical analysis is performed on the proposed algorithm and the improved transfer functions are proposed in order to improve the performance of the proposed algorithm. Furthermore, in order to balance the exploration and exploitation, a new linear increment scale factor is proposed. The proposed algorithm performs well on benchmark functions and then it is applied for practical dimensionality reduction of HSI. It can be seen from the results that the proposed algorithm is superior to state-of-art algorithms and it is helpful for solving the practical problems. The proposed algorithm cannot handle multi-model low-dimensional functions well. This will be the main work in the future.

Conflicts of Interest:
The authors declare no conflict of interest.