DA-Based Parameter Optimization of Combined Kernel Support Vector Machine for Cancer Diagnosis

As is well known, the correct diagnosis for cancer is critical to save patients’ lives. Support vector machine (SVM) has already made an important contribution to the field of cancer classification. However, different kernel function configurations and their parameters will significantly affect the performance of SVM classifier. To improve the classification accuracy of SVM classifier for cancer diagnosis, this paper proposed a novel cancer classification algorithm based on the dragonfly algorithm and SVM with a combined kernel function (DA-CKSVM) which was constructed from a radial basis function (RBF) kernel and a polynomial kernel. Experiments were performed on six cancer data sets from University of California, Irvine (UCI) machine learning repository and two cancer data sets from Cancer Program Legacy Publication Resources to evaluate the validity of the proposed algorithm. Compared with four well-known algorithms: dragonfly algorithm-SVM (DA-SVM), particle swarm optimization-SVM (PSO-SVM), bat algorithm-SVM (BA-SVM), and genetic algorithm-SVM (GA-SVM), the proposed algorithm was able to find the optimal parameters of SVM classifier and achieved better classification accuracy on cancer datasets.


Introduction
In the 21st century, cancer is expected to be the major cause of death all over the world.The GLOBOCAN 2018 cancer morbidity and mortality estimates published by the International Agency for Research on Cancer showed that there were 18.1 million new cancer cases and 9.6 million cancer deaths in 2018 [1].The correct diagnosis of cancer is essential for patients to receive timely and correct treatment.Machine learning plays a unique and important role in the field of cancer treatment.For example, some researchers applied neural networks to the classification of breast cancer [2,3], and Dongmei Ai et al. [4] identified intestinal microorganisms associated with colorectal cancer by means of decision tree aggregation with a random forest model.Support vector machine (SVM) is a supervised machine learning method used to solve classification and regression problems, firstly proposed by Vapnik on the basis of statistical learning theory [5].SVM was applied in many fields, such as economics [6], electrics [7], and medical science [8].Especially in the field of cancer diagnosis, many studies have already proven the excellent performance of SVM classifier [9][10][11].SVM uses the principle of structural risk minimization instead of empirical minimization and it can obtain a better generalization ability from limited samples.Moreover, when facing the problem that data cannot be linearly separated, SVM can not only use the idea of kernel function to map nonlinear features to a high-dimensional space, but it can also avoid the problem of "dimensional disaster".
The performance of SVM classifier depends on three aspects, these being the penalty parameter C of SVM classifier, the type of the kernel function, and its parameters.To improve classification accuracy of SVM classifier, some approaches were presented to search for the optimal parameters, such as grid search [12] and gradient descent [13].Although these methods have proven the effectiveness in the corresponding literature experiments, they are likely to fall into the local optimum point easily and have the defect of low efficiency.In recent years, some meta-heuristic algorithms, such as the dragonfly algorithm (DA) [14], particle swarm optimization (PSO) [15], bat algorithm (BA), [16] and genetic algorithm (GA) [17][18][19] have achieved competitive results when they were used to tune SVM classifier's parameters.However, most of this research only focused on the SVM classifier with a single kernel function.Though some literature [20,21] indicates that combining multiple kernel functions can obtain better performance than a single kernel function, little research has provided an in-depth analysis of the performance of SVM classifier with a combined kernel function.There would therefore seem to be a definite need to systematically study the complex optimization problem in the SVM classifier with a combined kernel.
In 2015, Mirjalili proposed a new meta-heuristic algorithm called the dragonfly algorithm (DA) [22], which has already been used to solve different optimization problems, such as feature selection [23,24], the knapsack problem [25], and image processing [26].Considering that DA has an excellent global search ability and there are few studies on SVM classifier with combined kernels in the field of cancer classification, this paper proposed a novel classification algorithm based on DA and SVM classifier with a combined kernel function (DA-CKSVM) to improve the classification ability for cancer diagnosis.The objective of this research was to construct an SVM classifier with two different kernel functions and use DA to optimize all the parameters in this SVM classifier, such as the parameters in both kernel functions, the weight coefficient of the combined kernel, and SVM's penalty parameter C.
The overall structure of the study takes the form of eight chapters, including this introductory chapter.The remaining part of the paper proceeds as follows: The related work on cancer classification is described briefly in Section 2. Section 3 introduces the basic idea of SVM.Section 4 deals with the construction of the combined kernel.Then, DA is introduced in Section 5. Section 6 is devoted to present the proposed algorithm.Section 7 focuses on the experimental results and discussions.Finally, conclusions and future work are provided in Section 8.

Related Work
Since SVM was proposed, many researchers have used it to conduct research on cancer classification.For example, Guangru Xu et al. [27] used SVM analysis to predict the recurrence risk and prognosis for patients with colon cancer.Youlin Tuo et al. [28] constructed an SVM classifier to evaluate the possibility of breast cancer metastasis and obtained high classification accuracy in several independent data sets.Yuanpeng Li et al. [29] used both SVM models and partial-least-square discriminant analysis to diagnose early gastric cancer.The experimental results showed the diagnostic model obtained from SVM was evidently better than the partial-least-square discriminant analysis.
It should be noted that a great deal of research has already recognized that the parameters of SVM classifier play a vital role in improving the effects of classification.The kernel function and its parameters define a nonlinear mapping from the input space to the high-dimensional space, while the trade-off between minimizing the training error and maximizing the classification margin was determined by the penalty parameter C of SVM.Different parameter configurations will lead to different classification results.Therefore, how to select appropriate parameters becomes the main challenge on improving the classification ability of SVM classifier.
A number of researchers have focused on optimizing the parameters of kernel functions to obtain better classification accuracy in cancer diagnosis.M. Prabukumar et al. [30] used SVM classifier to identify the lung cancer and more than 98% accuracy was achieved.The grid search method was employed to search for the optimal parameters in this study.Himar Fabelo et al. [31] used radial basis function (RBF) kernel function, linear kernel function, polynomial kernel function, and sigmoid kernel function to construct SVM classifiers to recognize brain tumors, respectively, and the cross-validation method was utilized to find the optimum parameters of SVM classifiers.By comparing the classification effects of four different models, the results proved that polynomial kernel had advantages in a few evaluation metrics, but in general RBF kernel function got the best classification results.The literature [17] applied linear, quadratic, RBF, and third-order polynomial SVMs for the screening and classification of prostate cancer, in which the parameter of RBF kernel and parameter C were optimized by the exhaustive method.However, the four models did not perform very well.Then, they combined SVM with the principal component analysis (PCA), the successive projections algorithm (SPA), and GA to automatically optimize penalty parameter C and the parameters in RBF kernel.The experimental results showed that the combination of meta-heuristic algorithm GA and SVM achieved the best performance.Overall, there seems to be some evidence to indicate that polynomial kernel function and RBF kernel function have better performance than other kernel functions, and the meta-heuristic algorithms have a better optimization ability than conventional optimization methods such as grid search and gradient descent.
Further understanding of the nature of kernel functions will help to build a more powerful SVM classifier.In 2001, Scholkopf et al. first divided the kernel function into local kernel function and global kernel function [32].Surveys conducted by Simts and Jordaan [33] showed that local kernels have a good interpolation ability, while global kernels have a good extrapolation ability.They linearly combined a local kernel function with a global kernel function to obtain a novel kernel function that exploited the advantages of both kernel functions to make SVM classifiers achieve better performance.Another important finding was that the performance of SVM classifier with the combined kernel function is influenced by the weight coefficient of the kernel function.To stratify and predict clinical outcomes in patients with ovarian cancer, Jaya Thomas and Lee Sael [34] combined a linear kernel function and a RBF kernel function in a weighted linear combination.They achieved satisfactory results after optimizing parameters and the weight coefficient.However, they only used a separate dataset that accounted for 25% of the total sample size to determine the parameters of kernels, which may have led to less precise parameters.In terms of the weight coefficient optimization, they used the optimization method proposed by Zien et al. [35].Therefore, optimizing the weight coefficient and parameters of kernel functions separately may cause uncoordinated results.Nguyen et al. [36] combined three kernel functions which included inverse multi-quadric kernel, RBF kernel, and sigmoid kernel, and optimized parameters with an evolutionary algorithm to construct an SVM classifier.The classification results on cancer datasets indicated the methodology was superior to a single kernel function.However, the weight coefficients of kernel functions were not taken into account.
The traditional optimization methodologies such as grid search and gradient descent are not only time-consuming, but also insufficient to synchronously optimize all parameters in the SVM classifier with multiple kernel functions.Meta-heuristic algorithms have already been applied to search for the optimal parameters of SVM classifier with a single kernel function and have been proven to have high classification accuracy and stability.However, up until now, far too little attention has been paid to the use of the meta-heuristic algorithm to optimize parameters in an SVM classifier with multiple kernel functions by solving the complex optimization problem.

The Basic Idea of SVM and the Construction of the Combined Kernel Function
As a powerful supervised learning method, SVM is suitable to deal with classification problems and regression problems.On the one hand, SVM has a high accuracy rate for linearly inseparable data.On the other hand, linearly inseparable data can be mapped into a high-dimensional space through kernel functions in SVM.This chapter will briefly introduce the basic principles of SVM with the classical binary classification as an example.

Linear SVM
Suppose that training data U = (x 1 , y 1 ), (x 2 , y 2 ), • • • , (x n , y n ) is linearly separable, where x i ∈ R n represents the i-th training sample that is represented by m features, and y i ∈ {−1, 1} represents the corresponding class label.The hyperplane ω T + b = 0 is the decision boundary of two types of data, where a weight vector ω is a normal vector of the hyperplane, b is the deviation, and x is the training sample.The goal of SVM is to determine the appropriate ω and b to make the hyperplane as far as possible from the nearest samples.Therefore, the training samples can be correctly classified by: (1) One inequality can be obtained for the two formulas above: In order to get the optimal hyperplane (ω•x) + b = 0, SVM needs to deal with the optimization problem which is shown below: where min Φ(ω) = 1 2 ω 2 is the objective function and y i (ω T x i + b) ≥ 1 is the constraint.The optimal solution of Equation ( 4) is the saddle point of the following Lagrange function: where α is the Lagrange multiplier.Since the gradients of ω and b at the saddle point are zero, therefore: Substituting Equations ( 6) and ( 7) into Equation ( 5), the problem of constructing an optimal hyperplane is translated into a dual quadratic programming problem: ω, b and α can be obtained by solving Equations ( 6)-(8).Only a small part of α i are greater than zero, and these corresponding samples which are closest to the hyperplane are called support vectors (SVs).
For an unknown sample x, the following formula can be used to determine its class: Processes 2019, 7, 263 However, in practical applications, data are usually not linearly separable, which will result in a large number of misclassified samples.Hence, a relaxation term ξ i ≥ 0 needs to be added into linear SVM to relax the constraints as follows: The corresponding objective function is: where penalty parameter C represents the degree of punishment for the error.The larger the C is, the heavier the penalty will be.

Nonlinear SVM
For the non-linearly separable data, a kernel function is used to map them into a high-dimensional space to make them linearly separable.The kernel function is defined as follows: The optimization problem of SVM is shown in Equation ( 13):

The Construction of the Combined Kernel Function
Various types of kernel functions are usually applied in SVM classifier, such as the RBF kernel function, linear kernel function, polynomial kernel function, and sigmoid kernel function.Each type of kernel function has its own advantages and disadvantages.Studies conducted by Simts and Jordaan [33] showed that in local kernels, only the data close to each other will affect the value of kernels, while in global kernels, the data far from each other will also affect the value of kernels.This means that the learning ability of local kernels is stronger than that of global kernels, but the generalization ability of local kernels is weaker than that of global kernels.In order to improve the classification accuracy of SVM classifier, this paper linearly combined a local kernel function with a global kernel function to take advantages of these two kernel functions.
Four alternative kernel functions are listed below, which are usually used in SVM classifier: (1) Linear kernel function (2) Polynomial kernel function (3) RBF kernel function (4) Sigmoid kernel function RBF kernel function is a local kernel function, while the other three are all global kernel functions.According to the composition conditions of the kernel function, if a combined kernel function is linearly constructed from existing kernels, it is still a kernel function [37] and can be used in SVM classifier.In view of local kernel function's predominant learning ability and global kernel function's outstanding generalization ability, this paper combined a RBF kernel with a polynomial kernel to obtain a novel combined kernel function which is shown in Equation ( 18): where λ is the weight coefficient in the range from 0-1.In this SVM classifier, the parameter tuning problem is to search for the optimal combination of the parameters set (C, gamma, d, λ).The dragonfly algorithm is applied in this paper to solve this complex optimization problem.

Dragonfly Algorithm (DA)
The dragonfly algorithm (DA) is a novel swarm intelligence algorithm presented by Mirjalili in 2015 [22].The exclusive cluster behaviors of dragonflies, namely hunting and migration, are the main source of inspiration for the algorithm.The hunting swarm is called the static swarm where dragonflies gather into small groups and fly back and forth in a small area to hunt prey.On the other hand, in the dynamic swarm, a big swarm of dragonflies can fly a long distance in one direction.These two states are very similar to the exploratory phase and the exploitation phase of meta-heuristic algorithms.The flight of the static swarm in a small area is similar to the exploring stage of the optimization algorithm, while the flying of the dynamic group along one direction is beneficial to exploitation.
The behavior of any swarms follows the principles given by Reynolds [38]: • Separation, whose aim is to avoid the collision between individuals and their neighbors in the static swarm.

•
Alignment, whose purpose is to match the individual velocity with others in the same group.

•
Cohesion, which is used to indicate the tendency of individuals to move towards the center of the group.
In order to survive, individuals in the group should be attracted to food and distracted by outward enemies.Considering these behaviors, five primary factors are used to update individuals' positions in a swarm.The mathematical models for these behaviors are as follows: (1) Separation where X represents the position of current individual, X j shows the position of j-th adjacent individual, and N is the amount of neighboring individuals.(2) Alignment where V j is the velocity of the j-th neighboring individual.(3) Cohesion where X + represents the position of the food source.
The behavior of dragonflies is represented by the combination of the behaviors above.Similar to the velocity vector in PSO, the step vector ∆X is used in DA to update the position of dragonflies, which is defined as follows: where s and S i represent the separation weight and the separation of i-th individual, respectively; a and A i are the alignment weight and the alignment of the i-th individual, respectively; c and C i indicate the cohesion weight and the cohesion of the i-th individual, respectively; f and F i show the food factor and the food source of the i-th individual, respectively; e and E i are the enemy factor and the position of the enemy of the i-th individual, respectively; w represents the inertia weight, and t is the iteration counter.The position vector is represented by the following formula: where t is the current iteration.
In the absence of adjacent solutions, DA uses random walk or Levy flight [39] to improve the randomness, stochastic behavior, and exploration of dragonflies.Therefore, the dragonflies update their position as follows: where d is the dimension of the position vector.The Levy flight can be calculated according to Equation (27): where r 1 , r 2 are two random numbers in [0, 1], β is a constant, and σ is calculated by Equation ( 28): where Γ(x) = (x − 1)!.

Proposed Algorithm: DA-CKSVM
This section will elaborate the proposed algorithm DA-CKSVM, which uses DA to optimize parameters of SVM with a combined kernel.

The Basic Process of DA-CKSVM
In the DA-CKSVM algorithm, each dragonfly represents a solution, that is, a combination of the parameters set (C, gamma, d, λ) which defines a four-dimensional search space for the optimization problem.The main process of the proposed algorithm is given below: Algorithm 1: The main process of DA-CKSVM Step 1: Set the maximum iteration times, the number of dragonflies, and the upper and lower bounds of each parameter in the parameters set (C, gamma, d, λ).
Step 2: Initialize the step vectors, the values of s, a, c, f , e and w in Equation (24), and the position of each individual.
Step 3: Train the SVM classifier with the training set and test it with the testing set.
Step 4: Evaluate the fitness value of each individual and update the enemy and food source.
Step 5: Update the values of s, a, c, f , e and w.
Step 7: Update the neighboring radius.
Step 8: If the dragonfly has at least one neighbor, the step vector and the position vector of the dragonfly will be calculated according to Equations ( 24) and (25).If not, the position vector will be updated by Equation (26).
Step 9: Adjust the new position based on boundaries of the parameters.
Step 10: If the maximum iteration times is reached, go to the Step 11.Otherwise, loop to Step 3.
Step 11: Output the final SVM classifier with optimal parameters.The schematic diagram in Figure 1 shows the whole process of the DA-CKSVM algorithm.As can be seen from the diagram, DA-CKSVM algorithm first initializes the relevant parameters.After that, the DA-CKSVM algorithm uses the normalized data set to train and test the SVM classifier with a combined kernel function by means of cross-validation.In the optimization process, the DA algorithm is employed to search for the best solution of the parameters set (C, gamma, d, λ).When the maximum number of iterations is reached, the algorithm is terminated and the final SVM classifier with the optimal parameters is achieved.
Processes 2019, 7, x FOR PEER REVIEW 8 of 17 Step 8: If the dragonfly has at least one neighbor, the step vector and the position vector of the dragonfly will be calculated according to Equations ( 24) and (25).If not, the position vector will be updated by Equation (26).
Step 9: Adjust the new position based on boundaries of the parameters.
Step 10: If the maximum iteration times is reached, go to the Step 11.Otherwise, loop to Step 3.
Step 11: Output the final SVM classifier with optimal parameters.
The schematic diagram in Figure 1 shows the whole process of the DA-CKSVM algorithm.As can be seen from the diagram, DA-CKSVM algorithm first initializes the relevant parameters.After that, the DA-CKSVM algorithm uses the normalized data set to train and test the SVM classifier with a combined kernel function by means of cross-validation.In the optimization process, the DA algorithm is employed to search for the best solution of the parameters set (C, gamma, d, λ).When the maximum number of iterations is reached, the algorithm is terminated and the final SVM classifier with the optimal parameters is achieved.

Fitness Function
The fitness function is applied to evaluate the performance of the parameter searching.The fitness function in this paper is defined as the classification error rate on test sets, as shown in Equation ( 29): The performance of the individual at the iteration t is evaluated by above equation.K and N denote the number of folds of cross-validation and the number of samples in the test set, respectively.c x j represents the classification result of the j-th sample in the test set, j indicates the label of each sample, and δ shows the relationship between c x j and y j , as shown in the following formula:

Data Sets and Experimental Platform
In order to validate the proposed DA-CKSVM algorithm's performance in cancer classification, experiments were carried out on six cancer data sets from University of California, Irvine (UCI) machine learning repository and two cancer data sets from Cancer Program Legacy Publication Resources.Breast Cancer Coimbra (BCC), Haberman's Survival (HS), Hepatocellular Carcinoma (HCC), Thoracic Surgery (TS), Breast Cancer Wisconsin Diagnostic (BCWD), and Breast Cancer Wisconsin Prognostic (BCWP) are from UCI machine learning repository; Diffuse Large B-cell Lymphoma (DLBCL_D) and Breast_A (B_A) come from Cancer Program Legacy Publication Resources.Table 1 lists the descriptions of all data sets.All the experiments in this paper were implemented on Matlab 2014(a) and SVM classifier was trained with a library for support vector machines (LIBSVM) [40].The details of the experimental platform are given in Table 2.

Data Preprocessing
Given that the characteristics in larger numerical ranges may dominate those in smaller numerical ranges [27], each feature is normalized and scaled in the range of [0, 1] in this paper: where f is the original value, f is the scaled value, "min" represents the minimum value of the characteristic, "max" represents the maximum value of the characteristic.

Cross-Validation
All experiments used a k-fold cross-validation in which the original data set was randomly divided into k subsets of (approximately) equal size.Each time, k-1 subsets were selected as the training set and the remaining subset was used as the test set.This process was repeated k times.Finally, the average of the classification accuracy on the testing set was used as the evaluation value.In this paper, k was set to 10.

Experimental Results
To validate the performance of the proposed DA-CKSVM algorithm, experiments in this section were carried out to compare the DA-CKSVM algorithm with dragonfly algorithm-SVM (DA-SVM) [14], particle swarm optimization-SVM (PSO-SVM) [15], bat algorithm-SVM (BA-SVM) [16], and genetic algorithm-SVM (GA-SVM) [19].According to the literature, all the algorithms for comparison employed a single RBF kernel to implement SVM classifier.Table 3 shows the initial parameters of each algorithm.The parameters of the proposed algorithm were consistent with those of DA algorithm.The iteration times and the population size in all algorithms were respectively set to 300 and 30 to obtain fair and reliable experiment results.Furthermore, 10 experiment trials were carried out in each algorithm to evaluate the final results by average classification accuracy and standard deviation, which may minimize the influence of randomness.  4 that in comparison with the PSO-SVM algorithm and GA-SVM algorithm, the DA-CKSVM algorithm achieves the highest classification accuracy on all datasets.The DA-CKSVM algorithm outperforms the BA-SVM algorithm and obtains the best performance on seven data sets out of eight data sets.The DA-CKSVM algorithm achieves better accuracy than the DA-SVM algorithm on six data sets out of eight data sets.The best accuracy over 10 trials for each algorithm is shown in Table 5.As shown, the DA-CKSVM algorithm achieves the best result on seven data sets out of eight data sets.Tables 4 and 5 reveal that better overall classification accuracy on cancer data sets can be obtained by means of constructing a combined kernel function for SVM classifier and searching for optimal kernel parameters.The bold values with underline represent that the corresponding algorithm obtains the highest results.
One noteworthy problem can be found by analyzing Tables 4 and 5.The DA-CKSVM algorithm shows excellent performance on most cancer datasets.But for two datasets, namely HCC and BCWD, the DA-CKSVM algorithm has no advantage over the comparison algorithms.In particular, Table 5 indicates the best result in the DA-CKSVM algorithm on dataset HCC equals to that in the comparison algorithms, but the DA-CKSVM algorithm does not perform better than others when comparing the average results over 10 trials.This implies that though the complementary characteristics of two different kernels in the combined kernel function may improve the SVM classifier's classification ability on most of data sets, for certain data sets, the best result can be obtained only when optimizing the SVM classifier with a single RBF kernel.Even if the optimal weight coefficient λ is very close to 1, the existence of the polynomial kernel in the combined kernel function may reduce the classification accuracy slightly in several trials.The bold values with underline represent that the corresponding algorithm obtains the highest results.

Data Set
The The bold values with underline represent that the corresponding algorithm obtains the highest results.
One noteworthy problem can be found by analyzing Tables 4 and 5.The DA-CKSVM algorithm shows excellent performance on most cancer datasets.But for two datasets, namely HCC and BCWD, the DA-CKSVM algorithm has no advantage over the comparison algorithms.In particular, Table 5 indicates the best result in the DA-CKSVM algorithm on dataset HCC equals to that in the comparison algorithms, but the DA-CKSVM algorithm does not perform better than others when comparing the average results over 10 trials.This implies that though the complementary characteristics of two different kernels in the combined kernel function may improve the SVM classifier's classification ability on most of data sets, for certain data sets, the best result can be obtained only when optimizing the SVM classifier with a single RBF kernel.Even if the optimal weight coefficient λ is very close to 1, the existence of the polynomial kernel in the combined kernel function may reduce the classification accuracy slightly in several trials.Figure 2b shows the optimization process of the HS data set, where the DA-CKSVM algorithm achieves the best result and the similar result is obtained by the DA-SVM algorithm and the BA-SVM algorithm.As can be seen from the figure, the DA-CKSVM algorithm gains advantages in early iterations and further improved classification accuracy in the subsequent optimization process.
The fitness curves of HCC are displayed in Figure 2c.The DA-SVM algorithm is the best of all the algorithms, followed by the BA-SVM algorithm, and the DA-CKSVM algorithm achieves the third best result.
Figure 2d illustrates the fitness value of TS.In this data set, the DA-CKSVM algorithm gains the best result.The DA-SVM algorithm and the BA-SVM algorithm have the same result.
Figure 2e shows the fitness curves of BCWD.As can be seen, the DA-SVM algorithm obtains the best accuracy for this data set and the DA-CKSVM algorithm ranks the second.
Figure 2f represents the fitness values of BCWP.The DA-CKSVM algorithm gains advantages in earlier iterations and performs the best in the end.The result of the DA-SVM algorithm is similar to the PSO-SVM algorithm.
Figure 2g demonstrates the fitness values of DLBCL_D.The DA-CKSVM algorithm gets the best fitness values, while other algorithms except GA-SVM obtain the same result.
Figure 2h represents the fitness values of B_A.The conclusion is consistent with that in Figure 2g.The Wilcoxon rank sum test with 5% significance level was applied on the average accuracy results to further evaluate the overall performance of the DA-CKSVM algorithm and comparison algorithms.The Wilcoxon rank sum test is a nonparametric statistical test to demonstrate that one algorithm is significantly different from others.Figure 2b shows the optimization process of the HS data set, where the DA-CKSVM algorithm achieves the best result and the similar result is obtained by the DA-SVM algorithm and the BA-SVM algorithm.As can be seen from the figure, the DA-CKSVM algorithm gains advantages in early iterations and further improved classification accuracy in the subsequent optimization process.
The fitness curves of HCC are displayed in Figure 2c.The DA-SVM algorithm is the best of all the algorithms, followed by the BA-SVM algorithm, and the DA-CKSVM algorithm achieves the third best result.
Figure 2d illustrates the fitness value of TS.In this data set, the DA-CKSVM algorithm gains the best result.The DA-SVM algorithm and the BA-SVM algorithm have the same result.
Figure 2e shows the fitness curves of BCWD.As can be seen, the DA-SVM algorithm obtains the best accuracy for this data set and the DA-CKSVM algorithm ranks the second.
Figure 2f represents the fitness values of BCWP.The DA-CKSVM algorithm gains advantages in earlier iterations and performs the best in the end.The result of the DA-SVM algorithm is similar to the PSO-SVM algorithm.
Figure 2g demonstrates the fitness values of DLBCL_D.The DA-CKSVM algorithm gets the best fitness values, while other algorithms except GA-SVM obtain the same result.
Figure 2h represents the fitness values of B_A.The conclusion is consistent with that in Figure 2g.The Wilcoxon rank sum test with 5% significance level was applied on the average accuracy results to further evaluate the overall performance of the DA-CKSVM algorithm and comparison

Figure 1 .
Figure 1.Flow chart of the proposed DA-CKSVM algorithm.

Figure 1 .
Figure 1.Flow chart of the proposed DA-CKSVM algorithm.

Figure
Figure 2a-h shows the fitness curves for each algorithm.The curves are drawn from the average fitness values over 10 trials.

Figure 2a displays the
Figure2adisplays the fitness values of the BCC data set.As shown, the DA-CKSVM algorithm achieves the best result and the result of the GA-SVM algorithm was the worst.Figure2bshows the optimization process of the HS data set, where the DA-CKSVM algorithm achieves the best result and the similar result is obtained by the DA-SVM algorithm and the BA-SVM algorithm.As can be seen from the figure, the DA-CKSVM algorithm gains advantages in early iterations and further improved classification accuracy in the subsequent optimization process.The fitness curves of HCC are displayed in Figure2c.The DA-SVM algorithm is the best of all the algorithms, followed by the BA-SVM algorithm, and the DA-CKSVM algorithm achieves the third best result.Figure2dillustrates the fitness value of TS.In this data set, the DA-CKSVM algorithm gains the best result.The DA-SVM algorithm and the BA-SVM algorithm have the same result.Figure2eshows the fitness curves of BCWD.As can be seen, the DA-SVM algorithm obtains the best accuracy for this data set and the DA-CKSVM algorithm ranks the second.Figure2frepresents the fitness values of BCWP.The DA-CKSVM algorithm gains advantages in earlier iterations and performs the best in the end.The result of the DA-SVM algorithm is similar to the PSO-SVM algorithm.Figure2gdemonstrates the fitness values of DLBCL_D.The DA-CKSVM algorithm gets the best fitness values, while other algorithms except GA-SVM obtain the same result.Figure2hrepresents the fitness values of B_A.The conclusion is consistent with that in Figure2g.The Wilcoxon rank sum test with 5% significance level was applied on the average accuracy results to further evaluate the overall performance of the DA-CKSVM algorithm and comparison algorithms.The Wilcoxon rank sum test is a nonparametric statistical test to demonstrate that one algorithm is significantly different from others.Table 6 lists the Wilcoxon test p-values between the

Figure 2 .
Figure 2. Fitness curves of all algorithms on six data sets listed in Table 1.(a) BCC data set; (b) HS data set; (c) HCC data set; (d) TS data set; (e) BCWD data set; (f) BCWP data set; (g) DLBCL_D data set; (h) B_A data set.

Figure
Figure 2a displays the fitness values of the BCC data set.As shown, the DA-CKSVM algorithm achieves the best result and the result of the GA-SVM algorithm was the worst.Figure2bshows the optimization process of the HS data set, where the DA-CKSVM algorithm achieves the best result and the similar result is obtained by the DA-SVM algorithm and the BA-SVM algorithm.As can be seen from the figure, the DA-CKSVM algorithm gains advantages in early iterations and further improved classification accuracy in the subsequent optimization process.The fitness curves of HCC are displayed in Figure2c.The DA-SVM algorithm is the best of all the algorithms, followed by the BA-SVM algorithm, and the DA-CKSVM algorithm achieves the third best result.Figure2dillustrates the fitness value of TS.In this data set, the DA-CKSVM algorithm gains the best result.The DA-SVM algorithm and the BA-SVM algorithm have the same result.Figure2eshows the fitness curves of BCWD.As can be seen, the DA-SVM algorithm obtains the best accuracy for this data set and the DA-CKSVM algorithm ranks the second.Figure2frepresents the fitness values of BCWP.The DA-CKSVM algorithm gains advantages in earlier iterations and performs the best in the end.The result of the DA-SVM algorithm is similar to the PSO-SVM algorithm.Figure2gdemonstrates the fitness values of DLBCL_D.The DA-CKSVM algorithm gets the best fitness values, while other algorithms except GA-SVM obtain the same result.Figure2hrepresents the fitness values of B_A.The conclusion is consistent with that in Figure2g.The Wilcoxon rank sum test with 5% significance level was applied on the average accuracy results to further evaluate the overall performance of the DA-CKSVM algorithm and comparison

Table 3 .
Initial parameters of algorithms.

Table 4
lists each algorithm's final classification accuracy over 10 trials in the form of average ± standard deviation.It is shown in Table

Table 4 .
Classification accuracy and standard deviation for all data sets.

Table 5 .
The highest result of each algorithm.

Table 6
lists the Wilcoxon test p-values between the