A New Swarm Intelligence Approach for Clustering Based on Krill Herd with Elitism Strategy

As one of the most popular and well-recognized clustering methods, fuzzy C-means (FCM) clustering algorithm is the basis of other fuzzy clustering analysis methods in theory and application respects. However, FCM algorithm is essentially a local search optimization algorithm. Therefore, sometimes, it may fail to find the global optimum. For the purpose of getting over the disadvantages of FCM algorithm, a new version of the krill herd (KH) algorithm with elitism strategy, called KHE, is proposed to solve the clustering problem. Elitism tragedy has a strong ability of preventing the krill population from degrading. In addition, the well-selected parameters are used in the KHE method instead of originating from nature. Through an array of simulation experiments, the results show that the KHE is indeed a good choice for solving general benchmark problems and fuzzy clustering analyses.


Introduction
Currently, fuzzy clustering is one of the important research branches in many fields, such as knowledge discovery, image processing, machine learning, and pattern recognition.With the expansion of scope of the study, more accurate clustering results are required from various aspects in scientific and practical application.Fuzzy C-Means (FCM) clustering is one of the most popular and well-recognized clustering methods.This method uses the concept of the geometric closeness of data points in Euclidean space.It allocates these data to different clustering, and the distance between these clusters is then determined.The FCM clustering algorithm is the basis of other fuzzy clustering analysis methods in theory and application respects, and it is therefore most widely-used among various clustering algorithms.However, the FCM algorithm is essentially a local search optimization algorithm.Herein, if its initial value is selected improperly, it will converge to a local minimum.Therefore, this drawback limits the FCM algorithm to be used in many applications.
Section 2 provides a basic knowledge of FCM clustering algorithm.Section 3 reviews the optimization process of KH, and then a framework of KHE method is given.This is followed by the usage of KHE method to solve the clustering problem.With the aim of the showing the performance of the KHE method, several simulation results comparing KHE with other methods for general benchmark functions and clustering are presented in Section 4. The discussion and future work orientation can be provided in Section 5.

Fuzzy C-Means (FCM) Clustering Algorithm
Let X = {x1, x2, …, xn} be n data samples; c (2 ≤ c ≤ n) is the number of the divided categories for these data samples; {A1, A2, …, Ac} indicates that the corresponding c categories, and U is their similarity classification matrix, whose cluster centers are {v1, v2, …, vc}; μk(xi) is the membership degree of xi in the category Ak (abbreviated as μik).The objective function Jb can be expressed as follows: where dik is the Euclidean distance that is used to measure distance between the i-th sample xi and the center point of the k-th category.It can be calculated as follows: where m is the number of characteristics of the data sample; b is the weighting parameter and its range is 1 b ≤ ≤ ∞ .The FCM clustering algorithm is to find an optimal classification, so that the classification is able to produce the smallest function value Jb.It is required that the sum of the values of membership degree for a sample in terms of each cluster is 1.That is to say, it can be described as As stated before, μik is the membership degree of xi in the category Ak, and it can be updated as Subsequently, all the cluster centers {vi} are calculated as ( ) ( ) here, we suppose { | 2 ; 0} For all of category i, The updating process mentioned above is repeated by Equations ( 4) and ( 5) until the method converges.When the algorithm converge, in theory, various cluster centers for each sample and the membership degree in terms of each category are obtained at this time, thus fuzzy clustering partition has been done by now.Although FCM has a high search speed, it is essentially a local search algorithm, and is therefore very sensitive to initial cluster centers.If the cluster centers have the initial poor choice, it will converge to a local minimum.

KH Method
Krill herd (KH) [55] is a novel swarm intelligence method for solving optimization problems.It is the simplification and idealization of the herding of the krill swarms in sea.The position of an individual krill is determined by three motions as: (i) movement induced by other krill individuals; (ii) foraging action; and (iii) random diffusion In KH, the Lagrangian model is used in a d-dimensional decision space as shown in Equation (6).
where Ni is the motion induced by other krill individuals; Fi is the foraging motion, and Di is the physical diffusion of the i-th krill individuals.

Motion Induced by Other Krill Individuals
The direction of motion induced, αi, is approximately evaluated by the target effect, a local effect, and a repulsive effect.For krill i, it can be defined as: where and N max is the maximum induced speed, ωn is the inertia weight of the motion induced, N old i is the last motion induced, α local i is the local effect provided by the neighbors and α target i is the target direction effect provided by the best krill individual.

Foraging Motion
The foraging motion is influenced by the two main factors: The previous and current food location.For the i-th krill individual, this motion can be expressed as follows: where and Vf is the foraging speed, ωf is the inertia weight of the foraging, F old i is the last foraging motion, β food i is the food attractiveness and β best i is the effect of the best fitness of the i-th krill so far.

Random Diffusion
This motion can be expressed in terms of a maximum diffusion speed and a random directional vector.It can be formulated as follows: where Dmax is the maximum diffusion speed, and δ is the random directional vector.Based on the above motions, the position of a krill individual from t to t + Δt is given by the following equation: ( ) It should be noted that Δt is a constant that can be determined by problem of interest.More details about the KH algorithm can be found in [55].

KH Method with Elitism Strategy (KHE)
As stated before, the KH method can always include the best krill individual in the population.However, the positions of all the krill individuals in the population will be updated during the optimization process regardless of its good and bad.When the best one is being updated, there is a probability of worsening the best one.If this happens, the whole population will deteriorate so that it may lead to slow convergence.
With the aim of preventing the krill population degrade, an elitism strategy is incorporated into the basic KH method.That is, in our current work, a new version of the KH method with elitism strategy (abbreviated as KHE) is proposed.In KHE method, certain best krill individuals are memorized, and then all the krill are updated by three motions.Finally, certain worst krill individuals in the new population will be replaced by the memorized best ones in the last generation.Elitism strategy can forbid the best ones being destroyed by three krill motions, and can guarantee the population can always proceed to the better status.Limited by the length of the paper, the more detailed process of elitism strategy can be referred to in [4,63].

KHE Method for Clustering Problem
The clustering problem is essentially an optimization problem.Therefore, clustering problem can be solved by the KHE method.As per Sections 2, 3.1, and 3.2, the optimization process of KHE method for clustering problem can be simply represented as follows: (1) Initialize the control parameters.All the parameters used in KHE are firstly initialized.
(2) Randomly initialize c cluster centers, and generate the initial population, calculate membership degree of each cluster center for all samples by Equation ( 4

Simulation Results
In this section, after function evaluation through an array of experiments conducted in benchmark functions (see Table 1), the clustering problem is dealt with by the KHE method.More detailed descriptions of all the benchmarks can be referred to in [4,64,65].Note that the dimensions of functions are thirty.In order to obtain fair results, all the implementations are conducted under the same conditions as shown in [59].

No.
Name Definition ) The parametric study about KH has been done in [61].The parameters for KHE method are the same as [61], which are set as follows: Vf = 0.02, D max = 0.005 and N max = 0.01.For the parameters used in the other methods, their settings can be referred to in [4,63].
In order to remove the influence of the randomness and get the relatively representative statistical results, 200 implementations have been done independently on each benchmark.The population size and maximum generation number are set to 50 in the experiments conducted in Section 4.1.In the following experiments, the optimal solution for each test problem is bolded.
From Table 2, it can be seen that, for the best, mean and worst function values, KHE has the best performance on all the seven benchmarks on average.For other methods, their obtained function values are similar.Carefully looking at Table 2, generally speaking, SGA has the relatively better final optimization values than the other five methods.The results in Table 2 indicate that the KHE method is the proper strategy for most optimization problems.

Clustering Problem Compared KHE with Seven Other Methods
As stated before, a clustering problem is essentially an optimization problem, so it can be solved by the KHE method.Here, KHE is compared with pure FCM and the other five metaheuristic methods including the basic KH method.The dataset used in this paper is the same with the data in [67].The data have four-hundred data samples, and its characteristic dimension is two.Now, we will divide these data samples into four categories.Therefore, each krill contains eight elements.Population size and maximum generation number are set to 16 and 25, respectively.For other algorithms, their parameter settings are the same as Section 4.1.Figure 2 is the clustering results by pure FCM clustering algorithm when its final objective function value is 3.620176.Figure 3 shows the optimization process of KHE method for clustering problem.From Figure 3, we can see, KHE has a fast convergent speed for clustering problem. Figure 4 is the clustering results by KHE algorithm when its final objective function value is 3.303485.From Figures 2 and 4, we can see that, KHE method can obtain more accurate clustering results than pure FCM.More results can be recorded in Table 3. From Table 3, on average, the KHE method has the most accurate clustering results, and both SGA and KHE have the optimal performance for the best clustering results.For the worst performance, all the methods except FCM have the similar clustering results that are significantly better than pure FCM.For standard deviation (STD), KHE has the second performance that is only inferior to HS.From Table 3 and Figures 2-4, it can be see that, the KHE method can solve the clustering problem better than other comparative methods in most cases.
It should be pointed out that, each run may generate completely different results.This is because the clustering results are dependent on the initial clustering centers.

Discussion and Conclusions
In many application fields, fuzzy clustering, especially fuzzy C-means (FCM) clustering, is one of the important hot research branches.FCM clustering algorithm is the most widely-used one among various clustering algorithms and has been used to successfully solve several application problems.However, the FCM algorithm is essentially a local search optimization algorithm.Herein, if its initial value is selected improperly, it will converge to a local minimum.Aiming at the disadvantages of the FCM algorithm, a new kind of swarm-based metaheuristic search, called KHE, is proposed to solve the clustering problem.Elitism strategy used in the KHE method can prevent the krill population from degrading.In KHE, the well-selected parameters are used instead of originating from nature.Furthermore, the KHE method is applied to solve the clustering problem for the purpose of escaping a local minimum.Moreover, with the aim of showing the performance of KHE method, it is compared with six other metaheuristic algorithms through seven complicated benchmark problems.The results show that the KHE method performs well on given benchmark problems and fuzzy clustering analyses.
Moreover, there are no additional operators added to the basic KH method.Therefore, the KHE method is simple and easy to implement.
Despite the above advantages of the KHE method, two prospective research points should be oriented as follows.On the one hand, in the current work, there is no study of computational requirements.The research of computational requirements should be made in future.On the other hand, only a few test problems and the clustering problem is solved by the KHE method in the present work.More problems should be used to test the KHE method from various aspects, and then it is used to solve more application problems, such as image segmentation, constrained optimization, knapsack problems, scheduling, dynamic optimization, antenna and microwave design problems, and water, geotechnical and transport engineering.

Figure 3 .
Figure 3. Optimization process for clustering problem of the KHE method.

Figure 4 .
Figure 4. Clustering results of the KHE method.

Table 2 .
Mean, best and worst function values obtained by different methods.

Table 3 .
Optimization results for the fuzzy C-means (FCM) problem.