Clustering Using an Improved Krill Herd Algorithm

: In recent years, metaheuristic algorithms have been widely used in solving clustering problems because of their good performance and application effects. krill herd algorithm (KHA) is a new effective algorithm to solve optimization problems based on the imitation of krill individual behavior, and it is proven to perform better than other swarm intelligence algorithms. However, there are some weaknesses yet. In this paper, an improved krill herd algorithm (IKHA) is studied. Modiﬁed mutation operators and updated mechanisms are applied to improve global optimization, and the proposed IKHA can overcome the weakness of KHA and performs better than KHA in optimization problems. Then, KHA and IKHA are introduced into the clustering problem. In our proposed clustering algorithm, KHA and IKHA are used to ﬁnd appropriate cluster centers. Experiments were conducted on University of California Irvine (UCI) standard datasets, and the results showed that the IKHA clustering algorithm is the most effective.


Introduction
Clustering is an important research direction in data analysis.This method does not make any statistical hypothesis on data and, thus, is called unsupervised learning in pattern recognition and data mining.Clustering is mainly used in text clustering [1], search engine optimization [2], landmark selection [3], face recognition [4], and medicine and biology [5].
Clustering is one of the most difficult and challenging problems in machine learning.The variety of clustering algorithms is roughly divided into three main types, namely, overlapping (so-called non-exclusive) [6], partitional [7], and hierarchical [8].Regardless of the type of clustering algorithm applied, the main goal is to maximize homogeneity within each cluster and heterogeneity among different clusters.In other words, objects that belong to the same cluster should be more similar to each other than objects that belong to different clusters.
Although present algorithms have their own advantages, they are sensitive to the initialization parameters and it is difficult to find their optimal clusters.In recent years, optimization methods inspired by natural phenomena have provided new ways to solve clustering problems.A swarm of individuals are employed to explore the search space and obtain an optimal solution, such as genetic algorithms (GA) [9], particle swarm optimization algorithms (PSO) [10], and ant colony optimization (ACO) [11], among others.Other novel swarm intelligence algorithms have been proposed, such as harmony search (HS) [12], honeybee mating optimization algorithm (HBMO) [13], artificial fish swarm algorithm (AFSA) [14], artificial bee colony (ABC) [15], firefly algorithm (FA) [16], monkey algorithm (MA) [17], bat algorithm (BA) [18], and many others.
The krill herd algorithm (KHA) [19] is a novel swarm algorithm that is based on the simulation of the herding behavior of krill individuals and the minimum distances of each individual krill from

Introduction to Krill Herd Algorithm
KHA is based on the simulation of the herding of krill swarms in response to specific biological and environmental processes.Nearly all necessary coefficients for KHA are obtained from real-world empirical studies [19].
In nature, the adaptability of an individual is judged by its distance to food and the maximum density of the krill population.Thus, based on the assumption of an imaginary distance, the fitness is the value of the objective function.Within a two-dimensional space, the specific location of the individual krill varies with time depending on the following three actions [19]: • movement induced by other krill individuals; • foraging activity; and • random diffusion.
KHA uses the Lagrangian model to extend the search space to an n-dimensional decision space as: where N i is the motion of the ith krill induced by other krill individuals, F i represents the foraging activity, and D i denotes the physical diffusion of the krill individuals.The explanations for basic KHA are given as follows: (1) Motion induced by other krill individuals According to theoretical arguments, individual krill maintain a high density and move due to mutual effects.The direction of motion induced, α i , is estimated from the local swarm density (local effect), target swarm density (target effect), and repulsive swarm density (repulsive effect).For an individual krill, the motion can be defined as: where: N max is the maximum induced speed, ω n is the inertia weight of the motion induced in the range [0, 1]s, N old i is the last motion induced, α local i is the local effect provided by the neighbors, and α target i is the target direction effect provided by the best individual krill.According to the measured values of the maximum induced speed (N max ), N max is taken as 0.01 (ms −1 ) in [19].
Different strategies can be used in choosing the neighbor.Based on the actual behavior of krill individuals, a sensing distance (d s ) should be determined around a krill individual and the neighbors should be found.
The sensing distance for each krill individual can be determined by using different heuristic methods.Here, the sensing distance is determined by using the following formula for each iteration: where d s,i is the sensing distance for the ith krill individual and N is the number of the krill individuals, and X i represents the related positions of ith krill.If the distance of X i and X j is less than the defined sensing distance (d s,i ), X j is a neighbor of X i .
(2) Foraging motion This movement is intended to comply with two criteria.The first is food location, and the second is previous experience about the food location.For the ith krill, the foraging motion can be expressed as: where: where V f is foraging speed, ω f is inertia weight of the foraging motion in the range [0, 1], β f ood i is the attractive food, and β best i is the effect of the best fitness of the ith krill so far.According to measured values of the foraging speed, V f is taken as 0. 02 ms −1 in [19].
Food effect is defined in terms of its location.The center of food should be found and then formulated for food attraction.This solution cannot be determined, but can be estimated.In this study, the virtual center of food concentration is estimated according to the fitness distribution of krill individuals, which is inspired by the "center of mass" concept.The center of food for each iteration is formulated as: where K i is the objective function value of the ith krill individual. (

3) Physical diffusion
The physical diffusion of the krill individuals is considered a random process.This motion can be expressed in terms of a maximum diffusion speed and a random directional vector.The formula is as follows: where D max is the maximum diffusion speed, and δ is the random directional vector and its arrays are random values between −1 and 1.I is the actual iteration number and I max is the maximum number of iterations.), which work simultaneously and create a powerful algorithm.Using diverse operative parameters of the motion throughout the time, the position vector of a krill individual during interval t to t + ∆t is expressed by the following equation: where X i (t + ∆t) represents the updated krill individual position, and X i (t) represents the current position.Note that ∆t is considered the most important constant and should be tuned carefully based on the optimization problem.This is because this parameter works as a scale factor of the speed vector, and ∆t can be obtained from the following formula: where NV is the total number of variables, and LB j and UB j are the lower and upper bounds of the jth variables (j = (1, 2, . . . ,NV)), respectively.Therefore, the absolute of their subtraction shows the search space.It is empirically found that C t is a constant number between [0, 2].It is also obvious that low values of C t let the krill individuals search the space carefully.
(5) Genetic operators Crossover operation is the use of a binomial crossover scheme to update the mth components of the ith krill by the following formula: where Cr is crossover probability, which is a random number between 0 and 1, r ∈ {1, 2, . . . ,i − 1, i + 1, . . ., N}. Mutation is controlled by mutation probability (M u ).The adaptive mutation scheme used is formulated as where p, q ∈ {1, 2, . . . ,i − 1, i + 1, . . ., N} and µ is a number between 0 and 1.In Ki,best , the nominator is K i − K best .Based on this new mutation probability, the mutation probability for the global best is equal to zero, which increases as fitness decreases.

Improved KHA
The KHA algorithm considers various motion characteristics of individual krill, as well as the global exploration and local exploitation ability.Through simulation and experiments [19], the performance of the algorithm is better than that of the majority of swarm intelligence algorithms.However, recent studies show that the KHA algorithm has excellent local exploitation ability, but global exploration ability is not as strong, especially in the treatment of high-dimensional multimodal function optimization [29], because the algorithm cannot always converge rapidly.To solve the problem, selection and crossover operators are added to the basic KHA in [29], and [30] used a local search to explore around the solution obtained by the KHA.Inspired by these developments, we propose the improved KHA algorithm (IKHA) based on a modified mutation scheme and a new updated mechanism.
The main ideas of IKHA are as follows: First, we sorted the individuals of each generation according to the fitness value in ascending order.The first part included individuals with good fitness (individuals with fitness value among the top 10%, but apart from the global best), and the rest comprised the second part.For the first part, which we call sub-optimal individuals, the fitness value was close to the optimal individual, but worse than the optimal solution.In the process of optimization of this part, the individual does not have much effect.Another noteworthy point, based on Equatio (14) in the previous section, is that mutation probability (M u ) for the global best is equal to zero and increases with decreasing fitness.In other words, the smaller the fitness value, the higher the probability of mutation.Thus, we can improve the mutation mechanism to use this part of the individual and allow them to find the potential solution in the vicinity of the optimal solution.
For the first part of the sub-optimal individuals, we use the individual's own neighbors x a (a neighbor of x i ) to optimize the mutation program instead of the original stochastic selection x p , x q .Specific operations observed the following formula, where SN is the abbreviation of sub-optimal individuals and µ nn is a number between 0 and 1: For the second part of the individuals, we only had to use good individuals to guide them toward a better direction of evolution.Therefore, we chose sub-optimal individuals to optimize the mutation program.The specific formula is as follows: where x b , x c ∈ {SN | SN} are sub-optimal individuals.Beyond the modified mutation mechanism, an updated operator is added in our approach.After many iterations, the KHA tends to stagnate.To avoid premature convergence in the early run phase, we added an updated mechanism to overstep the local extremum.In our approach, a parameter, the maximum number of stalls (S max ), is added.Suppose that the K gbest (the fitness value of the global best individual of the population) remains unchanged, and num samebest (the number of unchanged iterations) is greater than S max , then the updated formula is shown as follows: where X SN is the average position of the SN, and ν best is a number between 0 and 1.If the fitness value of X new1 best or X new2 best is less than K gbest , we replace the old position with the new position.S max , which is defined as follows, and is a positive integer greater than zero and decreases with the increase of the iteration number: In IKHA, the optimized mutation scheme abandons the original randomly-selected individuals for mutations, and uses different mutations for individuals with different fitness values.With such a divide-and-rule strategy, we take full advantage of all individuals, as opposed to the KHA.For example, sub-optimal individuals can be used to find potentially better values, thereby preventing the algorithm from falling into a local optimum.For the remaining individuals, excellent individuals could guide them, thereby speeding up optimization.The purpose of the updated operation is to find the potential for the escape from the local solution at the later run phase of the process.
The time computational complexity of IKHA is the same as KHA, and the analysis is as follows: In KHA, for each krill in an iteration, the time complexity of calculating the sensing distance d s,i is O(N), so KHA's time computational complexity is O(I max •N 2 ); in IKHA, the added updated operating is mainly according to Equations ( 17) and ( 18), and time computational complexity is O(1).Moreover, with the improved mutation mechanism, we need to sort the individuals according to their fitness value, and we use a quick sort algorithm, whose time computational complexity is O(N log N) in the average case, or O(N 2 ) in the worst case, but for every generation, one sorting operation is added, thus, the time computational complexity of IKHA is still O(I max •N 2 ).
To test IKHA further, we conducted the following experiments by using the Ackley function [31].The Ackley function is defined as follows and its graph is shown in Figure 1: The convergence graphs for the Ackley function is drawn in Figure 2. In our experiment, the number of iterations is set to 100, the population size is 50, and the results are obtained after 50 trials.For the KHA and the proposed IKHA, we set the same parameters N max = 0.01, V f = 0.02, D max = 0.005, S max = 5, and ν best = 0.5 at the beginning and these parameters linearly decreased to 0.1 at the end in IKHA [32,33].Regarding the convergence behavior of KHA and IKHA, both IKHA and KHA converged quickly in the early run phase, but IKHA converged faster than KHA.During the latter run, KHA began to stagnate after rapid convergence, but IKHA continued to find a better value.Thus, IKHA can quickly converge in the early iterations and jump out of the local optimum to find a better solution.
Algorithms 2017, 10, 56 7 of 13 Thus, IKHA can quickly converge in the early iterations and jump out of the local optimum to find a better solution.

Basic Idea of Clustering
Data clustering, which is a NP-complete problem, finds heterogeneous data by minimizing some measure of dissimilarity.Given , clustering aims to divide the whole data into K clusters (K ≤ n), n is the total number of data objects, and the data objects of the same cluster are similar according to the similarity criteria.The similarity measure uses Euclidean distance:

Basic Idea of Clustering
Data clustering, which is a NP-complete problem, finds heterogeneous data by minimizing some measure of dissimilarity.Given Dataset = {data 1 , data 2 , . . . ,data n }, clustering aims to divide the whole data into K clusters (K ≤ n), n is the total number of data objects, and the data objects of the same cluster are similar according to the similarity criteria.The similarity measure uses Euclidean distance: where i, j ∈ {1, 2, . . . ,n}, and data i,d is the dth attribute of the ith datum in D , dis data i , data j denotes the distance of data i and data j , and D is the number of attributes for each data object.

Clustering Based on IKHA
Clustering is in accordance with appropriate indexes to find an optimal clustering process.The essence of clustering is the optimization process.What is important is finding ways to combine the optimization algorithm IKHA with clustering.By representing each krill in the IKHA as a clustering scheme, we find the optimal clustering scheme by choosing the appropriate objective function.A clustering scheme can be expressed by all clustering centers.That is, every krill X i represents the K clustering centers: where d denotes the number of parameters of the data that will be clustered, and C 1 k represents the first parameter of the first cluster center.Each krill individual can be expressed as the following matrix: In this study, one krill is used to represent a candidate solution to a problem, and the selected K initial cluster centers are potential solutions.One krill and K initial clustering center play similar roles in our algorithms.Thus, the mapping between a krill individual and K initial clustering centers can be established.In the coding method of the krill location structure, a set of initial cluster centers are generated randomly from the dataset points.
The whole krill population represents a variety of clustering schemes.In this manner, our aim is to find the optimal clustering centers.According to the principle of the minimal distance, data are categorized into the appropriate cluster.The description of the improved krill-herd clustering algorithm (IKHCA) is shown in Algorithm 1.

Algorithm 1. Improved Krill Herd Clustering Algorithm (IKHCA)
(1) Define the parameters (K, I max , N, N max , V f , D max , and so on).
(2) Initialize N krills randomly as the initial clustering center.
(3) Evaluate each krill individual by fitness function.(4) For each krill individual: Perform three motions (motion induced by another individual, foraging motion, and physical diffusion).Then, implement the crossover operator and the modified mutation operator (two mutation schemes were performed for individuals with different fitness levels).Calculate the fitness according the krill's new position; if the new fitness is better than the older, update the krill individual position in the search space.
(5) Use the updated mechanism to update the krill's position if the new position is superior to the old position.(6) Repeat Steps 4 and 5 until the stopping criteria are satisfied.(7) Return to the best clustering solution.

Simulation and Experiment
To investigate the performance of IKHCA, five clustering algorithms, namely, K-means [34], ACO [35], PSO [36], KHCA I in [30], and KHCA II, were compared.KHCA II is a clustering algorithm based on KHA [19].Five datasets obtained from UCI Machine Learning Repository [37] were used in our experiment.The details of the data sets, including the name, number of classes, attributes, and records are presented in Table 1.Our experiments were conducted on Eclipse 4.6.0 with Windows 7 environment using Intel Core i7, 3.40 GHz, and 4 GB RAM.Before the experiment, the setting of the parameters and the selection of the objective functions in KHCA II and IKHCA were specified.In KHCA II and IKHCA, we used the sum of squared error (I SSE ) as the objective function directly, the formula is indicated in Equation (25).The low value of I SSE , the higher the quality of the clustering is 25) The parameters are set in accordance with [19,38]: N max = 0.01; V f = 0.02; and D max = 0.005.
Here, C t is set to 0.5, and the inertia weights ω n , ω f are equal to 0.9 at the beginning of the search, and linearly decreased to 0.1 at the end to encourage exploitation.The size of the population is set to 25, s max = 5, ν best = 0.5 at the beginning, and linearly decreased to 0.1 at the end in IKHCA.
We compared the performance of different clustering algorithms from two aspects.First, we compared the objective function value of the different clustering algorithms in Table 2, and then we compared the accuracy of different clustering algorithms in Table 3. Accuracy is specifically expressed as follows: accuracy = number of correctly placed data total number of data × 100 Table 2 lists the best and worst means of the solution, and ranks the algorithms based on the mean values for all datasets in Table 1.As compared, algorithm results are directly taken from [30].KHCA II and IKHCA algorithms were executed 100 times independently with the same parameters described in this paper, except that the maximum number of generations was set to 200.As shown in Table 2, IKHCA obtained better solutions for the best and worse than other algorithms on the Wine, Glass, Cancer, and CMC datasets, but not on Iris.KHCA II obtained the first solution for best on the Iris dataset.However, KHCA II generated a poor solution for the worst with respect to the Iris dataset.Then, we observed that IKHCA achieved the best solutions from mean values on all datasets, except Glass.However, IKHCA is very close to the results obtained by the KHCA II algorithm on the Glass dataset.From the experimental results, our proposed algorithm achieved better optimal solutions with improved stability in a limited number of iterations.IKHCA ranked first in all algorithms.In Table 3, the clustering accuracies of IKHCA and other clustering algorithms are given, and part of the results were obtained directly from [30], with the bold font indicating the best results.At a glance, one can easily see that the last three clustering algorithms (KHCA I, KHCA II, and IKHCA) by using KHA are obviously better than the K-means, ACO, and PSO algorithms.It can be seen that the introduction of KHA into the clustering problem is reasonable and effective.Based on these results, IKHCA is proved to be the best algorithm with respect to objective function value and accuracy.

Conclusions and Future Work
KHA is a good swarm intelligent heuristic algorithm that could be gradually applied to address real-world problems.For the original KHA algorithm that could not always converge rapidly and search globally particularly well, we proposed IKHA, which improved the original mutation mechanism to provide two different mutation schemes and introduced an updated mechanism.In IKHA, we were in accordance with the fitness of individuals, set different mutation schemes according to their own conditions, made outstanding individuals look for better solutions, and the rest moved closer to the good individual.Then, through the updated mechanism, optimal individuals looked for potential solutions in the surrounding space to avoid being stuck in the local optimal zone.Experimental results showed that IKHA performed better than KHA.
Several clustering algorithms depend highly on the initial states and always converge to the nearest local optimum from the starting position of the search.In order to find the optimal clustering center, we applied the IKHA to solve an actual clustering problem and proposed the improved krill-herd clustering algorithm (IKHCA).According to the experiments, the IKHCA had better efficiency than, and outperformed, other well-known clustering approaches.Moreover, the results of the experiments show that the IKHA can successfully be introduced in clustering problems and perform best in almost all experimental datasets.In the future, there are several issues the can be further studied, such as utilizing the optimization ability of the IKHA to find the optimal cluster number and apply the IKHA to other scenarios to solve a wide range of real-world problems.

( 4 )
Motion process of KHA Defined motions regularly change the krill position toward the best fitness.The foraging motion and motion induced by other krill individuals contain two local (α local i
, is the dth attribute of the ith datum in D ℜ , D is the number of attributes for each data object.

Figure 2 .
Figure 2. Comparison of convergence of the KHA and IKHA for Ackley (D = 20).

Table 1 .
The details of selected datasets.

Table 2 .
Objective function values obtained by the algorithms.

Table 3 .
Accuracy obtained by the algorithms.