Open Access
This article is

- freely available
- re-usable

*Algorithms*
**2017**,
*10*(2),
56;
https://doi.org/10.3390/a10020056

Article

Clustering Using an Improved Krill Herd Algorithm

^{1}

School of Information Science and Technology, Jinan University, Guangzhou 510630, China

^{2}

Department of Information Science and Technology, Jinan University, Guangzhou 510630, China

^{*}

Author to whom correspondence should be addressed.

Academic Editor:
Javier Del Ser Lorente

Received: 27 March 2017 / Accepted: 12 May 2017 / Published: 17 May 2017

## Abstract

**:**

In recent years, metaheuristic algorithms have been widely used in solving clustering problems because of their good performance and application effects. Krill herd algorithm (KHA) is a new effective algorithm to solve optimization problems based on the imitation of krill individual behavior, and it is proven to perform better than other swarm intelligence algorithms. However, there are some weaknesses yet. In this paper, an improved krill herd algorithm (IKHA) is studied. Modified mutation operators and updated mechanisms are applied to improve global optimization, and the proposed IKHA can overcome the weakness of KHA and performs better than KHA in optimization problems. Then, KHA and IKHA are introduced into the clustering problem. In our proposed clustering algorithm, KHA and IKHA are used to find appropriate cluster centers. Experiments were conducted on University of California Irvine (UCI) standard datasets, and the results showed that the IKHA clustering algorithm is the most effective.

Keywords:

data clustering; krill herd; improved algorithm; mutation operators## 1. Introduction

Clustering is an important research direction in data analysis. This method does not make any statistical hypothesis on data and, thus, is called unsupervised learning in pattern recognition and data mining. Clustering is mainly used in text clustering [1], search engine optimization [2], landmark selection [3], face recognition [4], and medicine and biology [5].

Clustering is one of the most difficult and challenging problems in machine learning. The variety of clustering algorithms is roughly divided into three main types, namely, overlapping (so-called non-exclusive) [6], partitional [7], and hierarchical [8]. Regardless of the type of clustering algorithm applied, the main goal is to maximize homogeneity within each cluster and heterogeneity among different clusters. In other words, objects that belong to the same cluster should be more similar to each other than objects that belong to different clusters.

Although present algorithms have their own advantages, they are sensitive to the initialization parameters and it is difficult to find their optimal clusters. In recent years, optimization methods inspired by natural phenomena have provided new ways to solve clustering problems. A swarm of individuals are employed to explore the search space and obtain an optimal solution, such as genetic algorithms (GA) [9], particle swarm optimization algorithms (PSO) [10], and ant colony optimization (ACO) [11], among others. Other novel swarm intelligence algorithms have been proposed, such as harmony search (HS) [12], honeybee mating optimization algorithm (HBMO) [13], artificial fish swarm algorithm (AFSA) [14], artificial bee colony (ABC) [15], firefly algorithm (FA) [16], monkey algorithm (MA) [17], bat algorithm (BA) [18], and many others.

The krill herd algorithm (KHA) [19] is a novel swarm algorithm that is based on the simulation of the herding behavior of krill individuals and the minimum distances of each individual krill from food and from the highest density of the herd, which are considered as the objective functions for krill movement. Although proposed recently, KHA has quickly been applied to multiple scenarios. Amudhavel et al. [20] used KHA to optimize a peer-to-peer network. KHA is applied in a smartphone ad hoc network [21]. Kowalski et al. [22] used KHA for learning an artificial neural network. In [23], KHA demonstrated better performance compared with well-known algorithms, such as PSO and GA. Gandomi and Alavi [19] illustrated that KHA with a crossover operator is superior to other well-known algorithms, including differential evolution (DE) [24], biogeography-based optimization (BBO) [25], and ACO.

Although KHA outperforms many other swarm intelligent algorithms [19], the algorithm cannot search globally particularly well [26]. In [27], a free search KHA for function optimization was proposed to improve the feasibility and effectiveness of KHA. An improved KHA with a linear decreasing step was proposed by Li et al. [28]. Furthermore, a new KHA that improved the original genetic operator by modifying the mutation mechanism and adding a new updated scheme will be demonstrated in our paper.

In this paper, we apply KHA as an optimization means to transform clustering into an optimization problem. In other words, we use individual krill to represent K cluster centers (K is the number of clusters), and KHA is used to search for the optimal clustering center. According to the principle of minimum distance of centers, all of the objects of the dataset are divided into different clusters, which leads to obtaining clustering results.

The rest of the paper is organized as follows: In Section 2, details of KHA are introduced. Section 3 briefly explains the improved KHA. In Section 4 clustering with the IKHA approach is proposed. Section 5 presents the experimental results of our proposed algorithm. Finally, the summary and future works are provided in Section 6.

## 2. Introduction to Krill Herd Algorithm

KHA is based on the simulation of the herding of krill swarms in response to specific biological and environmental processes. Nearly all necessary coefficients for KHA are obtained from real-world empirical studies [19].

In nature, the adaptability of an individual is judged by its distance to food and the maximum density of the krill population. Thus, based on the assumption of an imaginary distance, the fitness is the value of the objective function. Within a two-dimensional space, the specific location of the individual krill varies with time depending on the following three actions [19]:

- movement induced by other krill individuals;
- foraging activity; and
- random diffusion.

KHA uses the Lagrangian model to extend the search space to an n-dimensional decision space as:
where ${N}_{i}$ is the motion of the ith krill induced by other krill individuals, ${F}_{i}$ represents the foraging activity, and ${D}_{i}$ denotes the physical diffusion of the krill individuals.

$$\frac{d{X}_{i}}{dt}={N}_{i}+{F}_{i}+{D}_{i}$$

The explanations for basic KHA are given as follows:

(1) Motion induced by other krill individuals

According to theoretical arguments, individual krill maintain a high density and move due to mutual effects. The direction of motion induced, ${\alpha}_{i}$, is estimated from the local swarm density (local effect), target swarm density (target effect), and repulsive swarm density (repulsive effect). For an individual krill, the motion can be defined as:
where:

$${N}_{i}^{new}={N}^{\mathrm{max}}{\alpha}_{i}+{\omega}_{n}{N}_{i}^{old}$$

$${\alpha}_{i}={\alpha}_{i}^{local}+{\alpha}_{i}^{t\mathrm{arg}et}$$

${N}^{\mathrm{max}}$ is the maximum induced speed, ${\omega}_{n}$ is the inertia weight of the motion induced in the range [0, 1]s, ${N}_{i}^{old}$ is the last motion induced, ${\alpha}_{i}^{local}$ is the local effect provided by the neighbors, and ${\alpha}_{i}^{t\mathrm{arg}et}$ is the target direction effect provided by the best individual krill. According to the measured values of the maximum induced speed (${N}^{\mathrm{max}}$), ${N}^{\mathrm{max}}$ is taken as 0.01 (ms

^{−1}) in [19].Different strategies can be used in choosing the neighbor. Based on the actual behavior of krill individuals, a sensing distance (${d}_{s}$) should be determined around a krill individual and the neighbors should be found.

The sensing distance for each krill individual can be determined by using different heuristic methods. Here, the sensing distance is determined by using the following formula for each iteration:
where ${d}_{s,i}$ is the sensing distance for the ith krill individual and N is the number of the krill individuals, and ${X}_{i}$ represents the related positions of ith krill. If the distance of ${X}_{i}$ and ${X}_{j}$ is less than the defined sensing distance (${d}_{s,i}$), ${X}_{j}$ is a neighbor of ${X}_{i}$.

$${d}_{s,i}=\frac{1}{5N}{\displaystyle \sum _{j=1}^{N}\Vert {X}_{i}-{X}_{j}\Vert}$$

(2) Foraging motion

This movement is intended to comply with two criteria. The first is food location, and the second is previous experience about the food location. For the ith krill, the foraging motion can be expressed as:
where:
where ${V}_{f}$ is foraging speed, ${\omega}_{f}$ is inertia weight of the foraging motion in the range [0, 1], ${\beta}_{i}^{food}$ is the attractive food, and ${\beta}_{i}^{best}$ is the effect of the best fitness of the ith krill so far. According to measured values of the foraging speed, ${V}_{f}$ is taken as 0. 02 ms

$${F}_{i}={V}_{f}{\beta}_{i}+{\omega}_{f}{F}_{i}^{old}$$

$${\beta}_{i}={\beta}_{i}^{food}+{\beta}_{i}^{best}$$

^{−1}in [19].Food effect is defined in terms of its location. The center of food should be found and then formulated for food attraction. This solution cannot be determined, but can be estimated. In this study, the virtual center of food concentration is estimated according to the fitness distribution of krill individuals, which is inspired by the "center of mass" concept. The center of food for each iteration is formulated as:
where ${K}_{i}$ is the objective function value of the ith krill individual.

$${X}^{food}=\frac{{\displaystyle {\sum}_{i=1}^{N}\frac{1}{{K}_{i}}{X}_{i}}}{{\displaystyle {\sum}_{i=1}^{N}\frac{1}{{K}_{i}}}}$$

(3) Physical diffusion

The physical diffusion of the krill individuals is considered a random process. This motion can be expressed in terms of a maximum diffusion speed and a random directional vector. The formula is as follows:
where ${D}^{\mathrm{max}}$ is the maximum diffusion speed, and $\delta $ is the random directional vector and its arrays are random values between −1 and 1. $I$ is the actual iteration number and ${I}_{\mathrm{max}}$ is the maximum number of iterations.

$${D}_{i}={D}^{\mathrm{max}}\left(1-\frac{I}{{I}_{\mathrm{max}}}\right)\delta $$

(4) Motion process of KHA

Defined motions regularly change the krill position toward the best fitness. The foraging motion and motion induced by other krill individuals contain two local (${\alpha}_{i}^{local},{\beta}_{i}^{best}$) and two global strategies (${\alpha}_{i}^{t\mathrm{arg}et},{\beta}_{i}^{food}$), which work simultaneously and create a powerful algorithm. Using diverse operative parameters of the motion throughout the time, the position vector of a krill individual during interval $t$ to $t+\Delta t$ is expressed by the following equation:
where ${X}_{i}(t+\Delta t)$ represents the updated krill individual position, and ${X}_{i}(t)$ represents the current position. Note that $\Delta t$ is considered the most important constant and should be tuned carefully based on the optimization problem. This is because this parameter works as a scale factor of the speed vector, and $\Delta t$ can be obtained from the following formula:
where $NV$ is the total number of variables, and $L{B}_{j}$ and $U{B}_{j}$ are the lower and upper bounds of the jth variables ($j=\left(1,2,\dots ,NV\right)$), respectively. Therefore, the absolute of their subtraction shows the search space. It is empirically found that ${C}_{t}$ is a constant number between [0, 2]. It is also obvious that low values of ${C}_{t}$ let the krill individuals search the space carefully.

$${X}_{i}(t+\Delta t)={X}_{i}(t)+\Delta t\frac{d{X}_{i}}{dt}$$

$$\Delta t={C}_{t}{\displaystyle \sum _{j=1}^{NV}\left(U{B}_{J}-L{B}_{j}\right)}$$

(5) Genetic operators

Crossover operation is the use of a binomial crossover scheme to update the mth components of the ith krill by the following formula:
where Cr is crossover probability, which is a random number between 0 and 1, $r\in \{1,2,\dots ,i-1,i+1,\dots ,N\}$. Mutation is controlled by mutation probability (${M}_{u}$). The adaptive mutation scheme used is formulated as
where $p,q\in \left\{1,2,\dots ,i-1,i+1,\dots ,N\right\}$ and $\mu $
is a number between 0 and 1. In ${\widehat{K}}_{i,best}$
, the nominator is ${K}_{i}-{K}_{best}$. Based on this new mutation probability, the mutation probability for the global best is equal to zero, which increases as fitness decreases.

$${X}_{i,m}=\{\begin{array}{cc}\hfill {\mathrm{x}}_{\mathrm{r},\mathrm{m}}\hfill & {rand}_{i,m}<{C}_{r}\hfill \\ \hfill {\mathrm{x}}_{\mathrm{i},\mathrm{m}}\hfill & else\hfill \end{array}$$

$$Cr=0.2{\widehat{K}}_{i,best}$$

$${X}_{i,m}=\{\begin{array}{cc}{\mathrm{x}}_{\mathrm{gbest},\mathrm{m}}+\mathsf{\mu}({\mathrm{x}}_{\mathrm{p},\mathrm{m}}-{\mathrm{x}}_{\mathrm{q},\mathrm{m}})\hfill & {rand}_{i,m}<Mu\hfill \\ {\mathrm{x}}_{\mathrm{i},\mathrm{m}}\hfill & else\hfill \end{array}$$

$$Mu=0.05/{\widehat{K}}_{i,best}$$

## 3. Improved KHA

The KHA algorithm considers various motion characteristics of individual krill, as well as the global exploration and local exploitation ability. Through simulation and experiments [19], the performance of the algorithm is better than that of the majority of swarm intelligence algorithms. However, recent studies show that the KHA algorithm has excellent local exploitation ability, but global exploration ability is not as strong, especially in the treatment of high-dimensional multimodal function optimization [29], because the algorithm cannot always converge rapidly. To solve the problem, selection and crossover operators are added to the basic KHA in [29], and [30] used a local search to explore around the solution obtained by the KHA. Inspired by these developments, we propose the improved KHA algorithm (IKHA) based on a modified mutation scheme and a new updated mechanism.

The main ideas of IKHA are as follows: First, we sorted the individuals of each generation according to the fitness value in ascending order. The first part included individuals with good fitness (individuals with fitness value among the top 10%, but apart from the global best), and the rest comprised the second part. For the first part, which we call sub-optimal individuals, the fitness value was close to the optimal individual, but worse than the optimal solution. In the process of optimization of this part, the individual does not have much effect. Another noteworthy point, based on Equatio (14) in the previous section, is that mutation probability (${M}_{u}$) for the global best is equal to zero and increases with decreasing fitness. In other words, the smaller the fitness value, the higher the probability of mutation. Thus, we can improve the mutation mechanism to use this part of the individual and allow them to find the potential solution in the vicinity of the optimal solution.

For the first part of the sub-optimal individuals, we use the individual’s own neighbors ${x}_{a}$ (a neighbor of ${x}_{i}$) to optimize the mutation program instead of the original stochastic selection ${x}_{p},{x}_{q}$. Specific operations observed the following formula, where SN is the abbreviation of sub-optimal individuals and ${\mu}_{nn}$ is a number between 0 and 1:

$${X}_{i,m}=\{\begin{array}{cc}{x}_{i,m}+{\mu}_{nn}({x}_{a,m}-{x}_{i,m})\hfill & \hfill {x}_{i}\in SN,{rand}_{i,m}\le Mu\hfill \\ {x}_{i,m}\hfill & \hfill {x}_{i}\in SN,{rand}_{i,m}>Mu\hfill \end{array}$$

For the second part of the individuals, we only had to use good individuals to guide them toward a better direction of evolution. Therefore, we chose sub-optimal individuals to optimize the mutation program. The specific formula is as follows:
where ${x}_{b},{x}_{c}$ ∈ {SN | SN} are sub-optimal individuals.

$${X}_{i,m}=\{\begin{array}{cc}{x}_{gbest,m}+\mu ({x}_{b,m}-{x}_{c,m})\hfill & \hfill {x}_{i}\notin SN,{rand}_{i,m}\le Mu\hfill \\ {x}_{i,m}\hfill & \hfill {x}_{i}\notin SN,{rand}_{i,m}>Mu\hfill \end{array}$$

Beyond the modified mutation mechanism, an updated operator is added in our approach. After many iterations, the KHA tends to stagnate. To avoid premature convergence in the early run phase, we added an updated mechanism to overstep the local extremum. In our approach, a parameter, the maximum number of stalls (${S}_{\mathrm{max}}$), is added. Suppose that the ${K}_{gbest}$ (the fitness value of the global best individual of the population) remains unchanged, and $nu{m}_{samebest}$ (the number of unchanged iterations) is greater than ${S}_{\mathrm{max}}$, then the updated formula is shown as follows:
where ${X}_{\overline{SN}}$ is the average position of the SN, and ${\nu}_{best}$ is a number between 0 and 1. If the fitness value of ${X}_{best}^{new1}$ or ${X}_{best}^{new2}$ is less than ${K}_{gbest}$, we replace the old position with the new position. ${S}_{\mathrm{max}}$, which is defined as follows, and is a positive integer greater than zero and decreases with the increase of the iteration number:

$${X}_{best}^{new1}={X}_{best}+{\nu}_{best}({X}_{best}-{X}_{\overline{SN}}),\text{}if(nu{m}_{samebest}{S}_{\mathrm{max}})$$

$${X}_{best}^{new2}={X}_{best}-{\nu}_{best}({X}_{best}-{X}_{\overline{SN}}),\text{}if(nu{m}_{samebest}{S}_{\mathrm{max}})$$

$${S}_{\mathrm{max}}=\lceil {s}_{\mathrm{max}}(1-\frac{I}{{I}_{\mathrm{max}}})\rceil $$

In IKHA, the optimized mutation scheme abandons the original randomly-selected individuals for mutations, and uses different mutations for individuals with different fitness values. With such a divide-and-rule strategy, we take full advantage of all individuals, as opposed to the KHA. For example, sub-optimal individuals can be used to find potentially better values, thereby preventing the algorithm from falling into a local optimum. For the remaining individuals, excellent individuals could guide them, thereby speeding up optimization. The purpose of the updated operation is to find the potential for the escape from the local solution at the later run phase of the process.

The time computational complexity of IKHA is the same as KHA, and the analysis is as follows: In KHA, for each krill in an iteration, the time complexity of calculating the sensing distance ${d}_{s,i}$ is ${\rm O}\left(N\right)$, so KHA’s time computational complexity is ${\rm O}({I}_{\mathrm{max}}\u2022{N}^{2})$; in IKHA, the added updated operating is mainly according to Equations (17) and (18), and time computational complexity is $O\left(1\right)$. Moreover, with the improved mutation mechanism, we need to sort the individuals according to their fitness value, and we use a quick sort algorithm, whose time computational complexity is ${\rm O}(N\mathrm{log}N)$ in the average case, or ${\rm O}({N}^{2})$ in the worst case, but for every generation, one sorting operation is added, thus, the time computational complexity of IKHA is still ${\rm O}({I}_{\mathrm{max}}\u2022{N}^{2})$.

To test IKHA further, we conducted the following experiments by using the Ackley function [31]. The Ackley function is defined as follows and its graph is shown in Figure 1:

$$f(x)={\displaystyle \sum _{i=1}^{n}-20\mathrm{exp}\left(-0.2\sqrt{\frac{1}{n}{\displaystyle \sum _{i=1}^{n}{x}_{i}^{2}}}-\mathrm{exp}\left(\frac{1}{n}{\displaystyle \sum _{i=1}^{n}\mathrm{cos}2\pi {x}_{i}}\right)\right)}+20+e$$

The convergence graphs for the Ackley function is drawn in Figure 2. In our experiment, the number of iterations is set to 100, the population size is 50, and the results are obtained after 50 trials. For the KHA and the proposed IKHA, we set the same parameters ${N}^{\mathrm{max}}$ = 0.01, ${V}_{f}$ = 0.02, ${D}^{\mathrm{max}}$ = 0.005, ${S}_{\mathrm{max}}$ = 5, and ${\nu}_{best}$ = 0.5 at the beginning and these parameters linearly decreased to 0.1 at the end in IKHA [32,33]. Regarding the convergence behavior of KHA and IKHA, both IKHA and KHA converged quickly in the early run phase, but IKHA converged faster than KHA. During the latter run, KHA began to stagnate after rapid convergence, but IKHA continued to find a better value. Thus, IKHA can quickly converge in the early iterations and jump out of the local optimum to find a better solution.

## 4. Clustering Algorithms with IKHA

#### 4.1. Basic Idea of Clustering

Data clustering, which is a NP-complete problem, finds heterogeneous data by minimizing some measure of dissimilarity. Given $Dataset=\{dat{a}_{1},dat{a}_{2},\dots ,dat{a}_{n}\}$, clustering aims to divide the whole data into K clusters (K ≤ n), n is the total number of data objects, and the data objects of the same cluster are similar according to the similarity criteria. The similarity measure uses Euclidean distance:
where $i,j\in \left\{1,2,\dots ,n\right\}$, and $dat{a}_{i,d}$ is the dth attribute of the ith datum in${\Re}^{D}$, $dis\left(dat{a}_{i},dat{a}_{j}\right)$ denotes the distance of $dat{a}_{i}$ and $dat{a}_{j}$, and $D$ is the number of attributes for each data object.

$$dis\left(dat{a}_{i},dat{a}_{j}\right)=\sqrt{{\displaystyle \sum _{d=1}^{D}{(dat{a}_{i,d}-dat{a}_{j,d})}^{2}}}$$

#### 4.2. Clustering Based on IKHA

Clustering is in accordance with appropriate indexes to find an optimal clustering process. The essence of clustering is the optimization process. What is important is finding ways to combine the optimization algorithm IKHA with clustering. By representing each krill in the IKHA as a clustering scheme, we find the optimal clustering scheme by choosing the appropriate objective function. A clustering scheme can be expressed by all clustering centers. That is, every krill ${X}_{i}$ represents the K clustering centers:
where d denotes the number of parameters of the data that will be clustered, and ${C}_{k}^{1}$ represents the first parameter of the first cluster center. Each krill individual can be expressed as the following matrix:

$${X}_{i}=\{{C}_{1},{C}_{2},\dots ,{C}_{k},\dots {C}_{K-1},{C}_{K}\}$$

$${C}_{k}=\left\{\begin{array}{c}{C}_{k}^{1}\\ {C}_{k}^{2}\\ \vdots \\ {C}_{k}^{d}\\ \vdots \\ {C}_{k}^{D-1}\\ {C}_{k}^{D}\end{array}\right\}$$

$${X}_{i}=\left\{\begin{array}{ccccccc}{C}_{1}^{1}& {C}_{2}^{1}& \cdots & {C}_{k}^{1}& \cdots & {C}_{K-1}^{1}& {C}_{K}^{1}\\ {C}_{1}^{2}& {C}_{2}^{2}& \cdots & {C}_{k}^{2}& \cdots & {C}_{K-1}^{2}& {C}_{K}^{2}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ {C}_{1}^{d}& {C}_{2}^{d}& \cdots & {C}_{k}^{d}& \cdots & {C}_{K-1}^{d}& {C}_{K}^{d}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ {C}_{1}^{D-1}& {C}_{2}^{D-1}& \cdots & {C}_{k}^{D-1}& \cdots & {C}_{K-1}^{D-1}& {C}_{K}^{D-1}\\ {C}_{1}^{D}& {C}_{2}^{D}& \cdots & {C}_{k}^{D}& \cdots & {C}_{K-1}^{D}& {C}_{K}^{D}\end{array}\right\}$$

In this study, one krill is used to represent a candidate solution to a problem, and the selected K initial cluster centers are potential solutions. One krill and K initial clustering center play similar roles in our algorithms. Thus, the mapping between a krill individual and K initial clustering centers can be established. In the coding method of the krill location structure, a set of initial cluster centers are generated randomly from the dataset points.

The whole krill population represents a variety of clustering schemes. In this manner, our aim is to find the optimal clustering centers. According to the principle of the minimal distance, data are categorized into the appropriate cluster. The description of the improved krill-herd clustering algorithm (IKHCA) is shown in Algorithm 1.

Algorithm 1. Improved Krill Herd Clustering Algorithm (IKHCA) |

(1) Define the parameters (K, I_{max}, N, N^{max}, V_{f}, D^{max}, and so on).(2) Initialize N krills randomly as the initial clustering center. (3) Evaluate each krill individual by fitness function. (4) For each krill individual: - Perform three motions (motion induced by another individual, foraging motion, and physical diffusion).
- Then, implement the crossover operator and the modified mutation operator (two mutation schemes were performed for individuals with different fitness levels).
- Calculate the fitness according the krill’s new position; if the new fitness is better than the older, update the krill individual position in the search space.
(6) Repeat Steps 4 and 5 until the stopping criteria are satisfied. (7) Return to the best clustering solution. |

## 5. Simulation and Experiment

To investigate the performance of IKHCA, five clustering algorithms, namely, K-means [34], ACO [35], PSO [36], KHCA I in [30], and KHCA II, were compared. KHCA II is a clustering algorithm based on KHA [19]. Five datasets obtained from UCI Machine Learning Repository [37] were used in our experiment. The details of the data sets, including the name, number of classes, attributes, and records are presented in Table 1. Our experiments were conducted on Eclipse 4.6.0 with Windows 7 environment using Intel Core i7, 3.40 GHz, and 4 GB RAM.

Before the experiment, the setting of the parameters and the selection of the objective functions in KHCA II and IKHCA were specified. In KHCA II and IKHCA, we used the sum of squared error (${I}_{SSE}$) as the objective function directly, the formula is indicated in Equation (25). The low value of ${I}_{SSE}$, the higher the quality of the clustering is

$${I}_{SSE}={\displaystyle \sum _{k=1}^{K}{\displaystyle \sum _{i=1}^{n}{\omega}_{ik}}}{\left(dis(dat{a}_{i},{C}_{k})\right)}^{2}$$

$${\omega}_{ik}=\left\{\begin{array}{c}1,\mathit{if\; data\; i\; belongs\; to\; cluster\; k}\hfill \\ 0,\mathit{else}\hfill \end{array}\right\}$$

${N}^{\mathrm{max}}$ = 0.01;

${V}_{f}$ = 0.02; and

${D}^{\mathrm{max}}$ = 0.005.

Here, C

_{t}is set to 0.5, and the inertia weights $({\omega}_{n},{\omega}_{f})$ are equal to 0.9 at the beginning of the search, and linearly decreased to 0.1 at the end to encourage exploitation. The size of the population is set to 25, ${s}_{\mathrm{max}}=5$, ${\nu}_{best}$ = 0.5 at the beginning, and linearly decreased to 0.1 at the end in IKHCA.We compared the performance of different clustering algorithms from two aspects. First, we compared the objective function value of the different clustering algorithms in Table 2, and then we compared the accuracy of different clustering algorithms in Table 3. Accuracy is specifically expressed as follows:

$$\mathrm{accuracy}=\left(\frac{\mathrm{number}\text{}\mathrm{of}\text{}\mathrm{correctly}\text{}\mathrm{placed}\text{}\mathrm{data}}{\mathrm{total}\text{}\mathrm{number}\text{}\mathrm{of}\text{}\mathrm{data}}\right)\times 100$$

Table 2 lists the best and worst means of the solution, and ranks the algorithms based on the mean values for all datasets in Table 1. As compared, algorithm results are directly taken from [30]. KHCA II and IKHCA algorithms were executed 100 times independently with the same parameters described in this paper, except that the maximum number of generations was set to 200. As shown in Table 2, IKHCA obtained better solutions for the best and worse than other algorithms on the Wine, Glass, Cancer, and CMC datasets, but not on Iris. KHCA II obtained the first solution for best on the Iris dataset. However, KHCA II generated a poor solution for the worst with respect to the Iris dataset. Then, we observed that IKHCA achieved the best solutions from mean values on all datasets, except Glass. However, IKHCA is very close to the results obtained by the KHCA II algorithm on the Glass dataset. From the experimental results, our proposed algorithm achieved better optimal solutions with improved stability in a limited number of iterations. IKHCA ranked first in all algorithms.

In Table 3, the clustering accuracies of IKHCA and other clustering algorithms are given, and part of the results were obtained directly from [30], with the bold font indicating the best results. At a glance, one can easily see that the last three clustering algorithms (KHCA I, KHCA II, and IKHCA) by using KHA are obviously better than the K-means, ACO, and PSO algorithms. It can be seen that the introduction of KHA into the clustering problem is reasonable and effective. Based on these results, IKHCA is proved to be the best algorithm with respect to objective function value and accuracy.

## 6. Conclusions and Future Work

KHA is a good swarm intelligent heuristic algorithm that could be gradually applied to address real-world problems. For the original KHA algorithm that could not always converge rapidly and search globally particularly well, we proposed IKHA, which improved the original mutation mechanism to provide two different mutation schemes and introduced an updated mechanism. In IKHA, we were in accordance with the fitness of individuals, set different mutation schemes according to their own conditions, made outstanding individuals look for better solutions, and the rest moved closer to the good individual. Then, through the updated mechanism, optimal individuals looked for potential solutions in the surrounding space to avoid being stuck in the local optimal zone. Experimental results showed that IKHA performed better than KHA.

Several clustering algorithms depend highly on the initial states and always converge to the nearest local optimum from the starting position of the search. In order to find the optimal clustering center, we applied the IKHA to solve an actual clustering problem and proposed the improved krill-herd clustering algorithm (IKHCA). According to the experiments, the IKHCA had better efficiency than, and outperformed, other well-known clustering approaches. Moreover, the results of the experiments show that the IKHA can successfully be introduced in clustering problems and perform best in almost all experimental datasets. In the future, there are several issues the can be further studied, such as utilizing the optimization ability of the IKHA to find the optimal cluster number and apply the IKHA to other scenarios to solve a wide range of real-world problems.

## Acknowledgments

This work was supported by the National Natural Science Foundation of China (U1431227); and the Guangzhou Scientific and Technological Project (201604010037).

## Author Contributions

All of the authors contributed to the content of this paper. Qin Li participated in the algorithm analyses, design, algorithm implementation and draft preparation. Bo Liu analyzed the experimental data and revised this paper. All authors read and approved the manuscript.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

- Abualigah, L.M.Q.; Hanandeh, E.S. Applying Genetic Algorithms to Information Retrieval Using Vector Space Model. Int. J. Comput. Sci. Eng. Appl.
**2015**, 5, 19–28. [Google Scholar] - Carpineto, C.; Osiński, S.; Romano, G.; Weiss, D. A Survey of Web Clustering Engines. Acm Comput. Surv.
**2009**, 41, 17. [Google Scholar] [CrossRef] - Rafailidis, D.; Constantinou, E.; Manolopoulos, Y. Landmark selection for spectral clustering based on Weighted PageRank. Futur. Gener. Comput. Syst.
**2017**, 68, 465–472. [Google Scholar] [CrossRef] - Wu, B.; Hu, B.G.; Ji, Q. A Coupled Hidden Markov Random Field Model for Simultaneous Face Clustering and Tracking in Videos. Pattern Recognit.
**2016**, 64, 361–373. [Google Scholar] [CrossRef] - Kaya, I.E.; Pehlivanlı, A.C.; Sekizkardeş, E.G.; Ibrikci, T. PCA Based Clustering for Brain Tumor Segmentation of T1w MRI Images. Comput. Method. Progr. Biomed.
**2017**, 140, 19–28. [Google Scholar] [CrossRef] [PubMed] - Macqueen, J. Some Methods for Classification and Analysis of MultiVariate Observations. Proc. Berkeley Symp. Math. Stat. Probab.
**1967**, 1, 281–297. [Google Scholar] - Jain, A.K. Data Clustering: 50 Years Beyond K-Means; Springer: Berlin, Germany, 2008. [Google Scholar]
- Langfelder, P.; Zhang, B.; Horvath, S. Defining clusters from a hierarchical cluster tree: The Dynamic Tree Cut package for R. Bioinformatics
**2008**, 24, 719. [Google Scholar] [CrossRef] [PubMed] - Goldberg, D.E. Genetic Algorithms in Search, Optimization and Machine Learning. Choice Rev. Online
**1989**, 27, 2104–2116. [Google Scholar] - Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the 1995 IEEE International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
- Xu, B.; Zhu, J.; Chen, Q. Ant Colony Optimization. New Advances in Machine Learning; InTech: Jiangsu, China, 2010; pp. 1155–1173. [Google Scholar]
- Zong, W.G.; Kim, J.H.; Loganathan, G.V. A New Heuristic Optimization Algorithm: Harmony Search. Simul. Trans. Soc. Model. Simul. Int.
**2001**, 76, 60–68. [Google Scholar] - Abbass, H.A. MBO: Marriage in honey bees optimization-a Haplometrosis polygynous swarming approach. In Proceedings of the 2001 Congress on Evolutionary Computation, Seoul, Korea, 27–30 May 2001; Volume 1, pp. 207–214. [Google Scholar]
- Xiaolei, L.I.; Shao, Z.; Qian, J. An Optimizing Method Based on Autonomous Animats: Fish-swarm Algorithm. Syst. Eng. Theory Pract.
**2002**, 11, 32–38. [Google Scholar] - Karaboga, D. An Idea Based on Honey Bee Swarm for Numerical Optimization; Erciyes University, Engineering Faculty, Computer Engineering Department: Kayseri, Turkey, 2005. [Google Scholar]
- Yang, X.S. Firefly Algorithms for Multimodal Optimization. Mathematics
**2009**, 5792, 169–178. [Google Scholar] - Zhao, R.; Tang, W. Monkey algorithm for global numerical optimization. J. Uncertain Syst.
**2008**, 2, 165–176. [Google Scholar] - Yang, X.S. A New Metaheuristic Bat-Inspired Algorithm. Comput. Knowl. Technol.
**2010**, 284, 65–74. [Google Scholar] - Gandomi, A.H.; Alavi, A.H. Krill herd: A new bio-inspired optimization algorithm. Commun. Nonlinear Sci. Numer. Simul.
**2012**, 17, 4831–4845. [Google Scholar] [CrossRef] - Amudhavel, J.; Sathian, D.; Raghav, R.S.; Pasupathi, L.; Baskaran, R.; Dhavachelvan, P. A Fault Tolerant Distributed Self Organization in Peer To Peer (P2P) Using Krill Herd Optimization. In Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology, Unnao, India, 6–7 March 2015. [Google Scholar]
- Amudhavel, J.; Kumarakrishnan, S.; Gomathy, H.; Jayabharathi, A.; Malarvizhi, M.; Prem Kumar, K. An Scalable Bandwidth Reduction and Optimization in Smart Phone Ad hoc Network (SPAN) Using Krill Herd Algorithm. In Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology, Unnao, India, 6–7 March 2015. [Google Scholar]
- Kowalski, P.A.; Lukasik, S. Training Neural Networks with Krill Herd Algorithm. Neural Proc. Lett.
**2016**, 9463, 5–17. [Google Scholar] [CrossRef] - Chaturvedi, S.; Pragya, P.; Verma, H.K. Comparative analysis of particle swarm optimization, genetic algorithm and krill herd algorithm. In Proceedings of the 2015 International Conference on Computer, Communication and Control, Indore, India, 10–12 September 2015. [Google Scholar]
- Storn, R.; Price, K. Differential Evolution—A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. J. Glob. Optim.
**1997**, 11, 341–359. [Google Scholar] [CrossRef] - Simon, D. Biogeography-Based Optimization. IEEE Trans. Evol. Comput.
**2009**, 12, 702–713. [Google Scholar] [CrossRef] - Wang, G.; Guo, L.; Gandomi, A.H.; Cao, L.; Alavi, A.H.; Duan, H.; Li, J. Lévy-Flight Krill Herd Algorithm. Math. Probl. Eng.
**2013**, 2013, 1–14. [Google Scholar] [CrossRef] - Li, L.; Zhou, Y.; Xie, J. A Free Search Krill Herd Algorithm for Functions Optimization. Math. Probl. Eng.
**2014**, 2014, 1–21. [Google Scholar] [CrossRef] - Li, J.; Tang, Y.; Hua, C.; Guan, X. An improved krill herd algorithm: Krill herd with linear decreasing step. Appl. Math. Comput.
**2014**, 234, 356–367. [Google Scholar] [CrossRef] - Wang, G.G.; Gandomi, A.H.; Alavi, A.H. Stud krill herd algorithm. Neurocomputing
**2014**, 128, 363–370. [Google Scholar] [CrossRef] - Nikbakht, H.; Mirvaziri, H. A new clustering approach based on K-means and Krill Herd algorithm. In Proceedings of the 2015 IEEE Electrical Engineering, Tehran, Iran, 10–14 May 2015; pp. 662–667. [Google Scholar]
- Gandomi, A.H.; Yang, X.S. Benchmark Problems in Structural Optimization. In Computational Optimization, Methods and Algorithms; Springer: Berlin, Germany, 2011; pp. 259–281. [Google Scholar]
- Price, H.J. Swimming behavior of krill in response to algal patches: A mesocosm study. Limnol. Oceanogr.
**1989**, 34, 649–659. [Google Scholar] [CrossRef] - Morin, A.; Okubo, A.; Kawasaki, K. Acoustic data analysis and models of krill spatial distribution. In Scientific Committee for the Conservation of Antarctic Marine Living Resources; Selected Scientific Papers, Part I; Scientific Committee: Tasmania, Australia, 1988; pp. 311–329. [Google Scholar]
- Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A K-Means Clustering Algorithm. Appl. Stat.
**1979**, 28, 100–108. [Google Scholar] [CrossRef] - Shelokar, P.S.; Jayaraman, V.K.; Kulkarni, B.D. An ant colony approach for clustering. Anal. Chim. Acta
**2004**, 509, 187–195. [Google Scholar] [CrossRef] - Chen, C.Y.; Ye, F. Particle swarm optimization algorithm and its application to clustering analysis. In Proceedings of the 2004 IEEE International Conference on Networking, Sensing and Control, Taipei, Taiwan, 21–23 March 2004; Volume 2, pp. 789–794. [Google Scholar]
- Blake, C.L.; Merz, C.J. University of California at Irvine Repository of MachineLearning Databases. 1998. Available online: http://www.ics.uci.edu/mlearn/MLRepository.html (accessed on 15 May 2017).
- Kowalski, P.A.; Lukasik, S. Experimental Study of Selected Parameters of the Krill Herd Algorithm. In Intelligent Systems; Springer: Cham, Switzerland, 2014; pp. 473–485. [Google Scholar]

Name | Number of | ||
---|---|---|---|

Clusters | Parameters | Elements | |

Iris | 3 | 4 | 150 |

Wine | 3 | 13 | 178 |

Glass | 6 | 9 | 214 |

Cancer | 2 | 9 | 683 |

CMC | 3 | 10 | 1473 |

Data Set | Criteria | K-means | ACO | PSO | KHCA I | KHCA II | IKHCA |
---|---|---|---|---|---|---|---|

Iris | Best | 98.5 | 97.4 | 97.1 | 96.4 | 96.65 | 96.66 |

Worst | 117.4 | 99.2 | 100.5 | 103.1 | 97.68 | 96.67 | |

Mean | 104.7 | 97.8 | 98.8 | 98.6 | 96.67 | 96.66 | |

Rank | 6 | 3 | 5 | 4 | 2 | 1 | |

Wine | Best | 16,562.6 | 16,510.3 | 16,336.4 | 16,328.1 | 16,293.01 | 16,292.12 |

Worst | 17,995.9 | 16,535.8 | 16,426.4 | 16,430.9 | 17,710.16 | 16,589.23 | |

Mean | 17,101.5 | 16,528.5 | 16,396.3 | 16,384.2 | 16,490.17 | 16,305.51 | |

Rank | 6 | 5 | 3 | 2 | 4 | 1 | |

Glass | Best | 225.3 | 219.8 | 271.1 | 216.2 | 210.85 | 210.30 |

Worst | 263.1 | 258.1 | 286.8 | 255.9 | 246.16 | 223.03 | |

Mean | 248.2 | 241.3 | 279.3 | 238.3 | 215.86 | 215.90 | |

Rank | 5 | 4 | 6 | 3 | 1 | 2 | |

Cancer | Best | 2994.9 | 2966.6 | 2974.4 | 2945.8 | 2964.39 | 2964.39 |

Worst | 3651.5 | 3098.9 | 3289.1 | 3088.8 | 3571.53 | 2971.15 | |

Mean | 3131.3 | 2984.9 | 3102.8 | 2981.4 | 2995.87 | 2968.16 | |

Rank | 6 | 3 | 5 | 2 | 4 | 1 | |

CMC | Best | 5891.3 | 5721.8 | 5795.5 | 5711.2 | 5700.16 | 5692.20 |

Worst | 5989.4 | 5836 | 5866 | 5821.3 | 5791.52 | 5695.02 | |

Mean | 5945.1 | 5773 | 5823.1 | 5759.4 | 5760.29 | 5694.91 | |

Rank | 6 | 4 | 5 | 2 | 3 | 1 | |

Mean Rank | 5.8 | 3.8 | 4.8 | 2.6 | 2.8 | 1.2 | |

Final Rank | 6 | 4 | 5 | 2 | 3 | 1 |

Data Set | K-means | ACO | PSO | KHCA I | KHCA II | IKHCA |
---|---|---|---|---|---|---|

Iris | 83.3 | 88.5 | 87.7 | 89.07 | 89.67 | 90.67 |

Wine | 63.62 | 70.6 | 70.4 | 71.12 | 70.99 | 73.03 |

Glass | 60.8 | 64.7 | 56.65 | 64.98 | 65.01 | 65.88 |

Cancer | 93.37 | 94.1 | 94.62 | 95.01 | 95.02 | 95.16 |

CMC | 41.8 | 45.5 | 45.2 | 45.5 | 45.55 | 45.62 |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).