An Information Entropy-Based Animal Migration Optimization Algorithm for Data Clustering

Data clustering is useful in a wide range of application areas. The Animal Migration Optimization (AMO) algorithm is one of the recently introduced swarm-based algorithms, which has demonstrated good performances for solving numeric optimization problems. In this paper, we presented a modified AMO algorithm with an entropy-based heuristic strategy for data clustering. The main contribution is that we calculate the information entropy of each attribute for a given data set and propose an adaptive strategy that can automatically balance convergence speed and global search efforts according to its entropy in both migration and updating steps. A series of well-known benchmark clustering problems are employed to evaluate the performance of our approach. We compare experimental results with k-means, Artificial Bee Colony (ABC), AMO, and the state-of-the-art algorithms for clustering and show that the proposed AMO algorithm generally performs better than the compared algorithms on the considered clustering problems.


Introduction
The clustering problem is a basic research topic in data mining [1][2][3]; it is encountered in a number of academic and practical fields such as text document analysis, web data analysis, image processing, data compression, and bioinformatics.In recent years, we have viewed an increasing number of publications on the models and algorithms of data clustering [4][5][6][7], since the topic plays an important role in these fields.The task of clustering is to recognize natural groupings in multidimensional data based on certain similarity measures.For example, Euclidean distance is a measurement for evaluating similarities between clusters, which is one of the most frequently used distances in clustering problems.Specifically, given N objects, one should allocate each object to one of K clusters with the aim of minimizing the sum of squared Euclidean distances between each object and its corresponding centroid of the cluster.Formally, the problem can be described as follows [8,9]: w ij x i ´zj 2 , i " 1, 2, . . ., N, j " 1, 2, . . ., K , where N is the number of patterns, and K is the number of clusters; x i is the location of the i-th pattern, and z j is the center of the j-th cluster, and it is obtained by the following Equation (2): where N j is the number of patterns in the j-th cluster, and w ij is the association weight of pattern x i to cluster j, i.e., w ij is 1 if pattern i belongs to cluster j and 0 otherwise.Generally, the clustering problem is computationally difficult, namely NP-hard [10], so investigating efficient optimization algorithms to find better clusters is still an important task.The difficulty for designing and improving such algorithms is to propose effective strategies for optimization and to find suitable values of control parameters.Over the last several decades, a wide variety of algorithms and improvements have been presented and analyzed, which are mainly divided in two kinds: hierarchical methods and partitional methods.The k-means algorithm [11] is a basic and well-known partitional method.It starts from k random positions (centroids) and iterates by updating these centroids until they are no longer moved.The algorithm aims to minimize an objective function, which can be described as follows: where x i is the location of the i-th pattern, and c k is the k-th centroid.This method is easy to handle, but it often converges to a local optimum, so the quality of results highly depends on the initialization positions.
In order to overcome this issue, an alternative approach is swarm-based or metaheuristic-based optimization algorithms, where the genetic algorithm [12][13][14], particle swarm optimization [15][16][17], and the Artificial Bee Colony (ABC) algorithm [18,19] are typical ones.Very recently, some new metaheuristic algorithms have been proposed, such as Monarch Butterfly Optimization (MBO) [20], Elephant Herding Optimization (EHO) [21], and Animal Migration Optimization algorithm (AMO) [22].Among these, the AMO algorithm, proposed by Li et al., is an efficient one [22].Recent studies have shown that the AMO algorithm is good at solving many numeric optimization problems, and it performs well on benchmark instances.These metaheuristic-based algorithms have shown excellent performance on solving many optimization problems [23][24][25][26].They can always achieve good solutions compared to other heuristic algorithms.Metaheuristic-based algorithms have also been employed to deal with clustering problems [27][28][29][30][31][32], since such problems are naturally optimization problems.
In addition, information entropy is a good measure for data clustering, and it is often used as a heuristic.To cluster high-dimensional objects in subspaces and determine the importance of each dimension, Jing et al. [33] proposed a method that combines the weight entropy into the objective function to be minimized, and they also introduced an extra step to compute the contribution of each dimension to each cluster.Furthermore, information entropy was used for determining the optimal number of clusters.Liang et al. [34] proposed an approach to measure within-cluster entropy and between-cluster entropy with the aim of determining the number of clusters in a given data set effectively.Cheung and Jia [35] investigated clustering on mixed data composed of numerical and categorical attributes, and proposed an iterative clustering algorithm.To analyze similarities between objects, they provided a method to estimate the significance of categorical attributes using information theory, and then showed that the algorithm can determine the number of clusters automatically without prefixing control parameters.
Though there are several clustering algorithms that employ information entropy to analyze similarities of clusters, there are rare algorithms that use entropy as heuristic information for optimization.In this paper, we propose an information entropy-based AMO algorithm for solving clustering problems.The key feature of the algorithm is that a new migration method as well as a new population updating method controlled by information entropy is proposed.The original migration method is designed for common optimization problems, but the clustering data possess their own distribution, and values of attributes always have cluster properties, so the new migration method employs entropy heuristics to control the searching direction of each attribute, and a similar strategy is also incorporated into our population updating method.We perform intensive experiments on benchmark instances.Experimental results show that our new approach can find better clustering results compared to other approaches presented recently, and further indicate that our new approach accelerates convergence speed of optimization.
The remainder of this paper is organized as follows: In Section 2, we briefly review the AMO algorithm, and the entropy-based AMO algorithm for clustering problems is introduced in Section 3. Benchmarks for evaluating algorithms and experimental results are given in Section 4. Finally, conclusions are provided in Section 5.

The AMO Algorithm
In this section, we briefly introduce the AMO algorithm.The AMO algorithm is a swarm-based algorithm inspired by the migration phenomenon of animals.In the algorithm, individuals are regarded as positions of animals, and positions can be moved by mainly two operations: animal migration and population updating.The operation of animal migration simulates behaviors of animal groups moving from the current area to a new area.New positions of individuals will be produced according to the direction of animal migration, where three migration rules are considered: Individuals move towards the same direction as their neighboring individuals; individuals remain near their neighboring individuals; and individuals avoid collisions with their neighboring individuals.Using the three migration rules, a probability approach is introduced to yield new positions of individuals.The algorithm begins with a randomly initialization population, which is comprised of NP feature vectors with D x dimensions, which can be stated as follows: x i,j,0 " x j,min `rand i,j ¨`x j,max ´xj,min ˘, where x j,max and x j,min are the maximum value and the minimum value of the j-th dimension.x i,j,0 is the j-th dimension value of the i-th individual in the initialization population, and rand i,j is a uniformly random number between 0 and 1, i = 1, . . ., NP and j = 1, . . ., D x .After producing the initialization population, animal migration and population updating operations are performed iteratively.During the animal migration, the individuals are supposed to move to new positions according to the positions of their neighboring individuals, which can be described as follows: x i,j,G`1 " x i,j,G `δ¨´x neighborhood,j,G ´xi,j,G ¯, where x i,j,G is the j-th dimension value of the i-th individual in the current population G, and x i,j,G`1 is the j-th dimension value of the i-th individual in the new population G + 1; x neighborhood,j,G is the j-th dimension value of the neighboring individual of x i,j,G , which is defined using a ring topology scheme illustrated in Figure 1.In AMO, Li et al. [22] employ the four nearest individuals and the i-th individual itself for each dimension as its neighborhood and choose one individual in the neighborhood randomly as x neighborhood,j,G .As an example, Figure 1 shows the neighborhood of x i,j,G if i-th individual is x i,j,G .For the j-th dimension, x neighborhood,j,G is selected from the (i ´2)-th individual, the (i ´1)-th individual, the i-th individual, the (i + 1)-th individual, and the (i + 2)-th individual.δ is a random number produced by a Gaussian distribution with N (0, 1).
can find better clustering results compared to other approaches presented recently, and further indicate that our new approach accelerates convergence speed of optimization.The remainder of this paper is organized as follows: In Section 2, we briefly review the AMO algorithm, and the entropy-based AMO algorithm for clustering problems is introduced in Section 3. Benchmarks for evaluating algorithms and experimental results are given in Section 4. Finally, conclusions are provided in Section 5.

The AMO Algorithm
In this section, we briefly introduce the AMO algorithm.The AMO algorithm is a swarm-based algorithm inspired by the migration phenomenon of animals.In the algorithm, individuals are regarded as positions of animals, and positions can be moved by mainly two operations: animal migration and population updating.The operation of animal migration simulates behaviors of animal groups moving from the current area to a new area.New positions of individuals will be produced according to the direction of animal migration, where three migration rules are considered: Individuals move towards the same direction as their neighboring individuals; individuals remain near their neighboring individuals; and individuals avoid collisions with their neighboring individuals.Using the three migration rules, a probability approach is introduced to yield new positions of individuals.The algorithm begins with a randomly initialization population, which is comprised of NP feature vectors with dimensions, which can be stated as follows: where , and , are the maximum value and the minimum value of the j-th dimension.
, , is the j-th dimension value of the i-th individual in the initialization population, and , is a uniformly random number between 0 and 1, i = 1, …, NP and j = 1, …, .After producing the initialization population, animal migration and population updating operations are performed iteratively.During the animal migration, the individuals are supposed to move to new positions according to the positions of their neighboring individuals, which can be described as follows: where , , is the j-th dimension value of the i-th individual in the current population G, and is the j-th dimension value of the i-th individual in the new population G + 1; , , is the j-th dimension value of the neighboring individual of , , , which is defined using a ring topology scheme illustrated in Figure 1.In AMO, Li et al. [22] employ the four nearest individuals and the i-th individual itself for each dimension as its neighborhood and choose one individual in the neighborhood randomly as , , .As an example, Figure 1 shows the neighborhood of , , if i-th individual is , , .For the j-th dimension, , , is selected from the (i − 2)-th individual, the (i − 1)-th individual, the i-th individual, the (i + 1)-th individual, and the (i + 2)-th individual.
is a random number produced by a Gaussian distribution with N (0, 1).The population updating simulates how animals leave the group and new individuals join in the new population, as Equation ( 6) describes: where x r1,j,G is the j-th dimension value of the individual to be updated, which is chosen randomly in the current population; moreover, different from x i,j,G , x r2,j,G is the j-th dimension value of another random individual, and x best,j,G is the j-th dimension value of the best individual that has been found.rand 1 and rand 2 are two uniformly random numbers between 0 and 1.The algorithm makes the assumption that the number of animals in the population remains unchanged.Therefore, in the updating, it replaces some of the animals with new individual according to a probability Pa i , which is related to the fitness of individuals and can be calculated as follows: where Pa i is the probability value of the i-th individual, NP is the number of the individuals in the population, and sn i is the sequence number of the fitness of i-th individual after being sorted by their fitness in descending order, where i = 1, 2, . . ., NP.According to Equation ( 7), Pa i is 1 if the i-th individual is of the best fitness, whereas Pa i is 1/NP if the i-th individual has the worst fitness.

Algorithm 1. Animal Migration Optimization (AMO) algorithm
1 begin 2 set the generation counter G " 0 ; and randomly initialize NP individuals denoted as X i (1 ď i ď NP) with D x dimensions.3 evaluate the fitness for each individual.4 while stopping criteria is not satisfied do 5 for i = 1 to NP do 6 for j = 1 to D x do end for 9 end for 10 for i = 1 to NP do 11 evaluate the fitness of the offspring X i,G`1 , let X i " X i,G`1 if X i,G`1 is better than X i 12 end for 13 select r1 and r2 randomly (r1 ‰ r2 ‰ i) end for 20 end for 21 for i = 1 to NP do 22 evaluate the fitness of the offspring X i,G`1 , let X i " X i,G`1 if X i,G`1 is better than X i 23 end for 24 memorize the best solution achieved so far 25 end while 26 end For each individual and each dimension, a uniformly random number, denoted by rand, between 0 and 1 will be produced as the probability to determine whether the individual is reserved or is replaced by a new individual.Therefore, individuals with better fitness will be reserved with higher probability in the next generation, while those with worse fitness will probably be replaced by new individuals.Moreover, the animal with best position will be retained in the next generation.The entire AMO algorithm is described in Algorithm 1 [22].

The Information Entropy-Based AMO
In this section, we present a modified AMO algorithm.The original AMO algorithm is good at global searching and local searching, and can lead to a satisfactory solution for numeric optimization.However, the data clustering problem is quite different from the benchmarks of numeric optimization problems, as it is easy to see that data in a clustering problem usually has its own distributions.To adapt the AMO algorithm for data clustering, we investigate inherent features and propose an entropy-based AMO algorithm.

Attribute Information Entropy
It is clear that data in a clustering problem is a collection of points in a multi-dimensional space.In the general case, those points are not randomly positioned, so an attribute in the data may obey a certain distribution.Therefore, it is more reasonable to use different strategies according to the distribution of the attribute rather than to use a single strategy.Information entropy can be used to evaluate the disorder degree of a stochastic variable, so it is a suitable measure of attributes to evaluate their distribution.We will discuss information entropy of attributes in this subsection.
We use the method proposed by Shannon [36] to calculate information entropy, which is usually called Shannon's entropy.Shannon's entropy is used widely in many information measures.Given a clustering data set, we record the number of attributes as D, and the number of classes (centroids) as k first.Then, to evaluate Shannon's entropy value, we use h i pi " 1, 2, . . ., Dq to denote entropy of the i-th attribute, and discretize the attribute values, where each value is approximated to its nearest integer, and then calculate the i-th attribute entropy as follows: where low i is the minimum integer, and high i is the maximum integer after discretization of attribute values.p j is the percentage of the j-th integer of the attribute.Because attribute entropy will be used to control the migration process of the animal population, which will be discussed in the next subsection, we use the maximum possible entropy of the i-th attribute to normalize Shannon's entropy h i .The maximum possible entropy of the i-th attribute, denoted as maxh i , can be calculated as follows: Finally, the normalized entropy of the i-th attribute normh i can be described as where h i and maxh i is obtained from Equations ( 9) and (10), respectively.In the case that up i " low i , we will set normh i to 1. Thus, a normalized information entropy vector NormH can be obtained and NormH " pnormh 1 , normh 2 , . . ., normh D q.

The New Animal Migration Method
In this subsection, we present our new migration method.In the original AMO algorithm, an individual may move towards one of its neighbors or move away from the neighbor in the migration operation.This method guarantees diversification of the population.To enhance the convergence speed of the migration step and to improve the effectiveness of global searching, we propose a new method by using an alternative route, where two strategies are employed for migration.One is the original method, where a neighbor is picked up with the method used in the original AMO [22].The other is newly proposed.It selects S individuals randomly from the current population (1 ď S ď NP), and the best one among those S individuals will be picked up as a reference individual, denoted as X re f ,G .Usually, we set S to 5.This reference individual as well as the selected neighbor are taken as candidate migration directions, rather than only moving towards their neighbors.Migration direction (moving according to the reference individual or according to the neighbor) is controlled by a randomized approach.To balance the efforts between diversification of the population and convergence speed, information entropy is used here to make decisions.An attribute with large information entropy implies that values of it are uncertain and disordered; thus, searching on this attribute converges slowly, so we accelerate convergence on this attribute by moving it according to the position of the global reference individual with a higher probability than attributes with lower information entropy.Therefore, we present the new migration method as Procedure 1.
In Procedure 1, same as the original method, X i,G is the current position of the i-th individual, and X neighborhood,G is the current position of the selected neighbor.The selection method is the same as the original AMO; NP is the number of the individuals in the population, and D X is the dimension of individuals; δ is a random number produced by a Gaussian distribution with N (0, 1).Different from the original one, dimensions of an individual are not only processed by its neighbor and may also be processed with X re f ,G , where attribute entropy NormH controls the direction.X re f ,G will be used if rand (a uniformly distribution random number between 0 and 1) is smaller than the normalized entropy of the attribute.Otherwise, the individual will move towards or away from X neighborhood,G .

Procedure 1. The new animal migration operation
Select the best one from S random individuals as X re f ,G for j = 1 to D X do if rand ă normh j then x i,j,G`1 " x i,j,G `δ¨´x re f ,j,G ´xi,j,G ēlse x i,j,G`1 " x i,j,G `δ¨´x neighborhood,j,G ´xi,j,G ēnd if end for end for

The New Population Updating Method
During the population updating of AMO, animals will be replaced by a new individual with a probability approach, and the new individual is produced by Equation (6).To further enhance the convergence speed, we propose a new method to decide the manner of producing new individuals, using an attribute entropy similar to the method of our migration method.Two updating manners are involved in the method: moving towards the best individual and moving towards both the best individual and a random individual.Attributes with a higher entropy will probably move close to those of the best individual.The details of the method are shown as follows: Procedure 2. The new population updating operation select randomly integers r1 ‰ r2 ‰ i for i=1 to NP do for j=1 to D X do if rand 1 ą Pa then if rand 2 ă normh j then x i,j,G`1 " x r1,j,G `rand a ¨´x best,j,G ´xi,j,G ēlse where NP is the number of the individuals in the population, and D X is the dimension of individuals; rand 1 and rand 2 are both random numbers between 0 and 1 with uniform distribution.rand a , rand b , and rand n are random numbers between 0 and 1 to control the convergence speed.

The Entire Algorithm for Solving Clustering Problems
With discussions and newly proposed strategies in the above subsections, we present our Entropy-based Animal Migration Optimization (EAMO) combining both the new migration method and the new population updating method.
For solving clustering problems by the EAMO, initializing the population is the first operation.During this process, we set an initial population with NP animal individuals where an individual is a vector with length D x = D ˆK , where D is the number of attributes of the input dataset, and K is the number of clustering centroids.Positions of K clustering centroids are encoded into the vector, where the first centroid corresponds to the first D elements, and the second centroid corresponds to the second D elements, and so on.Each value in the initial individual vector is produced randomly and uniformly between the maximum value and the minimum value of the corresponding attribute in the input data set.After initialization of population, the EAMO performs optimization iteratively, where Equation ( 3) is used to evaluate fitness of individuals, until the stopping criterion is satisfied.The detailed description of the algorithm framework is listed as follows.
As is shown in Algorithm 2, it starts from generating initial individuals randomly and uniformly in the ranges of attributes from the input data set, and it then calculates the normalized entropy vector NormH.Afterwards, the algorithm performs optimization iteratively, where the proposed entropy-based migration operation and population updating operation are employed.Attributes will be updated by the population updating operations with probability Pa, which is the same as the original AMO algorithm.After each migration of an individual, we calculate the fitness of the new location by Equation ( 3) in Section 1.The new location will replace the current one if the new fitness is better than that of the old location.Identical to the migration operation, new better individuals will replace old ones after each population's updating step.In addition, the best individual found so far will be recorded after all individuals are processed by migration and population updating operations.The algorithm terminates if it achieves the maximum number of iterations, which is the same as the original AMO.The flowchart of the information entropy-based AMO algorithm is shown is Figure 2.

Algorithm 2. Information Entropy-based Animal Migration Optimization (EAMO) algorithm
1 begin 2 evaluate the information entropy for attributes, and calculate the entropy vector NormH 3 set the generation counter G " 0, and randomly initialize NP individuals denoted as X i (1 ď i ď NP) 4 evaluate the fitness for each individual in the population 5 while stopping criteria is not satisfied do 6 perform the new animal migration operation (Procedure 1) 7 for i = 1 to NP do 8 evaluate the fitness of the offspring X i,G`1 , let X i " X i,G`1 if X i,G`1 is better than X i 9 end for 10 perform the new population updating operation (Procedure 2) 11 for i = 1 to NP do 12 evaluate the fitness of the offspring

Experiments
In this section, we carry out computational experiments to evaluate our algorithm.First, we introduce benchmark data sets we used, and intensive experiments are then performed and

Experiments
In this section, we carry out computational experiments to evaluate our algorithm.First, we introduce benchmark data sets we used, and intensive experiments are then performed and compared to other well-known clustering algorithms.We also analyze the effectiveness of our migration methods by comparing with the EAMO algorithm without entropy heuristics.

Data Set
We select 12 well-known benchmark problems from University of California Irvine (UCI) Machine Learning Repository to test those algorithms.Those data sets we choose are frequently used as benchmark for clustering, where the numbers of attributes and the number of classes in each data set are quite different.Table 1 summarizes the main characteristics of those data sets.
Here, we give a brief description of those data sets.All of those data sets come from real-world applications ranging from healthy and medicine to education and criminological investigation.TAE and CMC in Table 1 are the abbreviations of Teaching Assistant Evaluation Data Set and Contraceptive Method Choice Data Set, respectively.The largest data set consists of 1473 objects characterized by 10 attributes, and the data set with the most attributes are Wine Data Set and StatLog (Heart) Data Set, up to 13 attributes.Most data sets have two or three categories, whereas there are six categories in Glass Identification, which is the largest in all data sets.Some of them consist of both categorical attributes and numerical attributes.

Comparisons with Other Algorithms
In order to demonstrate the effectiveness and performance of the proposed EAMO algorithm, we incorporate our entropy-based method into the source code of AMO and implement the new clustering algorithm within MATLAB (version 7.8, The MathWorks, Inc., Natick, MA, USA).The k-means, ABC [18,19], and AMO [22] are considered for comparisons.In addition, we compare our algorithm with recent algorithms such as GSA-KM [37], BH [38], and WK-means [39].
To make a fair comparison, we set population size to 100 for ABC, AMO, and EAMO and each algorithm with maximum 100 iterations as the stop criterion.We perform all experiments on a laptop with an Intel(R) Core(TM) i5-4200M 2.50 GHz CPU, and 4 GB RAM, running Windows 10.Each data set is tested 30 times with random initial solutions.We recorded the result of each run, and counted best and average results of 30 runs to evaluate optimization ability.Standard deviation is also calculated to show the robustness of the algorithms.The results are listed in Table 2.As can be seen in Table 2, EAMO is able to achieve better average performance than other algorithms (k-means, ABC, and AMO) for all data sets.For the Survival data set, ABC and EAMO both find the smallest best values, i.e., 2566.9889,but the standard deviation of EAMO is 4.6754 ˆ10 ´7, which is at least 4 orders of magnitude better than the results of other algorithms.Similar results can also be found on some other data sets, such as Iris, TAE, Seeds, Cancer, and Heart.For those data sets mentioned above, EAMO achieves the smallest mean and best values compared to other algorithms, and standard deviations are all several orders of magnitude better than the results of other algorithms.For Glass data set, EAMO achieves better performance, except for the standard deviation.Furthermore, the results in Table 2 also indicate that the EAMO has the strongest robustness with competitive mean values for most clustering problems.
Moreover, we compare our algorithm with clustering algorithms recently proposed.They are GSA-KM [37], BH [38], and WK-means [39].The results of those three algorithms are from the experiments results in [37][38][39], respectively.Table 3 shows the comparison results of those algorithms and EAMO.
The results of five data sets in Table 3 have been found from their works [37][38][39].Compared with the GSA-KM algorithm, it is obvious that EAMO performs better for four of the data sets (Iris, Cancer, Wine and CMC) out of five compared data sets.Compared with the BH algorithm, EAMO is able to find better results for three data sets (Iris, Cancer and Wine) out of four compared data sets, where the BH algorithm achieves the best solution for the data set of Glass among these four algorithms.In addition, the results of EAMO are also better than those of WK-means for all three compared data sets.Those comparison results prove that EAMO can perform better in most clustering problems than other existing algorithms.The best values are indicated in bold type.The dashed line is filled in the cell if no result can be found.

Analysis of Entropy-Based Heuristics
In this subsection, we further investigate the contribution of the entropy-based heuristics in our EAMO algorithm.Two AMO-based algorithms are selected for comparison.The first one is a modified EAMO, denoted by EAMO1, by using the newly proposed animal migration operation and the original population updating operation of the AMO.The second one, denoted by EAMO2, is obtained by using the newly proposed population updating operation and the original animal migration operation of the AMO.
Similar to the experiment in the previous section, 12 data sets are tested with 30 runs for EAMO, EAMO1, and EAMO2.Parameter settings are the same as in the previous experiment.We calculate best and average results as well as the standard deviation to compare with the results of EAMO.Table 4 shows the results of those experiments.
From Table 4, it is clear that EAMO1 and EAMO2 are better than AMO, as they obtain better solutions for most data sets.Between EAMO1 and EAMO2, EAMO1 performs better, as we can see that both mean and best solutions produced by EAMO1 are better than those of EAMO2.On the other hand, EAMO is better than EAMO1 since it performs best on mean results for 10 data sets, whereas EAMO1 performs best only for 3 data sets, and the same mean result is obtained for the data set Survival.Furthermore, EAMO has a good ability to find best results, as it finds the best solutions of 11 data sets, while there are only 6 data sets for which EAMO1 finds the best solutions.Therefore, it can be concluded that both entropy-based heuristic operations make efforts on improving searching effectiveness.
To show statistical results of all data sets, we record the best run by all algorithms for each data set and calculate relative percentage deviation of each run by using Equation (11).
where R is the clustering result of a run of an algorithm for a data set, R b is the best result of all runs of all algorithms for the data set, and RPD is the percentage result of R. By doing so, results of all data sets can be compared together.After that, we show the results made by those algorithms are statistically significant by plotting 95% confidence intervals for the algorithm factor, which is depicted in Figure 3. From it, we can clearly observe that the EAMO has a very good performance overcoming all the remaining methods, such as ABC and AMO.
modified EAMO, denoted by EAMO1, by using the newly proposed animal migration operation and the original population updating operation of the AMO.The second one, denoted by EAMO2, is obtained by using the newly proposed population updating operation and the original animal migration operation of the AMO.Similar to the experiment in the previous section, 12 data sets are tested with 30 runs for EAMO, EAMO1, and EAMO2.Parameter settings are the same as in the previous experiment.We calculate best and average results as well as the standard deviation to compare with the results of EAMO.Table 4 shows the results of those experiments.
From Table 4, it is clear that EAMO1 and EAMO2 are better than AMO, as they obtain better solutions for most data sets.Between EAMO1 and EAMO2, EAMO1 performs better, as we can see that both mean and best solutions produced by EAMO1 are better than those of EAMO2.On the other hand, EAMO is better than EAMO1 since it performs best on mean results for 10 data sets, whereas EAMO1 performs best only for 3 data sets, and the same mean result is obtained for the data set Survival.Furthermore, EAMO has a good ability to find best results, as it finds the best solutions of 11 data sets, while there are only 6 data sets for which EAMO1 finds the best solutions.Therefore, it can be concluded that both entropy-based heuristic operations make efforts on improving searching effectiveness.
To show statistical results of all data sets, we record the best run by all algorithms for each data set and calculate relative percentage deviation of each run by using Equation (11) where is the clustering result of a run of an algorithm for a data set, is the best result of all runs of all algorithms for the data set, and is the percentage result of .By doing so, results of all data sets can be compared together.After that, we show the results made by those algorithms are statistically significant by plotting 95% confidence intervals for the algorithm factor, which is depicted in Figure 3. From it, we can clearly observe that the EAMO has a very good performance overcoming all the remaining methods, such as ABC and AMO.The best values are indicated in bold type.

Conclusions
In this paper, we present a new AMO algorithm for clustering problems.The information entropy of data in the clustering problems is employed as a heuristic for optimization in the new algorithm.In order to speed up convergence of the proposed algorithm and improve the entire searching performance, we take an alternative manner in the migration step to yield new positions of individuals, where individuals not only move to or away from its neighborhood, but also make movements according to a good individual selected from the entire population.We employ the information entropy of each attribute to determine the probability of moving directions (moving according to the neighbor or the good individual).Furthermore, the population updating method is also modified with similar techniques in migration.Intensive experiments were performed to evaluate effectiveness.The proposed EAMO is tested on 12 well-known benchmark problems from the UCI Machine Learning Repository.Results are analyzed intensively by comparing with both basic algorithms and recently proposed algorithms.The comparison results show that the proposed EAMO algorithm can obtain better solutions than other existing algorithms.In future work, we will consider using information entropy to measure the correlation between attributes to improve clustering algorithms and recent metaheuristic algorithms to solve clustering problems, such as MBO and EHO.

Figure 1 .
Figure 1.The concept of the local neighborhood of an individual.Figure 1.The concept of the local neighborhood of an individual.

Figure 1 .
Figure 1.The concept of the local neighborhood of an individual.Figure 1.The concept of the local neighborhood of an individual.

Figure 2 .
Figure 2. The flowchart of the information entropy-based animal migration optimization algorithm.

Figure 2 .
Figure 2. The flowchart of the information entropy-based animal migration optimization algorithm.

Figure 3 .
Figure 3. ANOVA tests of the results of all algorithms.

Figure 3 .
Figure 3. ANOVA tests of the results of all algorithms.
better than X i 13 end for 14 memorize the best solution achieved so far 15 end while 16 end Algorithm 2. Information Entropy-based Animal Migration Optimization (EAMO) algorithm 1 begin 2 evaluate the information entropy for attributes, and calculate the entropy vector 3 set the generation counter = 0, and randomly initialize

Table 1 .
Main characteristics of benchmark data sets.

Table 2 .
Comparison on the results of basic algorithms and Entropy-based Animal Migration Optimization (EAMO).
´3The best values are indicated in bold type.

Table 3 .
Comparison on the results of recent clustering algorithms and EAMO.

Table 4 .
The results for analyzing entropy-based heuristics in EAMO.