A Novel Fuzzy Entropy-Based Method to Improve the Performance of the Fuzzy C-Means Algorithm

One of the main drawbacks of the well-known Fuzzy C-means clustering algorithm (FCM) is the random initialization of the centers of the clusters as it can significantly affect the performance of the algorithm, thus not guaranteeing an optimal solution and increasing execution times. In this paper we propose a variation of FCM in which the initial optimal cluster centers are obtained by implementing a weighted FCM algorithm in which the weights are assigned by calculating a Shannon Fuzzy Entropy function. The results of the comparison tests applied on various classification datasets of the UCI Machine Learning Repository show that our algorithm improved in all cases relating to the performances of FCM.


Introduction
The Fuzzy C-means (FCM) algorithm [1,2] is a well-known partitive fuzzy clustering algorithm which adopts the Euclidean metric for calculating distances and detects cluster centers as points. Unlike partitive crisp algorithms like K-means [3], FCM objects are allowed to belong to more than one cluster and handle uncertainty in assigning them to a cluster.
FCM applies an iterative process with the aim of minimizing an objective function in order to detect the cluster centers assigning the membership degrees of the objects to them. As in K-means, the number of clusters is set a priori and initially the membership degrees of the objects to the clusters are randomly assigned. At each iteration the cluster centers and the partition matrix are recalculated, minimizing an objective function. The process ends when the absolute difference between the partition matrix and the one calculated in the previous cycle is less than a prefixed threshold.
The initialization phase significantly affects the algorithm's performances, influencing its accuracy and running time. In fact, based on the initial choice of the partition matrix or clusters, the algorithm can quickly converge towards a global optimum or get trapped in a local optimum.
Many variations of FCM were worked out in order to optimize the initialization phase. Some authors [4][5][6] apply the Subtractive Clustering algorithm to find the initial cluster however, this method requires to set a priori the mountain peak and the mountain radius parameters.
In [7], an initialization method based on a density cluster algorithm is proposed. Recently metaheuristic methods have been proposed in the literature to optimize the initialization of the cluster centers. A Genetic Algorithm (GA) is applied to a kernel FCM algorithm in [8] in order to find the initial cluster centers. In [9], a GA is applied to find the optimal initial cluster centers in image segmentation. In [10], a Particle Swarm Optimization algorithm is applied to find the optimal FCM initial cluster centers for sentiment clustering. Three metaheuristic hybrid FCM algorithms based, respectively, on Differential Evolution, GA, and Particle Swarm Optimization (PSO) are used in [11] to find the optimal initial cluster centers.
However, the use of metaheuristic algorithms, despite contributing to improve the accuracy of the solutions obtained by running FCM, can produce a heavy increase in computational load.
In this research we propose a hybrid FCM algorithm in which a fuzzy entropy algorithm is applied to initialize the cluster centers.
The aim of this research is to improve FCM in classification problems finding the initial clusters by using a weighted FCM algorithm in which a fuzzy entropy function is calculated for setting the weight's value. This fuzzy entropy function is applied to measuring the fuzziness of an object in assigning it to the clusters.
In the literature some variations of the FCM algorithm that include entropy functions are proposed in order to increase the FCM performances.
In [12], a fuzzy clustering method combining the possibility of C-means clustering [13] and a fuzzy entropy function is proposed in order to make the algorithm robust to noise. A variation of FCM based on information entropy is proposed in [14] in order to find the optimal number of clusters. The authors introduce a weighting parameter in the objective function to adjust the location of cluster centers and reduce the influence of noise. Two entropy-based FCM variations in which an entropy function is applied for feature extraction are proposed in [15,16]. In [16], the entropy weight method is applied to delete features with less information, then, the K-means and the One-Hot encoding algorithms are used to initialize the membership matrix. A new entropy based FCM algorithm is applied in [17] for the segmentation of brain MR images. The objective function is modified adding a Shannon Entropy component applied to redress the uncertainty among the pixels. In [18], the authors apply the maximum entropy method to find the optimal kernel weights in a kernel FCM algorithm. An entropy-based FCM algorithm is applied in [19] to cluster webpages in a PageRank process. The weight of each cluster is set by using an information entropy method, which is then used to adjust the average weight. An improved entropy-based FCM algorithm is applied in [20] to detect epistasis whereby a cross-entropy method is used to calculate the distances.
In this research, we used the De Luca and Termini fuzzy entropy definition given in [21,22] to measure the fuzziness that models the membership degree of an object to the clusters. We applied a weighted FCM algorithm in the initialization phase to optimize the initial choice of the cluster centers whereby the weight assigned to an object was greater the lower its fuzziness. We tested our algorithm on various classification training sets to measure its performances and compare the results with the ones obtained using FCM.
In Section 2 the FCM algorithm is presented and in Section 3 the Entropy weighted FCM (EwFCM) algorithm is presented. Section 4 shows the results of our experiments. Final considerations are discussed in Section 5.

FCM Algorithm
Let X = {x 1 , . . . , x N }⊂R n be a set of N objects in the n-dimensional space R n where x j = (x j1 , . . . , x jn ) and V = {v 1 , . . . ,v C } ⊂ R n be the set of centers of the C clusters. Let U be the C × N partition matrix where u ij is the membership degree of the jth object x j to the ith cluster.
The FCM algorithm [1,2] is based on the minimization of the following objective function: where d ij = x j − v i is the Euclidean distance between the center v i of the ith cluster and the jth object x j and m∈[1,+∝) is the fuzzifier parameter (a constant which affects the membership values and defines the degree of fuzziness of the partition). For m = 1, FCM become a Hard C-means clustering whereby the more m tends towards +∝ the more the fuzziness level of the clusters grows.
Considering the following constraints: and applying the Lagrange multipliers, we obtain the following solutions for Equation (1): and An iterative process is proposed in [2] as follows: Initially the membership degrees are assigned randomly and in each iteration the cluster centers are calculated by Equation (4), then the membership degree components are calculated by Equation (5). The iterative process stops at the tth iteration when: where ε > 0 is a parameter assigned a priori to stop the iteration process and Electronics 2020, 9, x FOR PEER REVIEW 3 of 11 and applying the Lagrange multipliers, we obtain the following solutions for Equation (1): and An iterative process is proposed in [2] as follows: Initially the membership degrees are assigned randomly and in each iteration the cluster centers are calculated by Equation (4), then the membership degree components are calculated by Equation (5). The iterative process stops at the tth iteration when: where  > 0 is a parameter assigned a priori to stop the iteration process and A variation of the FCM algorithm is the weighted FCM (wFCM) algorithm in which a weight defines the influence of the object to the solutions.
The objective function in wFCM is the given by: The partition matrix U and the cluster centers V are obtained minimizing this objective function by using the Lagrange multipliers. The solution for the components uij of the partition matrix are given by Equation (5). The cluster centers are given by: (9) in which the weight wj provides the degree of influence of the jth object to find the cluster centers. A set of wFCM-based cluster methods was proposed by some researchers (see, for example, [23][24][25][26]) in order to reduce the number of objects in massive datasets assigning to an object a weight based on the density of near data [23] and to encode pixel's local information in image segmentation activities [24,25]. In [26] some variations of wFCM are proposed in order to handle very large data.
Formally, at any cycle the weight wj assigned to the object xj, j = 1,...,N is calculated by using a weight function w(xj).
The pseudocodes of the FCM (Algorithm 1)and wFCM algorithms (Algorithm 2) are shown below.
Initialize randomly the partition matrix U 3.
A variation of the FCM algorithm is the weighted FCM (wFCM) algorithm in which a weight defines the influence of the object to the solutions.
The objective function in wFCM is the given by: The partition matrix U and the cluster centers V are obtained minimizing this objective function by using the Lagrange multipliers. The solution for the components u ij of the partition matrix are given by Equation (5). The cluster centers are given by: (9) in which the weight w j provides the degree of influence of the jth object to find the cluster centers. A set of wFCM-based cluster methods was proposed by some researchers (see, for example, [23][24][25][26]) in order to reduce the number of objects in massive datasets assigning to an object a weight based on the density of near data [23] and to encode pixel's local information in image segmentation activities [24,25]. In [26] some variations of wFCM are proposed in order to handle very large data.
Formally, at any cycle the weight w j assigned to the object x j, j = 1, . . . ,N is calculated by using a weight function w(x j ).
The pseudocodes of the FCM (Algorithm 1) and wFCM algorithms (Algorithm 2) are shown below.
Initialize randomly the partition matrix U 3. Repeat 4.
Calculate w j, j = 1, . . . ,N by using a weight function w(x j )

The Entropy Weighted FCM Algorithm
De Luca and Termini [21,22] introduce the concept of fuzzy entropy as a measure of the degree of fuzziness of a fuzzy set and they define the properties of any fuzzy entropy.
Let X = {(x s , u s ), s = 1, . . . ,S} be a fuzzy set with u s membership degree of x s to X. The fuzzy entropy of X is given by: De Luca and Termini in [21] propose the following fuzzy entropy function as a measure of fuzziness of a membership degree: We use Equation (12) to measure the fuzziness in the assigning the object x j = (x j1 , . . . , x jn ) to the ith cluster, given by: where u ij is the membership degree of the jth object to the ith cluster. The mean fuzziness assigned to x j is given by: H(x j ) is calculated to assign the weight w j to the jth object in Equation (8). The constant K has been set to the value 1/C in order to normalize the mean fuzziness H(x j ).
Due to the existence of the constraint in Equation (2), the maximum fuzziness value h max ≤ 1 is obtained when the membership degree of the object to each cluster is equal to 1/C, while the minimum value is 0, and is obtained when the object belongs with a membership degree equal to 1 to a cluster and membership degrees equal to 0 to all others.
Since the relevance of the object must be greater the lower its fuzziness, we assign the following weight value: This formula for wj is due to the choice of a linear dependence of the weight on the fuzziness. Many other choices are possible, as w j = 1/H(x j ) however, it has the risk of assigning very high weight values to objects with low or evanescent fuzziness among which outliers or noisy objects can be confused since FCM is sensitive to the presence of noise and outliers in the data, the risk of increasing the influence of noise and outliers in the detection of the final cluster may not be negligible.
We define the clustering mean fuzziness H given by: H take values in [0, h max ] and is initially high, but its value is reduced along the iterations, in which it is given more importance to objects closer to the cluster centers.
The initial cluster centers are found when absolute difference between the mean fuzziness calculated in the current cycle and the one calculated in the previous cycle is below a fixed threshold η or when the number of iterations is equal to a maximal number of iterations i max .
The EwFCM algorithm (Algorithm 3) is schematized below: Initialize randomly the partition matrix U 3.
Calculate u ij, i = 1, . . . ,C; j = 1, . . . ,N by using Equation (5) 12. n iter := n iter +1 We tested the EwFCM algorithm on two well-known datasets of the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets.html), Iris and Wine, comparing the performance with those obtained by executing the FCM algorithm. In order to make complete classification performance assessments we measured the accuracy, precision, recall (or sensitivity), and F1 score indexes where F1 score is the weighted average of precision and recall.
The four indexes are given by [27,28]: where: • TP (True Positive) is the number of data correctly assigned to the class; To measure the four indices, each fuzzy cluster is labeled with the name of the class in the training dataset to which the largest number of objects assigned to the cluster belong.

Simulation Results
We compared the performance of FCM and EwFCM on over 40 datasets of the UCI machine learning repository. The fuzzifier parameter m was set to 2 and the end iteration parameter ε was set to 0.01. Furthermore, by means of suitable calibration performed on various datasets by varying the value of the parameters η and i max , the entropy threshold parameter was set to 0.05 and the maximum number of cycles was set to 50. For our experiments, we used an Intel core I5 3.2 GHz processor. The FCM algorithm was executed 10 times and the performance results obtained by running EwFCM were compared with the best performance results obtained by running FCM.
For brevity we present the complete results obtained for two datasets: Iris and Wine. The Iris dataset was composed of 150 objects with four numerical features describing three types of iris flowers: Iris setosa, Iris virginica, and Iris versicolor. Although Iris setosa is linearly separable from the other two classes, Iris virginica and Iris versicolor are not linearly separable from each other.
We executed the two algorithms setting the number of clusters C to 3.
In Tables 1 and 2 show the results obtained by executing, respectively, FCM and EwFCM. The four metrics are calculated for each class. The last column shows, for each metric, the average value among those calculated for each class. As the two tables show, the mean values of all the performance metrics obtained by running the EwFCM algorithm were higher than those obtained by running the FCM algorithm. In particular, all the metrics calculated by using EwFCM for the classes Iris versicolor and Iris virginica were better than the correspondent ones obtained by using the FCM algorithm. Table 3 shows the results of the comparison of the EwFCM with other FCM variations aimed at optimal search of the initial cluster centers. Table 3 shows the mean accuracy, precision, and recall obtained by using FCM, EwFCM, Density-based FCM [7] Kernel-based FCM [8], and PSO FCM [10]. These results show that the best results were obtained by using EwFCM and PSO. The values of accuracy, precision, recall, and F1 score obtained running the two algorithms were very similar.
To evaluate the computational cost, we measured the number of iterations and running time. Table 4 shows the number of iterations and running time measured using FCM, EwFCM Density FCM, Kernel FCM, and PSO FCM. The number of iteration and the running time calculated using FCM were the minimum and average values obtained by executing the algorithm 10 times. These results show that EwFCM had a better computational cost than FCM. While the number of iterations obtained by running EwFCM was approximately half that obtained by running the FCM algorithm, the running time was slightly less. This happens in consideration of the time taken in the initialization phase by the EwFCM algorithm. EwFCM and PSO FCM provided the least number of iterations but the running time in PSO FCM was slightly longer than in EwFCM, as PSO FCM took longer than EwFCM to complete the initialization phase. Now we present the results of the comparison tests performed on the wine dataset.
The wine dataset was given by 178 objects with 13 numerical features representing the chemical composition of wine. The dataset contains data of chemical composition of Italian wine derived from three different crops: It is partitioned in three classes whose belong 59, 71, and 48 objects, respectively. We executed the two algorithms setting the number of clusters C to 3. Tables 5 and 6 show the resultant four indicators obtained by using FCM and EwFCM, respectively. The four metrics are calculated for each class and the last column shows, for each metric, the average value among those calculated for each class.
These results confirm the ones obtained by applying FCM and EwFCM on the Iris dataset: The values of all the performance indicators obtained by running the EwFCM algorithm were higher than those obtained by running the FCM algorithm. This trend was obtained for all training sets used in our experiments. Table 7 shows the mean accuracy, precision, and recall obtained for the dataset wine by using FCM, EwFCM, Density-based FCM, Kernel-based FCM, and PSO FCM. For the results of the tests performed on the Iris dataset and Wine dataset, the optimal values of mean accuracy, precision, recall, and F1-score were obtained using the algorithms EwFCM and PSO FCM. Table 8 shows the computational cost results obtained by running all algorithms. Similarly to the previous test as well as in this case, the least number of iterations was obtained running EwFCM and PSO FCM. It was almost half than that used by FCM and the running time in EwFCM was slightly lower than in FCM, Density FCM, Kernel FCM, and PSO FCM. Table 9 shows, for each metrics, the mean values of the difference between the measure obtained applying, respectively, EwFCM, Density FCM, Kernel FCM, and PSO FCM and the one obtained by applying FCM. These results show that the EwFCM, Density FCM, Kernel FCM, and PSO FCM algorithms, applied to classification datasets of different size and number of features, produced a better performance than the FCM algorithm in obtaining accuracy, precision, recall, and F1-score measurements with an average of more than 2% than that obtained by running the FCM algorithm. In particular, EwFCM and PSO FCM had the best performance with the mean difference of the accuracy being near 3% and the mean difference of the precision, recall, and F1 score metrics being near 4%. Table 10 summarizes the computational cost results obtained for all datasets by running, respectively, EwFCM, Density FCM, Kernel FCM, and PSO FCM. The minimum, mean, and maximum values of the ratio are shown between the number of iterations and what was obtained by running FCM as well as the ratio between the running time and that obtained by running FCM. Here we have considered, for each experiment, the minimum number of iterations and the shortest running time obtained by executing the FCM algorithm ten times. The results in Table 10 show that the number of iterations in EwFCM and PSO FCM was about half that of FCM. Furthermore, EwFCM had a slightly shorter running time than Density FCM, Kernel FCM, and PSO FCM.

Conclusions
We presented a variation of the FCM in which a using a fuzzy entropy function was used in the initialization phase to optimize the assignment of cluster centers. This allowed us to overcome the FCM algorithm problem of detecting local solutions by randomly assigning the initial positions of the clusters. To set the initial cluster centers we used the FCM weighted algorithm in which the weight assigned to each object was greater the smaller its fuzziness. The results of the comparative tests performed on over 40 datasets in the UCI machine learning repository showed that the performance of the proposed algorithm was better than that obtained by applying the FCM algorithm. The mean difference of the accuracy was about 3% and the mean difference of the precision, recall, and F1 score metrics was near 4%. The results of comparisons with other FCM-based algorithms aimed to optimize the initialization phase showed that EwFCM and PSO FCM provided the best performances in terms of accuracy, prediction, recall, and F1-score. Furthermore, the number of iterations in EwFCM and PSO FCM was about half that in FCM. Finally, EwFCM had a slightly shorter running time than Density FCM, Kernel FCM, and PSO FCM.
Further tests are needed in the future. We intend to improve the proposed algorithm by exploring its use on massive datasets and testing the use of robust weight functions with respect to the presence of noise and outliers in the data. We also intend to carry out performance comparison tests of EwFCM with respect to other well-known machine learning clustering algorithms.