A New Validity Index Based on Fuzzy Energy and Fuzzy Entropy Measures in Fuzzy Clustering Problems

Two well-known drawbacks in fuzzy clustering are the requirement of assigning in advance the number of clusters and random initialization of cluster centers. The quality of the final fuzzy clusters depends heavily on the initial choice of the number of clusters and the initialization of the clusters, then, it is necessary to apply a validity index to measure the compactness and the separability of the final clusters and run the clustering algorithm several times. We propose a new fuzzy C-means algorithm in which a validity index based on the concepts of maximum fuzzy energy and minimum fuzzy entropy is applied to initialize the cluster centers and to find the optimal number of clusters and initial cluster centers in order to obtain a good clustering quality, without increasing time consumption. We test our algorithm on UCI (University of California at Irvine) machine learning classification datasets comparing the results with the ones obtained by using well-known validity indices and variations of fuzzy C-means by using optimization algorithms in the initialization phase. The comparison results show that our algorithm represents an optimal trade-off between the quality of clustering and the time consumption.


Introduction
A validity index is a measure applied in fuzzy clustering to evaluate the compactness of clusters and the separability among clusters.
Numerous validity indices have been applied to measure the compactness and separateness of clusters detected by applying the fuzzy C-means (FCM) algorithm [1,2].
The two well-known main drawbacks of the FCM are the random setting of the initial clusters and the requirement of assigning the number of clusters in advance. The initial selection of the cluster centers can affect the performances of the algorithm in terms of efficiency and number of iterations needed to obtain the convergence. Moreover, the quality of the final fuzzy clusters depends on the choice of the number of clusters, then, it is necessary to use a validity index to evaluate what is the optimal number of clusters.
A simple technique applied to solve these problems is to execute the clustering algorithm several times, varying the initial centers of the clusters and the number of clusters, and to choose the optimal clustering using a validity index to measure the quality of the final clustering. However, this technique can be computationally expensive as the clustering algorithm has to be run many times.
In References [3,4], a technique is proposed which is based on the subtractive clustering algorithm to initialize the clusters, but this method needs to set the maximum peak and the maximum radius parameters.
In Reference [5], a technique, called Fuzzy Silhouette, is proposed: this method generalizes the Average Silhouette Width Criterion [6] applied for evaluating the quality of crisp clustering. The authors of Reference [5] show that the proposed validity measure, unlike other well-known validity measures, such as Fuzzy Hypervolume and Average Partition Density [7] and the Xie-Beni [8] index, can be used as an objective function of an evolutionary algorithm to automatically find the number of clusters; however, this approach requires running FCM many times for each cluster number selection.
In Reference [9], a new optimization method based on the density of the grid cells is proposed to find the optimal initial cluster centers and number of clusters: this approach can reduce run times in high-dimensional clustering.
The K-means algorithm is used in Reference [10] to initialize the centers of the clusters; then, the Partition Coefficient [1,11] and Partition Entropy [12] validity measures are calculated to find the optimal number of clusters. The drawback of this method is that it is highly time consuming and it can be unsuitable for managing massive datasets.
Some authors propose hybrid FCM variations in which meta-heuristic approaches are applied to optimize the initialization of the cluster centers. In Reference [13], a kernel FCM algorithm is proposed in which an evolutive method is applied in order to find the initial cluster centers. A Genetic Algorithm (GA) is proposed in Reference [14] to find the optimal initial FCM cluster centers in image segmentation problems. A Particle Swarm Optimization (PSO) algorithm is proposed in Reference [15] to find the optimal FCM initial cluster centers for sentiment clustering. Three hybrid FCM algorithms, based on Differential Evolution, GA, and PSO methods, are proposed in Reference [16] to optimize the cluster centers' initialization. These algorithms, while guaranteeing a higher quality of results, require too long execution times, and they too are unsuitable for handling high-dimensional data.
In this paper, we propose a FCM variation in which a new validity index based on the De Luca and Termini Fuzzy Entropy and Fuzzy Energy concepts [17,18] is used to optimize the initialization of the clusters and to find the optimal number of clusters. Our aim is to reach a trade-off between the time consumption and the quality of the clustering algorithm.
Recently, a weighted FCM variation based on the De Luca and Termini Fuzzy Entropy was proposed in order to optimize the initialization of the cluster centers in Reference [19]. To initialize the cluster centers, the authors initially execute a weighted FCM algorithm, in which the weight assigned to a data point is given by a fuzziness measure obtained by calculating the mean fuzzy entropy of the data point and then the initial cluster centers are found when the mean fuzzy entropy of the clustering converges as well.
The algorithm proposed in Reference [19] is less time-consuming than hybrid algorithms using meta-heuristic approaches, but like the algorithm proposed in Reference [10], it applies an iterative method of pre-processing to initialize cluster centers. Furthermore, it does not detect the optimal number of clusters that must be set in advance.
In the proposed algorithm, the validity measure of the quality of clustering based on the fuzzy energy and fuzzy entropy is calculated both in the pre-processing phase to find the optimal initial cluster centers and to determine the optimal number of clusters. We set the number of clusters and randomly assign cluster centers several times, by choosing as initial cluster centers those for which the clustering validity index is greatest; finally, the FCM algorithm runs. We repeat this process by increasing the number of clusters up to a maximum number. After obtaining the final clusters for each setting of the number of clusters, we choose the one with the largest validity index.
In Section 2, we give a brief review on the Fuzzy Energy and Fuzzy Entropy measures of a fuzzy set and of the FCM algorithm. In Section 3, we introduce the proposed FCM algorithm based on the fuzzy energy and entropy-based validity index. In Section 4, we present several experimental results to demonstrate the features of the proposed index by applying to FCM. In Section 5, we present our conclusions.

Fuzzy Energy and Entropy Measures
Let X be a universe of discourse and F(X) = {A: X → [0, 1]} be the set of all fuzzy sets defined on X. Moreover, let A ∈ F(X) and B ∈ F(Y) be two fuzzy sets defined on the sets X and Y respectively, and let R ⊆ F[X × Y] be a fuzzy relation on X × Y.
In References [17,18], two categories of fuzziness measures of fuzzy sets are defined: fuzzy energy and fuzzy entropy. If X = {x 1 , . . . , x m } is a discrete set with cardinality m, the energy measure of fuzziness of the fuzzy set A ∈ F(X) is given by: where e: [0, 1] → [0, 1] is a continuous function called fuzzy energy function. The following restrictions are required for the function e: (1) e(0) = 0 (2) e(1) = 1 (3) e is monotonically increasing.
The simplest fuzzy energy function is given by the identity e(u) = u with u ∈ [0.1]. A more general formula for e(u) is: where p > 0 is a positive number. The minimal value of the fuzzy energy measure is 0 and the maximal value is given by E(A) = Card(X) = m, where Card(X) is the cardinality of the set X.
The energy measures of a fuzzy set A can be seen as a measure of information contained in this fuzzy set. If E(A) = 0, then A coincides with the empty set; if E(A) = m, then A coincides with the set X.
The entropy measure of fuzziness of the fuzzy set A is given by: where h: [0, 1] → [0, 1] is a continuous function called fuzzy entropy function. The following restrictions are required for the function h: The simplest fuzzy entropy function is given by: This fuzzy entropy function has a minimal value of 0 when u is 0 or 1 and a maximum value 1 when u = 1 2 . De Luca and Termini in Reference [18] propose the following fuzzy entropy function: This fuzzy entropy function has a maximum value 1 when u = 1 2 , and it is called Shannon's function. The entropy measures of a fuzzy set A can be seen as a measure of the fuzziness contained in this fuzzy set. If H(A) = 0, then for each element x i , i = 1, . . . , m, A(x i ) = 0 or A(x i ) = 1 and A coincides with a subset of the set X; if H(A) = m, then for each element x i , i = 1, . . . , m, A(x i ) = 1 2 and the fuzziness of A is maximum.
A problem is to find the fuzzy set from a family of fuzzy sets of F(X) with the highest information content and the lowest fuzziness.

Fuzzy C-Means Algorithm
Let X = {x 1 , . . . , x N } ⊂ R n be a set of N data points in the n-dimensional space R n , where x j = (x j1 , . . . , x jn ), and let V = {v 1 , . . . , v C } ⊂ R n be the set of centers of the C clusters. Let U be the C × N partition matrix, where u ij is the membership degree of the jth data point x j to the ith cluster v i .
The FCM algorithm [1,2] is based on the minimization of the following objective function: where d ij = x j − v i is the Euclidean distance between the center v i of the ith cluster and the jth object x j , p ∈ [1, +∝] is the fuzzifier parameter (a constant which affects the membership values and defines the degree of fuzziness of the partition). For m = 1, FCM become a hard C-means clustering; the more m tends towards +∝, the more the fuzziness level of the clusters grows. By considering the following constraints: and applying the Lagrange multipliers, we obtain the following solutions for (1): and An iterative process is proposed in Reference [2] as follows: initially the membership degrees are assigned randomly; in each iteration, the cluster centers are calculated by (4), then the membership degree components are calculated by (5). The iterative process stops at the tth iteration when where ε > 0 is a parameter assigned a priori to stop the iteration process and The pseudocodes of the FCM algorithm (Algorithm 1) are shown below. Initialize randomly the partition matrix U

The Proposed FCM Algorithm Based on a Fuzzy Energy and Entropy Validity Index
Let X = {x 1 , . . . , x N } be the set of data points with cardinality N. We consider the fuzzy set A i ∈ F(X), where A i (x j ) = u ij is the membership degree of the jth data point to the ith cluster.
We propose a new validity index based on the fuzzy energy and fuzzy entropy measures to evaluate the compactness of clusters and the separability among clusters.
By using (1) and (3) respectively, we can evaluate the fuzzy energy and the fuzzy entropy of the ith cluster, measuring the fuzzy entropy and the fuzzy energy of the fuzzy set A i , given by where the fuzzy energy and entropy are normalized dividing them by the cardinality N of the dataset. Fuzzy energy (13) measures the quantity of information contained in the ith cluster and fuzzy entropy (14) measures the fuzziness of the ith cluster, namely the quality of the information contained therein.
For example, a cluster with low fuzzy entropy has low fuzziness, so it is compact; however, if it also has a low fuzzy energy, then the information which it contains is low. Hence, even if compact, a very small number of data points will belong to this cluster and this could be due to the presence of noise or outliers in the data. Moreover, a cluster with a high value of fuzzy entropy has high fuzziness and low compactness.
We set the function (2) as fuzzy energy function, where p is given by the value of the fuzzifier parameter. The fuzzy entropy function h(u) is given by the Shannon function (5).
We measure the energy and the entropy of the clustering given by the averages of the energy and entropy of the C clusters: and respectively. The proposed validity index, called Partition Energy-Entropy (PEH), is given by the difference between the energy and the entropy of the clustering: This index varies in the range [−1, 1], the optimal clustering is the one that maximizes PEH, and the greater the value of PEH, the more the clusters are compact and well separated from each other.
We propose a new algorithm, called PEHFCM, in which the PEH index is used to initialize the cluster centers and to find the optimal number of clusters.
In addition to the fuzzifier and iteration error threshold parameters, further arguments of the algorithm are the maximum number of clusters, Cmax, and the number of random selections of initial C clusters, Smax. The PEHFCM algorithm is composed of a For loop in which the number of clusters is initially set to 2 and then cyclically iterated until the Cmax value is reached. In each cycle, Smax sets of cluster centers are initially selected, for each of which the PEH index is calculated. The optimal set of initial cluster centers is the one for which the PEH indicator is maximum. Subsequently, a variation of the FCM algorithm is performed, called FCMV, which, unlike FCM, uses the set of initial cluster centers V 0 as a further argument instead of setting it randomly. Finally, the PEH index of the final clustering is calculated.
The PEHFCM algorithm returns the optimal number of C* clusters and the respective sets of cluster centers V* and partition matrix U* corresponding to the highest PEH validity index.
We can evaluate the computational complexity of PEHFCM, considering that the computational complexity of the FCM algorithm is by O(N·n·c 2 ·I), where N is the number of objects, n their dimension, c the number of clusters, and I is the number of iterations.
In PEHFCM, for not high Smax values, it is possible to neglect the complexity of the computation of energy and entropy measures of the initial Smax cluster centers, approximating the computational complexity by O(N·n·c 2 ·I·Cmax), where Cmax is the maximum number of clusters and I is the mean number of iterations of each FCM execution.
Then, PEHFCM has the same computational complexity of the FCM in which the measurement of a validity index is performed to calculate the optimal number of clusters.
Moreover, due to the problem of initialization of cluster centers, FCM is generally performed several times, increasing its computational complexity; on the other hand, PEHFCM does not need to be executed several times as the algorithm determines the initial centers of the optimal clusters.
The PC validity index is given by the formula: It measures the crispness of the clusters. The value C* is obtained when PC is maximum. The PE validity index is given by: It measures the mean fuzziness of the clusters, and the optimal number of clusters, C*, is obtained when PE is minimum. For c = 2 to Cmax 5.
Set randomly the partition matrix U 7.
Calculate the value of the cluster centers v i by (9) i = 1, . . . , c 8.
Return V,U The FS validity index is given by: where v is the average of the cluster centers. The first term in (20) measures the compactness of the clusters, the other one the separability among the same clusters. The optimal number of clusters, C*, is obtained when FS is maximum. The XB validity index is given by the formula: The numerator measures the compactness of the clusters, and the denominator indicates the separability between clusters. The optimal number of clusters, C*, is obtained when XB assumes the minimum value.
The PCAES validity index is given by the formula: The numerator measures the compactness of the clusters, and the denominator indicates the separability between clusters. The optimal number of clusters, C*, is obtained when XB assumes the minimum value.
The PCAES validity index is given by the formula: where the vector ̄ is the average of the cluster center. The first term in (22) measures the compactness of clusters, and the last term the separability among clusters. The optimal number of clusters, C*, is obtained when PCAES assumes the minimum value. We complete our comparisons by comparing our method with hybrid metaheuristic algorithms.
The comparison tests are performed on well-known UC Irvine (UCI) machine learning classification datasets (http://archive.ics.uci.edu/ml/datasets.html). We measure the quality of the results in terms of accuracy, precision, recall, and F1-score [22,23].

Results
We show the results obtained on a set of over 40 classification UCI machine learning datasets. In all experiments, we used an Intel core I5 3.2 GHz processor, m = 2, ε = 0.01, and Smax = 100.
For brevity, we only show in detail the results obtained on the well-known Iris flower dataset. This dataset contains 150 data points with 4 features given by the length and the width of the sepals and petals measured in centimeters: 50 data points are classified as belonging to the type of Iris flower Iris Setosa, 50 data points to the type Iris Versicolor, and 50 data points to the Iris Virginica type. Only the class Iris Setosa is linearly separable from the other two, which are not linearly separable. We set the max number of clusters, Cmax, to 10. In Figure 1, we show the values of the PEH index of the best initial cluster centers obtained for each setting of the number of clusters. (22) where the vector v is the average of the cluster center. The first term in (22) measures the compactness of clusters, and the last term the separability among clusters. The optimal number of clusters, C*, is obtained when PCAES assumes the minimum value.
We complete our comparisons by comparing our method with hybrid metaheuristic algorithms. The comparison tests are performed on well-known UC Irvine (UCI) machine learning classification datasets (http://archive.ics.uci.edu/ml/datasets.html). We measure the quality of the results in terms of accuracy, precision, recall, and F1-score [22,23].

Results
We show the results obtained on a set of over 40 classification UCI machine learning datasets. In all experiments, we used an Intel core I5 3.2 GHz processor, m = 2, ε = 0.01, and Smax = 100.
For brevity, we only show in detail the results obtained on the well-known Iris flower dataset. This dataset contains 150 data points with 4 features given by the length and the width of the sepals and petals measured in centimeters: 50 data points are classified as belonging to the type of Iris flower Iris Setosa, 50 data points to the type Iris Versicolor, and 50 data points to the Iris Virginica type. Only the class Iris Setosa is linearly separable from the other two, which are not linearly separable. We set the max number of clusters, Cmax, to 10. In Figure 1, we show the values of the PEH index of the best initial cluster centers obtained for each setting of the number of clusters.
As can be seen from Figure 1, the maximum values of the PEH index are obtained for C = 3 by varying the number of clusters. Figure 2 shows that the number of iterations increases as the PEH value of the initial clustering decreases. Figure 3 shows the trend of the number of iterations necessary to reach the convergence by varying the number of clusters in PEHFCM. The least number of iterations (12) is obtained for C = 3. For brevity, we only show in detail the results obtained on the well-known Iris flower dataset. This dataset contains 150 data points with 4 features given by the length and the width of the sepals and petals measured in centimeters: 50 data points are classified as belonging to the type of Iris flower Iris Setosa, 50 data points to the type Iris Versicolor, and 50 data points to the Iris Virginica type. Only the class Iris Setosa is linearly separable from the other two, which are not linearly separable. We set the max number of clusters, Cmax, to 10. In Figure 1, we show the values of the PEH index of the best initial cluster centers obtained for each setting of the number of clusters.  As can be seen from Figure 1, the maximum values of the PEH index are obtained for C = 3 by varying the number of clusters. Figure 2 shows that the number of iterations increases as the PEH value of the initial clustering decreases. Figure 3 shows the trend of the number of iterations necessary to reach the convergence by varying the number of clusters in PEHFCM. The least number of iterations (12) is obtained for C = 3.  Like the PEH index of the final clustering, the number of iterations increases as the PEH value of the initial clustering decreases. In Figure 4, we show the trend of the PEH in any iteration for C = 3. As can be seen from Figure 1, the maximum values of the PEH index are obtained for C = 3 by varying the number of clusters. Figure 2 shows that the number of iterations increases as the PEH value of the initial clustering decreases. Figure 3 shows the trend of the number of iterations necessary to reach the convergence by varying the number of clusters in PEHFCM. The least number of iterations (12) is obtained for C = 3.  Like the PEH index of the final clustering, the number of iterations increases as the PEH value of the initial clustering decreases. In Figure 4, we show the trend of the PEH in any iteration for C = 3. Like the PEH index of the final clustering, the number of iterations increases as the PEH value of the initial clustering decreases. In Figure 4, we show the trend of the PEH in any iteration for C = 3.
The PEH index increases slightly, then increases rapidly after the 8th iteration and reaches a plateau at the 12th iteration. We compare the performances of the PEH index with the ones of the PC, PE, FS, and XB validity indices. Table 1 shows the optimal number of clusters found using the validity index, the number of iterations necessary for the convergence, and the running time.
The best results are obtained by executing PEHFCM with respect to FCM + PC and FCM + PE (resp., FCM + FS and FCM + XB) when the optimal number of clusters obtained is 2 (resp., 3). In both cases, the least number of iterations and the shortest execution time are achieved using PEHFCM. In addition, we compare the results obtained by executing PEHFCM with the ones obtained via the entropy-based weighted FCM algorithm (EwFCM) [19] and the metaheuristic PSOFCM proposed in Reference [15]. Table 2 shows the running time, the accuracy, precision, recall, and F1-Score obtained by executing FCM + FS, FCM + XB, FCM + PCAES, PEHFCM, EwFCM, and PSOFCM.  The PEH index increases slightly, then increases rapidly after the 8th iteration and reaches a plateau at the 12th iteration. We compare the performances of the PEH index with the ones of the PC, PE, FS, and XB validity indices. Table 1 shows the optimal number of clusters found using the validity index, the number of iterations necessary for the convergence, and the running time. The best results are obtained by executing PEHFCM with respect to FCM + PC and FCM + PE (resp., FCM + FS and FCM + XB) when the optimal number of clusters obtained is 2 (resp., 3). In both cases, the least number of iterations and the shortest execution time are achieved using PEHFCM. In addition, we compare the results obtained by executing PEHFCM with the ones obtained via the entropy-based weighted FCM algorithm (EwFCM) [19] and the metaheuristic PSOFCM proposed in Reference [15]. Table 2 shows the running time, the accuracy, precision, recall, and F1-Score obtained by executing FCM + FS, FCM + XB, FCM + PCAES, PEHFCM, EwFCM, and PSOFCM.   The results in Table 2 show that the best classification performances are given by EwFCM and PSOFCM. PEHFCM has the shortest running time and classification performances comparable with EwFCM and PSOFCM.
These results are confirmed by testing other UCI machine datasets. Here, we present the results obtained on the Wine dataset. This dataset is given by 178 data points having 13 features: each data point represents an Italian wine derived from a specific crop and their features provide information on its chemical composition. The dataset is partitioned in three classes, corresponding to three crops.
In Table 3, we show the results obtained by considering the five validity indices. Even in this case, PEHFCM provides the best number of iterations and running time. Table 4 shows the running time and the classification performances of all the compared algorithms. Also, here, the results obtained on the Wine dataset show that PEHFCM provides the shortest execution time and classification performances comparable to those obtained by using EwFCM and PSOFCM. In Table 5, the accuracy values obtained for some datasets used in our comparison tests are shown. These results confirm that the accuracy performances provided by PEHFCM are better than the ones provided by FCM + FS, FCM + XB, and FCM + PCAES, and are comparable to those provided by EwFCM and PSOFCM. We summarize the results obtained on all the classification UCI machine learning datasets used in our tests, calculating: - The mean percent of gain (or loss) of running time. If T C is a running time calculated by running a FCM-based method and T CPEH is the one calculated with PEHFCM, this index is given by the average of the percentage of (T CPEH − T C )/T CPEH . This value is equal to 0 for PEHFCM. - The mean percentage gain (or loss) of a classification index. If I C is a classification index value obtained by running a FCM-based method and I CPEH is the one obtained with PEHFCM, this index is given by the average of the percentage of (I C − I CPEH )/I CPEH . This value is equal to 0 for PEHFCM.
If the value of a summarized index is positive, then, by executing the algorithm, we obtain a gain in terms of running time or of the classification index; conversely, we get a loss if that value is negative. In Table 6, we show these results. The results in Table 6 show that PEHFCM provides the best running time; indeed, the running times measured executing the other FCM-based algorithms were more than 28% longer than the one obtained by executing PEHFCM. The gain of accuracy, precision, recall, and F1-score obtained executing EwFCM and PSOFCM was less than 2%.

Conclusions
We proposed a variation of FCM in which a validity index was based on the fuzzy energy and fuzzy entropy of the clustering, in order to find an optimal initialization of the cluster centers and the optimal number of clusters.
The proposed method represents a trade-off between the running time and the clustering performances: it aims to overcome the problems of initializing cluster centers and setting the number of clusters a priori, without, at the same time, requiring long execution times due to the pre-processing phase necessary to optimize the initialization of cluster centers.
The results of experimental tests applied on well-known UCI machine learning classification datasets showed that the PEHFCM algorithm provides shorter running times than the EwFCM and PSOFCM algorithms, which use an optimization method based on fuzzy entropy and a metaheuristic PSO-based method to determine the initial cluster centers, respectively. Furthermore, PEHFCM provides classifier performance comparable to EwFCM and PSO FCM.
Since the proposed method is FCM-based, its computational complexity depends on the number of data points and the number of features, and it may be unsuitable for managing massive and high-dimensional datasets. In the future, we intend to adapt PEHFCM to manage high-dimensional and massive datasets, in which it is essential to guarantee high performances both in terms of quality of results and execution times.