1. Introduction
A validity index is a measure applied in fuzzy clustering to evaluate the compactness of clusters and the separability among clusters.
Numerous validity indices have been applied to measure the compactness and separateness of clusters detected by applying the fuzzy Cmeans (FCM) algorithm [
1,
2].
The two wellknown main drawbacks of the FCM are the random setting of the initial clusters and the requirement of assigning the number of clusters in advance. The initial selection of the cluster centers can affect the performances of the algorithm in terms of efficiency and number of iterations needed to obtain the convergence. Moreover, the quality of the final fuzzy clusters depends on the choice of the number of clusters, then, it is necessary to use a validity index to evaluate what is the optimal number of clusters.
A simple technique applied to solve these problems is to execute the clustering algorithm several times, varying the initial centers of the clusters and the number of clusters, and to choose the optimal clustering using a validity index to measure the quality of the final clustering. However, this technique can be computationally expensive as the clustering algorithm has to be run many times.
In References [
3,
4], a technique is proposed which is based on the subtractive clustering algorithm to initialize the clusters, but this method needs to set the maximum peak and the maximum radius parameters.
In Reference [
5], a technique, called
Fuzzy Silhouette, is proposed: this method generalizes the Average Silhouette Width Criterion [
6] applied for evaluating the quality of crisp clustering. The authors of Reference [
5] show that the proposed validity measure, unlike other wellknown validity measures, such as Fuzzy Hypervolume and Average Partition Density [
7] and the XieBeni [
8] index, can be used as an objective function of an evolutionary algorithm to automatically find the number of clusters; however, this approach requires running FCM many times for each cluster number selection.
In Reference [
9], a new optimization method based on the density of the grid cells is proposed to find the optimal initial cluster centers and number of clusters: this approach can reduce run times in highdimensional clustering.
The Kmeans algorithm is used in Reference [
10] to initialize the centers of the clusters; then, the Partition Coefficient [
1,
11] and Partition Entropy [
12] validity measures are calculated to find the optimal number of clusters. The drawback of this method is that it is highly time consuming and it can be unsuitable for managing massive datasets.
Some authors propose hybrid FCM variations in which metaheuristic approaches are applied to optimize the initialization of the cluster centers. In Reference [
13], a kernel FCM algorithm is proposed in which an evolutive method is applied in order to find the initial cluster centers. A Genetic Algorithm (GA) is proposed in Reference [
14] to find the optimal initial FCM cluster centers in image segmentation problems. A Particle Swarm Optimization (PSO) algorithm is proposed in Reference [
15] to find the optimal FCM initial cluster centers for sentiment clustering. Three hybrid FCM algorithms, based on Differential Evolution, GA, and PSO methods, are proposed in Reference [
16] to optimize the cluster centers’ initialization. These algorithms, while guaranteeing a higher quality of results, require too long execution times, and they too are unsuitable for handling highdimensional data.
In this paper, we propose a FCM variation in which a new validity index based on the De Luca and Termini Fuzzy Entropy and Fuzzy Energy concepts [
17,
18] is used to optimize the initialization of the clusters and to find the optimal number of clusters. Our aim is to reach a tradeoff between the time consumption and the quality of the clustering algorithm.
Recently, a weighted FCM variation based on the De Luca and Termini Fuzzy Entropy was proposed in order to optimize the initialization of the cluster centers in Reference [
19]. To initialize the cluster centers, the authors initially execute a weighted FCM algorithm, in which the weight assigned to a data point is given by a fuzziness measure obtained by calculating the mean fuzzy entropy of the data point and then the initial cluster centers are found when the mean fuzzy entropy of the clustering converges as well.
The algorithm proposed in Reference [
19] is less timeconsuming than hybrid algorithms using metaheuristic approaches, but like the algorithm proposed in Reference [
10], it applies an iterative method of preprocessing to initialize cluster centers. Furthermore, it does not detect the optimal number of clusters that must be set in advance.
In the proposed algorithm, the validity measure of the quality of clustering based on the fuzzy energy and fuzzy entropy is calculated both in the preprocessing phase to find the optimal initial cluster centers and to determine the optimal number of clusters. We set the number of clusters and randomly assign cluster centers several times, by choosing as initial cluster centers those for which the clustering validity index is greatest; finally, the FCM algorithm runs. We repeat this process by increasing the number of clusters up to a maximum number. After obtaining the final clusters for each setting of the number of clusters, we choose the one with the largest validity index.
In
Section 2, we give a brief review on the Fuzzy Energy and Fuzzy Entropy measures of a fuzzy set and of the FCM algorithm. In
Section 3, we introduce the proposed FCM algorithm based on the fuzzy energy and entropybased validity index. In
Section 4, we present several experimental results to demonstrate the features of the proposed index by applying to FCM. In
Section 5, we present our conclusions.
3. The Proposed FCM Algorithm Based on a Fuzzy Energy and Entropy Validity Index
Let X = {x_{1}, …, x_{N}} be the set of data points with cardinality N. We consider the fuzzy set A_{i} ∊ F(X), where A_{i}(x_{j}) = u_{ij} is the membership degree of the jth data point to the ith cluster.
We propose a new validity index based on the fuzzy energy and fuzzy entropy measures to evaluate the compactness of clusters and the separability among clusters.
By using (1) and (3) respectively, we can evaluate the fuzzy energy and the fuzzy entropy of the i
th cluster, measuring the fuzzy entropy and the fuzzy energy of the fuzzy set A
_{i}, given by
where the fuzzy energy and entropy are normalized dividing them by the cardinality N of the dataset.
Fuzzy energy (13) measures the quantity of information contained in the ith cluster and fuzzy entropy (14) measures the fuzziness of the ith cluster, namely the quality of the information contained therein.
For example, a cluster with low fuzzy entropy has low fuzziness, so it is compact; however, if it also has a low fuzzy energy, then the information which it contains is low. Hence, even if compact, a very small number of data points will belong to this cluster and this could be due to the presence of noise or outliers in the data. Moreover, a cluster with a high value of fuzzy entropy has high fuzziness and low compactness.
We set the function (2) as fuzzy energy function, where p is given by the value of the fuzzifier parameter. The fuzzy entropy function h(u) is given by the Shannon function (5).
We measure the energy and the entropy of the clustering given by the averages of the energy and entropy of the C clusters:
and
respectively. The proposed validity index, called Partition EnergyEntropy (PEH), is given by the difference between the energy and the entropy of the clustering:
This index varies in the range [−1, 1], the optimal clustering is the one that maximizes PEH, and the greater the value of PEH, the more the clusters are compact and well separated from each other.
We propose a new algorithm, called PEHFCM, in which the PEH index is used to initialize the cluster centers and to find the optimal number of clusters.
In addition to the fuzzifier and iteration error threshold parameters, further arguments of the algorithm are the maximum number of clusters, Cmax, and the number of random selections of initial C clusters, Smax. The PEHFCM algorithm is composed of a For loop in which the number of clusters is initially set to 2 and then cyclically iterated until the Cmax value is reached. In each cycle, Smax sets of cluster centers are initially selected, for each of which the PEH index is calculated. The optimal set of initial cluster centers is the one for which the PEH indicator is maximum. Subsequently, a variation of the FCM algorithm is performed, called FCMV, which, unlike FCM, uses the set of initial cluster centers V^{0} as a further argument instead of setting it randomly. Finally, the PEH index of the final clustering is calculated.
The PEHFCM algorithm returns the optimal number of C* clusters and the respective sets of cluster centers V* and partition matrix U* corresponding to the highest PEH validity index.
Below, we show the algorithm PEHFCM (Algorithm 2) and the algorithm FCMV (Algorithm 3), called PEHFC.
Algorithm 2:PEHFCM. 
Input: Dataset X = {x_{1}, …, x_{N}} Output: Cluster centers V = {v_{1}, …, v_{C}}; Partition matrix U, optimal number of clusters C* Arguments: max num of clusters, Cmax; max num of random selections of the initial cluster centers, Smax, fuzzifier p; stop iteration threshold ε Set p, ε, Cmax to the values of the arguments C*:= 1 PEH*:= –1 For c = 2 to Cmax For k = 1 to Smax Set randomly the partition matrix U Calculate the value of the cluster centers v_{i} by (9) i = 1, …, c Calculate E by (15) Calculate H by (16) PEH: = E − H If PEH > PEH* Then V^{0}:= V_{k} End if PEH*:= −1 Call FCMV(X, V^{0}, p, ε, C) Calculate E by (15) Calculate H by (16) PEH: = E – H If PEH > PEH* Then PEH*: PEH C*: c V*: V U*: = U End if Next c ReturnU*, V*, C*

Algorithm 3:FCMV 
Input: Dataset X = {x_{1}, …, x_{N}} Initial cluster centers V^{0} = {v_{1}^{0}, …, v_{C}^{0}} Output: Cluster centers V = {v_{1}, …, v_{C}}; Partition matrix U Arguments: Initial cluster centers V^{0} = {v_{1}^{0}, …, v_{C}^{0}}; number of clusters C; fuzzifier p; stop iteration threshold ε Set p, ε, C to the values of the arguments v_{i}:= v_{i}^{0} i = 1, …, C Calculate u_{ij} i = 1,…,C j = 1, …, N by using (10) Repeat Calculate v_{i}:= i = 1, …, C by using (9) Calculate u_{ij} i = 1, …, C j = 1, …, N by using (10) Until$\left{\mathit{U}}^{\left(t\right)}{\mathit{U}}^{\left(t1\right)}\right>\epsilon $ Return V,U

We can evaluate the computational complexity of PEHFCM, considering that the computational complexity of the FCM algorithm is by O(N·n·c^{2}·I), where N is the number of objects, n their dimension, c the number of clusters, and I is the number of iterations.
In PEHFCM, for not high Smax values, it is possible to neglect the complexity of the computation of energy and entropy measures of the initial Smax cluster centers, approximating the computational complexity by O(N·n·c^{2}·I·Cmax), where Cmax is the maximum number of clusters and I is the mean number of iterations of each FCM execution.
Then, PEHFCM has the same computational complexity of the FCM in which the measurement of a validity index is performed to calculate the optimal number of clusters.
Moreover, due to the problem of initialization of cluster centers, FCM is generally performed several times, increasing its computational complexity; on the other hand, PEHFCM does not need to be executed several times as the algorithm determines the initial centers of the optimal clusters.
To measure the performances of the proposed algorithm, we compare the results with the ones obtained by applying FCM and applying our method by using other wellknown validity indices: Partition Coefficient (PC) [
1,
11], Partition Entropy (PE) [
12], Fukuyama and Sugeno (FS) [
20], XieBeni (XB) [
8], and Partition Coefficient And Exponential Separation (PCAES) [
21], described below.
The PC validity index is given by the formula:
It measures the crispness of the clusters. The value C* is obtained when PC is maximum.
The PE validity index is given by:
It measures the mean fuzziness of the clusters, and the optimal number of clusters, C*, is obtained when PE is minimum.
The FS validity index is given by:
where
$\overline{v}$ is the average of the cluster centers. The first term in (20) measures the compactness of the clusters, the other one the separability among the same clusters. The optimal number of clusters, C*, is obtained when FS is maximum.
The XB validity index is given by the formula:
The numerator measures the compactness of the clusters, and the denominator indicates the separability between clusters. The optimal number of clusters, C*, is obtained when XB assumes the minimum value.
The PCAES validity index is given by the formula:
where the vector
$\overline{\mathit{v}}$ is the average of the cluster center. The first term in (22) measures the compactness of clusters, and the last term the separability among clusters. The optimal number of clusters, C*, is obtained when PCAES assumes the minimum value.
We complete our comparisons by comparing our method with hybrid metaheuristic algorithms.
The comparison tests are performed on wellknown UC Irvine (UCI) machine learning classification datasets (
http://archive.ics.uci.edu/ml/datasets.html). We measure the quality of the results in terms of accuracy, precision, recall, and F1score [
22,
23].
4. Results
We show the results obtained on a set of over 40 classification UCI machine learning datasets. In all experiments, we used an Intel core I5 3.2 GHz processor, m = 2, ε = 0.01, and Smax = 100.
For brevity, we only show in detail the results obtained on the wellknown Iris flower dataset. This dataset contains 150 data points with 4 features given by the length and the width of the sepals and petals measured in centimeters: 50 data points are classified as belonging to the type of Iris flower Iris Setosa, 50 data points to the type Iris Versicolor, and 50 data points to the Iris Virginica type. Only the class Iris Setosa is linearly separable from the other two, which are not linearly separable. We set the max number of clusters, Cmax, to 10. In
Figure 1, we show the values of the PEH index of the best initial cluster centers obtained for each setting of the number of clusters.
As can be seen from
Figure 1, the maximum values of the PEH index are obtained for C = 3 by varying the number of clusters.
Figure 2 shows that the number of iterations increases as the PEH value of the initial clustering decreases.
Figure 3 shows the trend of the number of iterations necessary to reach the convergence by varying the number of clusters in PEHFCM. The least number of iterations (12) is obtained for C = 3.
Like the PEH index of the final clustering, the number of iterations increases as the PEH value of the initial clustering decreases. In
Figure 4, we show the trend of the PEH in any iteration for C = 3.
The PEH index increases slightly, then increases rapidly after the 8th iteration and reaches a plateau at the 12th iteration. We compare the performances of the PEH index with the ones of the PC, PE, FS, and XB validity indices.
Table 1 shows the optimal number of clusters found using the validity index, the number of iterations necessary for the convergence, and the running time.
The best results are obtained by executing PEHFCM with respect to FCM + PC and FCM + PE (resp., FCM + FS and FCM + XB) when the optimal number of clusters obtained is 2 (resp., 3). In both cases, the least number of iterations and the shortest execution time are achieved using PEHFCM. In addition, we compare the results obtained by executing PEHFCM with the ones obtained via the entropybased weighted FCM algorithm (EwFCM) [
19] and the metaheuristic PSOFCM proposed in Reference [
15].
Table 2 shows the running time, the accuracy, precision, recall, and F1Score obtained by executing FCM + FS, FCM + XB, FCM + PCAES, PEHFCM, EwFCM, and PSOFCM.
The results in
Table 2 show that the best classification performances are given by EwFCM and PSOFCM. PEHFCM has the shortest running time and classification performances comparable with EwFCM and PSOFCM.
These results are confirmed by testing other UCI machine datasets. Here, we present the results obtained on the Wine dataset. This dataset is given by 178 data points having 13 features: each data point represents an Italian wine derived from a specific crop and their features provide information on its chemical composition. The dataset is partitioned in three classes, corresponding to three crops.
In
Table 3, we show the results obtained by considering the five validity indices.
Even in this case, PEHFCM provides the best number of iterations and running time.
Table 4 shows the running time and the classification performances of all the compared algorithms.
Also, here, the results obtained on the Wine dataset show that PEHFCM provides the shortest execution time and classification performances comparable to those obtained by using EwFCM and PSOFCM.
In
Table 5, the accuracy values obtained for some datasets used in our comparison tests are shown. These results confirm that the accuracy performances provided by PEHFCM are better than the ones provided by FCM + FS, FCM + XB, and FCM + PCAES, and are comparable to those provided by EwFCM and PSOFCM.
We summarize the results obtained on all the classification UCI machine learning datasets used in our tests, calculating:
 
The mean percent of gain (or loss) of running time. If T_{C} is a running time calculated by running a FCMbased method and T_{CPEH} is the one calculated with PEHFCM, this index is given by the average of the percentage of (T_{CPEH} − T_{C})/T_{CPEH}. This value is equal to 0 for PEHFCM.
 
The mean percentage gain (or loss) of a classification index. If I_{C} is a classification index value obtained by running a FCMbased method and I_{CPEH} is the one obtained with PEHFCM, this index is given by the average of the percentage of (I_{C} − I_{CPEH})/I_{CPEH}. This value is equal to 0 for PEHFCM.
If the value of a summarized index is positive, then, by executing the algorithm, we obtain a gain in terms of running time or of the classification index; conversely, we get a loss if that value is negative. In
Table 6, we show these results.
The results in
Table 6 show that PEHFCM provides the best running time; indeed, the running times measured executing the other FCMbased algorithms were more than 28% longer than the one obtained by executing PEHFCM. The gain of accuracy, precision, recall, and F1score obtained executing EwFCM and PSOFCM was less than 2%.