Fuzzy C-Means Clustering Algorithm with Multiple Fuzziﬁcation Coe ﬃ cients

: Clustering is an unsupervised machine learning technique with many practical applications that has gathered extensive research interest. Aside from deterministic or probabilistic techniques, fuzzy C-means clustering (FCM) is also a common clustering technique. Since the advent of the FCM method, many improvements have been made to increase clustering e ﬃ ciency. These improvements focus on adjusting the membership representation of elements in the clusters, or on fuzzifying and defuzzifying techniques, as well as the distance function between elements. This study proposes a novel fuzzy clustering algorithm using multiple di ﬀ erent fuzziﬁcation coe ﬃ cients depending on the characteristics of each data sample. The proposed fuzzy clustering method has similar calculation steps to FCM with some modiﬁcations. The formulas are derived to ensure convergence. The main contribution of this approach is the utilization of multiple fuzziﬁcation coe ﬃ cients as opposed to only one coe ﬃ cient in the original FCM algorithm. The new algorithm is then evaluated with experiments on several common datasets and the results show that the proposed algorithm is more e ﬃ cient compared to the original FCM as well as other clustering methods.


Introduction
Clustering is a machine learning technique that involves the grouping of data points into different clusters, where data points in the same cluster have a higher degree of similarity and any two data points in two different clusters have a lower degree of similarity.This technique improves the effectiveness of data mining, especially in big-data problems, as the data points are grouped into clusters with distinctive characteristics [1].There are many different clustering techniques, such as centroid-based, hierarchical, density-based, distribution-based and grid-based clustering.
In practical problems, some data elements may be missing information or containing uncertain information.The fuzzy set theory provides an appropriate method of representing and processing these types of data elements, along with the concept of membership function defined in the range [0,1].In this concept, each element can belong to more than one cluster.The fuzzy C-means (FCM) algorithm [2], a method of fuzzy clustering, is an efficient algorithm for extracting rules and mining data from a dataset in which the fuzzy properties are highly common [3].Many new methods based on the FCM algorithm were introduced, in order to overcome the limitations and improve the clustering ability of this algorithm in different cases.The performance of these new methods is summarized in [4].The new and improved FCM-based methods extend or modify the distance metric between elements and cluster center, use different fuzzy measures for the membership of an element to a cluster or modify the exponential parameter for fuzzifying [3,5].
An improvement direction is given by Hwang et al. in [6], which is the interval type-2 fuzzy C-means clustering (FCMT2I) algorithm.The study gives a fuzzy extension of the fuzzification coefficient m, thereby applying the type-2 fuzzy set into the membership matrix U.The parameter m in the FCMT2I algorithm can be any value in the range [m L , m U ].A few articles [7][8][9][10] discuss this approach further; after every iteration, an element belonging to the clusters is represented by an interval type-2 fuzzy set and further calculations are needed for type-reduction when calculating the cluster centers for the next steps.According to the type-reduction algorithm [6,10], the cluster centers are calculated separately with the exponential parameters m L , m U , and then combined through the defuzzification step.In essence, after the type-reduction algorithm is executed, the resulting cluster center value is the same as the value calculated using an exponential parameter m in the range [m L , m U ].
The exponential parameter m is used to adjust the membership level of the elements in the clusters and its value can vary with clusters and steps.Thus, the FCMT2I algorithm allows the use of different exponential parameters.
This study presents an approach that utilizes multiple different fuzzification coefficients as opposed to only one coefficient in the FCM algorithm.While in other past papers, the determination of fuzzification coefficients corresponds to the type-reduction calculations for the cluster centers, in this study, the fuzzification coefficients are determined for each element prior to calculations, depending on the characteristics of such element in the entire dataset.In a dataset, the density of each element, represented by its distance from the "neighboring" elements, is a useful measurement [6,11,12].This study uses that measurement to determine the exponential parameter for each element during the clustering process.The main contribution of this study is the proposition of the use and manipulation of multiple fuzzification coefficients to enhance the performance of the original FCM algorithm.The experimental evaluation of the efficiency of the proposed algorithm is conducted using several common datasets from the University of California, Irvine (UCI) machine learning repository.
The rest of this study is organized as follows: Section 2 presents the FCM Algorithm and improvement ideas; Section 3 describes the proposed novel MC-FCM algorithm; and Section 4 provides the experimental results for algorithm evaluation.

Standard Fuzzy C-Means (FCM) Algorithm
The FCM algorithm [2] partitions a finite collection of N elements X = {X 1 , X 2 , . . ., X N } into a collection of C clusters with respect to some given criteria.Each element X i ∈ X, i = 1, 2, . . ., N, is a vector with d dimensions.We define a way to divide X into C clusters with cluster centers V 1 , V 2 , . . ., V C in the centroid set V.
In the FCM algorithm, U is a representative matrix for the membership of each element in each cluster.The matrix U has some characteristics as below:

•
The larger U(i, k) is, the higher the degree of confidence that the element X i belongs to the cluster k.
An objective function is defined such that the clustering algorithm minimizes the objective function where, is the squared distance between the element X i and the cluster center V k .
• m is the fuzzification coefficient of the algorithm.

Summary of steps for the FCM algorithm:
Input: the dataset X = {X 1 , X 2 , . . ., X N }, the fuzzification coefficient m.
Output: the partition of X into C clusters. Steps: • Step 1: Initialize value for V, let l = 0, set ε > 0 and m > 1.

•
Step 2: At the l − th loop, update U according to the formula: (2)

Interval Type-2 Fuzzy C-Means (FCMT2I) Clustering
FCM is a breakthrough algorithm compared to other well-known clustering algorithms because FCM has a more general representation and more accurate clustering results in many cases.However, the FCM algorithm has some disadvantages:

•
The algorithm is sensitive to noise and foreign elements; • Clustering is not accurate and valid for elements located at the boundary between clusters; • There is no specific criterion to select the value for the parameter m, which is often selected after testing multiple times.
With these disadvantages, the FCM algorithm requires more research and development to be further improved.
A research [6] on the extension or modification of the distance metrics D(i, k) uses different fuzzy metrics for U(i, k) or discusses the exponential parameter m.The clustering decisions made in FCM is based on Equation (2), which presents the fuzzy membership of element X i that is assigned by the relative distance between X i and V k .This formula has a characteristic that is, the smaller m, the higher the absolute membership U, as shown in Figure 1 by the case of C = 2.For the elements on the cluster boundary, the absolute degree of U must be lower than the elements near the center of the cluster.The reason is that we cannot make a clustering decision with elements at the immediate boundary, so we need the U graph to be more flexible in terms of cluster selection for these elements.Meanwhile, the elements near the center of the cluster should have a steeper U graph to ensure that the clustering control is always in place in the nearest cluster.Since FCM only uses a single m value, the absolute level of U is the same for the elements near the center of the cluster and the elements at the boundary, which is not reasonable.Hwang et al. [6] studied the elements on the boundary of two clusters.It was found that when choosing a large m for the degree of U to cluster center 1 V and choosing a small m for the degree of U to cluster center 2 V , the cluster boundary would be distributed with a wide left region and narrow right region, as shown in Figure 2. As seen from the above analysis, different exponential parameters m can be determined depending on the distance between the element and the cluster center.The interval type-2 fuzzy Cmeans clustering (FCMT2I) method, proposed in [6], uses two parameters L m and U m .These calculations are derived from Equations ( 2) and (3).However, the membership of i X in the cluster k becomes a range of values, hence, the calculation of the cluster center in FCMT2I requires additional steps from the type-reduction algorithm to obtain a single final value.Therefore, even though the performance of the FCMT2I algorithm is, in general, better than that of the FCM with one For the elements on the cluster boundary, the absolute degree of U must be lower than the elements near the center of the cluster.The reason is that we cannot make a clustering decision with elements at the immediate boundary, so we need the U graph to be more flexible in terms of cluster selection for these elements.Meanwhile, the elements near the center of the cluster should have a steeper U graph to ensure that the clustering control is always in place in the nearest cluster.Since FCM only uses a single m value, the absolute level of U is the same for the elements near the center of the cluster and the elements at the boundary, which is not reasonable.Hwang et al. [6] studied the elements on the boundary of two clusters.It was found that when choosing a large m for the degree of U to cluster center V 1 and choosing a small m for the degree of U to cluster center V 2 , the cluster boundary would be distributed with a wide left region and narrow right region, as shown in Figure 2.For the elements on the cluster boundary, the absolute degree of U must be lower than the elements near the center of the cluster.The reason is that we cannot make a clustering decision with elements at the immediate boundary, so we need the U graph to be more flexible in terms of cluster selection for these elements.Meanwhile, the elements near the center of the cluster should have a steeper U graph to ensure that the clustering control is always in place in the nearest cluster.Since FCM only uses a single m value, the absolute level of U is the same for the elements near the center of the cluster and the elements at the boundary, which is not reasonable.Hwang et al. [6] studied the elements on the boundary of two clusters.It was found that when choosing a large m for the degree of U to cluster center 1 V and choosing a small m for the degree of U to cluster center 2 V , the cluster boundary would be distributed with a wide left region and narrow right region, as shown in Figure 2. As seen from the above analysis, different exponential parameters m can be determined depending on the distance between the element and the cluster center.The interval type-2 fuzzy Cmeans clustering (FCMT2I) method, proposed in [6], uses two parameters L m and U m .These calculations are derived from Equations ( 2) and (3).However, the membership of i X in the cluster k becomes a range of values, hence, the calculation of the cluster center in FCMT2I requires additional steps from the type-reduction algorithm to obtain a single final value.Therefore, even though the performance of the FCMT2I algorithm is, in general, better than that of the FCM with one As seen from the above analysis, different exponential parameters m can be determined depending on the distance between the element and the cluster center.The interval type-2 fuzzy C-means clustering (FCMT2I) method, proposed in [6], uses two parameters m L and m U .These calculations are derived from Equations ( 2) and (3).However, the membership of X i in the cluster k becomes a range of values, hence, the calculation of the cluster center in FCMT2I requires additional steps from the type-reduction algorithm to obtain a single final value.Therefore, even though the performance of the FCMT2I algorithm is, in general, better than that of the FCM with one exponential parameter, the FCMT2I algorithm requires more calculations because the exponential parameters depend on the distance to the cluster center and the cluster center changes after every iteration.
The proposed idea in this study is to assign each element X i an exponential parameter m i which is constant throughout the entire clustering process.The determination of m i depends on the characteristics of the corresponding element X i in the dataset.

The Fuzzification Coefficients
Each element is assigned its own fuzzification coefficient.The fuzzification coefficient of an element is calculated based on the concentration of other elements surrounding that element.If the concentration is high, meaning a high chance to create a cluster, the fuzzification coefficient will be small for faster convergence.If the concentration is low, the fuzzification coefficient will be large to increase the ability to select a cluster through iterations.
With each element X i , 1 ≤ i ≤ N, let δ ij be the distance between the elements X i and X j , 1 ≤ j ≤ N. In the dataset, we can treat the total distance between X i and N/C elements with the closest distance to X i as a heuristic metric for the concentration surrounding X i , notated δ * i .Afterward, the fuzzification coefficient m i can be calculated from δ * i , as shown below.

Summary of steps for computing m i
Input: the dataset X = {X 1 , X 2 , . . ., X N }, the interval for fuzzification coefficients [m L , m U ], m L > 1 and a parameter α.

Output: the fuzzification coefficient m i for each element X i
Steps:

•
Step 2: Rearrange δ ij with index j, we have The algorithm to calculate m i is carried out once before clustering, with a complexity of O(N 2 ).The parameter α is utilized to adjust the mapping of the distance between X i and its neighboring elements with the exponential parameter in the range of [m L , m U ].If α = 1, it indicates linear mapping; if α < 1, the distribution is skewed toward m U ; and if α > 1, the distribution skewed toward m L .

Derivation of the MC-FCM Clustering Algorithm
In this section, we derive the fuzzy C-means clustering algorithm with multiple fuzzification coefficients (MC-FCM).The objective function to be minimized in the MC-FCM algorithm is formulated as follows, where X = {X 1 , X 2 , . . ., X N } is the dataset, N is the number of elements, C is the number of clusters, U(i, k) is the membership value of element X i in the cluster with center is the distance between two vectors X i and V k .
To solve the optimization problem shown in Equation ( 5), we employ the Lagrange multiplier method.Let Next, we compute U(i, k) using Replacing ( 9) into (8), we have

Summary of steps for the MC-FCM algorithm
Input: the dataset X = {X 1 , X 2 , . . ., X N }, the fuzzification coefficients m i of each X i .
Output: the partition of X into C clusters.

•
Step 2: At the l − th loop, update U according to Equation (10).

Evaluation of the Proposed MC-FCM Algorithm
In order to evaluate the performance of the proposed MC-FCM algorithm, we tested it using several UCI-benchmark datasets [13], described in Table 1 and compared it with FCM and FCMT2I.The algorithm was built and tested using C#.To make our comparison objective, we adopted the following performance indices outlined in [14]:

•
The Davies-Bouldin (DB) index is based on a ratio involving within-group and between-group distances.DB = 1 C C j=1 D j , where D j = max k j D j,k , D j,k is the within-to-between cluster spread for the j-th and k-th clusters, i.e., D j,k = d j + d k /d j,k , where d j and d k are the average within-group distances for the j-th and k-th clusters, respectively and d j,k is the inter-group distance between these clusters.These distances are defined as d j = (1/N j ) X i ∈C j X i − X j and d j,k = X j − X k .
Here, D j represents the worst-case within-to-between cluster spread involving the j-th cluster.Minimizing D j for all clusters minimizes the DB index.Hence, good partitions, which are comprised of compact and separated clusters, are distinguished by small values of DB.

•
The Alternative Silhouette Width Criterion (ASWC) is the ratio between the inter-group distance and the intra-group distance.
a k,i +ε .Let us consider that the i-th element of the dataset, X i belongs to a given cluster k ∈ {1, 2, . . ., C}, then a k,i is the average distance of X i to all other elements in this cluster, d q,i is the average distance of X i to all elements in another cluster q, with q k, b k,i is the minimum of d q,i computed over q = 1, 2, . . ., C, with q k and ε is a small constant (e.g., 10 −6 for normalized data) used to avoid division by zero when a k,i = 0. Large ASWC values indicate good partitions.

•
The PBM index is also based on the within-group and between-group distances.PBM where E 1 = N i=1 X i − X denotes the sum of distances between the elements and the grand mean of the data, E C = C j=1 X i ∈C j X i − X j represents the sum of within-group distances and D C = max j,k=1,...,C X j − X k is the maximum distance between group centroids.The best partition is indicated when PBM is maximized.

•
The Rand index (RI) can be seen as an absolute criterion that allows the use of properly labeled datasets for performance assessment of clustering results.This simple and intuitive index handles two hard partition matrices (R and Q) of the same dataset.The reference partition, R, encodes the class labels, while the partition Q partitions the data into C clusters and is the one to be evaluated.We have RI = , where a j is the number of elements in the cluster j after clustering and b j is the actual number of elements in cluster j.Large MA values often indicate good clustering.The DB, ASWC and PBM indices show the compact and separated level of the clusters, while the MA and RI indices measure the quality of the clusters for the labeled datasets, to see the compatibility between the clusters and the labeled groups.When performing the evaluation in this study, we focused on the MA and DB indices.Since the clustering algorithm is affected by the initial step, each run with the parameters set was conducted several times.
We considered these scenarios: (i) Perform FCM with m = 2 several times and record the run with the best MA index result; (ii) Perform FCM with changing m and record the run with the best MA index result; (iii) Perform MC-FCM with changing m L , m U and α and record the run with the best MA index result; (iv) Perform MC-FCM with the same m L and m U as in (iii), adjust α and record the run with the best DB index result; (v) Perform FCMT2I several times with the same m L and m U as in (iii) and record the run with the best MA index result.
The comparison results of various clustering algorithms on the five UCI-benchmark datasets are shown below.
The results of the 5 indices for each algorithm implemented in 5 different datasets are shown in Tables 2-6, with the best index results bolded in the tables.The results show that the proposed MC-FCM algorithm gives better results for most of the indices in all five datasets.The MA index results using MC-FCM are consistently the best when compared to FCM and FCMT2I.For the DB index, MC-FCM gives the significantly better results in four out of five datasets.Other indices are similar for all cases.Regarding the number of iterations in the clustering process using MC-FCM, after averaging multiple runs, the WINE and IRIS datasets have a fewer number of iterations than FCM and a similar number of iterations compared to FCMT2I.The HEART dataset has similar numbers of iterations

Figure 1 .
Figure 1.Membership of an element to a cluster for different values of m.

Figure 2 .
Figure 2. The cluster boundary between two clusters for different values of m.

Figure 1 .
Figure 1.Membership of an element to a cluster for different values of m.

Figure 1 .
Figure 1.Membership of an element to a cluster for different values of m.

Figure 2 .
Figure 2. The cluster boundary between two clusters for different values of m.

Figure 2 .
Figure 2. The cluster boundary between two clusters for different values of m.

f 11 + f 00 N(N− 1 )/ 2 ,
where f 11 denotes the number of pairs of data elements belonging to the same class in R and to the same cluster in Q, f 00 denotes the number of pairs of data elements belonging to different classes in R and to different clusters in Q.Large RI values indicate compatible clustering with the given class labels.• Mean accuracy (MA), MA = min j=1,...,C a j b j δ iN in non-decreasing order.

Table 1 .
Summarized descriptions of the experimental datasets.This dataset consists of 13 chemical constituents in 3 types of Italian wine used to identify them.