Next Article in Journal
Decision Analysis under Behavioral Economics—Incentive Mechanism for Improving Data Quality in Crowdsensing
Previous Article in Journal
Adaptive Finite/Fixed Time Control Design for a Class of Nonholonomic Systems with Disturbances
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

How KLFCM Works—Convergence and Parameter Analysis for KLFCM Clustering Algorithm

1
School of Information Engineering, Minzu University of China, Beijing 100081, China
2
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance, Ministry of Education, Minzu University of China, Beijing 100081, China
Mathematics 2023, 11(10), 2285; https://doi.org/10.3390/math11102285
Submission received: 13 April 2023 / Revised: 7 May 2023 / Accepted: 8 May 2023 / Published: 14 May 2023

Abstract

:
KLFCM is a clustering algorithm proposed by introducing K-L divergence into FCM, which has been widely used in the field of fuzzy clustering. Although many studies have focused on improving its accuracy and efficiency, little attention has been paid to its convergence properties and parameter selection. Like other fuzzy clustering algorithms, the output of the KLFCM algorithm is also affected by fuzzy parameters. Furthermore, some researchers have noted that the KLFCM algorithm is equivalent to the EM algorithm for Gaussian mixture models when the fuzzifier λ is equal to 2. In practical applications, the KLFCM algorithm may also exhibit self-annealing properties similar to the EM algorithm. To address these issues, this paper uses Jacobian matrix analysis to investigate the KLFCM algorithm’s parameter selection and convergence properties. We first derive a formula for calculating the Jacobian matrix of the KLFCM with respect to the membership function. Then, we demonstrate the self-annealing behavior of this algorithm through theoretical analysis based on the Jacobian matrix. We also provide a reference strategy for determining the appropriate values of fuzzy parameters in the KLFCM algorithm. Finally, we use Jacobian matrix analysis to investigate the relationships between the convergence rate and different parameter values of the KLFCM algorithm. Our experimental results validate our theoretical findings, demonstrating that when selecting appropriate lambda parameter values, the KLFCM clustering algorithm exhibits self-annealing properties that reduce the impact of initial clustering centers on clustering results. Moreover, using our proposed strategy for selecting the fuzzy parameter lambda of the KLFCM algorithm effectively prevents coincident clustering results from being produced by the algorithm.

1. Introduction

Clustering and fuzzy-based clustering have become popular techniques in data mining and machine learning due to their ability to identify patterns and group similar data points without the need for training data [1,2]. Clustering algorithms are a type of unsupervised approach where data are partitioned into subgroups based on their similarities or distances from each other [3]. The partition matrix represents the degree of membership of each data point in each cluster [4]. Since Zadeh [5] introduced the fuzzy set theory, the hard clustering algorithm was extended to the FCM (Fuzzy C-means) clustering algorithm in [6]. Clustering algorithms can be applied to a wide range of applications, such as image segmentation, customer segmentation, and anomaly detection.
In the literature, there are numerous improvements to the FCM algorithm that aim to address various clustering problems or enhance clustering performance [7,8,9,10,11]. The KLFCM (FCM with K-L information term) algorithm is one of the more well-known methods. Honda and Ichihashi proposed a modified objective function for Fuzzy c-Means (FCM) clustering that includes a regularizer based on K-L information [7]. In KLFCM, a regularization term based on the K-L divergence metric is added to the objective function to encourage cluster centers to be spaced further apart. The K-L divergence is used to measure the difference between two probability distributions, however, it can be used to measure the degree of separation between two cluster centers in this case. By including the regularization term, the algorithm ensures that the distance between cluster centers is maximized, which can help prevent the formation of overlapping clusters and ensure the resulting clusters are well separated. This algorithm has such unique characteristics that it has inspired the development of many clustering algorithms based on its principles, which have been proposed in the research literature. Honda applied probabilistic principal component analysis (PCA) mixture models to linear clustering and proposed a constrained model KLFCV [12]. Gharieb and Gendy [13] modified the regularization term of the original KLFCM algorithm using the Kullback–Leibler (KL) divergence, which measures the proximity between a pixel membership and the local average of this membership in the immediate neighborhood. Zhang et al. [14] combined the benefits of KLFCM and Student’s t-distribution to propose a new algorithm for image segmentation. A novel image segmentation algorithm based on KLFCM is proposed in [15] to increase the ability to overcome noise and describe the segmentation uncertainty. Amira et al. [16] incorporated conditional probability distributions and the probabilistic dissimilarity functional into the conventional KLFCM algorithm and proposed a new model called CKLFCM.
While numerous clustering methods based on the KLFCM algorithm have been proposed in the literature, few provide clear explanations for how and why this algorithm works. Furthermore, there is a lack of theoretical research into its convergence properties and optimal parameter selection. Similar to the FCM clustering algorithm, the degree of fuzziness in KLFCM’s membership values is regulated by the fuzzifier parameter. Larger values lead to fuzzier memberships [17], which can result in coincident clustering when fuzziness approaches infinity. In our research, “coincident clustering result” refers to a specific type of coincident clustering where all cluster centers coincide with the dataset’s mass center and merge into a single center, resulting in a loss of clustering information and decreased accuracy in data partitioning. Hence, selecting the proper fuzzifier value is crucial for obtaining accurate clustering results. Nevertheless, the use of KL divergence as a penalty term in the algorithm helps prevent the overlapping of the cluster centers by spreading them throughout the data space. Consequently, theoretically, the algorithm can eliminate overlap between the dataset’s mass center and all cluster centers. We have addressed the parameter selection of the clustering algorithm by using Jacobian matrix analysis in previous papers [18,19,20,21]. In these papers, we revealed the relationship between the stable fixed points of the clustering algorithm and the datasets using Jacobian matrix analysis. In [18], we provided an explanation of the self-annealing behavior observed in the EM algorithm for Gaussian mixtures, along with the initialization lower bound of the temperature parameter in the DA-EM algorithm. In addition, Ref. [21] demonstrated that coincident clustering results are not stable fixed points of the GG clustering algorithm and discussed the correlation between the clustering algorithm’s convergence rate and the fuzziness index. In this paper, we further analyze the parameter selection and convergence properties of the KLFCM clustering algorithm through Jacobian matrix analysis, building on our previous work.
The primary contributions of this paper can be summarized as follows:
  • Firstly, we constructed the Jacobian matrix of the KLFCM algorithm regarding the membership function. Then, we provided theoretical proof for the self-annealing property of the KLFCM algorithm.
  • We discussed the reference methods for selecting fuzzy parameters in practical applications of the KLFCM algorithm. Specifically, we talked about how to choose appropriate values for the parameters’ lambda to ensure poor clustering results are avoided.
  • Additionally, similar to the Hessian matrix, the Jacobian matrix can be utilized to estimate the convergence rate of an algorithm. Since computing the Jacobian matrix is simpler than computing the Hessian matrix, the third contribution of this paper is to estimate the convergence rate of the KLFCM algorithm under different parameter conditions using the Jacobian matrix.
  • Finally, we conducted experiments to verify the accuracy and effectiveness of the theoretical derivation.
The experimental results indicate that the fuzzy parameter lambda has a significant impact on the clustering outcome of the algorithm, and inappropriate parameter selection can result in poor clustering performance. The research also demonstrates that the coincident clustering solution is not a stable fixed point of the KLFCM algorithm. Therefore, under certain parameter conditions (i.e., where the chosen λ results in the spectral radius of the Jacobian matrix at the coincident clustering center being greater than 1), even if the initial clustering center selection is suboptimal, the algorithm may still produce good clustering results. Meanwhile, we used the spectral radius of the Jacobian matrix to estimate the convergence rate of the KLFCM algorithm under different parameter conditions in the experiment and further explained the relationship between the parameters and convergence rate.
In this research, we provide an introduction to the KLFCM clustering algorithm with a brief overview in Section 2. We then analyze the Jacobian matrix and discuss the theoretical behavior of the KLFCM algorithm in Section 3. To validate our theoretical findings, we present various experimental results in Section 4. Additionally, we include a discussion on the experimental outcomes in Section 5. Finally, we summarize our research in Section 6.

2. The KLFCM Clustering Algorithm

This section provides a concise overview of the KLFCM clustering algorithm.
Firstly, we focus on the original FCM clustering algorithm. Let X = { x 1 , , x k } R s be a dataset from an s-dimensional Euclidean space. The aim of clustering is to find structure in data and cluster n data points to c clusters. The assignment of all items to the clusters is determined by their membership values, indicating the degree to which each item belongs to each cluster. The membership matrix U = u i k c × n represents these values, where u i k denotes the membership value of the ith data sample for the jth cluster. It should be noted that all membership values must adhere to the following constraints:
u i k 0 , i , j a n d i = 1 c u i k = 1
We denote the set of fuzzy partition matrices as
M f c = U = u i k c × n | i , k , u i k 0 , n > k = 1 n u i k > 0 , i = 1 c u i k = 1 .
The objective function of the FCM algorithm [6] is formulated as follows:
J m ( U , V ) = k = 1 n i = 1 c u i k m d i k 2
where U M f c n and d i k = x k v i = ( ( x k v i ) T ( x k v i ) ) 1 / 2 is the Euclidian distance from the kth object x k to the ith cluster center v i . m is the weighting exponent which determines the degree of fuzziness and 1 < m < + . The necessary conditions for optimality of (1) are derived as follows: i = 1 , , c and k = 1 , , n
v i = k = 1 n u i k m x k k = 1 n u i k m
u i k = x k v i 2 m 1 j = 1 c x k v j 2 m 1
Miyamoto et al. [22] proposed the introduction of an entropy term and a positive parameter λ , resulting in the minimization of a new objective function J λ ( U , V ) instead of J m ( U , V ) . This approach is commonly known as entropy regularization.
J λ ( U , V ) = k = 1 n i = 1 c u i k d 2 ( x k , v i ) + λ k = 1 n i = 1 c u i k log u i k
The objective function of the FCM clustering method with regularization by K-L information (KLFCM) is obtained by substituting the entropy term in Equation (4) with K-L information. The objective function is given by the following equation:
J klfcm U , V = i = 1 c k = 1 n u i k d i k + λ i = 1 c k = 1 n u i k log u i k α i + i = 1 c k = 1 n u i k | F i |
where α i represents the proportion of samples belonging to the ith cluster. In the KLFCM algorithm, the Mahalanobis distance is utilized to quantify the dissimilarity between each data point and the cluster centers during the clustering process. The Mahalanobis distance takes into account the covariance structure of the data, which makes it a more precise distance measure than Euclidean distance. By considering the distribution of the data and the correlation between variables, it can provide better estimates of similarity or dissimilarity between data points. The formula for the Mahalanobis distance is as follows:
d i k = det F i 1 exp x k v i T F i 1 x k v i
The objective funciton of KLFCM (5) is minimized under the condition that α i > 0 , i = 1 c α i = 1 i = 1 c u i k = 1 , respectively. Then, the updating rules in the KLFCM clustering algorithm are as follows:
v i = k = 1 n u i k x k k = 1 n u i k
u i k = α i d i k 1 λ i = 1 c α i d i k 1 λ
α i = 1 n k = 1 n u i k
F i = k = 1 n u i k x k v i x k v i T k = 1 n u i k
where d i k = det F i 1 exp x k v i T F i 1 x k v i . The KLFCM clustering algorithm is equivalent to the Expectation-Maximization (EM) algorithm with Gaussian Mixture Models (GMMs) only when the value of λ is equal to 2. This relationship between the two algorithms is well established in the literature.
The KLFCM Algorithm is summarized in Algorithm 1:
Algorithm 1: KLFCM algorithm.
  • Step 1: Assuming a fixed number of clusters and parameter value, with  2 c n and λ selected, an initial matrix u 0 = u 1 0 , , u i 0 , i = 1 , , c M f c is chosen. The algorithm starts at t = 1 .
  • Step 2: Calculate the cluster centers v ( t ) using u ( t 1 ) through the utilization of Equation (6). The notation v ( t ) denotes the cluster centers obtained in the tth iteration, while u ( t 1 ) represents the membership matrix from the previous t−1th iteration of the clustering algorithm.
  • Step 3: Calculate the cluster covariance matrix F ( t ) and the matrix α ( t ) by applying Equation (9) for F ( t ) and Equation (8) for α ( t ) in the iterative process of the KLFCM algorithm.
  • Step 4: Using Equation (7), revise the membership matrix u ( t ) by incorporating the current cluster centers v ( t ) in the iterative procedure of the KLFCM algorithm.
  • Step 5: if u ( t ) u ( t 1 ) < ε then stop, otherwise return to Step 2.
For a better understanding of the impact of the fuzzy parameter λ on the output results of the KLFCM algorithm, we conducted experiments using the Iris dataset as the experimental object to observe the clustering results obtained by selecting different fuzzy parameters. In the clustering results, we used * (green asterisks), Δ (red triangles), and ⋆ (blue pentagrams) to represent sample points that belong to different clusters. The sample cluster centers were represented by black circles ●. In the use of the KLFCM clustering algorithm, it is generally necessary to initialize the cluster centers and membership matrix of the algorithm. To accomplish this task, we utilize the K-means clustering algorithm for initialization. Specifically, we use the K-means clustering algorithm to divide the sample data into k clusters and use each cluster’s centroid as the initial cluster center in the KLFCM algorithm. Additionally, based on the K-means clustering results, we calculate the distance between each sample point and the various cluster centers, which allows us to establish the initial membership matrix in the KLFCM algorithm.
The simulation shown in Figure 1 highlights the importance of choosing an appropriate value for the parameter λ in the KLFCM clustering algorithm. The results demonstrate that different values of λ can lead to significantly different clustering outcomes, and a poor choice of λ can result in an invalid or uninformative clustering solution. When λ is set to 2 or 4, reasonable clustering outcomes are obtained with low error counts of 5 and 44, respectively. This indicates that the algorithm was able to produce meaningful clusters with acceptable levels of misclassification. However, when λ is improperly initialized to 8 or 36, as illustrated in Figure 1e,f, the clustering algorithm fails to produce informative results. Specifically, the algorithm outputs a single cluster, which indicates that the clustering solution is invalid and uninformative.
Following that, we manually designate the initial class center of the KLFCM algorithm under identical conditions regarding the fuzzy parameter. The initial cluster centers are closely situated to each other, but there is no complete overlap. Then we will apply the KLFCM algorithm to cluster the Itis dataset and display the clustering results in Figure 2. We use ○ (magenta circles) to represent the initial cluster centers.
The initial cluster centers are already tightly close to each other (however, it is not completely overlapping). Even with poor initial clustering center selection, the algorithm can still prevent convergence towards overlapping clustering centers as long as the fuzzy parameter λ is appropriately chosen. That means the KLFCM clustering algorithm possesses the capability of evading these kinds of erroneous clustering outcomes, which highlights its potential self-annealing properties. When the fuzzy parameter λ is set to 2, the KLFCM clustering algorithm delivers satisfactory clustering results despite the less-than-ideal initial clustering center. However, when the fuzzy parameter λ is set to 5, the KLFCM clustering algorithm fails to prevent producing clustering results in which all samples are assigned to a single class. The self-annealing property refers to the ability of an algorithm to adapt and improve its performance without explicit external intervention. In the context of the KLFCM clustering algorithm, it means that the algorithm has the ability to adjust its parameters during the iterative process to achieve better clustering results. The algorithm seems to “self-anneal” toward a more meaningful clustering outcome, even when the initial parameter selection may be inappropriate.
In the upcoming section, we will perform a theoretical analysis of the KLFCM clustering algorithm using Jacobian matrix analysis.

3. Convergence and Parameter Analysis Based on Jacobian Matrix

It is a well-known fact that when partitioning a dataset into clusters, each cluster should have distinct centers from the others. Otherwise, if all degrees of membership between samples and any clustering center are equal, it would imply that we cannot meaningfully divide the dataset into subsets based on the membership matrix. Similarly, in the case of the KLFCM algorithm, we would expect it to circumvent this potential drawback; otherwise, it cannot be considered successful as a clustering algorithm.
As we mentioned in Section 2, the KLFCM cluster centers and membership values of the data points with them are updated through the following iterations.
v i t = k = 1 n u i k t 1 x k k = 1 n u i k t 1
α i t = 1 n k = 1 n u i k t 1
F i t = k = 1 n u i k t 1 x k v i t x k v i t T k = 1 n u i k t 1
u i k t = α i d i k t 1 λ i = 1 c α i d i k t 1 λ
where d i k t = det F i t 1 exp x k v i t T F i t 1 x k v i t . v i t and u i k t are the ith cluster center and membership value of kth sample for the ith cluster obtaind in the tth iteration, and so on.
We considering the KLFCM clustering algorithm as a map U ( t ) = θ ( U ( t 1 ) ) = H G ( U ( t 1 ) ) , where the mapping function G : U = [ u i k ] c × n M f c V = v 1 , v 2 , , v c T R c × s and F : V = v 1 , v 2 , , v c T R c × s U = [ u i k ] c × n M f c satisty V t = G ( U ( t 1 ) ) and U ( t ) = H ( V ( t ) ) . Then U t , t = 1 , , n , is called the iteration sequence or convergent sequence of the KLFCM algorithm. If the iteration sequence converges to a point U * Ω , this point should be a fixed point of the algorithm which satisfies U * = θ ( U * ) . Set the convergence domain of the KLFCM clustering algorithm as
Ω = U * M f c | J k l f c m U * , G U * J k l f c m U , G U * , U M f c , U U * a n d U * , G U * < J k l f c m U * , G U , G U R c × s , G U G U *
Clearly, if the iteration process is starting from a point U 0 M f c , then the iteration process will terminate at a point in the convergence domain, or there is a subsequence converges to a point in Ω .
If the initial membership matrix is U = U * = [ c 1 ] c × n M f c , then the KLFCM clustering centers are equal to the mass center of the dataset v i = x ¯ = k = 1 n x k / n , i . In next iteration, we still get x ¯ and U = [ c 1 ] c × n . That is, x ¯ , U = [ c 1 ] c × n is actually the fixed point of the KLFCM algorihtm. If the KLFCM algorithm converges to this point, the algorithm will fail to produce meaningful clusters. Moreover, if x ¯ , U = [ c 1 ] c × n is a stable fixed point of KLFCM clustering algorithm, this clustering algorithm will not escape from this point. Of course, this kind of situation should be avoided.
The KLFCM clustering result may be heavily influenced by the parameter value of λ , such as shown in Figure 1. However, the KLFCM clustering algorithm can avoid outputting the coincident clustering result in x ¯ , U = [ c 1 ] c × n , which means it is not a stable fixed point of the algorithm. Next, we address the convergence and parameter analysis of the KLFCM clustering algorithm using the Jacobian matrix. Our theoretical analysis is based on Olver’s Corollary [23]. According to Olver ([23], p. 143), for Jacobian matrix g ( μ * ) = d g ( μ ) d μ | μ = μ * . , if the spectral radius (i.e., the maximum of absolute eigenvalues of the matrix) r ( g ( μ * ) ) is less than one, then the fixed point μ * is asymptotically stable. That is, for KLFCM, if spectral radius of the Jacobian matrix θ ( U ) U at point U = [ c 1 ] c × n is not less than 1, then x ¯ , U = [ c 1 ] c × n is not a stable fixed point of the clustering algorithm.
Next, we construct the formula for the element of the Jacobian matrix. The element θ i k u j r , i = 1 , , c , k = 1 , , n , j = 1 , , c 1 , r = 1 , , n of Jacobian matrix is obtained by taking the derivations of θ i k with respect to u j r .
Theorem 1.
For i = 1 , , c , k = 1 , , n and j = 1 , , c 1 , r = 1 , , n , each element of Jacobian matrix θ i k u j r is
θ i k u j r = u i k u j k λ n α j H j k r + u i k u c k λ n α c H c k r + δ i j u i k λ n α i H i k r
where H j k r = λ + s x r v j T F j 1 x r v j + x k v j T F j 1 x r v j 2 x k v j T F j 1 x k v j + 2 x r v j T F j 1 x k v j and δ i j = 1 , i f i = j 0 , i f i j is the Kronecker delta function.
Proof. 
Each element of the Jacobian matrix is obtained as follows:
θ i k u j r = ( i = 1 c α i d i k 1 λ ) α i d i k 1 λ u j r i = 1 c α i d i k 1 λ 2 α i d i k 1 λ i = 1 c α i d i k 1 λ u j r i = 1 c α i d i k 1 λ 2
Recall that the membership matrix of the KLFCM clustering algorithm satisfies i = 1 c u i k = 1 , thus, we have:
θ i k u j r = α i d i k 1 λ u j r i = 1 c α i d i k 1 λ α i d i k 1 λ i = 1 c 1 α i d i k 1 λ u j r + α i d c k 1 λ u j r i = 1 c α i d i k 1 λ 2 = δ i j α i u j r d i k 1 λ i = 1 c α i d i k 1 λ + δ i j α i d i k 1 λ u j r i = 1 c α i d i k 1 λ α i d i k 1 λ d j k 1 λ n i = 1 c α i d i k 1 λ 2 α i α j d i k 1 λ d j k 1 λ u j r i = 1 c α i d i k 1 λ 2 + α i d i k 1 λ d c k 1 λ n i = 1 c α i d i k 1 λ 2 + α i α c d i k 1 λ d c k 1 λ u c r i = 1 c α i d i k 1 λ 2
For each element in Equation (16), we can get following result by simple computation:
δ i j α i u j r d i k 1 λ i = 1 c α i d i k 1 λ + δ i j α i d i k 1 λ u j r i = 1 c α i d i k 1 λ = δ i j d i k 1 λ n i = 1 c α i d i k 1 λ δ i j α i 1 λ d i k 1 λ i = 1 c α i d i k 1 λ det F i det F i u j r 1 λ δ i j α i d i k 1 λ i = 1 c α i d i k 1 λ x k v i T F i 1 u j r x k v i + 2 δ i j α i 1 λ d i k 1 λ i = 1 c α i d i k 1 λ k = 1 n u i k x r v i T F i 1 x k v i
α i α j d i k 1 λ d j k 1 λ u j r i = 1 c α i d i k 1 λ 2 = 2 1 λ α i α j d i k 1 λ d j k 1 λ i = 1 c α i d i k 1 λ 2 k = 1 n u j k x r v j T F j 1 x k v j 1 λ α i α j d i k 1 λ d j k 1 λ i = 1 c α i d i k 1 λ 2 x k v j T F j 1 u j r x k v j 1 λ α i α j d i k 1 λ d j k 1 λ i = 1 c α i d i k 1 λ 2 det F j 1 det F j u j r
α i α c d i k 1 λ d c k 1 λ u c r i = 1 c α i d i k 1 λ 2 = 2 1 λ α i α c d i k 1 λ d c k 1 λ i = 1 c α i d i k 1 λ 2 k = 1 n u c k x r v c T F c 1 x k v c 1 λ α i α c d i k 1 λ d c k 1 λ i = 1 c α i d i k 1 λ 2 x k v c T F c 1 u c r x k v c 1 λ α i α c d i k 1 λ d c k 1 λ i = 1 c α i d i k 1 λ 2 det F c 1 det F c u c r
We have that F i = k = 1 n u i k x k v i x k v i T k = 1 n u i k , so
F i 1 u i r = F i 1 F i u i r F i 1 = F i 1 x r v i x r v i T k = 1 n u i k F i 1 + F i 1 k = 1 n u i k det F i u i r = det F i x r v i T F i 1 x r v i k = 1 n u i k s k = 1 n u i k
Finally, we substitute the above equations into Equation (16). Then the element of the KLFCM Jacobian matrix can be rewritten as
θ i k u j r = δ i j d i k 1 λ n i = 1 c α i d i k 1 λ δ i j α i 1 λ d i k 1 λ det F i i = 1 c α i d i k 1 λ det F i x r v i T F i 1 x r v i k = 1 n u i k s k = 1 n u i k 1 λ δ i j α i d i k 1 λ i = 1 c α i d i k 1 λ x k v i T F i 1 x r v i x r v i T k = 1 n u i k F i 1 + F i 1 k = 1 n u i k x k v i + 2 δ i j α i 1 λ d i k 1 λ i = 1 c α i d i k 1 λ k = 1 n u i k x r v i T F i 1 x k v i α i d i k 1 λ d j k 1 λ n i = 1 c α i d i k 1 λ 2 2 1 λ α i α j d i k 1 λ d j k 1 λ i = 1 c α i d i k 1 λ 2 k = 1 n u j k x r v j T F j 1 x k v j
+ 1 λ α i α j d i k 1 λ d j k 1 λ i = 1 c α i d i k 1 λ 2 x k v j T F j 1 x r v j x r v j T k = 1 n u j k F j 1 + F j 1 k = 1 n u j k x k v j + 1 λ α i α j d i k 1 λ d j k 1 λ det F j i = 1 c α i d i k 1 λ 2 det F j x r v j T F j 1 x r v j k = 1 n u j k s k = 1 n u j k + α i d i k 1 λ d c k 1 λ n i = 1 c α i d i k 1 λ 2 + 2 1 λ α i α c d i k 1 λ d c k 1 λ i = 1 c α i d i k 1 λ 2 k = 1 n u c k x r v c T F c 1 x k v c 1 λ α i α c d i k 1 λ d c k 1 λ i = 1 c α i d i k 1 λ 2 x k v c T F c 1 x r v c x r v c T k = 1 n u c k F c 1 + F c 1 k = 1 n u c k x k v c 1 λ α i α c d i k 1 λ d c k 1 λ det F c i = 1 c α i d i k 1 λ 2 det F c x r v c T F c 1 x r v c k = 1 n u c k s k = 1 n u c k
We further simplify the formula and get
θ i k u j r = δ i j u i k n α i δ i j u i k λ x r v i T F i 1 x r v i k = 1 n u i k s k = 1 n u i k δ i j u i k λ x k v i T F i 1 x r v i x r v i T k = 1 n u i k F i 1 + F i 1 k = 1 n u i k x k v i + 2 δ i j u i k λ k = 1 n u i k x r v i T F i 1 x k v i u i k u j k n α j 2 u i k u j k λ k = 1 n u j k x r v j T F j 1 x k v j + u i k u j k λ x k v j T F j 1 x r v j x r v j T k = 1 n u j k F j 1 + F j 1 k = 1 n u j k x k v j + u i k u j k λ x r v j T F j 1 x r v j k = 1 n u j k s k = 1 n u j k + u i k u j k n α c + 2 u i k u c k λ k = 1 n u c k x r v c T F c 1 x k v c u i k u c k λ x k v c T F c 1 x r v c x r v c T k = 1 n u c k F c 1 + F c 1 k = 1 n u c k x k v c u i k u c k λ x r v c T F c 1 x r v c k = 1 n u c k s k = 1 n u c k = δ i j u i k λ n α i λ + s x r v i T F i 1 x r v i + 2 x r v i T F i 1 x k v i x k v i T F i 1 x r v i x r v i T F i 1 + F i 1 x k v i u i k u j k λ n α j λ + s x r v j T F j 1 x r v j + 2 x r v j T F j 1 x k v j x k v j T F j 1 x r v j x r v j T F j 1 + F j 1 x k v j + u i k u c k λ n α c λ + s x r v c T F c 1 x r v c + 2 x r v c T F c 1 x k v c x k v c T F c 1 x r v c x r v c T F c 1 + F c 1 x k v c
Set H j k r = λ + s x r v j T F j 1 x r v j + x k v j T F j 1 x r v j 2 x k v j T F j 1 x k v j + 2 x r v j T F j 1 x k v j . Then each element in the Jacobian matrix is:
θ i k u j r = u i k u j k λ n α j H j k r + u i k u c k λ n α c H c k r + δ i j u i k λ n α i H i k r
The proof is completed. □
Now, we get a general form for the Jacobian matrix. To discuss the theoretical behavior of the KLFCM clustering algorithm, we should consider the Jacobian matrix θ ( U ) U at the special point U = [ c 1 ] c × n . We define a notation as follows: For any matrix M = ( m 1 , , m q ) p × q , v e c ( M ) = ( m 1 T , , m q T ) T .
Theorem 2.
Each element of Jacobian matrix θ ( U ) U at the special point U = [ c 1 ] c × n is
θ i k u j r | i , k , u i k = c 1 = δ i j n λ A T A k r
where
A = λ λ 2 σ X 1 2 x 1 x ¯ 2 σ X 1 2 x n x ¯ v e c σ X 1 2 x 1 x ¯ x 1 x ¯ T σ X 1 2 I s v e c σ X 1 2 x n x ¯ x n x ¯ T σ X 1 2 I s
and x ¯ = k = 1 n x k / n , σ x = n 1 k = 1 n x k x ¯ x k x ¯ T , δ i j is the Kronecker delta function.
Proof. 
If U = [ c 1 ] c × n , then u i k = 1 c , v i = k = 1 n x k n = x ¯ and F i = k = 1 n x k x ¯ x k x ¯ T n = σ x . Thus, the Jacobian matrix at this special point becomes
θ i k u j r | i , k , u i k = c 1 = δ i j λ n λ δ i j λ n x r x ¯ T σ X 1 x r x ¯ + δ i j s λ n + δ i j n λ x k x ¯ T σ X 1 x r x ¯ 2 δ i j n λ x k x ¯ T σ X 1 x k x ¯ + 2 δ i j λ n x r x ¯ T F i 1 x k x ¯ = δ i j n λ x r x ¯ T σ X 1 x r x ¯ + λ + s + 2 x r x ¯ T F i 1 x k x ¯ + x k x ¯ T σ X 1 x r x ¯ 2 x k x ¯ T σ X 1 x k x ¯ = δ i j n λ λ + 2 x r x ¯ T F i 1 x k x ¯ x r x ¯ T σ X 1 x r x ¯ + s + x k x ¯ T σ X 1 x r x ¯ 2 x k x ¯ T σ X 1 x k x ¯
Consider that t r ( a T b ) = t r ( b T a ) where a and b are column vectos, also we have t r ( A B C ) = t r ( B C A ) = t r ( C A B ) for matrices A, B, and C, the following equations can be obtianed by simple computation:
x r x ¯ T σ X 1 x r x ¯ = t r σ X 1 2 x r x ¯ x r x ¯ T σ X 1 2 x k x ¯ T σ X 1 x k x ¯ = t r σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 x k x ¯ T σ X 1 x r x ¯ 2 = x r x ¯ T σ X 1 x k x ¯ x k x ¯ T σ X 1 x r x ¯ = x r x ¯ T σ X 1 2 σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 σ X 1 2 x r x ¯ = t r σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 σ X 1 2 x r x ¯ x r x ¯ T σ X 1 2
Equation (18) can be further simplified as
θ i k u j r | i , k , u i k = c 1 = δ i j n λ λ + 2 x r x ¯ T F i 1 x k x ¯ t r σ X 1 2 x r x ¯ x r x ¯ T σ X 1 2 + t r I s + t r σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 σ X 1 2 x r x ¯ x r x ¯ T σ X 1 2 t r σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 = δ i j n λ λ + 2 x r x ¯ T F i 1 x k x ¯ + t r σ X 1 2 x r x ¯ x r x ¯ T σ X 1 2 I s σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 I s
= δ i j n λ λ + 2 x r x ¯ T F i 1 x k x ¯ + t r σ X 1 2 x r x ¯ x r x ¯ T σ X 1 2 I s T σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 I s
The trace of the matrix A T B can be interpreted as t r ( A T B ) = i = 1 n j = 1 n a i j b i j , where a i j and b i j denote the element of row i and column j in A n × n and B n × n respectively. Moreover, we have that v e c ( A ) = a 11 , , a n 1 , a 12 , , a n 2 , a 1 n , , a n n T and v e c ( B ) = b 11 , , b n 1 , b 12 , , b n 2 , b 1 n , , b n n T . It implies that t r ( A T B ) = i = 1 n j = 1 n a i j b i j , where a i j and b i j = v e c ( A ) T × v e c ( B ) T . Finally, we have
θ i k u j r | i , k , u i k = c 1 = δ i j n λ λ + 2 x r x ¯ T σ X 1 x k x ¯ + v e c σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 I s T × v e c σ X 1 2 x r x ¯ x r x ¯ T σ X 1 2 I s
Let
H = λ + 2 x r x ¯ T σ X 1 x k x ¯ + v e c σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 I s T × v e c σ X 1 2 x r x ¯ x r x ¯ T σ X 1 2 I s
We have that
A T A k r = λ + 2 x r x ¯ T σ X 1 x k x ¯ + v e c σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 I s T × v e c σ X 1 2 x r x ¯ x r x ¯ T σ X 1 2 I s = H k r
Then the element in the Jacobian matrix θ ( U ) U at the special point U = [ c 1 ] c × n is
θ i k u j r | i , k , u i k = c 1 = δ i j n λ H k r = δ i j n λ A T A k r
The proof is completed. □
We have mentioned that the spectral radius of the Jacobian matrix can reflect the theoretical behavior of the algorithm. For KLFCM algorithm, if the spectral radius of the Jacobian matrix θ ( U ) U at point x ¯ , U = [ c 1 ] c × n is not less than 1, then it is not a stable fixed point of the algorithm. Next, we focus on the spectral radius of the Jacobian matrix calculated by Equation (17).
Theorem 3.
Let r * denote the spectral radius of Jacobian matrix θ U | i , k , u i k = c 1 , then we have that r * 1 .
Proof. 
Because of the eigenvalues of matrix A T A are equal to the eigenvalues of matrix A A T , the spectral radius of matrix A A T is the same as that of A T A . That is, the spectral radius of Jacobian matrix θ U | i , k , u i k = c 1 is equal to the spectral radius of symmetric matrix 1 n λ A × A T computed by Equation (19).
1 n λ A × A T = 1 n λ L 11 L 12 L 13 L 21 L 22 L 23 L 31 L 32 L 33 = 1 n λ λ n 0 0 0 2 n × I s L 23 0 L 32 L 33
where
L 11 = λ n , L 12 = 2 λ k = 1 n x k x ¯ T σ X 1 2 = 0 L 13 = λ k = 1 n v e c σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 I s T = 0 L 12 = L 21 T , L 13 = L 31 T , L 22 = 2 k = 1 n σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 = 2 n × I s L 23 = 2 k = 1 n σ X 1 2 x k x ¯ × v e c σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 I s T L 23 = L 32 T L 33 = k = 1 n v e c σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 I s × v e c σ X 1 2 x k x ¯ x k x ¯ T σ X 1 2 I s T
It is true that for the symmetric matrix 1 n λ A × A T , the following statement holds.
k max 1 n λ A × A T = max x 0 x T 1 n λ A × A T x x T x .
where the symbol k max represents the maximum eigenvalue of the matrix. Let e i = { 0 , , 1 , , 0 } is a vector in which the ith element is one and other elements are zero. Obviously, we have the following inequality holds:
max x 0 x T 1 n λ A × A T x x T x e i T 1 n λ A × A T e i e i T e i = 1 n λ A × A T i i .
Based on the above analysis, we can conclude that r * 1 . □
The study reveals that the coincident clustering results x ¯ , U = [ c 1 ] c × n are not stable fixed points of the KLFCM algorithm. As an example, when analyzing the Iris dataset, inappropriate selection of the fuzziness parameter may lead to all data points being assigned to a single cluster in the clustering result, as shown in Figure 1. Despite this clustering outcome being incorrect, the KLFCM algorithm avoids outputting the coincident clustering result, where all cluster centers are equal to the sample mean.

4. Experimental Results

In this section, we validate our theoretical results through experimental examples. We use both artificial and real datasets to demonstrate that the KLFCM clustering algorithm may exhibit self-annealing properties when selecting a suitable fuzzy parameter λ . We calculate the spectral radius of the Jacobian matrix at the coincident clustering result for the KLFCM algorithm under different lambda parameter conditions. A spectral radius greater than 1 indicates that the coincident clustering result is not a stable fixed point of the clustering algorithm. In addition, we found that the spectral radius of the Jacobian matrix can be applied in analyzing the convergence rate of the KLFCM algorithm. In all examples, the results of the K-means algorithm are used as the initialization for the KLFCM clustering algorithm. Let r * denote the spectral radius of Jacobian matrix θ U at point x ¯ , U = [ c 1 ] c × n . Subsequently, we conducted this experiment through a MATLAB model running on Windows 11 with the version of MATLAB being R2022a.
Example 1.
First, we synthesized GMM data with three clusters. The mixing proportions, mean values, and variances are listed in Table 1. The total number of data points is 300. The artificial dataset named Data-art is shown in Figure 3a. Data points generated from different models are denoted by different shapes, such as * (green), Δ (red) and (blue).
After initializing the KLFCM clustering algorithm with manually given cluster centers, we choose different λ values to observe their influence on the clustering result. The clustering results corresponding to different λ values are listed in Figure 3b–f.
To illustrate the clustering outcome, we use different colors and shapes to signify data points that belong to different clusters. Furthermore, we represent the initial cluster centers with ○ (magenta circles). It can be seen from Figure 3 that, although the three clustering centers are very close during initialization, as long as we choose appropriate fuzzy parameters, the KLFCM algorithm can still produce relatively good clustering results through iteration. This is demonstrated in Figure 3b–d. We observed that as the value of the fuzzy parameter λ increases, the KLFCM algorithm may produce clustering results where all samples belong to the same cluster. For example, when lambda = 3.7. We further calculated that when lambda = 3.7, the spectral radius of the Jacobian matrix at coincident clustering result x ¯ , U = [ c 1 ] c × n is equal to 1, r * = 1 . In other words, since the KLFCM algorithm inherently exhibits self-annealing properties, choosing appropriate parameters that satisfy r * > 1 will ensure that the algorithm produces interpretable and acceptable clustering results with any initial cluster centers, except for the case where all initial cluster centers are set equal to the sample mean. This finding is particularly intriguing. Next, we present more experimental results on real datasets.
Example 2.
We conduct experiments on six datasets from UCI Machine Learning Repository. The datasets we used in our experiments are described in Table 2.
We have theoretically proved the spectral radius of Jacobian matrix at the special point x ¯ , U = [ c 1 ] c × n is not less than 1, r * 1 . That is, x ¯ , U = [ c 1 ] c × n is not a stable fixed point of the KLFCM clustering algorithm. Our previous analysis reveals that for the KLFCM algorithm, the spectral radius of its Jacobian matrix at x ¯ , U = [ c 1 ] c × n is solely dependent on the fuzziness parameter λ and the data, while it remains unaffected by the initial clustering center of the algorithm. If the lambda value we choose ensures r * > 1 , then the clustering algorithm is likely to output good clustering results despite poorly choosing initial clustering centers, thanks to its self-annealing property.
By employing Equation (19), we have computed the spectral radius r * for different λ values, and the corresponding results are showcased in Table 3.
It can be seen from Table 3 that the spectral radius of Jacobian matrix at the special point x ¯ , U = [ c 1 ] c × n is not smaller than 1 for any fuzziness index value, which is consistent with the result of our theoretical analysis. We select parameter values that satisfy r * = 1 , such as λ = 5 or λ = 50 , and employ the K-means clustering algorithm and manual initialization to set the initial cluster centers of the KLFCM algorithm. Next, we apply the KLFCM clustering algorithm with different initialization methods to cluster the Iris dataset.
The clustering results are depicted in Figure 4. The ● (magenta circles) and ● (black circles), respectively, represent the initial cluster centers and the cluster centers obtained after clustering.
We have observed that when the spectral radius is equal to 1, several clusters are merged into a new cluster. However, different results can be obtained using different initialization methods. For instance, when λ = 5 and we use the K-means algorithm to initialize KLFCM, two clusters are obtained in the clustering results, which also preserve some structural information of the original data. On the other hand, if we initialize the KLFCM algorithm manually, all the samples in the clustering results will belong to the same cluster. If the value of the lambda parameter is large enough, for example, λ = 50 , regardless of the initialization method used, the KLFCM algorithm may output clustering results where all samples belong to the same cluster. Interestingly enough, the KLFCM algorithm is not suitable for the Balance Scale clustering issue because r * = 1 under any parameter condition.
Therefore, for the case where the spectral radius is equal to 1, as shown in Table 3, does the clustering algorithm avoid outputting overlapping clustering results? Next, we borrow the non-fuzzy index (NFI) to make a more advanced analysis. The NFI index is proposed by Roubens [24].
N F I ( c , U , λ ) = c / n × c 1 i = 1 c k = 1 n u i k 2 1 / c 1
These NFIs can be used to compare the performances of the clustering results. It is obvious that if the membership values are close to 0 or 1, then the NFI index will be close to 1. Otherwise, if U = c 1 c × n , the NFI index will then be close to 0. In other words, N F I c , U , λ = 0 indicates that the algorithm outputs the coincident clustering result x ¯ , U = c 1 c × n . That is,
N F I ( c , U , λ ) = 0 , i f U = [ c 1 ] c × n a n d V = x ¯ 1 , i f h a r d c l u s t e r i n g r e s u l t s
We initialized the KLFCM clustering algorithm with the K-means clustering algorithm and calculated the NFI value of the resulting clustering. The results are shown in Table 4.
Table 4 shows that under the parameter conditions listed, the NFI value of the KLFCM clustering results is almost always greater than 0. We further find that as the values of λ increase, the KLFCM clustering result has the NFI values with a decreasing trend. For the Iris dataset, if λ = 8 , then the spectral radius r * will be equal to 1 (see Table 3). We found that all data points are assigned to one cluster in this situation (see Figure 1). However, the NFI value is greater than 0, which means that the KLFCM clustering algorithm did not output the coincident clustering results in x ¯ , U = [ c 1 ] c × n . In fact, the KLFCM algorithm may output two cluster centers.
The above results indicate that even if the fuzzy parameters we choose are not optimal enough, the KLFCM algorithm has self-annealing property, which enables the algorithm to avoid producing coincident clustering centers at convergence. Even in some cases where the spectral radius r * is equal to 1, the KLFCM clustering results can still capture aspects of the underlying data structure because K-means initialization allows initial clustering centers to be distributed across different regions of the data. However, when the value of parameter λ used in the KLFCM clustering algorithm is too large, for example, λ = 50 , when clustering the Iris dataset, the algorithm may fail to distinguish between overlapping clusters and produce inaccurate clustering results.
Example 3.
In this example, we further discuss the impact of parameter λ on the convergence rate of the KLFCM clustering algorithm. We have previously regarded the KK algorithm as a mapping U ( t ) = θ ( U ( t 1 ) ) . If we assume that U ( t ) converges to U * and that the mapping is differentiable in a neighborhood of U * , then we can use a simple Taylor expansion to derive an expression.
U ( t + 1 ) U * = U t U * θ U U | U = U * + O U t U * 2
where · is the usual Euclidean norm. Within a certain neighborhood of U * , the behavior of the KLFCM algorithm can be well approximated by a linear iteration using the Jacobian matrix θ U U | U = U * .
Our focus now is on investigating the convergence rate of linear iterations in the KLFCM algorithm. Specifically, we define the global rate of convergence for this algorithm as c r = lim t U ( t + 1 ) U * / U ( t ) U * . Furthermore, a higher value of c r corresponds to a slower rate of convergence. To estimate the convergence rate of KLFCM, we calculate the spectral radius of θ ( U ) U | U = U * at the point of convergence, V * , U * . This is because the convergence rate of KLFCM is determined by the spectral radius of θ ( U ) U | U = U * , as indicated by Equation (20). By evaluating the spectral radius, we can approximate how quickly the algorithm will converge to its solution.
We varied the parameter λ in the KLFCM clustering algorithm, and for each value, we computed the spectral radius of the Jacobian matrix at the convergence point. This was done to explore how different parameter values influence the convergence rate of the algorithm. At the point of convergence, we can use Equation (14) to calculate the spectral radius of the Jacobian matrix for KLFCM and denote it as r S R . The results are shown in Table 5.
We know that the larger the value of r S R , the slower the convergence rate. Table 5 demonstrates that as the values of λ increase, the KLFCM algorithm exhibits a decreasing trend in convergence rates due to an increasing trend in the spectral radius. In the case of the Iris dataset, setting λ = 2 results in r S R = 0.5655 . However, if we increase the value of λ to 3, we get a spectral radius of 0.9221. Clearly, the convergence rate is observed to decrease in response to higher parameter values. This trend aligns with most experimental findings of the KLFCM algorithm. Specifically, setting a large value for the parameter λ in the KLFCM clustering algorithm can result in a fuzzier output, potentially causing slower convergence. Our demonstration shows that the Jacobian matrix can also be utilized to estimate the convergence rate.

5. Discussion

5.1. Main Results

Our experimental results indicate the following facts:
(1) The fuzzy parameter λ is a critical factor in the KLFCM algorithm. If the value is excessively large, it could lead to clustering failure.
(2) The KLFCM algorithm will exhibit self-annealing properties. Let r * denote the spectral radius of Jacobian matrix θ U | i , k , u i k = c 1 . If we choose a parameter that satisfies the condition of r * > 1 , even with suboptimal initial clustering center selection, the algorithm can still produce satisfactory clustering results through its self-regulating nature.
(3) This study suggests that selecting parameter values leading to r * greater than 1 is optimal due to the possibility of some clusters merging or overlapping when r * = 1 .
(4) The convergence rate of the KLFCM clustering algorithm can be estimated using the spectral radius of the Jacobian matrix at the convergence point.

5.2. Discussion of λ

As with other clustering algorithms, the KLFCM algorithm’s outcomes can be influenced by parameter values. When the λ value is too large, clustering results may exhibit partial cluster overlap. In cases where the parameter values are exceedingly large, the algorithm may produce outputs where all samples belong to one cluster due to its inability to differentiate significant overlapping clusters. This issue arises regardless of the initial class center selection of the KLFCM algorithm. Hence, discussing the algorithm’s self-annealing behavior becomes irrelevant when the parameter values are excessively large, (see Figure 1 and Figure 2).

5.3. Discussion of Self-Annealing Property

Based on the experimental results, it was observed that selecting parameters leading to r * greater than 1 allows the KLFCM algorithm to produce satisfactory clustering outcomes, even with suboptimal initial center selection or cases where the cluster centers are very close to each other. For instance, in Example 1, when λ is less than 3.4, all values satisfy the condition of r * > 1 . As a result, in these scenarios, the data were perfectly partitioned into different clusters. The experimental results suggest that the KLFCM clustering algorithm can exhibit self-annealing properties when certain conditions are met in parameter selection, (see Figure 2 and Figure 3).

5.4. Discussion of Parameter Selection

It is evident that when the chosen parameter values meet the condition of r * > 1 , the clustering algorithm’s output results exhibit strong interpretability and acceptability. When the selected parameter leads to r * = 1 , the initial cluster center selection can impact the algorithm’s clustering results. For example, in the case of the Iris dataset, we employed two different cluster center initialization methods under the same parameter conditions (e.g., λ = 5 ). Initializing the clustering centers using the K-means algorithm ensured that they were uniformly distributed across various data regions, resulting in more reliable clustering outcomes. Therefore, when investigating parameter selection issues, we used the K-means algorithm output as initialization for the KLFCM algorithm. From the findings presented in Table 4, we discovered that even in some cases where r * = 1 , we could still identify partial underlying data features through the clustering results, (see Figure 4, Table 3 and Table 4).
It should be emphasized that for datasets from UCI databases in Table 2, the majority of them meet the condition of r * > 1 when λ = 2 . In previous literature, it has been suggested that when λ = 2 , the KLFCM algorithm is equivalent to the EM algorithm for Gaussian mixtures. Our prior work has illustrated the self-annealing property of the EM algorithm, which aligns with the research outcomes presented in this paper.

5.5. Discussion of Convergence Rate

The degree of fuzziness in KLFCM algorithm clusters can be managed by adjusting the value of lambda. When λ increases, the fuzziness of the clusters also increases, leading to less distinct and more blurred cluster results that are harder to interpret. Furthermore, as lambda increases, the algorithm takes longer to converge due to the greater complexity of the optimization problem. Our experimental findings align with this phenomenon, (see Table 5).

6. Conclusions

Since its inception by Ichihashi and Honda, the KLFCM algorithm has become one of the most widely used clustering models. In this article, we perform convergence and parameter analysis of the KLFCM clustering algorithm using the Jacobian matrix. Our findings suggest that the coincident clustering result is not a stable fixed point of KLFCM clustering. Additionally, we discovered an interesting pattern in most datasets where λ = 2 results in r * > 1 . While our research includes theoretical analysis, it also has practical implications for the application of the KLFCM algorithm. Not only did we provide a theoretical basis for the self-annealing behavior of the KLFCM algorithm, but we also proposed an initialization selection strategy for algorithm parameters. The ideal parameter value should avoid r * > 1 .
The primary advantage of our proposed analysis method is that we can use Jacobian matrix-based analysis to evaluate any clustering algorithm with an iterative update formula. However, this method has its limitations. For example, we only considered consistent clustering results, but when the parameters are not appropriately selected, there may be some clusters merging in the output. Clearly, while the KLFCM algorithm has self-annealing properties, it may not always produce good results, especially when the parameters are not initialized properly. Therefore, future research must focus on developing better parameter selection strategies to ensure that clustering results accurately reflect the underlying structure of the data.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62006257).

Data Availability Statement

The dataset used for this work is sourced from the UCI database. It can be found at https://archive.ics.uci.edu/ml/index.php (accessed on 12 January 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sathiyasek, K.; Karthick, S.; Kanniya, P.R. The survey on various clustering technique for image segmentation. Int. J. Adv. Res. Electron. Comm. Eng. 2014, 3, 1. [Google Scholar]
  2. Ghosal, A.; Nandy, A.; Das, A.K.; Goswami, S.; Panday, M. A short review on different clustering techniques and their applications. Emerg. Technol. Model. Graph. 2018, 2020, 69–83. [Google Scholar]
  3. Ghosh, S.; Dubey, S.K. Comparative analysis of K-means and fuzzy C-means algorithms. Int. J. Adv. Comput. Sci. Appl. 2013, 4, 35–39. [Google Scholar] [CrossRef]
  4. Everitt, B.S. Cluster Analysis; Halstead Press: London, UK, 1974. [Google Scholar]
  5. Zadeh, L.A. Fuzzy sets. Inf. Control 1954, 8, 338–356. [Google Scholar] [CrossRef]
  6. Dunn, J.C. A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters. J. Cybern. 1974, 3, 32–57. [Google Scholar] [CrossRef]
  7. Ichihashi, H.; Miyagishi, K.; Honda, K. Fuzzy c-means clustering with regularization by KL information. In Proceedings of the 10th IEEE International Conference on Fuzzy Systems (Cat. No. 01CH37297), Melbourne, VIC, Australia, 2–5 December 2001; Volume 2, pp. 924–927. [Google Scholar]
  8. Karlekar, A.; Seal, A.; Krejcar, O.; Gonzalo-Martin, C. Fuzzy k-means using non-linear s-distance. IEEE Access 2019, 7, 55121–55131. [Google Scholar] [CrossRef]
  9. Zhao, X.; Nie, F.; Wang, R.; Li, X. Improving projected fuzzy K-means clustering via robust learning. Neurocomputing 2022, 491, 34–43. [Google Scholar] [CrossRef]
  10. Yang, X.; Zhu, M.; Sun, B.; Wang, Z.; Nie, F. Fuzzy C-Multiple-Means Clustering for Hyperspectral Image. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5503205. [Google Scholar] [CrossRef]
  11. Seal, A.; Karlekar, A.; Krejcar, O.; Gonzalo-Martin, C. Fuzzy c-means clustering using Jeffreys-divergence based similarity measure. Appl. Soft Comput. 2020, 88, 106016. [Google Scholar] [CrossRef]
  12. Honda, K.; Ichihashi, H. Regularized linear fuzzy clustering and probabilistic PCA mixture models. IEEE Trans. Fuzzy Syst. 2005, 13, 508–516. [Google Scholar] [CrossRef]
  13. Gharieb, R.R.; Gendy, G. Fuzzy C-means with a local membership KL distance for medical image segmentation. In Proceedings of the Cairo International Biomedical Engineering Conference (CIBEC), Giza, Egypt, 11–13 December 2014; pp. 47–50. [Google Scholar]
  14. Zhang, H.; Wu, Q.M.J.; Nguyen, T.M. A robust fuzzy algorithm based on student’s t-distribution and mean template for image segmentation application. IEEE Signal Process. Lett. 2012, 20, 117–120. [Google Scholar] [CrossRef]
  15. Li, X.L.; Chen, J.S. Region-Based Fuzzy Clustering Image Segmentation Algorithm with Kullback-Leibler Distance. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 4, 27–31. [Google Scholar] [CrossRef]
  16. Amira, O.; Zhang, J.S.; Liu, J. Fuzzy c-means clustering with conditional probability based K–L information regularization. J. Stat. Comput. Simul. 2021, 91, 2699–2716. [Google Scholar] [CrossRef]
  17. Yun, S.; Zanetti, R. Clustering Methods for Particle Filters with Gaussian Mixture Models. IEEE Trans. Aerosp. Electron. Syst. 2021, 58, 1109–1118. [Google Scholar] [CrossRef]
  18. Yu, J.; Chaomurilige, C.; Yang, M.S. On convergence and parameter selection of the EM and DA-EM algorithms for Gaussian mixtures. Pattern Recognit. 2018, 77, 188–203. [Google Scholar] [CrossRef]
  19. Chaomurilige; Yu, J.; Yang, M.S. Analysis of parameter selection for Gustafson–Kessel fuzzy clustering using Jacobian matrix. IEEE Trans. Fuzzy Syst. 2015, 23, 2329–2342. [Google Scholar] [CrossRef]
  20. Chaomurilige; Yu, J.; Yang, M.S. Deterministic annealing Gustafson-Kessel fuzzy clustering algorithm. Inf. Sci. 2017, 417, 435–453. [Google Scholar] [CrossRef]
  21. Chaomurilige, C.; Yu, J.; Zhu, J. Analysis of Convergence Properties for Gath-Geva Clustering Using Jacobian Matrix. In Proceedings of the Chinese Conference on Pattern Recognition, Chengdu, China, 5–7 November 2016. [Google Scholar]
  22. Sadaaki, M.; Masao, M. Fuzzy c-means as a regularization and maximum entropy approach. In Proceedings of the 7th International Fuzzy Systems Association World Congress (IFSA’97), Prague, Czech Republic, 25–29 June 1997; Volume 2, pp. 86–92. [Google Scholar]
  23. Olver, P.J. Lecture Notes on Numerical Analysis. Available online: http://www.math.umn.edu/~olver/num.html (accessed on 1 March 2023).
  24. Roubens, M. Pattern classification problems and fuzzy sets. Fuzzy Sets Syst. 1978, 1, 239–253. [Google Scholar] [CrossRef]
Figure 1. KLFCM Clustering Results with Different λ for Iris Dataset.
Figure 1. KLFCM Clustering Results with Different λ for Iris Dataset.
Mathematics 11 02285 g001
Figure 2. KLFCM Clustering Results with Inappropriate Cluster Center Initialization for Iris Dataset.
Figure 2. KLFCM Clustering Results with Inappropriate Cluster Center Initialization for Iris Dataset.
Mathematics 11 02285 g002aMathematics 11 02285 g002b
Figure 3. Clustering Results of the Data-art Dataset with Different λ .
Figure 3. Clustering Results of the Data-art Dataset with Different λ .
Mathematics 11 02285 g003
Figure 4. The Clustering Results of KLFCM Algorithm with Different Initialization Methods on the Iris Dataset.
Figure 4. The Clustering Results of KLFCM Algorithm with Different Initialization Methods on the Iris Dataset.
Mathematics 11 02285 g004
Table 1. Mixing Proportions, Means Values and Variances of Gaussian Mixture Models to Generate Data-art.
Table 1. Mixing Proportions, Means Values and Variances of Gaussian Mixture Models to Generate Data-art.
Mixing ProportionsMean ValuesVariances
α 1 = 1 / 3 m 1 = 1 , 1 , 2 1 = 1 0 0 0 1 0 0 0 1
α 2 = 1 / 3 m 2 = 5 , 3 , 0.5 1 = 0.5 0 0 0 0.1 0 0 0 1
α 3 = 1 / 3 m 3 = 2 , 6 , 5 1 = 0.5 0 0 0 0.1 0 0 0 2
Table 2. Experiments on Real Datasets from UCI Databases.
Table 2. Experiments on Real Datasets from UCI Databases.
DatasetsSample No. nFeature No. sCluster No. c r * λ = 2
Iris150432.08364
Cloud102410350.07438
Wine1781339.94855
Haberman’s Survival306321.49047
seeds210734.76313
Sonar2086029.07394
Balance Scale625431.00000
Table 3. Spectral Radius r * of the Jacobian Matrix Corresponding to Different λ .
Table 3. Spectral Radius r * of the Jacobian Matrix Corresponding to Different λ .
λ 233.544.55678910
Iris2.083641.389091.190651.041821.00001.00001.00001.00001.00001.00001.0000
Cloud50.0743833.3829228.6139325.0371922.2552820.0297516.6914614.3069712.5186011.1276410.01488
Wine *9.948556.632365.684884.974274.421583.97942
Haberman’s
Survival7.452354.968234.258483.726173.312152.980942.484122.129241.863091.656081.49047
seeds4.763133.175422.721792.381572.116951.905251.587711.360891.190781.058471.0000
Sonar45.3697030.2464725.9255422.6848520.1643118.1478815.1232312.9627711.3424310.082169.07394
Balance Scale1.00001.00001.00001.00001.00001.00001.00001.00001.00001.00001.0000
* It should be mentioned that the FLFCM clustering algorithm may fail to cluster data for the Wine dataset when λ 6 .
Table 4. NFI Values Corresponding to Different λ for Different Real Datasets.
Table 4. NFI Values Corresponding to Different λ for Different Real Datasets.
λ 233.544.55678910
Iris0.97490.98110.69780.58710.50790.50490.49560.36560.25150.25120.2511
Cloud0.88460.81700.77690.74400.71100.67710.70640.65710.59520.73680.5581
Wine *0.96180.89980.92560.95870.96000.9806
Haberman’s Survival0.88610.73230.65480.58270.51790.37520.27840.23720.22560.19730.4192
seeds0.99560.96730.98320.97730.96910.94580.91920.94170.78180.95460.8540
Sonar0.99990.99900.99980.99961.00000.99960.99690.99990.99490.99320.9946
Balance Scale0.77940.08950.12490.07910.07090.12580.01970.00530.00570.04580.0026
* It should be mentioned that the FLFCM clustering algorithm may fail to cluster data for the Wine dataset when λ 6 .
Table 5. Spectral radius r S R of the Jacobian matrix corresponding to different values of λ .
Table 5. Spectral radius r S R of the Jacobian matrix corresponding to different values of λ .
Data λ SR Corresponding to Different λ
Iris λ 22.533.544.5
r S R 0.56550.86220.92210.94840.98571.0000
Cloud λ 2344.556
r S R 0.85050.87130.89450.91640.95540.9701
Wine λ 22.533.544.5
r S R 0.69950.79830.71980.95590.75260.8223
Haberman’s Survival λ 234567
r S R 0.37860.62120.79400.94690.90670.9339
seeds λ 234567
r S R 0.40510.83570.82550.80930.86320.8777
Sonar λ 234567
r S R 0.10660.59200.11180.25860.47630.8928
Balance Scale λ 234567
r S R 0.89631.00001.00001.00001.00001.0000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chaomurilige. How KLFCM Works—Convergence and Parameter Analysis for KLFCM Clustering Algorithm. Mathematics 2023, 11, 2285. https://doi.org/10.3390/math11102285

AMA Style

Chaomurilige. How KLFCM Works—Convergence and Parameter Analysis for KLFCM Clustering Algorithm. Mathematics. 2023; 11(10):2285. https://doi.org/10.3390/math11102285

Chicago/Turabian Style

Chaomurilige. 2023. "How KLFCM Works—Convergence and Parameter Analysis for KLFCM Clustering Algorithm" Mathematics 11, no. 10: 2285. https://doi.org/10.3390/math11102285

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop