Revisiting Possibilistic Fuzzy C-Means Clustering Using the Majorization-Minimization Method

Possibilistic fuzzy c-means (PFCM) clustering is a kind of hybrid clustering method based on fuzzy c-means (FCM) and possibilistic c-means (PCM), which not only has the stability of FCM but also partly inherits the robustness of PCM. However, as an extension of FCM on the objective function, PFCM tends to find a suboptimal local minimum, which affects its performance. In this paper, we rederive PFCM using the majorization-minimization (MM) method, which is a new derivation approach not seen in other studies. In addition, we propose an effective optimization method to solve the above problem, called MMPFCM. Firstly, by eliminating the variable V∈Rp×c, the original optimization problem is transformed into a simplified model with fewer variables but a proportional term. Therefore, we introduce a new intermediate variable s∈Rc to convert the model with the proportional term into an easily solvable equivalent form. Subsequently, we design an iterative sub-problem using the MM method. The complexity analysis indicates that MMPFCM and PFCM share the same computational complexity. However, MMPFCM requires less memory per iteration. Extensive experiments, including objective function value comparison and clustering performance comparison, demonstrate that MMPFCM converges to a better local minimum compared to PFCM.


Introduction
Cluster analysis is one of the most important topics in machine learning [1], and it has been widely applied in various fields, such as data mining [2], image processing [3], and pattern recognition [4].Clustering algorithms are a type of unsupervised learning method, aiming to partition datasets into multiple clusters based on similarity measures, ensuring that samples within the same cluster are similar [5,6].
Generally, clustering algorithms are divided into hard and soft clustering schemes [7].The c-means algorithm [8] is the most widely used hard clustering method due to its speed and simplicity.However, it is sensitive to initial points and often falls short of finding an optimal solution.Xu et al. [9] explored an alternative to the c-means algorithm that retains its simplicity while mitigating its tendency to become trapped in local minima, called power k-means.The optimization objective of this method is a power mean function of the distances between sample points and cluster centers, leading to its non-convex and high-dimensional characteristics, which makes it difficult to solve directly.Therefore, the majorization-minimization (MM) method is used to derive a descent scheme.In this paper, the MM method will be employed for the first time to solve the possibilistic fuzzy c-means clustering problem.
As one of the most typical soft clustering methods, fuzzy c-means (FCM) [10,11] is a fuzzy extension method of c-means, which divides n data points into c clusters by the membership grade matrix U. Compared to c-means, FCM achieves better clustering performance on datasets with overlapped clusters.Due to its flexibility and robustness, FCM has been favored by researchers up to now.However, it also suffers from some drawbacks.The first is that this method is sensitive to initialization and may easily become trapped in 2. Due to the presence of a proportional term in the derived simplified optimization problem, we further transform it into an easily solvable equivalent form by introducing a new intermediate variable s.Then, the MM method is employed to design an iterative sub-problem.We refer to this method as MMPFCM.3. The complexity analysis indicates that MMPFCM and PFCM share the same computational complexity.However, MMPFCM utilizes the intermediate variable s of c × 1 size instead of the variable V of c × p size to update U and T, resulting in smaller space complexity.4. It is theoretically proven that when the inner loop of MMPFCM is executed only once, MMPFCM degenerates to the original PFCM method.5. Experimental studies show that MMPFCM obtains better local minima compared to PFCM.In addition, compared with other state-of-the-art clustering methods, MMPFCM also shows its superiority.
The rest of this paper is organized as follows.Some preliminaries including notations, PFCM, and the MM method are given in Section 2. In Section 3, we rederive PFCM using the MM method, and another effective optimization method (MMPFCM) is presented in Section 4. A theoretical analysis is given in Section 5.In Section 6, the experimental results and discussions are reported.Section 7 concludes this paper.

Related Works 2.1. Notations
Let X = [x 1 , x 2 , • • •, x n ] ∈ R p×n denote the data matrix, where x i ∈ R p is the i-th sample for i = 1, 2, • • •, n, and x T i denotes the transpose of vector ] ∈ R p×c denote the cluster center matrix, where v j ∈ R p is the centroid of the cluster C j for j = 1, 2, • • •, c.The fuzzy membership matrix is denoted by U = u ij n×c , where u ij represents the fuzzy membership degree of the i-th sample to the j-th cluster.The possibilistic membership matrix is denoted by T = t ij n×c , where t ij represents the possibilistic membership degree of the i-th sample to the j-th cluster.u j and t j are the j-th columns of U and T, respectively.u (k) j and t (k) j denote u j and t j at the k-th iteration, respectively.In addition, e is a column vector with all elements equal to 1. f (θ|θ t ) represents the function f evaluated at θ given the condition or parameter value θ t .

Possibilistic Fuzzy C-Means Clustering
PFCM [20] is a kind of hybrid clustering method based on PCM and FCM, which contains both membership and typicality components.Its objective function is as follows: Here, the membership weights a ∈ (0, +∞) and b ∈ (0, +∞) control the roles of the fuzzy membership degree and possibilistic membership degree.Furthermore, the exponential factors of fuzzy membership and possibilistic membership are m ∈ (1, +∞) and q ∈ (1, +∞), respectively.γ j is a penalty factor that is usually determined by [20]: where K > 0, with the most common choice being K = 1.PFCM usually updates the fuzzy membership u ij , the possibilistic membership t ij , and the center v j alternatively by When the membership weight a in the objective function of PFCM is 0, we realize PCM within the PFCM framework, and then Equations (3) and ( 4) are the iterative functions, respectively.When the membership weight b in the objective function of PFCM is 0, we realize FCM in the PFCM framework, and then Equations ( 2) and ( 4) are the corresponding iterative functions.

Majorization-Minimization Method
The majorization-minimization (MM) method [31] is an iterative optimization algorithm.It works by constructing a surrogate function that is simpler to minimize.
Consider the following optimization problem: where Θ is a nonempty closed set in R n and g : Θ → R is a continuous function.The MM method iteratively reduces a series of surrogate functions f (θ|θ t ) majorizing the objective function g(θ) at the current iterate θ t .The principle of majorization involves two key aspects: first, equality is achieved between the surrogate function and the objective function at the current iterate θ t , denoted as f (θ t |θ t ) = g(θ t ); and second, the surrogate function dominates the objective function, meaning f (θ|θ t ) ≥ g(θ) for all θ.
The update rule of the MM method is defined as follows [9]: which implies the descent property By successively minimizing these surrogate functions, we aim to converge to a local minimum of the original optimization problem.This process continues until convergence is achieved or another termination criterion is met.It finds wide applications in various fields, including machine learning [9], signal processing [32], and statistics [33].

Alternative Derivation Method for Possibilistic Fuzzy C-Means Clustering
In this section, we propose an alternative derivation method for PFCM, which interprets its iterative process using the MM method.Firstly, a simplified optimization model, which only contains the variables U and T, is obtained by eliminating V. Secondly, the MM method is applied to solve the new problem.Finally, we provide proof to demonstrate that this new derivation method is the same as the original PFCM method.

Formulation
Let F = U m , G = T q , and Z = aF + bG, and then according to Equation (4), we have Substituting v j into Equation (1), we have min Because the second term in problem (5) involves a proportional term, it makes problem (5) difficult to solve directly.Therefore, we introduce the MM method to optimize this equivalent optimization problem, and the next primary issue is to find a surrogate function that is easier to optimize.Considering that is a convex function (as proven in Appendix A), then its opposite function is concave.
Based on the property of concave functions, which states that the tangent line of a concave function at any point lies above the graph at that point, the following inequality holds: where is the first-order derivative function of ϕ z j with respect to z j at the current iterate . Therefore, − ω (k) j T z j is further optimized as the surrogate function for −ϕ z j .Substituting this surrogate function into problem (5), we have min Further, the update formulas for the variables U and T are derived using the alternating iteration method.

Optimization Procedure
We optimize problem (8) with respect to one variable, with the other variables being fixed, which leads to the following two sub-problems.
Firstly, fixing the variable T, the U-update step involves minimizing problem (8) with respect to the variable U, which can be expressed as follows: We use the Lagrange multiplier method to derive the iterative function of u ij as follows: Secondly, fixing the variable U, the T-update step involves minimizing problem (8) with respect to the variable T, which can be expressed as follows: Taking the derivative of Equation ( 11) with respect to t ij and setting it to zero, we have Obviously, according to Equation ( 7), we know that w j is dependent on U and T. When we obtain U and T by calculating Equations ( 10) and (12), U and T are repeatedly utilized to update w j using Equation (7).This means that we can calculate U, T, and w j iteratively.The detailed process for solving problem (5) is summarized in Algorithm 1.

Algorithm 1 Alternative derivation method for PFCM
1: Input X and c.Next, we provide proof to demonstrate that this new derivation method is the same as the original PFCM method, as shown in Theorem 1.
Theorem 1. Algorithm 1 is equivalent to the original PFCM algorithm.
Proof.From Equation (7), we can further deduce the value of each item in matrix W as follows: and then we have Substituting Equation (14) into Equation (10), we observe that Equation (10) is the same as Equation (2).Similarly, substituting Equation ( 14) into Equation ( 12), we observe that Equation ( 12) is the same as Equation (3).Therefore, based on the above analysis, Algorithm 1 is equivalent to the original PFCM algorithm, as their final iterative functions of U and T are the same.
To sum up, for the original PFCM problem, we provide a solution using the MM method by proving the convexity of the proportional term in problem (5).This represents a new derivation approach not found in other studies.

Majorization-Minimization Method for Possibilistic Fuzzy C-Means Clustering
In this section, another method is proposed to optimize the problem (5), called MMPFCM.Compared to PFCM, MMPFCM obtains better local minima.

Formulation
Here, we first present a lemma that introduces a simple optimization problem.In this lemma, a new variable s is introduced to eliminate the ratio term in the optimization problem, transforming the original optimization problem into an easily solvable equivalent form.The technique of introducing new variables in the lemma provides a basis for the equivalent transformation of the optimization model in this section.
According to Lemma 1, let µ j = z T j e and η j = z T j X T Xz j , and we have Therefore, in order to eliminate the proportional term in problem (5), we introduce a new variable s = (s 1 , s 2 , • • •, s c ) T , and then problem (5) can be transformed into min (U,T,s) Before solving problem (16), the following theorem provides the equivalence proof between problem (5) and problem (16).
Proof.In order to prove that those two problems are equivalent, we substitute s j in Equation (15) into problem (16), and it can be immediately concluded that problem.( 16) is equivalent to problem (5).
Problem ( 16) also involves three variables U, T, and s.However, compared to the variable V of p × c size in PFCM, the variable s in MMPFCM is a vector with c elements, resulting in smaller space complexity.Next, the alternating iteration method is used to solve problem (16).

Optimization Procedure
We optimize problem ( 16) with respect to one variable, with the other variables being fixed, which leads to the following three sub-problems.
Firstly, when the variables U and T are fixed, the s-update step is the minimization of problem (16) with respect to the variable s.Because the c components of s are separable, we have the following optimization problem: Taking the derivative of Equation ( 17) with respect to s j and setting the derivative value to zero, we obtain s j , as shown in Equation (15).Secondly, when the variable s is fixed, the optimization problem involving the variables U and T can be denoted as follows: Since problem (18) involves an optimization objective with a square root, the derivative for direct optimization is quite complex.Therefore, we continue to rely on the principles of the MM method to search for surrogate functions as a further optimization model.It is obvious that X T X is a positive semidefinite matrix, so z T j X T Xz j is a convex function about z j .Then, we can immediately conclude that − z T j X T Xz j is a concave function about z j .Concavity supplies the linear majorization , where is the first-order derivative function of z T j X T Xz j with respect to z j at the current iterate . Therefore, − α (k) j T z j is chosen as the surrogate function for − z T j X T Xz j .Substituting this surrogate function into problem (18), we have min Further, fixing the variables T, the U-update step involves minimizing problem (20) with respect to the variable U, which can be expressed as follows: u ij is solved using the Lagrange multiplier method, and we have Then, fixing the variables U, the T-update step involves minimizing problem (20) with respect to the variable T, which can be expressed as follows min Taking the derivative of Equation ( 23) with respect to t ij and setting it to zero, we have with 19), we find that α j is changed by the change in U and T. So, when we calculate U using Equation ( 22) and T using Equation ( 24), α j is updated accordingly, which means that we can calculate α j , U, and T iteratively.This process continues until convergence is achieved or another termination criterion is met.
What needs to be pointed out is that this loop is nested within the loop of the proposed method.Therefore, we refer to this loop as the inner loop of the algorithm, with its termination condition set as the maximum number of iterations of the loop being less than or equal to K. It should be noted that this paper selects K = 5 as the reference value for the comparative experiments, as detailed in Section 6.
In summary, a new optimization method, called MMPFCM, is summarized in Algorithm 2.
Algorithm 2 Majorization-minimization method for possibilistic fuzzy c-means clustering (MMPFCM)    As is well known, PFCM is an extension of FCM and PCM.Therefore, MMPFCM can also be interpreted as a generalization of MMFCM and MMPCM, where MMFCM and MMPCM are two cases when b = 0 or a = 0 in MMPFCM, respectively.Relevant experiments regarding these two methods are provided in Appendix B.

An Interesting Observation
Algorithm 2 consists of two nested loops: an inner loop and an outer loop.Through analysis, we have made an interesting observation, that is, when the inner loop is executed only once, Algorithm 2 degenerates to the original PFCM method.For a detailed analysis, refer to Theorem 3. Theorem 3. When K = 1 in Algorithm 2, Algorithm 2 is equivalent to the original PFCM method in this case.
Proof.If the inner loop is executed only once, then Algorithm 2 contains one loop, and its detailed steps are as follows: calculate s j using Equation (15), calculate α j using Equation (19), calculate u ij using Equation (22), and calculate t ij using Equation (24).
From Equation ( 19), we can further deduce the value of each item in matrix A as follows: Then, substituting Equations ( 15) and ( 25) into Equation ( 22), we have which is the same as Equation ( 2).Similarly, substituting Equations ( 15) and ( 25) into Equation ( 24), we observe that the obtained t (k+1) ij is similar to Equation (3).Therefore, based on the above analysis, when the inner loop is executed only once in Algorithm 2, Algorithm 2 degenerates to the original PFCM method.
In addition, Theorem 1 proves that Algorithm 1 is equivalent to the original PFCM method, so when K = 1 in Algorithm 2, Algorithm 2 is also equivalent to Algorithm 1.

Convergence Analysis
If we want to prove the convergence of Algorithm 2, it is essential to initially prove the convergence of the inner loop in Algorithm 2. Therefore, let us first direct our attention to proving the convergence of its inner loop.Theorem 4. The inner loop in Algorithm 2 will decrease the objective value of problem (18) in each iteration until it converges.
Proof.In Equation (20), let Then, Equation (20 Let Ū and T be the updated U and T in each interation, and then we have Furthermore, according to z T j X T Xz j is convex about z j , and convexity supplies the linear majorization Further, multiplying both sides by ∑ j 2s j , we have Adding both sides of Inequality ( 26) and ( 27) yields Therefore, the inner loop in Algorithm 2 decreases the objective value of problem (18) in each iteration until it converges.
In conclusion, since the inner loop in Algorithm 2 converges, Algorithm 2 also converges.

Complexity Analysis
For the computational complexity of MMPFCM, since the multiplication operation of matrices generally requires more time than simple addition operations, we focus solely on the multiplication operation.
In Algorithm 2, for Step 4, computing For Step 8, we need O(np + nc) to calculate T. Therefore, the total computational complexity of MMPFCM is O(((2npc + 2nc + 2np + pc)t 1 + npc + pc + nc)t 2 ), where t 1 and t 2 are defined as the number of iterations of the inner loop and the outer loop in Algorithm 2, respectively.
As a consequence, MMPFCM has the same linear complexity with respect to the number of samples, i.e., O(npct 1 t 2 ).Next, we verify the effectiveness of the proposed method through experiments.

Experiments
To verify the effectiveness and clustering performance of the proposed algorithm, experimental studies are conducted on twelve real-world datasets, all selected from the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/index.php,accessed on 26 February 2024).The specifics of these datasets are outlined in Table 1.All the experiments are run on a personal computer with an Intel Core i5-6500 processor and a maximum memory of 16 GB for all processes.The computer operates on Windows 10 with MATLAB R2017b.The convergence criterion is set as |obj(it) − obj(it − 1)| < 10 −5 , where obj(it) represents the objective function value at the end of the it-th iteration.The results are provided in tables and figures to verify the superiority of the proposed method.

Evaluation Metrics
To evaluate the performance of the proposed algorithm, four external metrics, including the overall F-measure for the entire dataset (F * ), Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and purity, are used to measure the agreement between the ground truth and the clustering results produced by the algorithm [14,34,35].For all four metrics, higher scores correspond to improved clustering quality.
Metrics that do not require the labels of data are used for the performance evaluation and are called internal metrics.Two internal validity metrics, including the DBI [36] and XB [37], are selected here.It is worth noting that for both metrics, smaller values indicate better clustering performance [5].
The objective function value, time, and number of iterations are the remaining three evaluation metrics used to indicate the efficiency of the algorithms.

Setting of the Iterations in the Inner Loop
Since the setting of the iterations for the inner loop in Algorithm 2 is the primary issue that needs to be addressed for the execution of this method, in this subsection, we investigate the impact of different K on the clustering performance of the algorithm in order to select the appropriate number of iterations for the subsequent comparative experiments.
To study the effect of K on MMPFCM, we set the values of K as 1, 2, 5, 8, 10, and 30, respectively.MMPFCM is executed 10 times under random initializations, with the clarification that the initializations are the same for different values of K. We record the mean and standard deviation of F*, as well as the purity, across 10 experiments for different values of K.The results are shown in Figure 1.Although the curves in Figure 1 exhibit minor fluctuations under different values of K, they generally show an upward trend.Additionally, It can be observed that the clustering performance of the algorithm shows a significant improvement when the number of iterations in the inner loop is set within the range of 5 to 10.Therefore, we can choose any number within this range to execute MMPFCM.This new update method increases the flexibility of the original model, allowing for similar fine-tuning to achieve better performance.It should be noted that for convenience, this paper selects K = 5 as the reference value for all subsequent comparative experiments.

Comparison between PFCM and MMPFCM
Nie et al. [38] mentioned that a bad local minimum makes the objective value not small enough, which limits the algorithm's performance.Based on this observation, the first set of experiments is carried out to evaluate the performance of MMPCM, focusing on the objective J PFCM (U, T, V) in Equation (1).To more intuitively compare the perfor- mance of MMPFCM and PFCM, we run both algorithms 10 times under the same random initializations and record their objective function values.Then, we calculate the difference in the objective function values, defined as J MMPFCM -J PFCM .Figure 2 presents a box plot of these differences across twelve real-world datasets.The green plus sign on each box plot indicates the mean of the differences.The red plus sign indicates the outliers.The red horizontal line represents the median.The length of the box represents the interquartile range (IQR), which is the range from the first quartile (Q1) to the third quartile (Q3) and shows the middle 50% of the data distribution.The whiskers extend from the box to the highest and lowest values, excluding outliers.Typically, the length of the whiskers extends up to 1.5 times the IQR from the edges of the box.As shown in Figure 2, based on the same optimization objective function and the same random initialization conditions, the green plus signs are all below the horizontal 0 line, indicating that the mean difference in objective function values between MMPFCM and PFCM is less than 0 for these datasets.In other words, the average objective function value of MMPFCM is lower than that of PFCM.The red horizontal lines represent the median and are also below the horizontal 0 line for most datasets.This further demonstrates that the median objective function value of MMPFCM is lower than that of PFCM for most datasets, indicating better performance.Additionally, except for the COIL20 dataset, the boxes for the other datasets are below the horizontal 0 line.Since the difference is defined as J MMPFCM -J PFCM , the boxes being mostly below 0 indicates that the objective function value of MMPFCM is generally lower than that of PFCM.From this analysis, we can conclude that the proposed method achieves better local optimal solutions under the same initialization conditions.
The second set of experiments involves comparing the clustering performance of these two methods.Given that both the fuzzy membership U and the possibilistic membership T in PFCM can reflect the degree of membership of a point belonging to a particular cluster, we record the DBI and XB calculated by U and T for PFCM and MMPFCM in this set of experiments.The results are listed in Table 2 and Table 3, respectively.From Tables 2 and 3, it is evident that regardless of whether the experimental results are obtained through U or T, the DBI and XB of MMPFCM are either less than or equal to those of PFCM, indicating that the proposed method exhibits better clustering performance on these twelve datasets.Moreover, the smaller standard deviation of MMPFCM also demonstrates its stability under the same initialization conditions.
The convergence curves of PFCM and MMPFCM on twelve real-world datasets are shown in Figure 3.The figures illustrate that both algorithms exhibit monotonically decreasing objective values over time.However, MMPFCM obtains better local minima on the SCADI, COIL20, ORL, Yale64, Isolet5, and Urban datasets.Furthermore, the time taken by both methods is at the same linear scale due to their identical linear complexities.
IRWFCM and IRWERFCM are chosen because they utilize a novel method, namely the iteratively re-weighted method, to optimize FCM-type problems, and their advantages lie in their ability to achieve better local minima with fewer iterations.EPFCMR is a generalized entropy-based PFCM, which utilizes functions of distance, not the distance itself, to decrease noise contributions on the cluster centers.FW-S-PFCM introduces a feature-weighted method and a "suppressed competitive learning" strategy into the PFCM model, resulting in improved clustering performance.Additionally, it reduces the number of iterations, sensitivity to membership weights, and initializations of PFCM.
To facilitate a comprehensive comparison between MMPFCM and other algorithms, four external metrics on twelve real-world datasets are presented in Table 4.The values are averaged over 10 trials with random initializations.The standard deviations are given after the means, and the best results are shown in bold.Note that all experimental results are calculated by the obtained U for different methods.In addition, the corresponding running times of the PFCM-type algorithms are illustrated in Figure 4.  From the experimental results in Table 4 and Figure 4, the following conclusions can be made: 1. Comparing the fourth and last columns of each dataset, MMPFCM consistently outperforms PFCM across all four clustering evaluation metrics on ten datasets.In addition, MMPFCM outperforms PFCM in terms of the ARI and FM on the ORL and USPS datasets, and MMPFCM outperforms PFCM in terms of purity on the ORL dataset.These results indicate the superiority of the proposed method under the same initialization conditions.2. PFCM-type clustering algorithms have better clustering results than FCM-type clustering algorithms on the SCADI, balance, Yale32, Yale64, Iris, and USPS datasets.This is because PFCM-type clustering algorithms are better equipped to handle data with noise and outliers.3. The total running time of FW-S-PFCM is the lowest on these twelve datasets, but this is achieved under the condition of tuning more hyperparameters.The time taken by MMPFCM and PFCM is at the same linear level, which confirms that their time complexities have the same linear relationship.

Conclusions
This article mainly revisits PFCM using the MM method.By eliminating the variable V, we obtain a simplified model with fewer variables, and then we provide a solution using the MM method by proving the convexity of the proportional term in this model.Through analysis, the new derivation method is shown to be equivalent to the original PFCM.In addition, we introduce a new intermediate variable s to transform the simplified model with a proportional term into an easily solvable equivalent form.Then, we design an iterative sub-problem using the MM method.For convenience, we refer to this method as MMPFCM.The complexity analysis indicates that MMPFCM and PFCM share the same computational complexity.However, MMPFCM uses the intermediate variables s of c × 1 size instead of the variable V of p × c size to update U and T, resulting in smaller space complexity.Extensive experiments have shown that MMPFCM converges to a better local minimum compared to PFCM.In addition, this new updating approach enhances the flexibility of the original model, allowing for fine-tuning the number of iterations in the inner loop to achieve better performance.
In future work, we will try to apply this strategy to other PFCM-type clustering algorithms for the purpose of obtaining a better local minimum.In addition, since this method has the same computational complexity as the original PFCM, we are also committed to researching an accelerated version of MMPFCM to reduce its running time.
the objective function values of both algorithms under the same random initializations.A smaller value indicates that the corresponding method achieves a better local minimum.We run each method 10 times and record the Mean_obj and Std_obj, as shown in Table A1.Note that we use the terminal outputs of FCM as the initialization of PCM and MMPCM here.In Table A1, it can be observed that the objective values of MMPCM on the seven real-word datasets are all less than or equal to those of PCM, which leads to the conclusion that MMPCM achieves a better local minimum compared to PCM.
Then, we compare their clustering performance under the same random initializations.Table A2 lists the DBI on the real-world datasets for PCM and MMPCM.As shown in Table A2, the clustering results of MMPCM in terms of the DBI are either superior to or equal to those of PCM, validating that MMPCM has better clustering performance.The second part of the experiments involves comparing FCM and MMFCM.To assess the performance of both FCM and MMFCM in finding local minima, we measure the objective function values of both algorithms under the same random initializations.The experimental results are shown in Table A3.As shown in Table A3, the objective values of MMFCM on the eight real-word datasets are all less than or equal to those of FCM.Therefore, we can conclude that MMFCM converges to a better local optimum compared to FCM.Moreover, the smaller standard deviation of MMFCM also demonstrates its stability under the same initialization conditions.
Table A4 lists the DBI and XB for FCM and MMFCM on the real-world datasets.As illustrated in the table, the clustering results of MMFCM in terms of the DBI and XB are all superior to those of FCM, validating that MMFCM has better clustering performance compared to FCM.

until convergence 9 :
Output U and T. ;

Figure 1 .
Figure 1.Mean and standard deviation of F*, as well as Purity, for different values of K on five real-world datasets, where K is the number of iterations in the inner loop.(a) Mean and standard deviation of F*.(b) Mean and standard deviation of Purity.

Figure 2 .
Figure 2. Box plot of the differences in objective function values across twelve real-world datasets under the same initialization conditions.

Figure 3 .
Figure 3. Convergence curves of PFCM and MMPFCM on twelve real-world datasets, where the two methods share the same initialization.

Figure 4 .
Figure 4. Plot of the corresponding times of the different algorithms on twelve real-world datasets.

Table 2 .
DBI and XB calculated by U for PFCM and MMPFCM on real-world datasets.The values are averaged over 10 trials with random initializations.The standard deviations are given below the means, and the best results are shown in bold.

Table 3 .
DBI and XB calculated by T for PFCM and MMPFCM on real-world datasets.The values are averaged over 10 trials with random initializations.The standard deviations are given below the means, and the best results are shown in bold.

Table 4 .
Experimental results of different algorithms on real-world datasets.The values are averaged over 10 trials with random initializations.The standard deviations are given after the means, and the best results are shown in bold.

Table A2 .
DBI for PCM and MMPCM on real-world datasets.The values are averaged over 10 trials with random initializations.The standard deviations are given after the means, and the best results are shown in bold.

Table A4 .
DBI and XB for FCM and MMFCM on real-world datasets.The values are averaged over 10 trials with random initializations.The standard deviations are given after the means, and the best results are shown in bold.