A Clustering Algorithm based on Feature Weighting Fuzzy Compactness and Separation

Aiming at improving the well-known fuzzy compactness and separation algorithm (FCS), this paper proposes a new clustering algorithm based on feature weighting fuzzy compactness and separation (WFCS). In view of the contribution of features to clustering, the proposed algorithm introduces the feature weighting into the objective function. We first formulate the membership and feature weighting, and analyze the membership of data points falling on the crisp boundary, then give the adjustment strategy. The proposed WFCS is validated both on simulated dataset and real dataset. The experimental results demonstrate that the proposed WFCS has the characteristics of hard clustering and fuzzy clustering, and outperforms many existing clustering algorithms with respect to three metrics: Rand Index, Xie-Beni Index and Within-Between(WB) Index.


Introduction
Similar data belongs to a cluster, while different data belongs to different clusters [1][2][3].The fuzzy C-means (FCM) algorithm is a classical pattern recognition method [4], and many FCM-type OPEN ACCESS clustering algorithms were proposed [5,6].However, the between-cluster separation is ignored in these clustering techniques because these algorithms partition data points only by minimizing the distances between data points and cluster centers (i.e., the within-cluster compactness).Therefore, Wu et al. proposed a fuzzy compactness and separation (FCS) algorithm [7].The proposed FCS algorithm assigns a crisp boundary for each cluster so that hard memberships and fuzzy memberships can coexist in the clustering results.
For high dimensional dataset clustering, features of data are assigned weights which illustrate the importance degree of features.A major problem of un-weighted clustering algorithms lies in treating all features equally in the clustering process.Therefore, many contributions attempt to weight features with various methods and to optimize the FCM-type algorithms [8][9][10][11][12][13].Frigui and Nasraoui [8] proposed the simultaneous clustering and attribute discrimination algorithm, in which clustering and feature weighting can be performed simultaneously in an unsupervised manner; Wang et al. [9] discussed that the weight assignment can be given by learning according to the gradient descent technique; Jing et al. proposed an EWkmeans [10] which minimizes the within-cluster compactness and maximizes the negative weight entropy to stimulate more features contributing to the identification of a cluster; Wang et al. [11] presented a new fuzzy C-means algorithm with variable weighting (WFCM) for high dimensional data analysis; Wang et al. [12] put forward a feature weighting fuzzy clustering algorithm integrating rough sets and shadowed sets (WSRFCM); Deng et al. [13] introduced the between-cluster separation into the EWkmeans and proposed the enhanced soft subspace clustering (ESSC) algorithm.The WFCM and WSRFCM employ only the within-cluster compactness while updating the membership matrix and feature weights.ESSC uses a parameter to balance the within-cluster compactness and between-cluster separation.However, negative values may be produced in the membership matrix if the balancing parameter is too large.Therefore, to avoid the negative membership value, could be set zero.In this case, ESSC would degrade to the EWkmeans.
In the real world, some data points belong to a cluster strictly (i.e., hard clustering) and others belong to a cluster ambiguously (i.e., fuzzy clustering).For maximizing the between-cluster separation and minimizing the within-cluster compactness, we proposed a new feature weighting fuzzy compactness and separation (WFCS) algorithm with fusion of hard clustering and fuzzy clustering.The rest of this paper is organized as follows.Section 2 introduces both the FCS and the WFCS algorithms, addresses the flaw of FCS and discusses the adjustment of membership and feature weighting of WFCS.The proposed algorithm is evaluated in section 3. Finally, this paper is concluded and the future work is discussed in Section 4.
Table 1 illustrates the main symbols that appear in the following formulas.

The FCS and WFCS Algorithms
In this section, the FCS algorithm is reviewed and data points on the crisp boundary are discussed.Then we present the WFCS algorithm, demonstrate the formulas of the membership and feature weight and give the adjustment strategy of these formulas.
 


the parameter to control the influence of between-cluster separation 2.1.FCS Algorithm [7] The fuzzy within-cluster compactness FW S and the fuzzy between-cluster separation FB S are defined as: Objective function is formulated as: shown in Figure 1).The parameter i  guarantees that no two crisp kernels will overlap [7] and can be demonstrated as: where 0 1

By minimizing ,
FCS J we have: According to Equations ( 5) and ( 6), dataset X can be partitioned into c clusters by iteratively updating cluster centers and membership value.The data point in the th i crisp kernel belongs to the th i cluster strictly, which is called hard clustering.However, if a data point falls on the crisp boundary (see Figure 1), membership value ij  will be infinite.Hence, according to Equation ( 6) the FCS algorithm fails.

The Principle of WFCS
Aiming at clustering data more reasonably, we introduce feature weight into the FCS.Firstly, we define the feature weighting fuzzy within-cluster matrix WFW S and between-cluster matrix WFB S as follows: We extend the formula of i Based on Equation (7) and Equation ( 8), the objective function is shown as: Hence, WFCS can be formulated as an optimization problem which can be expressed as: Equation ( 11) can be solved via the Lagrange multiplier.The L function can be given by: Let the partial derivatives of L function with respect to ij  and  equal to zero.Then we have: (1 ) According to Equations ( 13)-( 15), dataset X can be partitioned into c clusters by iteratively updating a ,  and  .
We note here that the objective functions of WFCS and ESSC include the within-cluster compactness and between-cluster separation.However, in ESSC the parameter  will be assigned a value at the beginning of the iteration procedure and will be fixed.Furthermore, if =0,  the ESSC will degrade to the entropy weighting clustering algorithm without the between-cluster information.
However, in the proposed WFCS,  will be calculated automatically by the between-cluster information and will not be zero if the parameter 0   .

The Adjustment Strategies
(1) Adjustment of k then Equation ( 14) can be written as: If the value of k  is zero, this means that the th k feature has exactly the same effect on all clusters then k  should be zero.
Here, k  is the grand fuzzy distance between data points and crisp kernels on the th k feature.
Hence, k  is non-negative when distribution of data points is balance and so is k  .On the contrary, k  is negative when distribution of data points is imbalance and k  could be negative.Consequently, we have to make some adjustment.Here, Therefore, the projection function may be expressed as: where 1,..., t s  and 1,..., .p s  After the adjustment, the feature weighting can be given by Equation ( 14). ( then Equation ( 15) can be presented as: If j x falls on the th i crisp boundary, 0 ij   .Accordingly, the membership value of j x is infinite.
The fact is that the membership value of j x is fuzzier than that of data point in the crisp kernel.
Furthermore, the membership value of j x is greater than that of data point lying outside crisp kernel.
Based on the discussions above, we have the projection function as Equation ( 21): where After the adjustment, ij  can be given by Equation (15).

The Implement of WFCS
Step 1. Choose ,  , m  and the iterative error threshold . Assign a random membership partition matrix   ij  and random values between 0 and 1 to  .Set the initial iteration counter as 1 l  ; Step 2. Update ( ) l i a with ( 1)   l ij   , ( 1)   l i   according to Equation (13); Step 3. Update ( ) l k  with ( 1)   l ij   , ( -1)   l i a , ( 1)   l i   based on both Equations ( 14) and (18); Step 4. Update ( ) l ij  with ( -1)   l k  , ( -1)   l i a , ( 1)   l i   according to Equations ( 15) and (21); Step 5. Compute ( ) l i  with  , ( ) l i a according to Equation (9); Step 6. Set 1 l l   and return to Step 2 until convergence has been reached.

Performance Evaluation and Analysis
In this section, the proposed WFCS algorithm has been evaluated by a large number of experiments performed on the simulated dataset and the real dataset.The real datasets include eight UCI benchmarking datasets [14] and a CFM56-type engine dataset (named as ENGINE (ENGINE data can be provided by sending email to the corresponding author)) with measurement noise which has been collected from Air Company.In order to obtain the simulated data, an aero-engine gas path data with Gauss noise (named as LTT) was obtained by a simulation software (developed by the Laboratory of Thermal Turbo machines at the National Technical University of Athens (Downloaded from http://www.ltt.mech.ntua.gr/index.php/softwaremn/teachesmn)).The ENGINE and the LTT datasets present the aero-engine's states, including healthy states and degrade states.In these experiments, all datasets are normalized into (0, 1) [13].
First, the datasets information, validation criteria and parameters setting are described.Then, the properties of the WFCS are investigated based on the experimental results of the Iris dataset.A detailed comparison with other three feature weighting fuzzy clustering algorithms (ESSC, WFCM, WSRFCM) and one un-weighted fuzzy clustering algorithm (FCS) is performed at last.

Datasets Information, Validation Criteria and Experimental Setting
The 10 datasets information are summarized in Table 2.The rand index (RI) [15], the Xie-Beni index (XB) [16] and Within-Between index (WB) [17] are used for evaluating the performance of the proposed WFCS algorithm.The WB index is a recently proposed one.RI index is defined to evaluate the accuracy of partition-the higher the value is, the higher accuracy we get.XB and WB index are to evaluate the with-cluster compactness and betweencluster separation-the smaller the XB and WB values are, the better the clustering results is.
The parameter setting is:

 
Parameter values in experiments are tabulated in Table 3, which is based on the best clustering results in terms of the means and standard deviations of the RI index.We conduct each algorithm 10 times.All experiments were implemented on a computer with 2.5 GHz CPU and 8 GB RAM.

Property Analysis of WFCS
Figure 2 demonstrates the original distribution of Iris dataset and the clustering results of the five algorithms.As shown in Figure 2a, Iris dataset contains three clusters of 50 data points each, where each cluster refers to a type of iris plant.It is obvious that Cluster1 is linearly separable from the other two while the latters are overlapped.Hence, it is more reasonable for data points in Cluster1 to be hard clustered than to be fuzzy clustered.
(1) Clustering performance Figure 2 shows that clustering results of feature weighted clustering algorithms (WFCS, ESSC, WFCM and WSRFCM) are similar to the distribution of original data (shown in Figure 2 (a)).Data points in Cluster1 can be recognized very well by the five algorithms.Moreover, most data points in Cluster2 and Cluster3 can be recognized by the four feature weighted algorithms.In Figure 2 (f), it is obvious that some data in Cluster3 are misclassified into Cluster2 by FCS.
The cluster centers of five algorithms are different from each other.Furthermore, the distance between Cluster1, Cluster2 and Cluster3 center obtained by the five algorithms are shown in Figure 3.
With regard to WFCS, ESSC and FCS integrating the within-cluster compactness and between-cluster separation, the distances between the overlapped Cluster2 and Cluster3 center are larger than that of WSRFCM and WFCM.However, FCS can't partition the data points belonging to Cluster2 or Cluster3 correctly for it has not included the feature weight though the biggest value of distance is obtained.
in order to evaluate the convergence of algorithm.
Figure 4 shows the convergence curves of the five algorithms.As shown in Figure 4, the five convergence curves descend fast in the first two iterations, and the convergence curves vary slowly after three iterations.Furthermore, the smaller iteration number means the higher convergence speed.Overall, WFCS has a higher speed of convergence.The convergence speed of WFCM is lower than that of WFCS and ESSC, while the FCS has the lowest convergence speed.
(2) Hard clustering Figure 5 shows the fuzzy membership values for Cluster1 of 150 data points in WFCS when  is 1, 0.5, 0.05 and 0.005 respectively.When membership value is equal to 1, data point is hard clustered into Cluster1.When membership value is 0, data point is hard clustered into the other two clusters.
In Figure 5a-c, there are 50, 31 and 12 data points hard clustered into Cluster1 respectively.In Figure 5d, all data point membership values are smaller than 1, then all data points are fuzzy clustered into Cluster1.As seen in Figure 5, the membership value becomes fuzzier when  is smaller.Hence, WFCS has the characteristics of both hard clustering and fuzzy clustering.

Clustering Evaluation
The best RI indexes of the five algorithms are presented in Table 4.It is evident in Table 4 that WFCS demonstrates the best performance except for Breast-cancer, Vehicle and ENGINE datasets.The performance of WFCM and WSRFCM are mostly comparable or better than that of ESSC and FCS.Even if FCS is not a feature weighted clustering algorithm, it is able to achieve the best clustering result performance for the dataset Wine.Tables 5 and 6 list the XB and WB index values of the five algorithms respectively.By comparing Tables 4-6, we found that the best clustering performance as indicated through RI is not always the smallest value as indicated through XB or WB index.Therefore, no single algorithm can always be superior to the others for all datasets.
The average performances of the five algorithms are shown in Figure 6.

Figure 1 .
Figure 1.Illustration of the crisp kernel.

Figure 2 .
Figure 2. (a) The original data distribution; (b) The clustering results of weighting fuzzy compactness and separation algorithm (WFCS); (c) The clustering results of enhanced soft subspace clustering algorithm (ESSC); (d) The clustering results of the feature weighting fuzzy clustering algorithm integrating rough sets and shadowed sets (WSRFCM); (e) The clustering results of the feature weighting fuzzy c-means algorithm (WFCM); (f) The clustering results of the fuzzy compactness and separation algorithm (FCS).

Figure 3 .
Figure 3.The distance between three cluster centers.
different clustering algorithms have different objective functions, we introduce the iteration function

Figure 4 .
Figure 4. Convergence of the five algorithms.