An Ensemble of Locally Reliable Cluster Solutions

: Clustering ensemble indicates to an approach in which a number of (usually weak) base clusterings are performed and their consensus clustering is used as the ﬁnal clustering. Knowing democratic decisions are better than dictatorial decisions, it seems clear and simple that ensemble (here, clustering ensemble) decisions are better than simple model (here, clustering) decisions. But it is not guaranteed that every ensemble is better than a simple model. An ensemble is considered to be a better ensemble if their members are valid or high-quality and if they participate according to their qualities in constructing consensus clustering. In this paper, we propose a clustering ensemble framework that uses a simple clustering algorithm based on kmedoids clustering algorithm. Our simple clustering algorithm guarantees that the discovered clusters are valid. From another point, it is also guaranteed that our clustering ensemble framework uses a mechanism to make use of each discovered cluster according to its quality. To do this mechanism an auxiliary ensemble named reference set is created by running several kmeans clustering algorithms.


Introduction
Clustering as a task in statistics, pattern detection, data mining, and machine learning is considered to be very important [1][2][3][4][5]. Its purpose is to assign a set of data points to several groups. Data points placed in a group must be very similar to each other, while they need to be very different from other data points located in other groups. Consequently, the purpose of clustering is to group a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters) and have the maximum difference with other objects within the other clusters [6]. It is often assumed in the definition of clustering that each data object must belong to a minimum of one cluster (i.e., the clustering of all data must be done rather than part of it) and a maximum of one cluster (i.e., clusters must be non-overlapping). Each group is known as a cluster and the whole process of finding a set of clusters is known as clustering process. All clusters together are called a clustering result or abbreviately a clustering. A clustering algorithm is defined as an algorithm that takes a set of data objects and returns a clustering. Various categorizations have been proposed for various clustering algorithms, including hierarchical approaches, flat approaches, density-based approaches, network-based approaches, partially-based approaches, and graph-based approaches [7].
In consensus-based learning as one of the most important research topics in data mining, pattern recognition, machine learning, and artificial intelligence, we train several simple (often weak) learners to learn how to solve a single problem. In this learning method, instead of learning the data directly by a strong learner (which is usually slow), they try to learn a set of weak learners (which are usually fast) and combine their results with an agreement function mechanism (like a voting) [8]. In supervised learning, the evaluation of each simple learner is straightforward because of the existence of labels for data objects. But, it is not true in non-supervised learning, and consequently it is very difficult to find a solution without the use of side information to assess the weaknesses and strengths of a clustering result (or algorithm) on a dataset. Now, there are several methods for ensemble clustering to improve the strength and quality of clustering task. Each clustering in the ensemble clustering is considered to be a basic learner.
This study tries to solve all of the aforementioned sub-problems by defining valid local clusters. In fact, this study calls data around a cluster center in the clustering of kmedoids as valid local data clusters. In order to generate diverse clustering, a duplicate strategy of producing weak clustering results (i.e., using kmedoids clustering algorithm as base clustering algorithm) is used on non-appeared data in previously valid local clusters. Then, an intra-cluster similarity criterion is used to measure the similarity between the valid local clusters. By forming a weighted graph whose vertices are valid local clusters, the next step of the proposed algorithm of this study is done. The weight of an edge in the mentioned graph is the degree of similarity between the two valid local clusters sharing the edge. In the next step of the algorithm, minimizing the graph cut for partitioning this graph is applied to a number (which is predetermined) of the final cluster. In the final step, the output of these clusters and the average credibility and the agreement of the final agreement clusters are maximized. It should be noted that any other base clusters can also be used as base clusters (for example, the fuzzy c-means algorithm (FCM) can be used). Also, other conventional consensus function methods can also be used as an agreement function for combining the basic clustering results.
The second section of this study addresses literature. The proposed method is presented in the third section. In Section 4, experimental results are presented. In the final section, conclusions and future works are discussed.

Related Works
There are two very important problems in ensemble clustering: (1) How to create an ensemble of valid and diverse basic clustering results; and (2) how to produce the best consensus clustering result from an available ensemble. Of course, although each of these two problems has a significant impact on the other one, but these two are widely known and studied as two completely independent problems. That is why only one of these two categories has been addressed in research papers and it has been less seen that both issues are considered together.
The first problem, which is called ensemble generation, tries to generate a set of valid and diverse basic clustering results. This has been done through a variety of methods. For example, it can be generated by applying an unstable basic clustering algorithm on a given data set with a change in the parameters of the algorithm [9][10][11]. Also, it can be generated by applying different base clustering algorithms on a given data set [12][13][14]. Another way that we can create a set of valid and diverse base clustering results is to apply a basic clustering algorithm on various mappings from the given dataset [15][16][17][18][19][20][21][22][23]. In the next step, a set of valid and diverse base clustering results can be created by Appl. Sci. 2020, 10, 1891 3 of 20 applying a base cluster algorithm on various subsets (which can be generated with-replacement or without-replacement) from the given data set [24].
Many solutions have been proposed in order to solve the second problem. The first solution is an approach based on the co-occurrence matrix, in which first we store the number of each pair of data with the same number of the clusters, which contain them simultaneously, in a matrix called the co-occurrence matrix. Then, the final clusters agreement is obtained by considering this matrix the similarity matrix and applying a clustering method (usually a hierarchical clustering method).
This approach is known as the most traditional method [25][26][27][28]. Another approach is a graph cutting-based approach. In this approach, first, the problem of finding a consensus clustering becomes a graph partitioning problem. Then, the final clusters are obtained using the partitioning or graph cutting algorithms [29][30][31][32]. Four graph-based ensemble clustering algorithms are known as CSPA, HGPA, MCLA, and HBGF.
Another approach is the voting approach [16,17,[33][34][35]. For this purpose, first, a re-labeling must be done. Re-labeling is done in order to align the labels of various clusters to match. Other important approaches include [36][37][38][39][40][41]: (1) an approach which considers primary clusters an interface space (or new data set) and partitions this new space using a basic clustering algorithm such as the expectation maximization algorithm [37], and (2) an approach which uses evolutionary algorithms to find the most consistent clustering as consensus clustering [36], the approach of using the kmods clustering algorithm to find an agreement clustering [40,41] (note that kmods clustering algorithm is equivalent version of kmeans clustering algorithm for categorical data).
Furthermore, an innovative clustering ensemble framework based on the idea of cluster weighting was proposed. Using a certainty criterion, the reliability of clusters was first computed. Then, the clusters with highest reliability values are chosen to make the final ensemble [42]. Bagherinia et al. have introduced an original fuzzy clustering ensemble framework in which effect of the diversity and quality of base clusters were studied [43]. In addition to Alizadeh et al. [44,45], a new study claims that edited NMI (ENMI), which is derived from a subset of total primary spurious clusters, performs better than NMI for cluster evaluation [46].
Moreover, multiple clusters have been aggregated considering cluster uncertainty by using locally weighted evidence accumulation and locally weighted graph partitioning approaches, but the proposed uncertainty measure depends on the cluster size [47].
The ensemble clustering methods are considered to be capable of clustering data of arbitrary shape. Therefore, one of the methods to discover clusters with arbitrarily shaped clusters is clustering ensemble. Consequently, we have to compare our method to some of these methods such as: CURE [48] and CHAMELEON [49,50]. A set of hierarchical clustering algorithms, which aims at extraction of data clustering with arbitrary-shape clusters, uses sophisticated techniques and involves a number of parameters. Two examples of these clustering algorithms are CURE [48] and CHAMELEON [49,50]. CURE clustering algorithm takes a number of sampled datasets and partitions them. After that, a predefined number of distributed sample points are chosen per partition. Then, single link clustering algorithm is employed to merge similar clusters. Because of the randomness of its sampling, CURE is an instable clustering algorithm. CHAMELEON clustering algorithm first transforms the dataset into a k-nearest neighbors graph, and divides it into m smaller subgraphs by graph partitioning methods. After that, the basic clusters represented by these subgraphs are clustered hierarchically. According to the experimental results described in document [49], CHAMELEON algorithm has higher accuracy than CURE algorithm and DBSCAN algorithm.

Proposed Ensemble Clustering
This section provides definitions and necessary notations. Then, we define the ensemble clustering problem. In the next step, the proposed algorithm is presented. Finally, the algorithm is analyzed in the last step. Table 1 shows all the symbols used in this study. i-th data object in dataset D L D i:

Notifications and Definitions
Real label of the i-th data object in dataset D D ij j-th feature from i-th data object |D :1 | The size of data set D Similarity between two clusters u and v T q (u, v) q-th hypothetical cluster between two center points of clusters u and v p q: π i , π j The center of q-th hypothetical cluster between two clusters u and v B The size of the ensemble clustering Φ C i The number of the clusters in the i-th clustering result Φ i G(Φ) The graph defined on the ensemble clustering Φ V(Φ) The nodes of the graph defined on the ensemble clustering Φ E(Φ) The edges of the graph defined on the ensemble clustering Φ λ A clustering result similar to the real labels 3.1.1. Clustering A set of C non-overlapping subsets of data set can be called as a clustering result (or abbreviately a clustering) or a partitioning result (or abbreviately a partitioning), if the subsets union is the entire data set and the intersection of each pair of subsets is null; any subset of a data set is called a cluster. A clustering is shown by Φ, a binary matrix, where Φ :i , a vector of size |D :1 |, represents the i-th cluster; and Φ T i: , a vector of size C, represents which cluster the i-th data point belongs to. Obviously, C j=1 Φ ij = 1 for any i in {1, 2, . . . , |D :1 |}; and |D :1 | i=1 Φ ij > 0 for any j in {1, 2, . . . , C}; and also we have The center of each cluster Φ :i is a data point shown by M Φ :i , and its j-th feature is defined as Equation (1) [51].

A Valid Sub-Cluster from a Cluster
A valid sub-cluster from a cluster Φ :i is shown by R Φ :i and k-th data point belongs to it if R Φ :i k is one. The R Φ :i k is defined according to Equation (2).
where γ is a parameter. It should be noted that a sub-cluster can be considered to be a cluster.

Ensemble of Clustering Results
A set of B clustering results from a given data set is called an ensemble clustering and shown by Φ = Φ 1 , Φ 2 , . . . , Φ B , where Φ i represents i-th clustering result in the ensemble Φ. Obviously, Φ k as a clustering has a number of C k clusters, and therefore, it is a binary matrix of size |D :1 | × C k . The j-th cluster of k-th clustering result of the ensemble Φ is shown by Φ k :j . Objective clustering or the best clustering is shown by Φ * including C clusters.

Similarity Between a Pair of Clusters
There are different distance/similarity criteria between the two clusters. In this study, we define the similarity between the two clusters Φ k 1 :i and Φ k 2 :j , which is shown by sim Φ k 1 :i , Φ k 2 :j , and defined as Equation (3).
where, T q Φ k 1 :i , Φ k 2 :j is calculated using the Equation (4): where, p q: Φ k 1 :i , Φ k 2 :j is a point whose w-th feature is defined as the Equation (5): Different T q for all q ∈ {1, 2, . . . , 9} for two arbitrary clusters (two circles depicted at corners) are depicted in the top picture in Figure 1. Indeed, each T q is an assumptive region or an assumptive cluster or an assumptive circle. The term ∪ 9 q=1 T q Φ (3) is the set of all data points in the grey (blue in online version) region at the picture presented by Figure 1. The term p q: is center of the assumptive cluster T q .
Appl. Sci. 2020, 10, x 6 of 22 cluster or an assumptive circle. The term ⋃ T : , : −∪ : , : in Equation (3) is the set of all data points in the grey (blue in online version) region at the picture presented by Figure 1. The term : is center of the assumptive cluster .

An Undirected Weighting Graph Corresponding to an Ensemble Clustering
A weighting graph corresponding to an ensemble of clustering results is shown by and is defined as = , . The vertex set of this graph is also the set of all of the valid subsets extracted out of all of the clusters in the ensemble members, namely, = : , … , : , : , … , : … : , … , : . In this graph, the weight of each edge between a pair of the vertices in this graph, or the edge between a pair of the clusters is their similarity and it can be obtained in accordance with Equation (6).

Production of Multiple Base Clustering Results
The set of basic clustering results is generated based on the algorithm presented in Algorithm 1. In this pseudocode, the indices of the whole data set are first stored as , and then step-by-step, the modified kmedoids algorithm [52] is applied and the result of the clustering is stored. The th clustering result has at least clusters where is a positive random integer number in the interval 2; | : | .

An Undirected Weighting Graph Corresponding to an Ensemble Clustering
A weighting graph corresponding to an ensemble Φ of clustering results is shown by G(Φ) and is defined as The vertex set of this graph is also the set of all of the valid subsets extracted out of all of the clusters in the ensemble members, namely, :C B . In this graph, the weight of each edge between a pair of the vertices in this graph, or the edge between a pair of the clusters is their similarity and it can be obtained in accordance with Equation (6).

Production of Multiple Base Clustering Results
The set of basic clustering results is generated based on the algorithm presented in Algorithm 1. In this pseudocode, the indices of the whole data set are first stored as T, and then step-by-step, the modified kmedoids algorithm [52] is applied and the result of the clustering is stored. The ith clustering result has at least C i clusters where C i is a positive random integer number in the interval 2; Each base clustering result is an output of a base locally reliable clustering algorithm presented in Algorithm 2. The mentioned method repeats until the number of the objects out of the so-far reliable clusters is less than C 2 . The final cluster centers obtained at any time are extracted using a repeated method to display different sub-sets of data and also ensure that multiple clustering results are used to describe the entire data. Here, we have to explain why it determines the final conditions. Many researchers, in the research background [53,54], argued that the maximum number of clusters in a data set should be less than √ |D :1 |. Thus, as soon as the number of the objects in the dataset which is out of the so-far reliable clusters is less than C 2 , we assume that the dataset can be divided no longer into C clusters. Therefore, the loop ends in that case.
The time complexity of the kmedoids clustering algorithm (a version of the kmeans clustering algorithm) is O(|D :1 |CI), so that I is the number of iterations. It should be noted that the kmeans or kmedoids clustering algorithm is a poor learner whose function is affected by many factors. For example, the algorithm is very sensitive to initial cluster centers. So that, selection of different initial cluster centers often leads to different clustering results. In addition, the kmeans or kmedoids clustering algorithm has the tendency to find spherical clusters in relatively uniform sizes that are not suitable for data with other distributions. Therefore, we will try to provide multiple clustering results generated by the kmedoids clustering algorithm in order to create an ensemble of good clustering results over the data set, by distributing different data, instead of using a strong clustering algorithm.
For k = 1 to C 07. If

Time Complexity of Production of Multiple Base Clustering Results
The incremental method is called the kmedoids clustering algorithm, which is called the base locally reliable clustering algorithm presented in Algorithm 2. The time complexity of the base locally reliable clustering algorithm presented in Algorithm 2 is O(|D :1 |IC), which is in the worst case O |D :1 | 1.5 I where I is number of the iterations the kmedoids clustering algorithm needs to converge. The algorithm of generating the ensemble of clustering results presented in Algorithm 1 is is the number of basic clustering results generated. The outputs of the algorithm presented in Algorithm 1 have been the clustering set Φ and also the set of clusters' numbers in ensemble members.

Construction of Clusters' Relations
Class labels show specific classes in the classification, while cluster labels express only the data grouping features and are not comparable in cluster analysis in different clustering results. Therefore, different clustering labels must be aligned in the ensemble clustering. Additionally, since the kmeans and kmedoids clustering algorithms can only detect spherical and uniform clusters, a number of clusters in a same clustering result can inherently be a same cluster. Therefore, analysis of the relationship between clusters through a between-cluster similarity measure is needed. Now, there are a large number of criteria proposed in the research background [49,[55][56][57] to measure the similarity among the clusters. For example, in the chain clustering algorithm, the distance between the closest or farthest data object of the two clusters is used for measuring the cluster separation [56,57]. They are sensitive to noise because of their dependence on a few objects. In the center-based clustering algorithms, distance between the centers of clusters measures the lack of correlation between the two clusters. Although this measure is considered to be a computationally efficient and powerful measure to deal with noise, it cannot reflect the boundary between the two clusters.
The number of common objects created by the two clusters is used to represent their similarity in cluster grouping algorithms. This measure does not consider the fact that the cluster labels of some objects may be incorrect in a cluster. Therefore, some of these objects have a significant impact on the measurement. Additionally, since two clusters of a same clustering do not have any common objects, measurement cannot be used to measure their similarity. Although there are good practical implementations of different measures, they are not suitable for ensemble clustering. In the previous section, the basic clustering generated Φ with valid local labels are different, which means that the labels of each cluster are only partially valid. Therefore, measurement of the difference between the two clusters in our local labels, instead of all the labels, is needed. However, the overlap between the local spatial spaces of both clusters should be very small because of the basic clustering generation mechanism. Therefore, we consider an "indirect" overlap between the two clusters in order to measure their similarity.
Let us assume we have Φ :i and Φ :j as the two clusters, M Φ :i and M Φ :j as their cluster centers, and consequently, p 5: Φ :i , Φ :j as the middle point of the two centers. We assume there is a hidden dense region between the reliable sections of the clusters pair Φ :i and Φ :j , i.e., R Φ :i and R Φ : j . We define 9 points p k: Φ :i , Φ :j for k ∈ {1, 2, . . . , 9} at the equal distances on line connecting M Φ :i to M Φ :j . We assume whatever the number of objects is larger in all of the valid local spaces, it is more likely that those clusters are the same. If all of the valid local spaces are dense and the distance between M Φ :i and M Φ :j is not greater than 4γ, the likelihood that those clusters are the same should be a high value, as shown in Figure 1. For clusters Φ :i and Φ :j , we examine the following two factors in order to measure their similarity: (1) The distance between their cluster centers, (2) the possibility of the existence of a dense region between them. As we know, if the distance between their cluster centers is smaller, it is more likely that they may be the same cluster. Therefore, we assume that their similarity must be inversely proportional to this distance. Additionally, we know that, since the kmedoids clustering algorithm is a linear clustering algorithm, the spaces of both clusters are separated by the middle line between their cluster centers. If the areas around them contain a few objects, i.e., they are sparse, they can be clearly identified. An example has been presented in Figure 2. It is observed that the distance between the centers of the clusters B and C is not greater than the distance between clusters A and B. But we find that the boundary between the clusters B and C is clearer than the boundary between the clusters A and B. Therefore, if the clarity between boundary and clusters is considered, there is more distance between clusters B and C than clusters A and B. Based on the above analyses, we assume that their similarity should be proportional to the probability of existence of dense regions between their centers. So similarity between two clusters is formally measured based on Equation (3). of the clusters B and C is not greater than the distance between clusters A and B. But we find that the boundary between the clusters B and C is clearer than the boundary between the clusters A and B. Therefore, if the clarity between boundary and clusters is considered, there is more distance between clusters B and C than clusters A and B. Based on the above analyses, we assume that their similarity should be proportional to the probability of existence of dense regions between their centers. So similarity between two clusters is formally measured based on Equation (3). Based on the similarity criterion, we generated a weighted undirected graph (WUG), denoted by = , , to show the relationship between these clusters. In the mentioned graph , is the set of vertices that represent clusters in the ensemble . Therefore, each vertex is also seen as a cluster in the ensemble . The is the weights of the edges between vertices, i.e., clusters. For a pair of clusters, their similarity is used as the weight of the edge between them, i.e., the weight is calculated according to Equation (6); and the more similarity between them, the more likely they show a same cluster. After obtaining the WUG, the problem of determining the cluster relationship can be transferred to a normal graph partitioning problem [58]. Therefore, a partition of vertices in the graph is obtained and it is denoted by which is of the size ∑ × . The = 1 if the : belongs to the th consensus cluster where ∑ + = . We want to obtain such a partitioning by minimizing an objective function where vertices are very similar in the same subsets and are very different from the vertices in other subsets. In order to solve the optimization problem, we apply a normalized spectral clustering algorithm [59] to obtain a final partition of . The vertices in the same subsets are used to represent a cluster. Therefore, we define a new ensemble with aligned clusters denoted by based on Equation (7). Based on the similarity criterion, we generated a weighted undirected graph (WUG), denoted by G(Φ) = (V(Φ), E(Φ)), to show the relationship between these clusters. In the mentioned graph G(Φ), V(Φ) is the set of vertices that represent clusters in the ensemble Φ. Therefore, each vertex is also seen as a cluster in the ensemble Φ. The E(Φ) is the weights of the edges between vertices, i.e., clusters.
For a pair of clusters, their similarity is used as the weight of the edge between them, i.e., the weight is calculated according to Equation (6); and the more similarity between them, the more likely they show a same cluster. After obtaining the WUG, the problem of determining the cluster relationship can be transferred to a normal graph partitioning problem [58]. Therefore, a partition of vertices in the graph G(Φ) is obtained and it is denoted by CC which is of the size :q belongs to the ith consensus cluster where We want to obtain such a partitioning by minimizing an objective function where vertices are very similar in the same subsets and are very different from the vertices in other subsets. In order to solve the optimization problem, we apply a normalized spectral clustering algorithm [59] to obtain a final partition of CC. The vertices in the same subsets are used to represent a cluster. Therefore, we define a new ensemble with aligned clusters denoted by Λ based on Equation (7).
The time complexity to make the cluster relationship will be O |D :1 | B t=1 C t

Extraction of Consensus Clustering Result
After securing the ensemble of the aligned (or relabeled) clustering results out of the main ensemble of the clustering results, Λ, where Λ k :: is a matrix of size |D :1 | × C for any k ∈ {1, 2, . . . , B}, is now available for the extraction of consensus clustering result. According to the ensemble Λ, the consensus function can be rewritten according to Equation (8).
where Λ ij = B k=1 Λ k ij . The complexity of the final cluster generation time is O(|D :1 |CB).

Overall Implementation Complexity
The general complexity of the proposed algorithm is equal to We observe that the time complexity is linear proportional to the number of objects, and for ensemble learning, the greater number of base clusters, i.e., B t=1 C t , does not mean the ensemble performance is better. Therefore, we can control the equation B t=1 C t |D :1 |, so that the proposed algorithm is suitable for dealing with large-scale data sets. However, if there is enough computational resource, we can increase C t up to |D :1 | 0.5 and consequently assuming B t=1 C t ≈ |D :1 | 0.5 , the general complexity of the proposed algorithm is O |D :1 | 1.5 I + |D :1 | 2 . But, if there is not enough computational resource, the complexity of the proposed algorithm is linear with the data size.

Experimental Analysis
In this section, we test the proposed algorithm on four artificial datasets and five real-world datasets and evaluate its efficacy using (1) external validation criteria and (2) time assessment costs.

Benchmark Datasets
Experimental evaluations have been performed on nine benchmark datasets. The details of these datasets are shown in Table 2. The cluster distribution of the artificial 2D datasets has been shown in Figure 3. The real-world datasets are derived from the UCI datasets' repository [60]. Table 2. Description of the benchmark datasets; the number of data objects (|D :1 |), the number of attributes (|D 1: |), the number of clusters (C).

Evaluation Criteria
Two external criteria have been used to measure the similarity between the output labels predicted by different clustering algorithms and the correct labels of the benchmark datasets. Let us denote the clustering similar to the real labels of dataset by . It is defined according to Equation (9).
where : is the real label of the th data point. Given a dataset and two partitioning results over these objects, namely * (the consensus clustering result) and (the clustering similar to the real labels of dataset), the values between * and can be summed up in a probable table. It is presented in Table 3, so that shows the number of common data objects in the groups : * and : .
The adjusted rand index (ARI) is defined based on Equation (10). * , = where the variables are defined in Table 3. Normalized mutual information (NMI) [61] is defined based on Equation (11).
The more similar the clustering result * (i.e., the consensus clustering result) and the ground truth clustering (the real labels of dataset), i.e., , the more the value of their NMI (and ARI).

Evaluation Criteria
Two external criteria have been used to measure the similarity between the output labels predicted by different clustering algorithms and the correct labels of the benchmark datasets. Let us denote the clustering similar to the real labels of dataset by λ. It is defined according to Equation (9).
where L D i: is the real label of the ith data point. Given a dataset D and two partitioning results over these objects, namely π * (the consensus clustering result) and λ (the clustering similar to the real labels of dataset), the values between π * and λ can be summed up in a probable table. It is presented in Table 3, so that n ij shows the number of common data objects in the groups π * :i and λ :j .
The adjusted rand index (ARI) is defined based on Equation (10).
where the variables are defined in Table 3. Normalized mutual information (NMI) [61] is defined based on Equation (11).
The more similar the clustering result π * (i.e., the consensus clustering result) and the ground truth clustering (the real labels of dataset), i.e., λ, the more the value of their NMI (and ARI).
In addition, we compared the proposed method with other "strong" base clustering algorithms including: (1) the normal spectral clustering algorithm (NSC) [59], (2) "density-based spatial clustering of applications with noise" algorithm (DBSCAN) [62], (3) "clustering by fast search and find of density peaks" algorithm (CFSFDP) [63]. The purpose of this comparison is to test whether the proposed method is a "strong" clustering or not.

Experimental Settings
A number of settings for the different ensemble clustering algorithms are listed in the following so as to ensure that they are reproducible. The number of clusters in each base clustering result is randomly set in the proposed ensemble clustering algorithm where the kmedoids clustering algorithm is also used to produce their basic clustering results. In each of the state-of-the-art clustering algorithms, the number of clusters per each base clustering result in the ensemble is set according to the method they used; and the kmeans clustering algorithm is also used to produce their basic clustering results. The parameter B is always set to 40. For these compared methods, we determine their parameters based on their authors' suggestions. The quality of each clustering algorithm is reported as an average over 50 independent runs. A Gaussian kernel has been employed for the NSC algorithm and a value in a range greater than or equal to 0.1 and less than or equal to 2 with step size 0.1 is chosen as the kernel parameter σ 2 . In these parameters, the best clustering result has been selected for comparison.
DBSCAN and CFSFDP algorithms also require the input parameter ε. We estimated the value of ε using the average distance between all data points and their average point, denoted by ASE. However, each of these algorithms may require different values of ε. Therefore, we have evaluated each of these algorithms with ten different values from the following set ASE 1 , ASE 2 , ASE 3 , ASE 4 , ASE 5 , ASE 6 , ASE 7 , ASE 8 , ASE 9 , ASE 10 and the best clustering result is used for comparison.

Comparison with the State of the Art Ensemble Methods
Different consensus functions have been first used to extract the final clustering result out of the output ensemble of the Algorithm 1. As it is clear, the clustering results in the output ensemble of the Algorithm 1 are not complete; therefore, to apply EAC on the output ensemble of the Algorithm 1, edited EAC (EEAC) is needed [64,65]. CSPA, HGPA, and MCLA are also applied on the output ensemble of the Algorithm 1. EM has been also used to extract the final clustering result out of the output ensemble of the Algorithm 1. Therefore, seven methods including PC + EEAC + SL, PC + EEAC + AL, PC + CSPA, PC + HGPA, PC + MCLA, PC + EM, and the proposed mechanism presented in Section 3.4 have been used as different consensus functions to extract the final clustering result out of the output ensemble of the Algorithm 1. Here, PC stands for the proposed base clustering presented in Algorithm 1. Experimental results of different ensemble clustering methods on different datasets in terms of ARI and NMI have been presented respectively in Figures 4 and 5. The last seven bars stand for performances of the seven mentioned methods. All these results have been summarized in last seven rows of Table 4. The proposed consensus function presented in Section 3.4 is the best method and the PC + EEAC + SL consensus function is the second. According to Table 4, the PC + EEAC + AL and PC + MCLA consensus functions are the third and fourth methods. Therefore, the proposed mechanism presented in Section 3.4 is considered to be our main consensus function.

Comparison with the State of the Art Ensemble Methods
Different consensus functions have been first used to extract the final clustering result out of the output ensemble of the Algorithm 1. As it is clear, the clustering results in the output ensemble of the Algorithm 1 are not complete; therefore, to apply EAC on the output ensemble of the Algorithm 1, edited EAC (EEAC) is needed [64,65]. CSPA, HGPA, and MCLA are also applied on the output ensemble of the Algorithm 1. EM has been also used to extract the final clustering result out of the output ensemble of the Algorithm 1. Therefore, seven methods including PC + EEAC + SL, PC + EEAC + AL, PC + CSPA, PC + HGPA, PC + MCLA, PC + EM, and the proposed mechanism presented in Section 3.4 have been used as different consensus functions to extract the final clustering result out of the output ensemble of the Algorithm 1. Here, PC stands for the proposed base clustering presented in Algorithm 1. Experimental results of different ensemble clustering methods on different datasets in terms of ARI and NMI have been presented respectively in Figures 4 and 5. The last seven bars stand for performances of the seven mentioned methods. All these results have been summarized in last seven rows of Table 4. The proposed consensus function presented in Section 3.4 is the best method and the PC + EEAC + SL consensus function is the second. According to Table 4, the PC + EEAC + AL and PC + MCLA consensus functions are the third and fourth methods. Therefore, the proposed mechanism presented in Section 3.4 is considered to be our main consensus function.  Also, as shown in Table 4, the efficiency of the proposed ensemble clustering algorithm is significantly better than other ensemble clustering algorithms in the artificial datasets. However, accuracy improvement of the proposed ensemble clustering algorithm in the real-world datasets is marginally better than other ensemble clustering algorithms. The main reason for this is that the complexity of the real-world datasets is larger than the complexity of the artificial datasets. Considering the performance of each method on different datasets to be a variable, a Friedman test is performed. It is proved (with a p-value about 0.006) that there is a significant difference between our variables. It is shown through the post-hoc analysis that the difference is mostly because of the difference between PC+MCLA and the proposed method with a p-value of about 0.047. Therefore, the difference between the proposed method performance and the performance of the most effective method aside from the proposed method, i.e., PC+MCLA, contributing mostly in Friedman test pvalue (i.e., 0.006), is with a P-value equal to 0.047 which is still considered to be significant.

Comparison with Strong Clustering Algorithms
The results of the proposed ensemble clustering algorithm compared with four "strong" clustering algorithms on the different benchmark datasets are depicted in Figure 6. In Figure 6, the last two columns indicate the mean and standard deviation of the clustering validity of each Average Rank Based on NMI Based on the ARI and NMI criteria, the comparison of the performances of different ensemble clustering algorithms on the artificial and real-world benchmark datasets is shown respectively in Figures 4 and 5, and they are summarized in Table 4. As shown in Figures 4 and 5, we observe that the proposed ensemble clustering algorithm has a high clustering accuracy in the benchmark synthetic and real-world datasets compared to other existing ensemble clustering algorithms. According to the experimental results, the proposed ensemble clustering algorithm can detect different clusters in an effective way and increase the performance of the state-of-the-art ensemble clustering algorithms.
Also, as shown in Table 4, the efficiency of the proposed ensemble clustering algorithm is significantly better than other ensemble clustering algorithms in the artificial datasets. However, accuracy improvement of the proposed ensemble clustering algorithm in the real-world datasets is marginally better than other ensemble clustering algorithms. The main reason for this is that the complexity of the real-world datasets is larger than the complexity of the artificial datasets. Considering the performance of each method on different datasets to be a variable, a Friedman test is performed. It is proved (with a p-value about 0.006) that there is a significant difference between our variables. It is shown through the post-hoc analysis that the difference is mostly because of the difference between PC+MCLA and the proposed method with a p-value of about 0.047. Therefore, the difference between the proposed method performance and the performance of the most effective method aside from the proposed method, i.e., PC+MCLA, contributing mostly in Friedman test p-value (i.e., 0.006), is with a p-value equal to 0.047 which is still considered to be significant. Table 4. The summery of the results presented in Figure 5. The column L-D-W indicates the number of datasets on which the proposed method Loses to-Draws with-Wins against a rival validated by paired t-test [66] with the confidence level of 95%.

ARI NMI
Average ± STD L-D-W Average ± STD L-D-W The results of the proposed ensemble clustering algorithm compared with four "strong" clustering algorithms on the different benchmark datasets are depicted in Figure 6. In Figure 6, the last two columns indicate the mean and standard deviation of the clustering validity of each algorithm on the different datasets. We observe that the clustering validity of the proposed ensemble clustering algorithm is superior or close to the best results of the other four algorithms. According to the results of these experiments, the proposed algorithm can compete with "strong" clustering algorithms; therefore, it approves that "a number of weak clusters equal to a strong clustering".
Parameter analysis: How to set the parameter γ is considered to be an important issue for the proposed ensemble clustering algorithm. The selection of this parameter regulates the number of basic clusters produced through each base clustering result. The number of base clusters generated by the proposed ensemble clustering algorithm increases exponentially with decreasing γ. However, the accuracy of clustering task does not increase with decreasing γ, therefore the value of γ must increase to a certain extent. According to the empirical results, if the number of basic clusters in a clustering result is very large or small, it can be considered to be a bad clustering. Therefore, the value γ is selected in such a way that the number of clusters in each clustering result is less than √ |D :1 | and more than √ |D :1 | 2 . Time Analysis: In the end, the efficiency of the proposed ensemble clustering algorithm is evaluated on the KDD-CUP99 dataset. The γ is set to 0.14. The proposed ensemble clustering algorithm is implemented on Matlab 2018. The runtime of the proposed ensemble clustering algorithm in terms of different numbers of objects are shown in Table 5. We observe that the number of the basic clusters in the base clustering results increases with increasing the number of objects. algorithm on the different datasets. We observe that the clustering validity of the proposed ensemble clustering algorithm is superior or close to the best results of the other four algorithms. According to the results of these experiments, the proposed algorithm can compete with "strong" clustering algorithms; therefore, it approves that "a number of weak clusters equal to a strong clustering." Parameter analysis: How to set the parameter is considered to be an important issue for the proposed ensemble clustering algorithm. The selection of this parameter regulates the number of basic clusters produced through each base clustering result. The number of base clusters generated by the proposed ensemble clustering algorithm increases exponentially with decreasing . However, the accuracy of clustering task does not increase with decreasing , therefore the value of must increase to a certain extent. According to the empirical results, if the number of basic clusters in a clustering result is very large or small, it can be considered to be a bad clustering. Therefore, the value

Final Decisive Experimental Results
In this subsection, a set of six real-world datasets has been employed to evaluate the efficacy of the proposed method in comparison with some recently published papers. Three of these six datasets are the same Wine, Iris, and Breast datasets presented in Table 2. To make our final conclusion fairer and more general, three additional datasets, whose details are presented in Table 6, are used as benchmark in this subsection. Table 6. Description of the benchmark datasets; the number of data objects (|D :1 |), the number of attributes (|D 1: |), the number of clusters (C).

Source
Dataset |D :1 | |D Based on the NMI criterion, the comparison of the performances of different ensemble clustering algorithms on the real-world benchmark datasets is shown in Table 7. According to the experimental results, the proposed ensemble clustering algorithm can detect different clusters in an effective way and increase the performance of the state-of-the-art ensemble clustering algorithms.

Conclusions and Future Works
The kmedoids clustering algorithm, as a fundamental clustering algorithm, has been widely considered to be a low-computational one. However, it is also considered to be a weak clustering method, because its performances are affected by many factors, such as unsuitable selection of the initial cluster centers and dissimilar distribution of data. This study is carried out aiming to propose a new ensemble clustering algorithm using multiple kmedoids clustering algorithms. The proposed ensemble clustering method has the advantages of the kmedoids clustering algorithm, including its high speed. Meanwhile, it does not have its major weaknesses, i.e., the inability to detect non-spherical and non-uniform clusters. Indeed, the new ensemble clustering algorithm can improve stability and quality of kmedoids clustering algorithm and prove "aggregating several weak clustering results is better than or equal to a strong clustering result." This study tries to solve all of the ensemble clustering problems by defining valid local clusters. In fact, this study calls data around a cluster center in the clustering of kmedoids as valid local data clusters. In order to generate a diverse clustering result, a strategy of sequential application of a weak clustering algorithm (i.e., using kmedoids clustering algorithm as base clustering algorithm) is used on non-appeared data in previously valid local clusters. Empirical analysis compares the proposed ensemble clustering algorithm with several existing ensemble clustering algorithms and three strong fundamental clustering algorithms running on a set of artificial and real-world benchmark datasets. According to the empirical results, the performance of the proposed ensemble clustering algorithm is much more effective than the state-of-the-art ensemble clustering methods. In addition, we examined the efficiency of the proposed ensemble clustering algorithm, which is even suitable for dealing with large-scale datasets. This method works because it concentrates on finding local structures in valid small-cluster and then merging them. It also benefits from instability of kmedoids clustering algorithm to make a diverse ensemble. The main limitation of the paper is its usage of base kmedoids clustering algorithm. It seems that other base kmedoids clustering algorithms such as fuzzy cmeans clustering algorithm can be a better alternative and it will be discussed in future work.