An Adaptive Spectral Clustering Algorithm Based on the Importance of Shared Nearest Neighbors

The construction of a similarity matrix is one significant step for the spectral clustering algorithm; while the Gaussian kernel function is one of the most common measures for constructing the similarity matrix. However, with a fixed scaling parameter, the similarity between two data points is not adaptive and appropriate for multi-scale datasets. In this paper, through quantitating the value of the importance for each vertex of the similarity graph, the Gaussian kernel function is scaled, and an adaptive Gaussian kernel similarity measure is proposed. Then, an adaptive spectral clustering algorithm is gotten based on the importance of shared nearest neighbors. The idea is that the greater the importance of the shared neighbors between two vertexes, the more possible it is that these two vertexes belong to the same cluster; and the importance value of the shared neighbors is obtained with an iterative method, which considers both the local structural information and the distance similarity information, so as to improve the algorithm’s performance. Experimental results on different datasets show that our spectral clustering algorithm outperforms the other spectral clustering algorithms, such as the self-tuning spectral clustering and the adaptive spectral clustering based on shared nearest neighbors in clustering accuracy on most datasets. OPEN ACCESS Algorithms 2015, 8 178


Introduction
Over the past several decades, the spectral clustering algorithm has attracted a great amount of attention in the field of pattern recognition and become a research hot spot [1].It has the feature that it does not assume for the global structure of the dataset, but directly finds the global optimal solution on a relaxed continuous domain through decomposition of the Laplacian matrix of the graph.Therefore, it is simple to implement and is solved efficiently by standard linear algebra, so that it often outperforms the traditional clustering algorithms, such as the k-means algorithm [2].
Spectral clustering consists of one significant step in which a similarity matrix (graph) with a kind of similarity measure should be constructed.The main goal of constructing the similarity matrix is to model the local neighborhood relationships between the data vertexes.A good similarity matrix is greatly responsible for the performance of spectral clustering algorithms [3].
The Gaussian kernel function is one of the most common similarity measures for spectral clustering, in which a scaling parameter  controls the speed of the similarity falling off with the distance between the vertexes.Though its computation is simple and the results of the positive definite similarity matrix can simplify the analysis of eigenvalues, it does not work well on some complex datasets, e.g., a multi-scale dataset [4].Moreover, the scaling parameter  is specified manually, so that the similarity between two vertexes is only determined by their Euclidean distance.
In recent years, there have appeared some new construction methods of the similarity matrix.Fischer et al. [5] proposed a path-based clustering algorithm for texture segmentation.Their algorithm utilizes a connectedness criterion, which considers two objects as similar if there exists a mediating intra-cluster path without an edge with large cost, and it is used for spectral clustering.The construction method mainly combines the Gaussian kernel function with the shortest path, which is effective on some datasets, but sensitive to outliers.Chang et al. [6] utilized the idea of M-estimation and developed a robust path-based spectral clustering method by defining a robust path-based similarity measure for spectral clustering, which can effectively reduce the influence of outliers.Yang et al. defined adjustable line segment lengths, which can squeeze the distances in high density regions, but widen them in low density regions, and proposed a density-sensitive distance similarity function for the spectral clustering [7].Assuming that each data point can be linearly reconstructed from its local neighborhoods, Gong et al. utilized the contributions between different vertexes in neighborhoods through n standard quadratic programming to get the similarity, rather than Gaussian kernel function, and to get a better cluster performance [8].Zhang et al. adopted multiple methods of vector similarity measurement to produce diverse similarity matrices to get a new similarity matrix through particle swarm optimization and proposed a new similarity measure [9].The construction methods utilized the idea of ensemble learning, which is helpful to improve the cluster performance.Cao et al. utilized the maximum flow to be computed as the new similarity between data points, which carried the global and local relations between data and worked well on a dataset with a nonlinear and elongated structure [10].
The multi-scaled self-tuned kernel function for spectral clustering is also a significant research direction.Erdal Yenialp et al. proposed a multi-scale density-based spatial clustering algorithm with noise.The proposed algorithm represents the images in multiple scales by using Gaussian smoothing functions and evaluates a density matrix for each scale.The density matrices in each scale are then fused to capture salient features in each scale.The developed algorithm does not include a training phase, so computationally-efficient solutions could be reached to segment the region-of-interest [11].Hsieh Fushing et al. developed a new methodology, called data cloud geometry-tree, which derived from the empirical similarity measurements a hierarchy of clustering configurations that captures the geometric structure of the data, and had a built-in mechanism for self-correcting clustering membership to multi-scale clustering, which provided a better quantification of the multi-scale geometric structures of the data [12].Raghvendra et al. created a parameter-free kernel spectral clustering model and exploited the structure of the projections in the eigenspace to automatically identify the number of clusters, which showed the efficiency for large-scale complex networks [13].Manor et al. introduced a self-tuning scaling parameter for the Gaussian kernel function, and on that basis, Li et al. introduced a parameter for the shared nearest neighbors self-tuning Gaussian kernel function and proposed an adaptive spectral clustering algorithm based on the shared nearest neighbors.This algorithm exploited the information about local density embedded in the shared nearest neighbors, thereby learning the implicit information of the cluster's structure and improving the algorithm's performance [14,15].
Due to the non-homogeneous of the network topology, each node in the network is of different importance.The similarity of two vertices relates not only to the number of neighbors shared, but also closely to the importance of the shared neighbor vertices.In a graph, the importance of a vertex is related to the vertex's out-degree, in-degree and neighboring vertexes' importance.The greater the importance of the shared neighbors between two vertexes, the more possible it is that these two vertexes belong to the same cluster.Blondel et al. introduced hubs and authorities based on the idea of characterizing the most important vertices in a graph representing the connections between vertices [16].From an implicit relation, an "authority score" and a "hub score" to each vertex of a given graph can be obtained as the limit of a converging iterative process, which can be used to represent the importance of the vertices [17].
In this paper, we propose the importance of a shared nearest neighbors-based similarity measure for constructing the similarity matrix, originating from the idea of "authority score" and "hub score".In this measure, we first find the importance of every vertex through the limitation of a converging iterative process and then look for the maximal importance in shared nearest neighbors between each of two vertices.The greater the maximal importance, the more similar the two vertices are.Therefore, we can get structure information between every two vertices and then utilize this information to self-tune the Gaussian kernel function.Finally, we get the similarity measure based on the importance of shared nearest neighbors.
The rest of this paper is organized as follows.In Section 2, we give a brief outline of similarity graphs.In Section 3, we propose a new similarity measure and apply it to the construction of the similarity matrix.In Section 4, we present the experiment results for the proposed algorithm on some datasets, followed by the concluding remarks given in Section 5.

Similarity Graphs
Given a set of data points 1 , , n x x  and some notion of similarity 0 ij s  between all pairs of data points i x and j x , the intuitive goal of clustering is to divide the data points into several groups, so that points in the same group are similar and points in different groups are dissimilar to each other.If we do not have more information than similarities between data points, a nice way of representing the data is in the form of the similarity graph ( , ) G V E  .Each vertex i v in this graph represents a data point i x .Two vertices are connected if the similarity ij s between the corresponding data points i x and j x is positive or larger than a certain threshold and the edge is weighted by ij s .The problem of clustering can now be reformulated by using the similarity graph: we want to find a partition of the graph so that the edges between different groups have very low weights and the edges within a group have high weights.
The goal of constructing similarity graphs is to model the local neighborhood relationships between the data points.As far as we know, the Gaussian kernel function is still an important construction method; and the important feature of the Gaussian kernel function is that the construction form is based on the Gaussian kernel model, which can be defined as Equation (1).exp( ( , ) ) Where, the ( , ) d i j is the Euclidean distance between i x and j x , and  is the kernel parameter, which is a fixed parameter and cannot vary with the change of the surroundings.Zelnik-Manor et al. proposed a local scale parameter i  for each point to replace the fixed parameter  [14], which allows the similarity self-tuning capability.Usually, ( , ) , where m x is the m -th closest neighbor of the point i x , and the similarity function is defined as Equation (2).

exp( ( , ) (
)) Jarvis et al. proposed a conception of the shared nearest neighbor, which is used to characterize the local density of different vertices [18].Supposing the closest kd nearest neighbors of point i x can construct a set ( ) i N x and point j x can construct a set ( ) j N x , then the shared neighbor vertexes between i x and j x are defined as Equation ( 3).
( , ) ( ) ( ) Li et al. assumed that vertexes in the same manifold have a higher similarity and a higher local density region than those in different manifolds.They used the number of the shared nearest neighbors to characterize the similarity between vertex i x and j x [15].The construct similarity function is defined as Equation (4).
2 exp( ( , ) ( ( ( , ) 1))) According to this method, the similarity between two vertexes is higher if there are more common shared nearest neighbors.Due to the non-homogeneity of the network topology, the importance of each node in the network is different, and the similarity of two vertices relates to not only the number of neighbors shared, but also closely to the importance of the shared neighbor vertices.

The Importance of Node
Some efficient web searching engines are often based on the idea of characterizing the most important vertices in a graph representing the connections or links between pages on the web, such as Google.Because the linkages between pages can be interpreted as interrelated and mutually supportive between pages, the importance of a page can be determined according to the linkages.Kleinberg et al. proposed a similar method to identify in a set of pages relevant to a query search the subset of pages that are good hubs or the subset of pages that are good authorities [17].Good hubs are pages that point to good authorities, and good authorities are pages that are pointed to by good hubs.From these implicit relations, Kleinberg derived an iterative method that assigns an "authority score" and a "hub score" to every vertex of a given graph.
Given a graph ( , ) G V E  with vertex set V and with edge set E , let i h and i a be the hub and authority scores of vertex i .The hub score of vertex i is set equal to the sum of the authority scores of all vertices pointed to by i , and similarly, the authority score of vertex i is the set equal to the sum of the hub scores of all vertices pointing to i .The scores of and can be calculated as Equation (5).
:( , ) :( , ) Let these scores be initialized by some positive values and then update them simultaneously for all vertices; the "authority score" and "hub score" can be obtained as a limit of a converging iterative process according to Equation ( 6): Where, B is the matrix whose entry ( , ) i j is equal to the number of edges between the vertices i and j in G (the adjacency matrix of G ). Blondel et al. has proven that in the initial condition Equation (4) will converge when the number of iterations is odd or even times, respectively [16].When getting the "authority score" and a "hub score" for every vertex, the score of vertex importance can be calculated as Im ( ) h a   .Obviously, the importance of one vertex is related to the vertex's out-degree, in-degree and neighboring vertexes' importance, to represent the structure and properties characteristics of the network.Similarly, we can utilize the score of vertex importance to construct a similarity matrix in graph G .

Similarity Matrix Based on the Importance of Shared Nearest Neighbors
In this section, we propose a new similarity matrix construction method based on the importance of shared nearest neighbors.There exists a local high density area in the same cluster, and it can be expressed by the numbers of shared nearest neighbors.Obviously, the role of every node in the local i h i a area is different, so in the shared nearest neighbors, the more important the role of one node, the more impact of the vertexes for the graph.Though we cannot give an explicit expression of the role of each node, we hold the opinion that the importance of one vertex is helpful to find some potential "critical nodes" and reflects the global and local importance of the node.The greater the importance of the node, the more it is close to the center of network.In the shared nearest neighbors between two vertices, the greater the neighbor's scores are, the more similar the two vertices are.On the basis of this idea, a new kind of similarity measure based on the importance of shared nearest neighbors is proposed.The steps of computing the similarity matrix is described in Table 1.
Assume the matrix SNEW to be the similarity matrix based on the importance of shared nearest neighbors.We can derive that the construction method is similar to the adaptive Gaussian kernel function based on shared nearest neighbors, SNN , while the difference is that the maximal importance in shared nearest neighbor vertexes is used to replace the number of shared neighbor vertexes.In fact, through adjusting different parameters, SNEW can become the Gaussian kernel function described in self-tuning spectral clustering or SNN .Meanwhile, it is worth noting that there are many choices of shared neighbors, but we choose the vertex with the maximal importance in shared nearest neighbors, not only because the importance of the vertex can express the structural information of global graph, but also the maximal importance can affect the similarity between vertexes in the local structure of the graph.Nevertheless, the shared nearest neighbors reflect the local density information, so the matrix SNEW has considered both structure attributes of the graph and the local density information, so the measure can represent the inner link between vertexes more reasonably.( , ) 0 Where, the ( , ) i j d x x is the Euclidean distance between i x and j x , and the TH is an ordinary threshold about Euclidean distance d and is set as the mean value of d . Step2.Set , and iterate an even number of times with Equation (4).Stop upon convergence and get the importance score of every vertex , Im 1, Step3.Look for shared nearest neighbor vertexes between i x and j x , and find the maximal importance in shared nearest neighbors; set it as: max Im( , ) 1, , ; 1, , Step4.Get a new kind of similarity matrix by Equation ( 8): Where α is a regulation parameter, and 0   ; add 1 to make sure that it not divided by zero.

An Improved Adaptive Spectral Clustering Algorithm
Spectral clustering is a clustering method that is based on graph theory and uses the feature vectors of a data similarity matrix to make the clustering.It can identify a data space of arbitrary shape and converge to the global optimal solution.
Let us consider a set V of N data points, or vertices.We write ij S for the similarity between the i -th and the j -th data point, and ij S S  for the N N  similarity matrix.Let us define the degrees ii D of vertex i V  by Equation ( 9): Without loss of generality, we assume that all vertices have non-zero degrees.Then, we write ( ) One spectral clustering technique, commonly used for image segmentation, is the normalized cuts algorithm or Shi-Malik algorithm introduced by Shi and Malik [19].It partitions points into k sets, , based on the eigenvectors  corresponding to the first k biggest eigenvalues of the symmetric normalized Laplacian defined as, We introduce the proposed similarity matrix SNEW to the standard spectral clustering and then get a new adaptive spectral clustering algorithm based on the importance of shared nearest neighbors.The steps of improved adaptive spectral clustering algorithm is described in Table 2: Table2.Adaptive spectral clustering algorithm based on the importance of shared nearest neighbors.

Adaptive spectral clustering algorithm based on the importance of shared nearest neighbors:
Input: n data vertexes: , clustering number: K : Step1.Get the similarity matrix SNEW according to the calculation steps of the Table1; Step2.Define D to be the diagonal matrix, where , and compute the Laplacian matrix Step3.Compute the first K largest eigenvalues of the Laplacian matrix and their corresponding eigenvectors Step4.Construct the matrix Y by normalizing each row in U , where

Experiments
To evaluate the performance of the adaptive spectral clustering algorithm based on the importance of shared nearest neighbors (SNNISC), experiments are conducted on the synthetic, UCI Machine Learning Repository (UCI) and the MNIST database of handwritten digits (MNIST) in comparison with the other two spectral clustering algorithms, the self-tuning spectral clustering (SSC) [14] and the adaptive spectral clustering based on shared nearest neighbors (SNNSC) [15], respectively.

Evaluation Metric
Given a dataset with n samples, clustering is classified as a relationship between samples; the samples are divided into the same clusters, or different clusters.In following experiments, we adopt the adjusted Rand index (ARI) as the performance metric.
The adjusted Rand index assumes the generalized hyper geometric distribution as the model of randomness, i.e., the different partitions of the objects are picked at random, such that the number of objects in the partitions to compare is fixed.The general form of ARI can be simplified as Equation (10) Where, the ij n is the number of objects that are both in different partitions; the .i n and .j n are the number of objects in different clusters, respectively.The ARI can take on a wider range of values between zero and one, with the increasing sensitivity of the index.

Parameter Settings
In SSC, a similar local scale parameter i  is used and is actually computed as its distance to the M-th neighbor.In our experiments, the range of M is   2, 20 , and the one that gets the best ARI values is used.SNNSC involves the number of shared nearest neighbors' parameter kd .The range of kd is   5,50 , and the one that gets the best ARI value is picked.The range of  is [10,20] .The value of TH is set as the mean value of Euclidean distance of all vertexes.

Experiments on Synthetic Datasets
As shown in Figure 1, six synthetic datasets [20] with different structure are used in the experiments, and the results are shown in Table 3.This example is used to test the ability of identifying different structures on synthetic datasets.In Table 3, the average value is used to show the average performance of algorithms on different datasets, and the best value is marked by boldface.It can be seen from the Table 3 that SSC, SNNSC and SNNISC get similar results on all the datasets (about 97%, except on the forth dataset), which indicates that the proposed similarity measure can effectively identify different synthetic datasets.

Experiments on UCI Datasets
To test the performance of SNNISC further, eight real-word datasets are adopted from UCI datasets about classification and clustering [21][22][23][24][25][26][27][28][29], and the results are shown in Table 4. From the boldface in the Table 4, we observe that the clustering performance of SNNISC is superior to SSC and SNNSC on four datasets in addition to "Breast Tissue" and "Data Bank".In particular, for the dataset "Iris", one cluster is linearly separable from the other two nonlinearly clusters, which is challenging for clustering algorithms.Although the ARI value of SSC and SNNSC can reach to about 83%, SNNISC can achieve 92%.On dataset "Seeds", SNNISC, SNNSC and SSC get the same ARI value (71%).
On dataset "Glass", the ARI value of SNNISC (24%) is less than SSC (27%), but better than SNNSC (23%).Meanwhile, it can be found that SNNISC is more stable, which is just less than the best result between 0.2%~0.3%.Therefore, we conclude that the SNNISC can improve the performance of the spectral clustering algorithm.

Conclusions
The construction of a similarity matrix is important for spectral clustering algorithms.In this paper, we propose an adaptive Gaussian kernel similarity measure and its corresponding spectral clustering algorithm.The algorithm introduces the importance of nodes from the complex networks and uses an iterative method to get the numerical value of the importance of different vertexes to scale the Gaussian kernel function.The new measure exploits the structural information of the neighborhood and local density information and reflects the idea that the greater the importance of the shared neighbors between two vertexes is, the more likely these two vertexes are to belong to the same cluster.From the experiments on different datasets, we observe that it achieves improvements over the self-tuning spectral clustering algorithm and the adaptive spectral clustering algorithm based on shared nearest neighbors on most datasets and that it is less sensitive to the parameters.In this paper, we mainly consider the impact on the similarity of the vertex with maximal importance in shared nearest neighbors, and one important future work is to investigate the impact of other vertexes in shared nearest neighbors.

Step5.
Treat each row of Y as a vertex in space k R and cluster them into K clusters via k-means or other clustering algorithms for the ultimate clustering results,

Table 1 .
The algorithm of similarity matrix based on the importance of shared nearest neighbors.

matrix based on the importance of shared nearest neighbors:
(7)tput: similarity matrix SNEW .Step1.Construct an adjacency matrix B of graph G according to Equation(7).The construction of adjacency matrix B can be similar to the  -neighborhood technique.1 ( , )

Table 3 .
The results of adjusted Rand index (ARI) on synthetic datasets.SSC, self-tuning spectral clustering; SNNSC, spectral clustering based on shared nearest neighbors, SNNISC, the adaptive spectral clustering algorithm based on the importance of shared nearest neighbors.

Table 4 .
The results of ARI on the UCI Machine Learning Repository.

Table 5 .
Mean and standard deviation of ARIs of different spectral clustering methods.