Multiview Data Clustering with Similarity Graph Learning Guided Unsupervised Feature Selection

In multiview data clustering, consistent or complementary information in the multiview data can achieve better clustering results. However, the high dimensions, lack of labeling, and redundancy of multiview data certainly affect the clustering effect, posing a challenge to multiview clustering. A clustering algorithm based on multiview feature selection clustering (MFSC), which combines similarity graph learning and unsupervised feature selection, is designed in this study. During the MFSC implementation, local manifold regularization is integrated into similarity graph learning, with the clustering label of similarity graph learning as the standard for unsupervised feature selection. MFSC can retain the characteristics of the clustering label on the premise of maintaining the manifold structure of multiview data. The algorithm is systematically evaluated using benchmark multiview and simulated data. The clustering experiment results prove that the MFSC algorithm is more effective than the traditional algorithm.


Introduction
Various application types correspond to various network attributes that describe individuals and groups from different perspectives.These networks are represented as multiview feature spaces.For example, when uploading photos to Flickr, users are required to offer labels and related text.In other words, photos can be represented by three view feature spaces: photo content, label, and text description spaces.
Multiview data can integrate these view spaces and use correlation to obtain more accurate network representations.Currently, multiview data are usually described in the form of graphs, such as Gaussian function graphs, k nearest neighbor graphs [1], and graphs based on subspace clustering [2,3].For the selection of the correct neighborhood size and the processing of points near the intersection of the subspace, subspace clustering based on self-representation is superior to other graph-based representation methods.Nie et al. developed a multiview clustering [4,5] algorithm that can perform spectral clustering of an information network of multiple views by constructing a multiview similarity matrix.The multiview clustering algorithm [6] proposed by Bickel et al. uses spherical k-means multiview clustering.Pu et al. advanced the multiview clustering algorithm [7] based on matrix decomposition, which regularizes the similarity matrix using multiview manifold regularization, to merge the inherent and nonlinear structure of the network in every view.The aforementioned methods provide an idea regarding the relationships between multiview data that improve clustering performance [8] by constructing multiview similarity matrix clustering.However, the redundancy of multiview data has not yet been resolved.In addition, the calculation for constructing a multiview similarity matrix is complicated and unsuitable for large-scale multiview data.
Feature selection [9] obtains the low-dimensional feature subspace representation of the network by selecting features as well as removing noisy, irrelevant, and redundant features to preserve the inherent data structure.This is an effective method for handling large-scale high-dimensional networks.Most existing feature selection methods are based on single-view networks.Recently, the focus of unsupervised feature selection research has been on the study of multiview data.Zhang et al. [10] propose a formulation that learns an adaptive neighbor graph for unsupervised multiview feature selection.This formulation collaborates multiview features and discriminates between different views.Fang et al. [11] propose a novel approach that incorporates both cluster structure and a similarity graph.Their method utilizes multiview feature selection and an orthogonal decomposition technique, which breaks down each objective matrix into a base matrix and a clustering index matrix for each view.Cao et al. [12] present a cluster learning guided multiview unsupervised feature selection, which unified subspace learning, cluster learning, and feature selection into a framework.Tang et al. [13] propose a feature selection method based on multiview data that aims to maintain diversity and enhance consensus learning by utilizing cross-view local structures.Liu et al. [14] propose a framework for guided unsupervised feature selection, which utilizes consensus clustering to generate pseudo cluster indexes for the purpose of feature selection.
There are two modes of feature selection in multiview networks.One is the serial mode, which is a feature selection method that seriates the connection multiview feature space into a feature space and then selects the features.The other is the parallel mode, which involves performing traditional feature selection on each view simultaneously.In more detail, the serial mode ignores the differences between heterogeneous feature spaces, so its performance is relatively poor.The parallel mode considers the correlation between multiple view spaces with relatively better performance.Research on the unsupervised feature selection of multiview data without labels poses a significant challenge.For the traditional unsupervised feature selection method, the feature distribution selected by the Laplacian score [15] method agrees with the sample distribution, which can perform a good regional classification and reflect the inherent manifold structure of data.However, the correlation between the features is not evaluated, resulting in the selection of redundant features.In the MFSC method, spectral analysis retains the internal structure and L 2,1 uses feature selection coefficients to select the best features.Therefore, the selected features retain the clustering structure of the data.
The MFSC algorithm proposed in this study makes the following contributions: 1.
Compared with a single-view dataset that concatenates multiview data, the parallel use of multiview datasets from real-world social media sites significantly improves the accuracy of data representation.

2.
In integrated subspace clustering and feature selection, the clustering label and representative coefficient matrix are flow regularizations.Furthermore, to obtain a more suitable feature selection matrix, the a priori of the manifold structure is embedded in the feature selection model.

3.
In the construction of the parallel mode multiview feature selection algorithm, noisy, irrelevant, and redundant features are removed to preserve the inherent data structure and improve the efficiency and quality of feature selection based on clustering, which is more suitable for multiview data.
The rest of this paper is organized as follows.Section 2 introduces the basic studies related to the MFSC algorithm.Section 3 presents the MFSC model and its optimization iterative process in detail, and it theoretically proves the convergence and complexity of the algorithm.Section 4 reports the parameter sensitivity and performance analysis of MFSC on typical datasets, as well as the results of comparison experiments with some single-view or multiview feature selections.Section 5 presents the results of this study and the future work.

Multiview Subspace Representation
Let X (v) be the data sample node of the v-th view and S (v) be its representative coefficient matrix.Each data point in the subspace union can be reconstructed effectively by combining the other points in the dataset.Given the data X based on the group effect [16], for representation coefficients S i and S j of the samples, X i and X j are similar and so are S i and S j .The multiview representation of traditional sparse subspace clustering (MVSC) [17] is defined as follows: MVSC can well capture the self-representation matrix in the multisubspace k-nearest neighbor graph structure.Similar structure graph 2 of the v-th view can learn the multisubspace structure when there are noise, abnormal values, and damaged or missing entries in the data.

Multiview Unsupervised Feature Selection
Most existing studies on multiview learning [18] assume that all views share the same label space and that these views are related to each other through the label space.It is well known that the main difficulty of unsupervised feature selection is the lack of class tags.Consequently, the concept of a pseudo-class label is introduced to guide the development of the framework using the relationship between views, which is defined as follows: where the v-th view has a mapping matrix W (v) that assigns the pseudo-class label C to the data points.Based on the assumption that the view is associated with the shared label space, each pseudo-class label allocation matrix (X (v) ) T W (v) is approximated such that it is close to the pseudo-class label matrix.The l 2,1 norm [19] is added to W i to ensure sparseness in the W i row and feature selection.In addition, the l 2,1 norm is convex, making the optimization easier.

Multiview Manifold Structure
The greater the similarity value of the two data points, the more similar the clusters.A similar structure graph with k unconnected cluster subspaces can be directly learned and it is defined as follows: where clustering label C ∈ R n×k and Laplacian matrix

2
. It is known that MHOAR [20] points out that the properties of the L matrix of nonnegative matrix S are shown in Theorem 1.
Theorem 1.The number of the eigenvalues 0 of normalized L is equal to the number of connected subspaces of S. Therefore, rank(L) = n − k.According to the Ky Fan theorem [21], using σ i (L) to represent the i-th smallest eigenvalue of L, then σ i (L) ≥ 0 and rank(L) = n − k.Therefore, ∑ k i=1 σ i (L) = arg min C∈R n×k ,C T C=I k tr(C T LC).

Proposed Model
This section contains an introduction to the MFSC model: an explanation of the iterative optimization implementation process, algorithm, proof of convergence, and analysis of algorithm complexity.An illustration of the MFSC model is shown in Figure 1.Multiview unsupervised feature selection, similarity graph learning, and clustering index learning are achieved in the parallel mode.MFSC reduces the redundancy and irrelevant influence of multiview data and uses the clustering index as the feature selection standard to ensure that the clustering structure remains unchanged.

MFSC Model
v=1 denotes the data of the v-th view, d (v) denotes the feature number of the v-th view, and n denotes the number of data.The feature selection matrix is where The model independently learns the S (v) of each view instead of directly using the S (v) calculated by the kernel function.Using the similarity graph of self-representation learning based on the manifold structure, the multisubspace structure of the data can be effectively reflected.By integrating subspace similarity graph learning and feature selection, the pseudo-class label C can capture the relationship between the views to obtain a robust and clean pseudo-class label.Row sparsity is achieved by applying the l 2,1 [22] constraint to W (v) .Figure 1 shows the feature selection based on the parallel mode that iteratively updates the similarity matrix {S (v) } m v=1 , the feature selection matrix {W (v) } m v=1 , and the pseudo label matrix C.

Optimization Calculation Process and Algorithm Representation
This section first introduces the effective implementation of the iterative method to solve the optimization calculation in Equation ( 4).In the implementation process, W (v) , S (v) , and C are updated iteratively to obtain the specific implementation process of the MFSC algorithm.

Update W (v) :
To effectively calculate the feature selection matrix W (v) , irrelevant items S (v) and C are fixed.The objection equation can be rewritten as follows: Given that this equation is nondifferentiable [19], the equation is transformed into: where D (v) denotes a diagonal matrix and the j-th diagonal element is Calculation process: The updated rules for W (v) are as follows: with , where f i is the i-th row vector of matrix F.
To effectively calculate the clustering label C, irrelevant items W (v) and S (V) are fixed.The objection equation can be rewritten as follows: Based on the properties of the matrix trace tr(X T Y) = tr(XY T ) and Theorem 2, it is known that P is a symmetric matrix; then, where suppose ) is equivalently expressed as follows: Given that tr((S (v) ) T P) = ∑ i S (v) (i, :)P(:, i), where S (v) (i, :) denotes the i-th row vector of S (v) and P(:, i) denotes the i-th column vector of P.Then, Subsequently, the objective vector expression for S (v) (i, :) is obtained as follows: Similarly, the objective vector expression can also be expressed in the following form.There is only a constant difference between the two forms: According to dS (v) (i,:)(k) = 0 , the solution is as follows:

Update C:
To effectively calculate the clustering label C, W (v) and S (V) are fixed, and irrelevant items are ignored.The optimization formula can be rewritten as follows: To remove the orthogonal constraint, a penalty term p||C T C − I k || 2 F is added to function (20).The following optimization functions are available: The Lagrangian operator φ is introduced to remove the inequality constraints and the following Lagrangian function is obtained: Take ω(C, φ) to the derivative of C, then: Thus, φ is obtained as follows: Based on the Karush-Kuhn-Tucker condition [23] φ ij C ij = 0, the following equation is obtained: The following update formulas are obtained: After updating C, C must be regularized to ensure that it satisfies the following constraint: C T C = I k .

Convergence
Theorem 3. The iterative optimization process J1(W (v) ) automatically reduces the objective function value until it converges.
Proof.The first term of J1(W (v) ) : Ω(X (v) , F and its Hessian matrix value is: The second term of and its Hessian matrix value is Therefore, Φ(W Then, J1(W ).The proof is completed.Proof.Other variables are fixed such that the objective function J1(W (v) ) is related to W (v) .Theorem 3 proves that, under the update rule, the objective value of J1(W (v) ) is automatically reduced: Given the other fixed variables, the objective function J2(W (v) ) is related to S (v) .Then, the Hessian matrix of 1 ≥ 0, which is a positive semidefinite matrix.Therefore, Fix other variables and update C; the Hessian matrix of the objective function The proof is complete.

Complexity Analysis
In this section, the time complexity of the three subproblems in the optimization model is calculated: ) and its inverse matrix requires O(n(d (v) ) 3 ).The time complexity of term (X (v) In subproblem J2(S (v) ), each row of S (v) requires matrix multiplication and the time complexity is O(n 2 d (v) ).Therefore, the total time complexity of the subproblem is O(∑ m v=1 n 3 d (v) ).In subproblem J3(C), the calculation of term (X (v) ) T W (v) requires O(d (v) × n × k), and the calculation of terms L (v) C and CC T C requires O(n 2 k).The total time complexity is

Experiment
This section conducts an evaluation experiment on the MFSC algorithm using some kinds of benchmark multiview datasets and its performance is compared with those of other related algorithms.

Dataset
The evaluation experiment of the MFSC algorithm was conducted on 5 real multiview datasets: news dataset 3sources, paper dataset Cora, information retrieval and research dataset CiteSeer, website dataset BBCSport, and blog website dataset BlogCatalog.Table 1 summarizes the 5 datasets.In addition, the specific information is as follows: 1.
3sources The news dataset comes from three online news sources: BBC, Reuters, and Guardian.All articles are placed within the text.Out of a total of 948 articles from three sources, 169 are adopted.It is noteworthy that each article in the dataset has a main theme.

2.
Cora The paper dataset contains a total of 2708 sample points, which are divided into 7 categories.Each sample point is a scientific paper.A paper comprises a 1433dimensional word vector.

3.
CiteSeer The papers in the information retrieval and research dataset are divided into six categories, containing a total of 3312 papers, and records the citation or citation information between the papers.Through sorting, 3703 unique words are obtained.

4.
BBCSport The website dataset comes from 544 dataset points of the BBC sports website, including sports news related to 5 subject areas (athletics, cricket, football, rugby, and tennis) and 2 related views.

5.
BlogCatalog BlogCatalog is the social blog directory which manages the bloggers and their blogs.The data consists of 10,312 articles, divided into 6 categories, each article with two views: blog content and its related tags.

Benchmark Method
MFSC is compared with the following algorithms.

1.
LapScore (the Laplacian score function) selects features with strong separability, where the distribution of feature vector values is consistent with the sample distribution, thereby reflecting the inherent manifold structure of the data.

2.
Relief is a multiclass feature selection algorithm.The larger the weight of the feature, the stronger the classification ability of the feature.Features with weights less than a certain threshold are removed.

3.
MCFS [24] (a multiclustering feature selection) algorithm uses the spectral method to preserve the local manifold topology and selects features using a method that can preserve the clustering topology.4.
PRMA [7] (probabilistic robust matrix approximation) is a multiview clustering algorithm with robust regularization matrix approximation.Powerful norm and manifold regularization are used for regularization matrix factorization, making the model more distinguishable in multiview data clustering.5.
SCFS [3] (subspace clustering-based feature selection) is an unsupervised feature selection method based on subspace clustering that maintains a similarity relation by learning the representation of the low-dimensional subspace of samples.7.
JMVFG [11] (joint multiview unsupervised feature selction and graph leaning) proposed a unified objective function that can simultaneously learn clustering structure, and global and local similarity graphs.8.
CCSFS [12] (consensus cluster structure guided multiview unsupervised feature selection) unifies subspace learning, clustering learning, consensus learning, and unsupervised feature selection into one optimization framework for mutual optimization.

Evaluation Metrics
ACC (Accuracy) is used to compare the obtained cluster labels cluster_label i with the real cluster labels truth_label i .The ACC is defined as follows: where m denotes the total number of data samples.NMI (Normalized Mutual Information) is the mutual information entropy between the obtained and real cluster labels; it is defined as follows: where n i denotes the sample number of cluster C i (1 ≤ i ≤ K) and n i,j denotes the sample number in both cluster C i and category C j .

Results of Multiview Clustering
Tables 2 and 3 show the ACC and NMI values of the different feature selection and multiview clustering methods.To determine the impact of the benchmark feature selection method on clustering, this experiment first merges the results of multiview feature selection into new data and then executes k-means.The final value is the average value of the clustering of different feature selection values.Based on the experimental results, MFSC performs well on both ACC and NMI, which proves the effectiveness of the algorithm.

Parameter Analysis
To achieve peak clustering performance, we tune parameters α, β, and f eature#.Thus, we alter their values to see how they affect the ACC and NMI of clustering for 3sources data, Cora data, CiteSeer data, BBCSport data and BlogCatalog data.
Figures 2-4 show the clustering experiment results of parameters α, β, and f eature# in the 3sources dataset.
Figure 2 shows the change description of α, β, and the clustering indexes ACC and NMI in 3source.The average value is taken as the final result.Based on the ACC and NMI results in 3source, the MFSC algorithm is sensitive to parameters α and β.When parameter α is small, the performance of ACC is relatively high.When parameter β is large, the performance of NMI is relatively high.Figure 3 shows the change description of parameter α and f eature# and the values of clustering indexes ACC and NMI from 3source.In most cases, when the parameter α = 0.001, the ACC and NMI of the MFSC exhibit better performance in the feature selection dimension, which shows the importance of capturing the multiview manifold structure and embedding it into the feature selection model.
Figure 4 shows the change description of parameter β and feature number and the clustering performance ACC and NMI values from 3source.It can be concluded that the MFSC is sensitive to the selected feature number.As the value of feature selection increases, the ACC and NMI increase.In most cases, when the parameter β = 10,000, the ACC and NMI of the MFSC exhibit better performance.To ensure the sparsity of matrix W, the larger the value of the feature selection, the greater the importance and the stronger the clustering performance.Figure 5a shows that the ACC is insensitive to parameter α and insensitive to parameter β on interval β ∈ [0.000001, 0.001] or β ∈ [0.01, 1, 10, 100, 10000].Figure 5b shows that the NMI is insensitive to parameter α but is sensitive to parameter β.When β ≥ 1, the NMI value is larger.Figure 6 shows the clustering results of parameter α and f eature# in Cora dataset.As depicted in Figure 6a, when parameters α and f eature# increase, the ACC value increases.In Figure 6b, f eature# is sensitive to NMI value, while the overall relative value of f eature# is larger and the NMI value is larger.Figure 8a shows that the ACC is insensitive to parameters α and β, but Figure 8b shows that the NMI is insensitive to parameters α ∈ {0.001, 0.01, 1, 10, 100} and β ∈ {0.01, 0.1, 1, 100}; when the two parameters are larger or smaller, the NMI value exhibits a small fluctuation.Figure 9 shows the clustering results of parameters α and f eature# in the CiteSeer dataset.As illustrated in Figure 9a, the magnitude of the ACC value difference is 0.05 and the ACC is insensitive to parameter α.For f eature# = 300, the ACC has better performance.As shown in Figure 9b, the NMI is slightly sensitive to parameters α and β.For α = 1, when f eature# = 200, a larger NMI is achieved.Figure 10 shows the clustering results of parameters β and f eature# in the CiteSeer dataset.As demonstrated in Figure 10a, the magnitude of the ACC value difference is 0.05 and the ACC is insensitive to parameter β.In general, the NMI performance is better when β ∈ {0.001, 0.01, 1}.The ACC performance is stabler when f eature# > 200. Figure 10b shows that the NMI results are almost insensitive to parameter β and f eature# in the CiteSeer dataset.When β = 0.001, the NMI result is greater.Figure 11a shows that the ACC is insensitive to parameters α and β in the BBCSport dataset.Figure 11b shows that the NMI is insensitive to parameter α but the NMI changes slightly when β ≥ 1.However, the NMI results have a peak when β = 1.    Figure 14a shows that ACC is insensitive to α but, when β = 1000, its performance is better.Figure 14b shows the NMI performance for parameters α and β.NMI is not very sensitive to α and β, and, when β is larger and α is smaller, NMI is relatively larger.
Figure 15a shows the ACC performance with parameters α and f eature#.When f eature# = 1500 and α > 10, the ACC performance is better.Figure 15b shows the NMI decrease with parameter f eature# and NMI is not sensitive to α. Figure 15 indicates that higher f eature# is not necessarily better for BlogCatalog data.Figure 16 shows that β is sensitive to ACC and NMI, and f eature# is sensitive to clustering performance.In Figure 16a, when f eature# ≥ 1500 and β ≥ 1, the ACC is better.In Figure 16b, NMI increases with parameter β; when β ≥ 10, NMI has a greater value.Parameter sensitivity remains a challenging and unsolved problem in feature selection.This experiment analyzes the sensitivity of parameters α, β, and f eature#.We performed similar parameter sensitivity analyses for the data sources.The results show that MFSC is almost insensitive to parameters α and β for ACC performance.This shows the importance of capturing the multiview manifold structure embedded in the feature selection model.However, the MFSC is sensitive to f eature#.This is because the network size affects the number of feature selections.

Convergence Analysis
The convergence effects of the datasets are shown in Figure 17.In addition, the convergence effect of the remaining data is similar.Based on the experimental results, the convergence effect is relatively good.The objective function increases as the number of

Conclusions and Future Work
This study proposes a multiview clustering-guided feature selection algorithm for multiview data, which integrates subspace learning and feature selection, and embeds the norm of manifold regularization.This feature selection algorithm reduces the influence of redundancy and the irrelevant matrix of the multiview data.In addition, clustering is used as the standard for feature selection.This algorithm can perform feature selection to ensure that the clustering structure remains unchanged.It is noteworthy that the complementary contribution of each view is fully considered.The optimization process is calculated and theoretically analyzed, and experiments are performed using a multiview dataset.It can be concluded that the algorithm is effective and superior to many existing feature selection algorithms or multiview clustering algorithms.
Although our method achieves good clustering performance, on the one hand, we mainly consider social network data, while other types of multimodal data graph structures are not considered.On the other hand, some parameters need to be manually adjusted.Recently, deep learning has demonstrated excellent feature extraction capabilities in multiview data, such as images and natural languages.In the future, we will study how to integrate deep learning and the MFSC model to process multiview data and accurately describe semantic information.

Figure 1 .
Figure 1.Overall framework based on the parallel mode in MFSC.
, and the subspace representation coefficient S (v) ∈ R n×n , where k denotes the cluster number.The MFSC model is defined as follows:

Theorem 4 .
The iterative optimization process of Algorithm 1 automatically reduces the value of the objective function (4) until it converges.

Figure 2 .
Figure 2. (a) ACC values of parameter α and parameter β for 3sources data.(b) N MI values of parameter α and parameter β for 3sources data.

Figure 3 .
Figure 3. (a) ACC values of parameter α and parameter f eature# for 3sources data.(b) N MI values of parameter α and parameter f eature# for 3sources data.

Figure 4 .
Figure 4. (a) ACC values of parameter β and parameter f eature# for 3sources data.(b) N MI values of parameter β and parameter f eature# for 3sources data.

Figure 5 .
Figure 5. (a) ACC values of parameter α and parameter β for Cora data.(b) N MI values of parameter β and parameter f eature# for Cora data.

Figure 6 .
Figure 6.(a) ACC values of parameter α and parameter f eature# for Cora data.(b) N MI values of parameter β and parameter f eature# for Cora data.

Figure 7 Figure 7 .
Figure 7 depicts the clustering results of parameters β and f eature# in the Cora dataset.As depicted in Figure7a, for β ≤ 0.01, the ACC increases as f eature# increases; otherwise, for β > 0.01 the ACC value remains basically unchanged.Figure7bshows that, when f eature# = 100 and 500, the NMI value is larger.

Figure 8 .
Figure 8.(a) ACC values of parameter α and parameter β for CiteSeer data.(b) N MI values of parameter α and parameter β for CiteSeer data.

Figure 9 .
Figure 9. (a) ACC values of parameter α and parameter f eature# for CiteSeer data.(b) N MI values of parameter α and parameter f eature# for CiteSeer data.

Figure 10 .
Figure 10.(a) ACC values of parameter β and parameter f eature# for CiteSeer data.(b) N MI values of parameter β and parameter f eature# for CiteSeer data.

Figures 11 -
Figures 11-13 show the clustering experiment results of parameters α, β, and f eature# in the BBCSport dataset.Figure11ashows that the ACC is insensitive to parameters α and β in the BBCSport dataset.Figure11bshows that the NMI is insensitive to parameter α but the NMI changes slightly when β ≥ 1.However, the NMI results have a peak when β = 1.

Figure 11 .
Figure 11.(a) ACC values of parameter α and parameter β for BBCSport data.(b) N MI values of parameter α and parameter β for BBCSport data.

FigureFigure 12 .
Figure 12a shows the clustering ACC of parameters α and f eature# in the BBCSport dataset.The magnitude of the ACC value difference in this figure is 0.05, and the ACC is

FigureFigure 13 .
Figure 13a shows the clustering ACC of parameters b and and f eature# in the BBCSport dataset.The magnitude of ACC value difference in this figure is 0.05, and the ACC is insensitive to parameters β and f eature# in the BBCSport dataset.Comparatively, it has high ACC with f eature# ≥ 300 and a = 10,000.Figure 13b shows the clustering NMI of parameters b and f eature# in the BBCSport dataset.NMI is sensitive to f eature#, and NMI has a greater value when f eature# ≥ 300 and b ≥ 100.

Figures 14 -
Figures 14-16 show the clustering results of parameters α, β, and f eature# in BlogCatalog dataset.Figure14ashows that ACC is insensitive to α but, when β = 1000, its performance is better.Figure14bshows the NMI performance for parameters α and β.NMI is not very sensitive to α and β, and, when β is larger and α is smaller, NMI is relatively larger.Figure15ashows the ACC performance with parameters α and f eature#.When f eature# = 1500 and α > 10, the ACC performance is better.Figure15bshows the NMI decrease with parameter f eature# and NMI is not sensitive to α. Figure15indicates that higher f eature# is not necessarily better for BlogCatalog data.

Figure 14 .Figure 15 .
Figure 14.(a) ACC values of parameter α and parameter β for BlogCatalog data.(b) N MI values of parameter α and parameter β for for BlogCatalog data.

Figure 16 .
Figure 16.(a) ACC values of parameter β and parameter f eature# for BlogCatalog data.(b) N MI values of parameter β and parameter f eature# for BlogCatalog data.

Table 1 .
Statistical table of typical datasets.

Table 2 .
ACC of different methods on typical datasets.

Table 3 .
NMI of different methods on typical datasets.