Kernel Block Diagonal Representation Subspace Clustering with Similarity Preservation

: Subspace clustering methods based on the low-rank and sparse model are effective strategies for high-dimensional data clustering. However, most existing low-rank and sparse methods with self-expression can only deal with linear structure data effectively, but they cannot handle data with complex nonlinear structure well. Although kernel subspace clustering methods can efﬁciently deal with nonlinear structure data, some similarity information between samples may be lost when the original data are reconstructed in the kernel space. Moreover, these kernel subspace clustering methods may not obtain an afﬁnity matrix with an optimal block diagonal structure. In this paper, we propose a novel subspace clustering method termed kernel block diagonal representation subspace clustering with similarity preservation (KBDSP). KBDSP contains three contributions: (1) an afﬁnity matrix with block diagonal structure is generated by introducing a block diagonal representation term; (2) a similarity-preserving regularizer is constructed and embedded into our model by minimizing the discrepancy between inner products of original data and inner products of reconstructed data in the kernel space, which better preserve the similarity information between original data; (3) the KBDSP model is proposed by integrating the block diagonal representation term and similarity-preserving regularizer into the kernel self-expressing frame. The optimization of our proposed model is solved efﬁciently by utilizing the alternating direction method of multipliers (ADMM). Experimental results on nine datasets demonstrate the effectiveness of the proposed method.


Introduction
In real applications, high-dimensional datasets need to be processed [1,2]. In recent years, subspace clustering developed as a hot research topic and was introduced to process high-dimensional data efficiently [3][4][5]. Subspace clustering has been widely applied in many application areas, such as computer vision [6], image representation and compression [7], motion segmentation [8], and face clustering [9]. The existing subspace clustering methods can be divided into five categories: iterative models [10]; statistical models [11]; algebraic models [12]; spectral clustering-based models [1,[13][14][15]; and deep learning-based models [15,16]. Notably, the spectral clustering-based subspace clustering approaches have achieved outstanding performance. These methods have two main parts: (1) affinity matrix construction; (2) spectral clustering [17]. Recently, many spectral clustering methods have emerged [13,14,[18][19][20][21], and the two most representative models are sparse subspace clustering (SSC) [14] and low-rank representation (LRR) [22]. SSC and LRR have obtained great success in many research areas. These methods based on sparse or low-rank representation techniques can obtain a coefficient matrix by using a self-expression model. The coefficient matrix is named the similarity matrix [23]. The similarity matrix is adaptively calculated and can effectively avoid the drawback that the number of neighbors is difficult to set for a graph in subspace clustering methods. However, all these methods can only process linear subspaces and they cannot handle the nonlinear data; but in fact, most data are nonlinear Appl. Sci. 2023, 13, 9345 2 of 16 structures in the real world [24,25]. In addition, although a lot of recent clustering studies based on deep learning are proposed [15,16], the topics of these research studies are not the emphasis of this paper.
To overcome the drawback that the existing linear subspace clustering methods cannot deal with nonlinear data, some kernel self-expression methods [26][27][28][29][30][31] extended linear subspace clustering to nonlinear subspace clustering by employing the "kernel strategy", where the linear subspace clustering can be carried out. The two typical methods are kernelized SSC (KSSC) [30] and kernelized LRR (KLRR) [14]. KSSC and KLRR have captured nonlinear structure information in the input space and have made the most development in many applications. Although these kernel subspace clustering methods based on kernel self-expression can efficiently process the nonlinear structure data, some similarity information between samples may be lost when reconstructing the original data in the kernel space. In practice, real data with manifold structures present complex structures aside from sparse or low rank [32]. Hence, it is essential to construct a representation that can sufficiently implant the rich structure information of original data.
Many methods discover the underlying structure by exploring data relations [33][34][35]. Some subspace clustering methods based on structure learning have been proposed, such as similarity learning via kernel-preserving embedding (SLKE) [35] and structure learning with similarity preserving (SLSP) [36]. SLKE constructs a model by retaining the similarity information between data, which conveys better performance. SLSP constructs a structure learning framework that contains the similarity information of original data, which overcomes the drawback that the SLKE algorithm might lose some low-order information of original data. In summary, although these methods can obtain good performance, they are subject to the learned similarity matrix that does not have an optimal block diagonal structure. Hence, the construction of the block diagonal similarity matrix is a hot topic that recently emerged in the field of subspace clustering methods.
In the research works of the literature [8,[14][15][16][17][18][19][20][21][22], all kinds of norm regularization terms in self-expressive models have been used to learn a block diagonal coefficient matrix, such as l 1 norm, l 2 norm, and nuclear norm. However, those regularization methods have two drawbacks: one is that the number of blocks in the coefficient matrix cannot be controlled, and another is that the learned coefficient matrix may not be an optimal block diagonal because of the noise in the data. To remedy these drawbacks, the block diagonal representation (BDR) subspace clustering algorithms [13,27,[37][38][39][40][41] are proposed to pursue a "good" block diagonal structure of the coefficient matrix, such as implicit block diagonal low-rank representation (IBDLR) [40]. IBDLR constructs a novel model by integrating the block diagonal prior and implicit feature representation into the low-rank representation model. These algorithms based on the block diagonal representation gradually improve clustering performance. However, these subspace clustering methods based on BDR are not introduced into the similarity-preserving mechanism.
To resolve the above problems, this paper first constructs a representation, which can better capture the rich structure information in original data. Second, to promote the affinity matrix from the self-expression learning framework to obey a "good" block diagonal structure for spectral clustering, we introduce the block diagonal representation (BDR) term to our model. Third, the kernel self-expressing framework is introduced into our model, which can efficiently process the nonlinear structure data. By fully considering these issues, we propose a novel robust subspace clustering method termed kernel block diagonal representation subspace clustering with similarity preservation (KBDSP) by embedding the block diagonal representation term and similarity-preserving regularizer into the kernel self-expressing framework. Experiments on nine datasets prove the effectiveness and robustness of our proposed KBDSP method. To sum up, the main contributions of this work can be summarized as follows: (1) To capture the nonlinear structure of original data, a subspace clustering frame based on kernel self-expressing is introduced into our model.
(2) To better preserve the similarity information between original data, a similaritypreserving regularizer is constructed and introduced to our model by minimizing the difference between inner products of original data and inner products of reconstructed data in the kernel space. (3) To obtain the similarity matrix with optimal block diagonal structure, a block diagonal representation is introduced to our model. Then, the optimal block diagonal matrix is directly captured by optimizing the objective function, overcoming the drawback that sparse or low-rank-based subspace clustering methods cannot obtain the optimal block diagonal matrix. (4) Our KBDSP model is proposed by integrating the block diagonal representation term and similarity-preserving regularizer into the kernel self-expressing frame. The proposed optimization model is solved by applying the alternating direction method of multipliers (ADMM).
This paper includes the following five sections. Self-expression-based subspace clustering, kernel-based subspace clustering, and block diagonal representation term are briefly introduced in Section 2. Section 3 describes the proposed KBDSP method in detail. Section 4 presents the experimental results on nine datasets. Section 5 gives the conclusions.

Self-Expression-Based Subspace Clustering
The self-expression-based subspace clustering methods aim to express each data point as a linear combination of all other data points in the same subspace [11]. Its general model is defined as: min where R(Z) denotes the regularization term, and λ > 0 denotes a trade-off parameter. Z is delivered by solving the specific model and a balanced affinity matrix is constructed by (Z T + Z)/2, which is very important to design a proper regularizer for promoting the clustering performance. Z * , Z 1 , and Z 2 F are three common regularization terms [39]. To the best desirable outcome, the coefficient matrix Z is an optimal block diagonal matrix. However, the coefficient matrix Z is influenced by noise and is usually not a block diagonal.

Kernel-Based Subspace Clustering
To complete our framework more generally, we introduce the kernel self-expressing model for subspace clustering. Based on kernel function mapping, Equation (1) can be mapped into higher dimensional kernel space, where nonlinear relational data can be transformed into linear relational data. The optimization model of kernel self-expression subspace clustering is expressed as: where φ(X) is a kernel function mapping, K is a kernel matrix whose elements are computed as K i,j = φ(X i ) T φ(X j ), and R(Z) is the regularization term. In Equation (2), the Frobenius norm is used to calculate the representation error. Nevertheless, when there are outliers in data, the Frobenius norm is not robust. To solve the problem of clustering robustness, robust kernel low-rank representation (RKLRR) [14] is put forward, in which a closed-form solution is provided for the challenge problem. Although RKLRR can solve the problem of clustering robustness, the learned affinity matrix is not ensured to be an optimal block diagonal.

Block Diagonal Regularizer
The BDR algorithm directly pursues the block diagonal matrix by introducing a block diagonal regularization term and obtains better clustering performance [27]. The optimization model of the BDR algorithm can be expressed as: where X represents a data matrix, Z represents a coefficient matrix, and Z k represents a k-block diagonal regularizer.

KBDSP Model
To preserve the similarity information between samples and simultaneously obtain a similarity matrix with optimal block diagonal structure, we propose a novel subspace clustering method (KBDSP). Inspired by Kang et al. [39], we preserve the similarity information by minimizing the difference of two inner products: one refers to the inner products between original data in the kernel space, and another refers to the inner products of reconstructed data in the kernel space. Based on this, the optimization problem is expressed as: min After, this can be simplified as To effectively deal with the nonlinear structure data, we adopt the kernel self-expression subspace clustering framework. In the meantime, we introduce the block diagonal regularization term into the kernel self-expression framework in order to obtain the similarity matrix with block diagonal structure. Thus, our proposed kernel block diagonal representation subspace clustering with similarity preservation (KBDSP) model is An auxiliary matrix B and a regularization term Z − B 2 F are drawn into our model for separating the variables. Thus, the optimization problem (6) can be translated to where α, β and γ are the non-negative trade-off parameters.

Optimization of KBDSP
To facilitate the solution of the problem (7), we transform it to the following equivalent problem by importing three new auxiliary variables: The problem (8) is solved by using ADMM. The corresponding augmented Lagrangian function is where λ 1 , λ 2 , λ 3 are Lagrangian multipliers and µ > 0 is a penalty parameter. These variables can be updated alternatively.
(1) Updating J After deleting the irrelevant terms, the problem (9) is written as: Let the first derivative be equal to zero, we obtain: Let the first derivative be equal to zero, we have: Let the first derivative be equal to zero, we have: Let the first derivative equal to zero, we have: For B, the subproblem is: According to the Ky Fan Theorem [42], problem (18) can be rewritten as: where A = UU T , U consists of k eigenvectors associated with the k smallest eigenvalues of Diag (B1) − B. I denotes the identity matrix and 1 denotes a column vector with all elements being 1.
For Equation (19), it is equivalent to: Once we obtain the matrix B, we can calculate the similarity matrix by (|B| + |B| T )/2, and then, we can obtain the clustering results by applying the spectral clustering algorithm [20]. The updating process is terminated when the convergence condition is reached, where ε is the stop threshold. To make the process clear, the complete procedures to solve problem (7) is outlined as follows (Algorithm 1).

Experimental Results and Analysis
To verify the effectiveness of our KBDSP algorithm, we perform our experiments on nine widely used benchmark datasets and compare the results with state-of-the-art methods. resized to 26 × 26. The AR database is composed of more than 4000 frontal images from 126 individuals, 120 of which are used in our experiment [37]. The 120 individuals consist of 65 men and 55 women. The COIL20 dataset consists of 72 images of 20 individuals, and all the images are resized to 32 × 32 pixels. The BA dataset contains 1404 handwritten digits images, including digits "0" to "9" and uppercase letters "A" to "Z". The sizes of all these images are 20 ×16 pixels. The text datasets include TR11, TR41, and TR45. The detailed information of these datasets is presented in Table 1.

Comparison of the Methods
We compare our KBDSP method with several state-of-the-art methods, including SC [17], KSSC [30], KLRR [14], IBDLR [40], SLKEs [35], SLKEr [35], SLSPs [36], and SLSPr [36]. These state-of-the-art methods can be classified into three types: clustering methods based on similarity preservation, clustering methods based on kernel self-expression, and clustering methods based on block diagonal representation. To show faithful comparison, all parameters of these state-of-the-art methods are manually tuned to be the best. The parameter settings of all experimental methods have been shown in Table 2, in which the recommended parameters are indicated in bold.

Method
Parameter Settings

Evaluation Metrics
To effectively evaluate the KBDSP algorithm and other advanced algorithms quantitatively, three public evaluation metrics are adopted, which are clustering accuracy (ACC), normalized mutual information (NMI), and Purity [37].
Suppose n is the total number of sample points. We use y i andŷ i to denote the cluster label and the ground truth cluster label, respectively. Then accuracy is defined as follows: where δ(x, y) denotes the Kronecker delta function, and it takes the value 1 if x and y are equal, otherwise it takes the value 0. The map(·) is a mapping function by which each cluster label can be mapped to ground truth cluster label according to the Kuhn-Munkers algorithm.
We use S andŜ to denote the set of clustering that generated from a clustering algorithm and the ground truth set of clustering, respectively. Then NIM is defined as: where p(s) and p(ŝ) are the marginal probability distribution functions of S andŜ, respectively. p(s,ŝ) denotes the joint probability density functions of S andŜ. H(·) represents the entropy function. The larger the value of NMI, the higher similar S andŜ are.
Purity is used to measure the overall precision of the clusters, and it is defined as The higher these three metrics are, the better the clustering performance.

Kernel Design
As with the literature [14], we design 12 kernels in this work which include 7 Gaussian kernels, 4 polynomial kernels and a linear kernel. The Gaussian kernel function is defined

Experimental Results and Analysis
We perform the proposed method on the nine datasets including Yale, JAFFE, ORL, AR, COIL20, BA, TR11, TR41, and TR45. We conduct all the experimental methods on the 12 kernels, and the average experimental results over 12 kernels are computed at each run of the experiment. The experiment is repeated ten times for each experimental method, and the average experimental results are displayed in Table 3. The best results are highlighted with boldface. As can be seen in Table 3, the proposed method presents the best performance in comparison with other methods in most cases, which shows the effectiveness of introducing block diagonal representation term and the similarity-preserving term in our model. Specifically, we obtain the following eight findings from the results listed in Table 3.
(1) SC is a representative clustering approach. Compared to the SC algorithm, our proposed KBDSP obtains better results in terms of three evaluation metrics, namely ACC, NMI, and Purity. From Table 3, we can see the average values of three evaluation metrics of KBDSP are 19%, 18%, and 22%, which are higher than SC, respectively. This is attributed to the advantage that the learned Z is an input of SC instead of the kernel matrix in SC. (2) KSSC and KLRR methods have made satisfactory results in many fields because they can exploit the nonlinear structure of original data. It is worth mentioning that our method outperforms these two methods in most cases. The superiority of KBDSP derives from our similarity-preserving strategy. (3) Compared to SLKEs and SLKEr, KBDSP obtains a higher performance. The reasons for performance improvement have two aspects: one is that our kernel self-expression frame can preserve some low-order information of input data, which is lost in SLKEs and SLKEr methods; the other is that we can learn a similarity matrix with block diagonal structure by introducing the block diagonal representation term to our model. (4) SLSPs and SLSPr can not only handle nonlinear datasets, but also preserve similar information. The experimental results of the two methods are shown in Table 3, from which we can see they have improved performance compared to SC, KSSC, KLRR, SLKEs and SLKEr. However, KBDSP still outperforms them in most instances. For example, from Table 3, we can see the average values of three evaluation metrics of KBDSP are 14%, 15%, and 2%, which are higher than SLSPs, respectively. These results prove that the introduced block diagonal representation term can help boost performance. (5) IBDLR and KBDSP enable the learning desired affinity matrix with an optimal block diagonal structure by importing block diagonal representation term. From Table 3, it can be seen that KBDSP and IBDLR have better performance than other algorithms except SLSPs on all datasets, which verifies that KBDSP and IBDLR methods with block diagonal representation term are effective. This indicates that the KBDSP and IBDLR are beneficial for datasets that have many classes, and that they can capture the block diagonal structure of data. For COIL20 and BA datasets, which have a greater number of instances, Table 3 shows that our KBDSP achieves nearly 13% higher than other compared methods apart from IBDLR on the COIL20 dataset. In addition, as can be seen in Table 3, we can observe that KBDSP outperforms IBDLR on the COIL20 dataset; this is because our method considers the similarity-preserving strategy, but the IBDLR method does not. (6) For the TR11, TR41, and TR45 datasets, which have high dimensions of features, SLSPr presents better performance than IBDLR because it introduces a similarity-preserving mechanism. Benefiting from the similarity-preserving strategy and block diagonal representation term, the proposed KBDSP consistently outperforms IBDLR and even SLSPr in most cases on the TR11, TR41, and TR45 datasets. This indicates KBDSP is beneficial, performing well with the amount of information provided by datasets that have rich features, such that KBDSP can exploit the intrinsic structure of data. (7) From Table 3, it can be seen that our proposed KBDSP method has the smallest standard deviation in almost all cases, which means that KBDSP has good stability. (8) The one-way ANOVA was used to test the significant difference in the Purity of compared methods. From Figure 1, we can see that the p-values for the Purity metric is less than 0.05, showing significant difference between performance of the proposed and existing methods.
In summary, the experimental results demonstrate that our method can beat others in almost all experiments. This is because our method introduces simultaneously the similarity-preserving mechanism, block diagonal representation term, and kernel selfexpressing model. Experimental results prove the proposed KBDSP can not only exploit the intrinsic structure and nonlinear structure of original data, but also obtain a coefficient matrix with an optimal block diagonal structure. Therefore, KBDSP can obtain better clustering performance compared to other state-of-the-art methods. similarity-preserving mechanism, block diagonal representation term, and kernel selfexpressing model. Experimental results prove the proposed KBDSP can not only exploit the intrinsic structure and nonlinear structure of original data, but also obtain a coefficient matrix with an optimal block diagonal structure. Therefore, KBDSP can obtain better clustering performance compared to other state-of-the-art methods.

Computational Complexity Analysis
We analyze the computational complexity of our proposed KBDSP method. KBDSP mainly includes two steps. The first step is to construct the kernel matrix, which has O(n 2 ) complexity. The second step is to implement Algorithm 1. The computational complexity of the second step is mainly determined by complexity of updating J, G, H, A, Z and B.

Robustness Experiments
In this section, we verify the robustness of our KBDSP. As shown in Figure 2, pixels of a certain percentage in each image of the Yale dataset were corrupted Gaussian noise with a mean 0 and a variance 0.1. The percentage of corrupted pixels in each image varies randomly from 10% to 90% with a step of 20%. To ensure the reliability of the results, we repeat experiments for each method 10 times and report the average results. The ACC metric is used for evaluation. The experiments' results are shown in Figure 3. As can be seen from Figure 3, the ACC of all methods decreases significantly when more pixels are corrupted by noise. In addition, it can be seen that our KBDSP method obtains the best results. This is mainly because KBDSP uses the similarity-preserving strategy, which can effectively suppress noise. The asterisk (*) in the table represents the proposed algorithm. metric is used for evaluation. The experiments' results are shown in Figure  seen from Figure 3, the ACC of all methods decreases significantly when mo corrupted by noise. In addition, it can be seen that our KBDSP method obt results. This is mainly because KBDSP uses the similarity-preserving strateg effectively suppress noise.

Parameter Sensitivity Analysis and Convergence Analysis
There are three parameters in the proposed KBDSP, i.e., , α β and γ . T α is used to balance the similarity-preserving term  seen from Figure 3, the ACC of all methods decreases significantly when more pix corrupted by noise. In addition, it can be seen that our KBDSP method obtains th results. This is mainly because KBDSP uses the similarity-preserving strategy, whi effectively suppress noise.

Parameter Sensitivity Analysis and Convergence Analysis
There are three parameters in the proposed KBDSP, i.e., , α β and γ . The para α is used to balance the similarity-preserving term

Parameter Sensitivity Analysis and Convergence Analysis
There are three parameters in the proposed KBDSP, i.e., α, β and γ. The parameter α is used to balance the similarity-preserving term K − Z T KZ 2 F , parameter β is used to control the term Z − B 2 F , and parameter γ is used to control the block diagonal representation term B k . The Yale and JAFFE datasets are selected for the evaluation of KBDSP. As shown in Figures 4 and 5, the NMI versus different values of parameters α, β and γ are shown. Parameters α, β and γ take values in set 1 × 10 −5 , 1 × 10 −4 , 1 × 10 −3 , 1 × 10 −2 , 0.1, 1} , 1 × 10 −5 , 1 × 10 −3 , 0.1, 1} , and 1 × 10 −2 , 1 × 10 −1 , 1, 10, 30, 50} , respectively. From Figures 4 and 5, it can be seen that KBDSP performs well for a wide range of α, β and γ on the two datasets, and KBDSP is not very sensitive to these parameters. Moreover, when parameters α, β and γ are set to 0.01, 0.001, and 0.1, respectively, the clustering performance of KBDSP is better than other comparison methods. Therefore, we fix α = 0.01, β = 0.001 and γ = 0.1 for KBDSP in all experiments of this paper.
For convergence analysis, we show the empirical convergence curves of the KBDSP algorithm on four datasets in Figure 6. The four datasets are YALE, ORL, JAFFE, and TR45, respectively. It is observed that our method converges in objective value within a few iterations and reaches a steady state with more iterations, which indicates fast convergence and efficiency of KBDSP.
in all experiments of this paper.
For convergence analysis, we show the empirical convergence curves of the KBDSP algorithm on four datasets in Figure 6. The four datasets are YALE, ORL, JAFFE, and TR45, respectively. It is observed that our method converges in objective value within a few iterations and reaches a steady state with more iterations, which indicates fast convergence and efficiency of KBDSP.  set to 0.01, 0.001, and 0.1, respectively, the clustering performance of KBDSP is better than other comparison methods. Therefore, we fix for KBDSP in all experiments of this paper.
For convergence analysis, we show the empirical convergence curves of the KBDSP algorithm on four datasets in Figure 6. The four datasets are YALE, ORL, JAFFE, and TR45, respectively. It is observed that our method converges in objective value within a few iterations and reaches a steady state with more iterations, which indicates fast convergence and efficiency of KBDSP.

Conclusions
In this paper, a novel subspace clustering method based on kernel block diagonal representation and similarity-preserving strategy is proposed. The proposed KBDSP method has three steps: (1) capture the nonlinear structure of input data by introducing the kernel self-expressing frame to our model; (2) generate a similarity matrix with block diagonal structure by introducing the block diagonal representation term to our model; (3) capture the pairwise similarity information between data points by introducing the similarity-preserving term to our model. In this study, we conducted experiments on nine benchmark datasets. Experimental results have well demonstrated the effectiveness and superiority of the proposed KBDSP method. In future work, we will consider extending KBDSP to the deep framework to further improve its performance utilizing nonlinear information. In the proposed KBDSP, we only use single kernel methods to conduct the algorithm. Therefore, in the future, we will also research multiple kernel learning methods to improve the KBDSP method.

Conclusions
In this paper, a novel subspace clustering method based on kernel block diagonal representation and similarity-preserving strategy is proposed. The proposed KBDSP method has three steps: (1) capture the nonlinear structure of input data by introducing the kernel self-expressing frame to our model; (2) generate a similarity matrix with block diagonal structure by introducing the block diagonal representation term to our model; (3) capture the pairwise similarity information between data points by introducing the similarity-preserving term to our model. In this study, we conducted experiments on nine benchmark datasets. Experimental results have well demonstrated the effectiveness and superiority of the proposed KBDSP method. In future work, we will consider extending KBDSP to the deep framework to further improve its performance utilizing nonlinear information. In the proposed KBDSP, we only use single kernel methods to conduct the algorithm. Therefore, in the future, we will also research multiple kernel learning methods to improve the KBDSP method.  Data Availability Statement: The codes and data used in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare that there are no conflicts of interests; we do not have any possible conflicts of interest.