Next Article in Journal
Performance Evaluation of Reconfigurable Intelligent Surface against Distributed Antenna System at the Cell Edge
Next Article in Special Issue
Leveraging Deep Features Enhance and Semantic-Preserving Hashing for Image Retrieval
Previous Article in Journal
Estimation of Demographic Traits of the Deputies through Parliamentary Debates Using Machine Learning
Previous Article in Special Issue
Anatomical Landmark Detection Using a Feature-Sharing Knowledge Distillation-Based Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Block Diagonal Least Squares Regression for Subspace Clustering

Department of Computer and Information, Anhui Polytechnic University, Wuhu 241000, China
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(15), 2375; https://doi.org/10.3390/electronics11152375
Submission received: 27 June 2022 / Revised: 20 July 2022 / Accepted: 26 July 2022 / Published: 29 July 2022
(This article belongs to the Special Issue Pattern Recognition and Machine Learning Applications)

Abstract

:
Least squares regression (LSR) is an effective method that has been widely used for subspace clustering. Under the conditions of independent subspaces and noise-free data, coefficient matrices can satisfy enforced block diagonal (EBD) structures and achieve good clustering results. More importantly, LSR produces closed solutions that are easier to solve. However, solutions with block diagonal properties that have been solved using LSR are sensitive to noise or corruption as they are fragile and easily destroyed. Moreover, when using actual datasets, these structures cannot always guarantee satisfactory clustering results. Considering that block diagonal representation has excellent clustering performance, the idea of block diagonal constraints has been introduced into LSR and a new subspace clustering method, which is named block diagonal least squares regression (BDLSR), has been proposed. By using a block diagonal regularizer, BDLSR can effectively reinforce the fragile block diagonal structures of the obtained matrices and improve the clustering performance. Our experiments using several real datasets illustrated that BDLSR produced a higher clustering performance compared to other algorithms.

1. Introduction

High-dimensional data are increasingly common in everyday life, which has greatly affected the methods that are used for data analysis and processing. Performing clustering analyses on high-dimensional data has become one of the hot spots in current research. It is usually assumed that high-dimensional data are distributed within a joint subspace, which is composed of multiple low-dimensional subspaces. This supposition has become the premise of the study of subspace clustering [1].
Over the last several decades, researchers have proposed many useful approaches for solving clustering problems [2,3,4,5,6,7]. Among these methods, spectral-type subspace clustering has become the most popular for use in subspace algorithms because of its outstanding clustering effects when processing high-dimensional data. The spectral-type method mainly utilizes local or global information from data samples to construct affinity matrices. In general, the representation matrices greatly affect the performance of spectral-type subspace clustering. In recent years, a vast number of different methods have been proposed in the attempt to find better representation matrices. Two typical representative methods in linear spectral-type clustering are sparse subspace clustering (SSC) [2,8] and low-rank representation (LRR) [3,9], which are based on linear representation. They emphasize sparseness and low rank when solving coefficient matrices.
Initially, SSC used the L0 norm to represent the sparsity of the matrices. When considering that the optimization of the L0 norm is NP-hard, the L0 norm was finally replaced by the L1 norm. SSC pursues the sparse representation by minimizing the L1 norm of coefficient matrices, which can better classify data. However, the obtained matrices are too sparse and miss the important correlation structures of the data. Thus, SSC leads to unsatisfactory clustering results for highly correlated data. To make up for the shortcomings of SSC, LRR employs the nuclear norm to search for low-rank representations with the goal of better analyzing the relevant data. By describing global structures, LRR improves the clustering performance of highly correlated data. However, the coefficient matrices that are obtained using the LRR model may not be sparse enough. To this end, many improved methods have been proposed.
Luo et al. [10] made use of the prior of sparsity and low rank and then presented a multi-subspace representation (MSR) model. Zhuang et al. [11] improved the MSR model by adding non-negative constraints to the coefficient matrices and then proposed a non-negative low-rank sparse representation (NNLRS) model. Both the MSR and NNLRS models ensure that coefficient matrices have both sparsity and grouping effects by using a combination of sparsity and low rank. To explore local and global structures, Zheng et al. [12] incorporated local constraints into LRR and presented a locally constrained low-rank representation (LRR-LC) model. Chen et al. [13] developed another new model by adding symmetric constraints to the LRR so that highly correlated data in the subspaces had consistent representation. To better reveal the essential structures of subspaces, Lu et al. [14] utilized the F norm to constrain the coefficient matrices and proposed the use of least squares regression (LSR) for subspace clustering. This method demonstrated that enforced block diagonal (EBD) conditions were helpful for block diagonalization, i.e., under EBD conditions, the structures of the obtained coefficient matrices are block diagonal. In the meantime, LSR encouraged the grouping effect by clustering highly correlated data together. It has been theoretically proven that LSR tends to shrink the coefficients of related data and is robust to noise. More importantly, the objective function of LSR produces closed solutions, which are easy to solve and reduce the time complexity of the algorithm. Generally, under the independent subspace assumption, the models that use the L1 norm, nuclear norm or F norm as regularization terms can satisfy EBD conditions. This can help them to learn affinity matrices with block diagonal properties.
Assuming that the data samples are not contaminated or damaged and that the subspaces are independent of each other, the above methods can obtain coefficient matrices with block diagonal structures. However, the structures that are obtained using these methods are unstable because the required assumptions are often not established. When a fragile block diagonal structure is broken, it causes the clustering performance to decline to a great extent. For this reason, many subspace algorithms have been proposed that extend the idea of block diagonal structures. Zhang et al. [15] introduced an effective structure into LRR and proposed a recognition method to distinguish block diagonal low-rank representation (BDLRR), which learned distinguished data representations while shrinking non-block diagonal parts and highlighting block diagonal representations to improve the clustering effects. Feng et al. [16] devised two new subspace clustering methods, which were based on a graph Laplace constraint. Using the above methods, block diagonal coefficient matrices can be precisely constructed. However, their optimization processes are inefficient because each updating cycle contains an iterative projection. Hence, the hard constraints of the coefficient matrices are difficult to satisfy precisely. To settle this matter, Lu et al. [17] used block diagonal regularization to add constraints into the solutions and then put forward a block diagonal representation (BDR) model. These constraints, which were more flexible than direct hard constraints, could help the BDR model to easily obtain solutions with block diagonal properties. Therefore, the regularization was called a soft constraint. This block diagonal regularization caused BDR to be non-convex, but it could be solved via simple and valid means. This model imposed block diagonal structures on the coefficient matrices, which was more likely to lead to accurate clustering results. Due to the good performance of the k-diagonal block within subspace clustering, numerous corresponding extended algorithms [18,19,20] have been proposed.
Under the conditions of independent subspaces and noise-free data, LSR can obtain coefficient matrices that have block diagonal properties, which usually produce exact clustering results. However, the coefficient matrices that are obtained using the LSR method are sensitive to noise or corruption. In real environments, data samples are often contaminated by noise. In this case, the fragile block diagonal structures can become damaged. When the block diagonal structures are violated, the LSR method does not necessarily lead to satisfactory clustering results.
Inspired by the BDR method, we constructed a novel block diagonal least squares regression (BDLSR) subspace clustering method by combining LSR and BDR. Considering that the LSR method meets EBD conditions and has a grouping effect, we directly pursued block diagonal matrices on the basis of LSR. The block diagonal regularizer was a soft constraint that encouraged the block diagonalization of the coefficient matrices and could improve the accuracy of the clustering. However, the appearance of the block diagonal regularizer resulted in the non-convexity of BDLSR. Fortunately, the alternating minimization method could be used to solve the objective function. Meanwhile, the convergence of the objective function could also be proven. Our experimental results from using several public datasets indicated that BDLSR was more effective.
The rest of the paper is organized as follows. The least squares regression model and the block diagonal regularizer are briefly reviewed in Section 2. The BDLSR model is proposed, optimized and discussed explicitly in Section 3. In Section 4, the experimental effectiveness of BDLSR is demonstrated using three datasets. Finally, the conclusions are summarized.

2. Notations and Preliminaries

2.1. Notations

Many notations were used in this study, which are shown in Table 1.

2.2. Least Squares Regression

In real data, most data have strong correlations. However, SSC enforces solutions to be sparse and LRR causes the solutions to have low-rank representation; therefore, these methods still cannot obtain the intrinsic structures of the data very well. To better explore these correlations, Lu et al. [14] obtained coefficient matrices using the F norm constraint and proposed a least squares regression (LSR) subspace algorithm. LSR has a grouping effect, which makes it helpful for the better clustering of highly correlated data.
In a given data matrix Y, the data samples are from multiple independent linear subspaces. To determine the clustering number of the subspaces, the data samples need to be divided into their respective subspaces. Ideally, the data are clean. By solving the following optimization problems, LSR can solve the coefficient matrix Z:
min Z , s . t . Y = Y Z , d i a g ( Z ) = 0
where, Z denotes the F norm of Z, which is defined as Z = i = 1 n j = 1 n Z i j 2 . Note that the constraint d i a g ( Z ) = 0 in the above model can help to avoid trivial solutions. In other words, each data sample is only represented by itself and then the case of Z being the identity matrix can be avoided.
When the data contain noise, a noise term is added to enhance the robustness of the model. The extended function of LSR is shown below:
min Y Y Z 2 + λ Z 2 , s . t . d i a g ( Z ) = 0
where λ > 0 is a regularization parameter, which is used to balance the two terms within the above objective function. Note that LSR produces analytical solutions, which are easy to solve. Due to these advantages, LSR has attracted the attention of a growing number of researchers.

2.3. The Block Diagonal Regularizer

Existing studies [2,3,14] have shown that block diagonal structures can better reveal the essential attributes of subspaces. Ideally, under the assumption of independent subspaces and noise-free data, the obtained coefficient matrix Z has a k-block diagonal structure, where k represents the number of blocks or subspaces. Then, Z can be expressed as follows:
Z = [ Z 1 0 0 0 Z 2 0 0 0 Z k ]
Here, each block represents a subspace, the size of blocks represents the dimensions of the subspaces and the data within the same block belong to the same subspace.
However, due to noise in the data, the assumption of independent subspaces is usually incorrect. Hence, the solutions of the model are always far from the k-block diagonal structure. To obtain the above structure, a block diagonal regularizer was proposed in [17]:
| | B | | K = i = n k + 1 n λ i ( L B ) ,
where B n × n satisfies the constraints of B 0 and B = B T and is an affinity matrix, | | B | | K denotes the k-block diagonal regularizer (which is defined as the sum of the smallest k eigenvalues of L B ) and L B denotes the corresponding Laplacian matrix (which is defined as L B = D i a g ( B 1 ) B , where B1 represents the product of matrix B and vector 1). Here, λ i ( L B ) denotes the i-th eigenvalue of L B when arranged in decreasing order. Due to L B 0 , L B is positive semidefinite and λ i ( L B ) 0 for all values of i. More precisely, B has a k-block diagonal when (and only when):
λ i ( L B ) { > 0 i = 1 , , n k , = 0 i = n k + 1 , , n .
The term in Equation (2) is a soft constraint, which does not directly require exact solutions with block diagonal structures. However, the number of blocks can be controlled and correct clustering results can still be obtained. Hence, the regularizer is more flexible and produces accurate clustering results more easily than hard constraints [16].

3. Subspace Clustering Using BDLSR

In this section, our BDLSR method, which is a combination of least squares regression and block diagonal representation, is proposed for subspace clustering. Then, the objective function of the BDLSR model is presented and the optimization process is described. Finally, the convergence of the BDLSR model is proven.

3.1. The Proposed Model

In theory, under the independent subspace assumption, LSR can obtain coefficient matrices that have block diagonal attributes. In other words, when the similarity between classes is zero and the similarity within classes is more than zero, perfect data clustering can usually be achieved. However, in real environments, the block diagonal structures of the solutions can be easily broken by data noise or corruptions. Considering the advantages of block diagonal representation, we integrated it into the objective function of LSR to obtain solutions with strengthened block diagonal structures, which required the matrices to be simultaneously non-negative and symmetric. Then, we proposed the BDLSR method for subspace clustering. The objective function of BDLSR for handling noisy data was as follows (where α > 0 , γ > 0 and Y is a data matrix):
min B 1 2 | | Y Y B | | 2 + α 2 B 2 + γ | | B | | K , s . t . d i a g ( B ) = 0 , B 0 , B = B T .
where the coefficient matrix B is non-negative and symmetric and its diagonal elements are all zero. The above conditions had to be satisfied to use a block diagonal regularizer. However, the conditions limited the representation capability of B. To alleviate this issue, we needed to relax the restrictions on B. Thus, we introduced an intermediary term Z into the above formula. The new model was rewritten equivalently as follows (where λ > 0 ):
min Z , B 1 2 | | Y Y Z | | 2 + α 2 | | Z | | 2 + λ 2 | | Z B | | 2 + γ | | B | | K , s . t . d i a g ( B ) = 0 , B 0 , B = B T .
where α , λ and γ are the balance parameters. When λ was sufficiently large, Equation (4) was equivalent to Equation (3). The new added item λ 2 | | Z B | | 2 was strongly convex for updating Z and B; thus, it was easy to solve closed-form solutions and perform convergence analyses. Then, we could alternatively optimize the variables Z, V and B.

3.2. The Optimization of BDLSR

Next, we solved the minimization problem of the BDLSR model in Equation (4). Because of the block diagonal regularizer, it was observed that Equation (4) was non-convex. The key to solving this issue was the non-convex term | | B | | K . Fortunately, we could utilize the related theorem in [17,21] to reformulate | | B | | K . The theorem in [17] was presented as follows.
We let L n × n , L 0 . Then:
i = n k + 1 n λ i ( L ) = min V < L , V > , s . t .0 V I , T r ( V ) = k .
where < L , V > = T r ( L T V ) .
Then, | | B | | K could be reformulated as a convex problem:
| | B | | K = min V < L B , V > , s . t .0 V I , T r ( V ) = k .
Due to L B being defined as D i a g ( B 1 ) B , Equation (4) was equivalent to the following problem:
min Z , B , V 1 2 | | Y Y Z | | 2 + α 2 | | Z | | 2 + λ 2 | | Z B | | 2 + γ < D i a g ( B 1 ) B , V > , s . t . d i a g ( B ) = 0 , B 0 , B = B T , 0 V I , T r ( V ) = k .
The above problem involved three variables, which could be obtained by alternating the minimization. Then, we decomposed the problem into three subproblems and developed the update steps separately.
By fixing the variables B and V, we could update Z using:
Z m + 1 = arg min Z 1 2 | | Y Y Z | | 2 + α 2 | | Z | | 2 + λ 2 | | Z B | | 2 ,
Note that Equation (6) was a convex program. By setting the partial derivative of Z to 0, we could obtain its closed-form solution. The solution was:
Z m + 1 = ( Y T Y + λ I + α I ) 1 ( λ B + Y T Y )
Similarly, by fixing the variables B and Z, we could update V using:
V m + 1 = arg min V < D i a g ( B 1 ) B , V > , s . t .0 V I , T r ( V ) = k .
According to Equation (8), we could obtain the solution for V as follows:
V m + 1 = U U T ,
where U n × k consists of k eigenvectors, which are associated with the k smallest eigenvalues of D i a g ( B 1 ) B .
Then, we could solve B. By fixing the variables Z and V, we could update B using:
B m + 1 = arg min B λ 2 | | Z B | | 2 + γ < D i a g ( B 1 ) B , V > , s . t . d i a g ( B ) = 0 , B 0 , B = B T .
By converting Equation (10), the following function was obtained:
B m + 1 = arg min B 1 2 | | B Z + γ λ ( d i a g ( V ) 1 T V ) | | 2 , s . t . d i a g ( B ) = 0 , B 0 , B = B T .
According to the proposition in [17], we could obtain the closed-form solution of B:
B m + 1 = [ 1 2 ( A D i a g ( d i a g ( A ) ) ) + 1 2 ( A D i a g ( d i a g ( A ) ) ) T ] +
where:
A = Z γ λ ( d i a g ( V ) 1 T V ) , [ A ] + = { a i j , i f a i j > 0 0 , i f a i j 0
The complete optimization procedure for the BDLSR model for subspace clustering is summarized in Algorithm 1.
Algorithm 1 The optimization process for solving Equation (5).
Input: Data matrix Y, λ > 0 , α > 0 ,   γ > 0 .
Initialize: m = 0 , Z m = 0 , V m = 0 , B m = 0 .
while not converge do
 1: Update Z m + 1 by solving (6).
 2: Update V m + 1 by solving (8).
 3: Update B m + 1 by solving (11).
 4: m = m + 1.
end while
Output: Z * , V * , B * .

3.3. Discussion

To solve Equation (4), we developed an iterative update algorithm, which is shown in this section. For a given data matrix, we could settle the proposed BDLSR problem by using representation matrices Z and B, which could be obtained using Algorithm 1. Then, we used Z and B to construct the affinity matrices, e.g., P = ( | B | + | B T | ) / 2 or P = ( | Z | + | Z T | ) / 2 . Finally, we performed spectral-type subspace clustering on the constructed affinity matrices.
We noted that the BDLSR algorithm contained three parameters, which could be utilized to balance the different terms in Equation (4). We could first run BDR for the suitable λ and γ values or run LSR for the suitable α and λ values. After that, we could search the last remaining parameter to achieve the highest clustering accuracy.
Note that the number of subspaces had to be known to construct the affinity matrices. Considering this requirement is required for all spectral-type subspace algorithms, the number always had to be known. Our proposed BDLSR algorithm used the block diagonal structure and its clustering effectiveness was shown in the following experiments.

3.4. Convergence Analysis

Because of the block diagonal regularizer, the objective function in Equation (4) was non-convex. Fortunately, Equation (4) could be solved by alternating the minimization and its solution could be proven to converge to the global optimum. The alternating minimization method is summarized in Algorithm 1. Now, we explain the convergence of Algorithm 1 theoretically.
To guarantee the convergence of BDLSR, the sequence { Z m , V m , B m } that was generated by Algorithm 1 needed to be proven to have at least one limit point. In addition, any limit point ( Z * , V * , B * ) of { Z m , V m , B m } was a stationary point of Equation (5). Now, with regard to Algorithm 1, we detail the convergence analysis by referring to [17].
The objective function of Equation (5) was denoted as G ( Z m , V m , B m ) , where m represents the m-th iteration. We let S V = { V | 0 V I , T r ( V ) = k } and S B = { B | d i a g ( B ) = 0 , B 0 , B = B T } . The indicator functions of S V and S B were denoted as g S V ( V ) and g S B ( B ) , respectively.
First, according to the update process of Z m + 1 in Equation (6), we obtained:
Z m + 1 = arg min Z G ( Z , V m , B m )
Note that G ( Z , V m , B m ) is λ -strongly convex with regard to Z. We obtain
G ( Z m + 1 , V m , B m ) G ( Z m , V m , B m ) λ 2 Z m + 1 Z m 2
Then, due to the optimality of V m + 1 in Equation (8), we obtained:
G ( Z m + 1 , V m + 1 , B m ) + g S V ( V m + 1 ) G ( Z m + 1 , V m , B m ) + g S V ( V m )
Next, according to the update process of B m + 1 in Equation (11), we obtained:
B m + 1 = arg min B G ( Z m + 1 , V m + 1 , B ) + g S B ( B ) .
Note that G ( Z m + 1 , V m + 1 , B ) + g S B ( B ) was strongly λ -convex with regard to B. We obtained:
G ( Z m + 1 , V m + 1 , B m + 1 ) + g S B ( B m + 1 ) G ( Z m + 1 , V m + 1 , B m ) + g S B ( B m ) λ 2 B m + 1 B m 2
By combining and simplifying (13)–(15), we obtained:
G ( Z m + 1 , V m + 1 , B m + 1 ) + g S V ( V m + 1 ) + g S B ( B m + 1 ) G ( Z m , V m , B m ) + g S V ( V m ) + g S B ( B m ) λ 2 Z m + 1 Z m 2 λ 2 B m + 1 B m 2
Hence, G ( Z m , V m , B m ) + g S V ( V m ) + g S B ( B m ) monotonically decreased and was upper bounded. This implied that { Z m } and { B m } were bounded. Additionally, V m S V implied that V m 2 1 and { V m } were bounded.
Since V m and D i a g ( B m 1 ) B m were positive semidefinite, we obtained < D i a g ( B m 1 ) B m , V m > 0 . Thus, G ( Z m , V m , B m ) + g S V ( V m ) + g S B ( B m ) 0 . Then, by summing Equation (16) over m = 0, 1, 2, …, we obtained:
m = 0 + λ 2 ( Z m + 1 Z m 2 + B m + 1 B m 2 ) G ( Z 0 , V 0 , B 0 )
This implied:
Z m + 1 Z m 0
and
B m + 1 B m 0
Using Equation (18) and the update process of V m + 1 in Equation (8), we obtained:
V m + 1 V m 0
Then, since { Z m } , { V m } and { B m } were bounded, a point { Z * , V * , B * } and a subsequence { Z m i , V m i , B m i } existed such that Z m i Z * , V m i V * and B m i B * . Then, according to Equations (17)–(19), we obtained Z m i + 1 Z * , V m i + 1 V * and B m i + 1 B * . On the other hand, by considering the optimality of V m i + 1 in Equation (8), Z m i + 1 in Equation (6) and B m i + 1 in Equation (10), we obtained:
0 G Z ( Z m i + 1 , V m i , B m i ) , 0 G V ( Z m i + 1 , V m i + 1 , B m i ) + V g S V ( V m i + 1 ) , 0 G B ( Z m i + 1 , V m i + 1 , B m i + 1 ) + B g S B ( B m i + 1 ) .
By letting m + in Equation (20), we obtained:
0 G Z ( Z * , V * , B * ) , 0 G V ( Z * , V * , B * ) + V g S V ( V * ) , 0 G B ( Z * , V * , B * ) + B g S B ( B * ) .
Thus, ( Z * , V * , B * ) was a stationary point of Equation (5).

4. Experiments

4.1. Experimental Settings

4.1.1. Data

To evaluate the BDLSR model, we performed experiments using different datasets: Hopkins 155 [22], ORL [23] and Extended Yale B [17].
The Hopkins 155 dataset contains 156 different sequences, which were extracted from moving objects. Some samples from the Hopkins 155 dataset are shown in Figure 1.
The ORL dataset, which was created by Cambridge University, contains 400 facial photos of 40 subjects with a dark uniform background under different conditions for time, light and facial expression. We resized each photo to 32 × 32. Some samples from the ORL dataset are shown in Figure 2.
The Extended Yale B dataset is composed of images of 38 individuals and can be used for facial recognition. The dataset contains 2414 images, with 64 photos of each individual under different conditions and angles. We resized each photo to 32 × 32. Some samples from the Extended Yale B dataset are shown in Figure 3.

4.1.2. Baseline

We chose representative subspace clustering algorithms for comparison to verify the effectiveness of BDLSR, including SSC [8], LRR [9], LSR [14] and BDR-B(BDR-Z) [17]. These algorithms are spectral-type algorithms and construct affinity matrices by applying coefficient matrices to achieve the final results.
SSC (sparse subspace clustering) is based on sparse representation, which ensures the sparsity of the coefficient matrices by minimizing the L1 norm. LRR (low-rank representation) is a typical subspace algorithm that searches for low-rank representations of coefficient matrices using the nuclear norm. LSR (least squares regression) is an easy-to-solve clustering subspace algorithm that uses the F norm to constrain coefficients so that there is a grouping effect between the coefficients. BDR (block diagonal representation) is a novel spectral clustering subspace algorithm that uses a block diagonal regularizer, which provides two methods for solving coefficient matrices: BDR-B and BDR-Z.
In this paper, we chose AC and NMI [24] as the evaluation criteria to quantitatively evaluate BDLSR. AC was used to evaluate the accuracy of the clustering and NMI was used to evaluate the correlations between the clustering results and the real results. The larger the AC and NMI values, the better the clustering results.
We supposed that k i and k i * were the ground truth label and the obtained label of a data sample y i , respectively, to define AC is as follows (where n is the number of data samples):
AC = 1 n i = 1 n δ ( k i , k i * )
where the delta function δ ( a , b ) = 1 when (and only when) a = b.
We supposed that the two clustering results were D and D’ to define NMI as follows:
NMI ( D , D ) = MI ( D , D ) max ( H ( D ) , H ( D ) )
where H ( · ) and M I ( D , D ) represent the entropy and the mutual information of the clustering results, respectively.
All algorithms were implemented using the MATLAB2016a program on Windows 10.

4.2. Experimental Results

Here, we discuss the clustering effects of BDLSR during our experiments using the Hopkins 155, ORL and Extended Yale B datasets. The comparison algorithms in our experiments were SSC, LRR, LSR and BDR-B (BDR-Z). The code was provided by the original authors to ensure the accuracy of the algorithms. The parameters were adjusted to the optimal values, according to the settings in the original papers.
In our experiments, the specific parameter adjustment process was as follows. We first determined the initial value ranges of the parameters λ and γ , according to the parameter settings in the BDR algorithm. Then, we used the parameter values of λ and γ that were provided in [17] as the initial values to try for another parameter α and determined the value range of this parameter according to the changes in the clustering results. Finally, the grid search method was used for an exhaustive search of the value ranges of the three determined parameters and the values of the parameters were selected when the BDLSR algorithm produced the best performance. In this paper, α [ 10 3 , 10 6 ] , λ [ 10 0 , 10 3 ] and γ [ 10 1 , 10 4 ] were selected for Hopkins 155, α [ 10 0 , 10 3 ] , λ [ 10 0 , 10 3 ] and γ [ 10 1 , 10 2 ] were selected for ORL and α [ 10 2 , 10 5 ] , λ [ 10 2 , 10 1 ] and γ [ 10 1 , 10 2 ] were selected for Extended Yale B.
The Hopkins 155 dataset contains 156 motion sequences, which are classified into groups of two and three motions. In this experiment, we used the mean and median values of AC to evaluate the different algorithms. We adopted two settings to construct data matrices: 2F-dimensional and 4k-dimensional data points. Table 2 and Table 3 show the AC values of each algorithm using the Hopkins 155 dataset. The partial comparative data in Table 1 and Table 2 originated from [17]. In the experiment, we set α = 4 × 10 4 , λ = 9 × 10 2 , γ = 9 × 10 3 for 2F-dimensional data and α = 6.5 × 10 5 , λ = 7 × 10 2 , γ = 7 × 10 3 for 4k-dimensional data. Using the selected parameter settings, BDLSR could perform well in most cases. From Table 2 and Table 3, it can be seen that the AC value of BDLSR was higher than those of the other algorithms under the two different settings, except for the three motions in Table 2. These results showed that BDLSR had some advantages over other representative clustering algorithms when using the Hopkins 155 dataset.
The ORL dataset includes 40 subjects, with 10 photos of each. We selected the first k (10, 20, 30, 40) subjects of the dataset for the face clustering experiments. The comparison results from using the ORL dataset are shown in Table 4. We set different parameters for the different numbers of subjects in the experiments. For instance, we set α = 9 × 10 2 ,   λ = 7 × 10 2 ,   γ = 1.67 for 40 subjects. From Table 4, it can be seen that BDLSR improved the AC value from 74.75% to 81.00% and the NMI value from 87.53% to 88.68% when using BDLSR-Z. This showed that BDLSR produced the best clustering results out of all of the algorithms for 10, 20, 30 and 40 subjects.
The Extended Yale B dataset includes 2414 photos of 38 individuals. We selected the first k (3, 5, 8, 10) individuals of the dataset to compare the AC and NMI values of the different algorithms. The comparison results from using the Extended Yale B dataset are shown in Table 5. We set α = 1 × 10 4 ,   λ = 11 ,   γ = 1 for 10 subjects in these experiments. From Table 5, it can be seen that BDLSR improved the AC value from 93.59% to 94.06% and the NMI value from 89.82% to 90.10% when using BDLSR-B. This showed that BDLSR achieved better results than the other algorithms for 3, 5, 8 and 10 subjects.
From the results, it can be seen that the BDR algorithm produced good clustering results. However, the proposed BDLSR algorithm had higher AC and NMI values than the BDR algorithm. This demonstrated that the combination of LSR and BDR could enhance the grouping effect on related data and improve the clustering results.

4.3. Parameter Analysis

The BDLSR algorithm mainly involved three parameters α ,   λ and γ . Now, we further discuss the effects of the parameter settings on the performance of BDLSR. Considering that BDLSR mainly used block diagonal regularization and least squares regression to improve the clustering effect, we mainly discuss the influence of the parameters α and γ on the AC and NMI values. Then, we present the clustering results that were achieved by fixing λ and setting different values for α and γ .
Figure 4 displays the AC values when using the Hopkins 155 dataset with different parameter settings. For 2F dimensional data, λ = 0.09 and α = { 3 × 10 4 , 4 × 10 4 , 5 × 10 4 , 6 × 10 4 } . For 4k dimensional data, λ = 0.07 and α = { 6.2 × 10 5 , 6.5 × 10 5 , 6.8 × 10 5 , 7.0 × 10 5 } . For both settings, γ = { 7 × 10 3 , 8 × 10 3 , 9 × 10 3 , 1 × 10 2 } . By taking the results of all motions in the Hopkins 155 dataset as the example, we could observe the changes in the AC values that were caused by the BDLSR-B method for 2F-dimensional data and those that were caused by the BDLSR-Z method for 4k-dimensional data. From Figure 4, it can be seen that the AC values fluctuated very little with the changes to α and γ . When α = 4 × 10 4 , γ = 9 × 10 3 and α = 6.5 × 10 5 , γ = 7 × 10 3 , the clustering effect was the best.
Figure 5 displays the AC and NMI values when using the ORL dataset with different parameter settings. We set λ = 0.07, α = { 7 × 10 2 , 8 × 10 2 , 9 × 10 2 , 1 × 10 1 } and γ = { 1.61 , 1.64 , 1.67 , 1.70 } . To observe the changes in the AC and NMI values that were caused by the BDLSR-Z method, we took 40 individuals from the ORL dataset as the example. From Figure 5, it can be seen that the AC values changed significantly and the NMI values changed slightly within a certain value range. When α = 9 × 10 2 and γ = 1.67 , BDLSR-Z produced the best clustering effect.
Figure 6 displays the AC and NMI values when using the Extended Yale B dataset with different parameter settings, where λ = 50, α = { 2 × 10 4 , 5 × 10 4 , 7 × 10 4 , 9 × 10 4 } and γ = { 0.7 , 0.8 , 0.9 , 1.0 } . To observe the changes in the AC and NMI values that were caused by the BDLSR-B method, we took 5 individuals from the Extended Yale B dataset as the example. From Figure 6, it can be seen that the AC and NMI values were relatively stable. On the whole, changes in the parameters had little impact on the clustering effect.

4.4. Ablation Studies

In this section, we reveal the influences of various parameters on the clustering results of the algorithm using ablation studies. Specifically, the original parameter settings remained unchanged when using the three standard datasets, then α = 0 , λ = 0 and γ = 0 were set to perform the ablation studies. The comparison results are shown in Table 6, Table 7, Table 8 and Table 9.
From the comparative data in Table 6, Table 7, Table 8 and Table 9, it can be seen that after the ablation studies, the clustering effects when using the three standard datasets decreased significantly (except the three motions in the Hopkins 155 dataset with 2F-dimensional data). Our experimental results showed that each component of the objective function of the BDLSR algorithm played an effective role. This also showed that the combination of block diagonal structures and LSR could effectively improve the performance of subspace clustering.

5. Conclusions

In this work, to obtain the advantages of both the LSR and BDR algorithms, block diagonal structures were introduced into an LSR algorithm. Then, the BDLSR method was proposed for subspace clustering. The algorithm made use of a regularization item to enhance the fragile block diagonal structures that were obtained using LSR and could produce better clustering results and demonstrate a strong competitiveness. After that, we settled the objective function using the alternating minimization method. In addition, we also theoretically proved the convergence of this algorithm and verified its effectiveness using three standard datasets. Our experimental results showed that the BDLSR method achieved the best clustering results compared to other algorithms in almost all cases when using the three standard datasets. However, the algorithm also had some shortcomings: it involved more parameters (three parameters), which needed more time to be adjusted; additionally, the clustering results from a certain dataset were not the best out of the compared algorithms. To solve these shortcomings, in our future work, we will explore the concept of parameter reduction and how to set the minimum number of parameters while ensuring the best clustering accuracy so as to obtain a clustering algorithm with an even better performance.

Author Contributions

Conceptualization, methodology and software, L.F. and G.L.; writing—original draft preparation, L.F.; writing—review and editing, G.L.; validation, T.L.; data curation, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by projects from the National Natural Science Foundation of China (grant number: 61976005), the Science Research Project from Anhui Polytechnic University (grant number: Xjky2022155) and the Industry Collaborative Innovation Fund from Anhui Polytechnic University and Jiujiang District (grant number: 2021cyxtb4).

Institutional Review Board Statement

The study did not require ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

The study did not report any data.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
LSRLeast squares regression
EBDEnforced block diagonal
BDLSRBlock diagonal least squares regression
SSCSparse subspace clustering
LRRLow-rank representation
MSRMulti-subspace representation
NNLRSNon-negative low-rank sparse representation
LRR-LCLocally constrained low-rank representation
BDRBlock diagonal representation

References

  1. Parsons, L.; Haque, E.; Liu, H. Subspace clustering for high dimensional data: A review. ACM SIGKDD Explor. Newsl. 2004, 6, 90–105. [Google Scholar] [CrossRef]
  2. Elhamifar, E.; Vidal, R. Sparse subspace clustering. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 2790–2797. [Google Scholar]
  3. Liu, G.; Lin, Z.; Yu, Y. Robust subspace segmentation by low-rank representation. In Proceedings of the 27th International Conference on Machine Learning (ICML), Haifa, Israel, 21–24 June 2010; pp. 663–670. [Google Scholar]
  4. Yang, C.; Liu, T.; Lu, G.; Wang, Z. Improved nonnegative matrix factorization algorithm for sparse graph regularization. In Data Science, Proceedings of the International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE 2021), Taiyuan, China, 17–20 September 2021; Springer: Singapore, 2021; Volume 1451, pp. 221–232. [Google Scholar]
  5. Qian, C.; Brechon, T.; Xu, Z. Clustering in pursuit of temporal correlation for human motion segmentation. Multimed. Tools Appl. 2018, 77, 19615–19631. [Google Scholar] [CrossRef] [Green Version]
  6. Wang, H.; Yang, Y.; Liu, B.; Fujita, H. A study of graph-based system for multiview clustering. Knowl.-Based Syst. 2019, 163, 1009–1019. [Google Scholar] [CrossRef]
  7. Chen, Y.; Wang, S.; Zheng, F.; Cen, Y. Graph-regularized least squares regression for multiview subspace clustering. Knowl.-Based Syst. 2020, 194, 105482. [Google Scholar] [CrossRef]
  8. Elhamifar, E.; Vidal, R. Sparse Subspace Clustering: Algorithm, Theory, and Applications. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2765–2781. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  9. Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 171–184. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Luo, D.; Nie, F.; Ding, C.; Huang, H. Multisubspace representation and discovery. In Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2011; pp. 405–420. [Google Scholar]
  11. Zhuang, L.; Gao, H.; Lin, Z.; Ma, Y.; Zhang, X.; Yu, N. Nonnegative low rank and sparse graph for semisupervised learning. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 2328–2335. [Google Scholar]
  12. Zheng, Y.; Zhang, X.; Yang, S.; Jiao, L. Low-rank representation with local constraint for graph construction. Neurocomputing 2013, 122, 398–405. [Google Scholar] [CrossRef]
  13. Chen, J.; Zhang, Y. Subspace Clustering by Exploiting a Low-Rank Representation with a Symmetric Constraint. Available online: http://arxiv.org/pdf/1403.2330v2.pdf (accessed on 23 April 2015).
  14. Lu, C.; Min, H.; Zhao, Z.; Zhu, L.; Huang, D.; Yan, S. Robust and efficient subspace segmentation via least squares regression. In Computer Vision—ECCV 2012, Proceedings of the 2012 Computer Vision European Conference on Computer Vision (ECCV), Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 347–360. [Google Scholar]
  15. Zhang, Z.; Xu, Y.; Shao, L.; Yang, J. Discriminative block-diagonal representation learning for image recognition. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 3111–3125. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Feng, J.; Lin, Z.; Xu, H.; Yan, S. Robust subspace segmentation with block-diagonal prior. In Proceedings of the 2014 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 3818–3825. [Google Scholar]
  17. Lu, C.; Feng, J.; Lin, Z.; Mei, T.; Yan, S. Subspace clustering by block diagonal representation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 487–501. [Google Scholar] [CrossRef] [Green Version]
  18. Xie, X.; Guo, X.; Liu, G.; Wang, J. Implicit block diagonal low rank representation. IEEE Trans. Image Process. 2018, 27, 477–489. [Google Scholar] [CrossRef] [PubMed]
  19. Liu, M.; Wang, Y.; Sun, J.; Ji, Z. Structured block diagonal representation for subspace clustering. Appl. Intell. 2020, 50, 2523–2536. [Google Scholar] [CrossRef]
  20. Wang, L.; Huang, J.; Yin, M.; Cai, R.; Hao, Z. Block diagonal representation learning for robust subspace clustering. Inf. Sci. 2020, 526, 54–67. [Google Scholar] [CrossRef]
  21. Dattorro, J. Convex Optimization & Euclidean Distance Geometry 2016. p. 515. Available online: http://meboo.Convexoptimization.com/Meboo.html (accessed on 1 March 2021).
  22. Tron, R.; Vidal, R. A benchmark for the comparison of 3-D motion segmentation algorithms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
  23. Samaria, F.; Harter, A. Parameterisation of a stochastic model for human face identification. In Proceedings of the 1994 IEEE Workshop on Applications of Computer Vision, Sarasota, FL, USA, 5–7 December 1994; pp. 138–142. [Google Scholar]
  24. Cai, D.; He, X.; Wang, X.; Bao, H. Locality preserving nonnegative matrix factorization. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, CA, USA, 11–17 July 2009; pp. 1010–1015. [Google Scholar]
Figure 1. Samples from the Hopkins 155 dataset.
Figure 1. Samples from the Hopkins 155 dataset.
Electronics 11 02375 g001
Figure 2. Samples from the ORL dataset.
Figure 2. Samples from the ORL dataset.
Electronics 11 02375 g002
Figure 3. Samples from the Extended Yale B dataset.
Figure 3. Samples from the Extended Yale B dataset.
Electronics 11 02375 g003
Figure 4. The AC values when using the Hopkins 155 dataset with 2F- and 4k-dimensional data.
Figure 4. The AC values when using the Hopkins 155 dataset with 2F- and 4k-dimensional data.
Electronics 11 02375 g004
Figure 5. The AC and NMI values when using the ORL dataset.
Figure 5. The AC and NMI values when using the ORL dataset.
Electronics 11 02375 g005
Figure 6. The AC and NMI values when using the Extended Yale B dataset.
Figure 6. The AC and NMI values when using the Extended Yale B dataset.
Electronics 11 02375 g006
Table 1. The notations that were used in this study.
Table 1. The notations that were used in this study.
NotationMeaning
AAny matrix, represented by capital letters
IIdentity matrix
aAny vector, represented by boldface lowercase letters
1The all one vector
Ai,:The i-th row of A
A:jThe j-th column of A
Ai,jThe entry in the i-th row and j-th column of A
diag(A)A vector with its i-th entry being the i-th diagonal element of A
Diag(a)A diagonal matrix with its i-th entry on the diagonal being the i-th element of a
A 0 Positive semidefinite
A B B A 0
A 0 All entries of A are non-negative
Tr(A)The trace of a square matrix
< A , B > The inner product between A and B
[ A ] + max ( A , 0 )
A = i j a i j 2 The F norm of A
Table 2. The accuracy (%) when using the Hopkins 155 dataset with 2F-dimensional data.
Table 2. The accuracy (%) when using the Hopkins 155 dataset with 2F-dimensional data.
Algorithm2 Motions3 MotionsAll Motions
MeanMedianMeanMedianMeanMedian
SSC98.48100.0095.6098.3797.82100.00
LRR96.3599.7890.6096.0195.0599.47
LSR96.76100.0094.0697.9596.1599.55
BDR-B99.00100.0098.0599.7998.78100.00
BDR-Z99.05100.0099.1599.7999.07100.00
BDLSR-B99.19100.0098.8799.7999.12100.00
BDLSR-Z98.95100.0098.3299.7998.81100.00
Table 3. The accuracy (%) when using the Hopkins 155 dataset with 4k-dimensional data.
Table 3. The accuracy (%) when using the Hopkins 155 dataset with 4k-dimensional data.
Algorithm2 Motions3 MotionsAll motions
MeanMedianMeanMedianMeanMedian
SSC99.17100.0095.6099.4497.59100.00
LRR95.7899.7190.5796.3094.6099.47
LSR96.6599.7193.8797.9596.0399.47
BDR-B98.74100.0098.7899.7998.75100.00
BDR-Z98.96100.0098.7899.8098.92100.00
BDLSR-B98.76100.0097.7399.7998.53100.00
BDLSR-Z98.98100.0098.9599.8198.97100.00
Table 4. The results from the different algorithms when using the ORL dataset (%).
Table 4. The results from the different algorithms when using the ORL dataset (%).
Number of SubjectsAlgorithmACNMI
10SSC72.0077.09
LRR63.6070.06
LSR76.0078.11
BDR-B75.0076.81
BDR-Z78.0079.44
BDLSR-B76.0075.65
BDLSR-Z85.0083.04
20SSC66.5082.55
LRR63.5076.23
LSR73.5084.07
BDR-B76.0085.74
BDR-Z74.0083.35
BDLSR-B82.5086.43
BDLSR-Z86.5089.35
30SSC70.0084.56
LRR65.9078.77
LSR76.6784.41
BDR-B79.6788.95
BDR-Z75.3387.21
BDLSR-B86.3391.36
BDLSR-Z81.3388.36
40SSC72.7584.56
LRR66.3081.07
LSR73.2570.30
BDR-B65.7583.20
BDR-Z74.7587.53
BDLSR-B70.2585.36
BDLSR-Z81.0088.68
Table 5. The results from the different algorithms when using the Extended Yale B dataset (%).
Table 5. The results from the different algorithms when using the Extended Yale B dataset (%).
Number of SubjectsAlgorithmACNMI
3SSC95.3187.34
LRR92.7179.07
LSR90.1080.14
BDR-B96.3589.32
BDR-Z95.3187.34
BDLSR-B95.3187.34
BDLSR-Z98.9695.11
5SSC94.6990.39
LRR90.9479.43
LSR88.7578.58
BDR-B97.1393.49
BDR-Z95.5692.27
BDLSR-B98.4495.27
BDLSR-Z96.8893.20
8SSC88.4883.50
LRR60.5561.48
LSR78.9167.22
BDR-B95.9091.85
BDR-Z92.1986.85
BDLSR-B96.6893.21
BDLSR-Z92.3887.46
10SSC81.5676.90
LRR64.3865.72
LSR76.2565.70
BDR-B93.5989.82
BDR-Z73.1371.37
BDLSR-B94.0690.10
BDLSR-Z73.5972.52
Table 6. A comparison of the accuracy (%) results when using the Hopkins 155 dataset with 2F-dimensional data in the ablation studies.
Table 6. A comparison of the accuracy (%) results when using the Hopkins 155 dataset with 2F-dimensional data in the ablation studies.
Algorithm2 Motions3 MotionsAll Motions
MeanMedianMeanMedianMeanMedian
BDLSR-B ( α = 0 )98.69100.0098.0299.8098.53100.00
BDLSR-Z ( α = 0 )98.90100.0099.0699.7998.94100.00
BDLSR-B ( λ = 0 )69.0367.0956.8756.3066.2964.95
BDLSR-Z ( λ = 0 )91.2699.1989.1398.0090.7898.95
BDLSR-B ( γ = 0 )98.82100.0097.9199.7298.61100.00
BDLSR-Z ( γ = 0 )98.20100.0096.4999.3397.81100.00
BDLSR-B99.19100.0098.8799.7999.12100.00
BDLSR-Z98.95100.0098.3299.7998.81100.00
Table 7. A comparison of the accuracy (%) results when using the Hopkins 155 dataset with 4k-dimensional data in the ablation studies.
Table 7. A comparison of the accuracy (%) results when using the Hopkins 155 dataset with 4k-dimensional data in the ablation studies.
Algorithm2 Motions3 MotionsAll Motions
MeanMedianMeanMedianMeanMedian
BDLSR-B ( α = 0 )98.74100.0098.7899.7998.75100.00
BDLSR-Z ( α = 0 )98.96100.0098.7899.8098.92100.00
BDLSR-B ( λ = 0 )69.0167.0956.9256.5166.2864.95
BDLSR-Z ( λ = 0 )90.2498.4484.7193.1688.9998.25
BDLSR-B ( γ = 0 )98.43100.0096.4999.7297.99100.00
BDLSR-Z ( γ = 0 )98.04100.0096.2099.3397.99100.00
BDLSR-B98.76100.0097.7399.7998.53100.00
BDLSR-Z98.98100.0098.9599.8198.97100.00
Table 8. A comparison of the results from the different algorithms when using the ORL dataset (%).
Table 8. A comparison of the results from the different algorithms when using the ORL dataset (%).
Number of SubjectsAlgorithmACNMI
40BDLSR-B ( α = 0 )68.0083.74
BDLSR-Z ( α = 0 )74.7586.86
BDLSR-B ( λ = 0 )3.5020.45
BDLSR-Z ( λ = 0 )75.7585.99
BDLSR-B ( γ = 0 )74.0085.36
BDLSR-Z ( γ = 0 )74.7586.28
BDLSR-B70.2585.36
BDLSR-Z81.0088.68
Table 9. A comparison of the results from the different algorithms when using the Extended Yale B dataset in the ablatio studies (%).
Table 9. A comparison of the results from the different algorithms when using the Extended Yale B dataset in the ablatio studies (%).
Number of SubjectsAlgorithmACNMI
10BDLSR-B ( α = 0 )93.9189.94
BDLSR-Z ( α = 0 )73.5972.74
BDLSR-B ( λ = 0 )10.162.77
BDLSR-Z ( λ = 0 )11.724.51
BDLSR-B ( γ = 0 )90.1685.44
BDLSR-Z ( γ = 0 )69.5365.73
BDLSR-B94.0690.10
BDLSR-Z73.5972.52
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Fan, L.; Lu, G.; Liu, T.; Wang, Y. Block Diagonal Least Squares Regression for Subspace Clustering. Electronics 2022, 11, 2375. https://doi.org/10.3390/electronics11152375

AMA Style

Fan L, Lu G, Liu T, Wang Y. Block Diagonal Least Squares Regression for Subspace Clustering. Electronics. 2022; 11(15):2375. https://doi.org/10.3390/electronics11152375

Chicago/Turabian Style

Fan, Lili, Guifu Lu, Tao Liu, and Yong Wang. 2022. "Block Diagonal Least Squares Regression for Subspace Clustering" Electronics 11, no. 15: 2375. https://doi.org/10.3390/electronics11152375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop