Block-Diagonal Constrained Low-Rank and Sparse Graph for Discriminant Analysis of Image Data

Recently, low-rank and sparse model-based dimensionality reduction (DR) methods have aroused lots of interest. In this paper, we propose an effective supervised DR technique named block-diagonal constrained low-rank and sparse-based embedding (BLSE). BLSE has two steps, i.e., block-diagonal constrained low-rank and sparse representation (BLSR) and block-diagonal constrained low-rank and sparse graph embedding (BLSGE). Firstly, the BLSR model is developed to reveal the intrinsic intra-class and inter-class adjacent relationships as well as the local neighborhood relations and global structure of data. Particularly, there are mainly three items considered in BLSR. First, a sparse constraint is required to discover the local data structure. Second, a low-rank criterion is incorporated to capture the global structure in data. Third, a block-diagonal regularization is imposed on the representation to promote discrimination between different classes. Based on BLSR, informative and discriminative intra-class and inter-class graphs are constructed. With the graphs, BLSGE seeks a low-dimensional embedding subspace by simultaneously minimizing the intra-class scatter and maximizing the inter-class scatter. Experiments on public benchmark face and object image datasets demonstrate the effectiveness of the proposed approach.


Introduction
With the rapid development of information technology, nowadays high precision sensors sense large-scale data, especially image data, all the time. These data often feature high dimensionality, and consist of redundant information or noise. How to analyze these high-dimensional data has attracted the interest of many researchers. Dimensionality reduction (DR) is a practical way to deal with this problem. DR aims to find a lower-dimensional embedding subspace where some desired properties can be preserved as much as possible [1][2][3]. The most well-known DR methods are, for example, principal component analysis (PCA) [4] and linear discriminate analysis (LDA) [5]. PCA is unsupervised, and applies orthogonal projection to maximize data variance. As a supervised method, LDA finds the linear projection axes on which the ratio between the between-class and within-class scatters is maximized. LDA cannot be directly applied to small size sample (SSS) problem because the within-class scatter matrix is singular. To avoid this problem, Li et al. [6] adopted the difference of between-class scatter and within-class scatter as the discriminating criterion for embedding learning. The method, termed maximum margin criterion (MMC), is simple and effective. However, these linear methods cannot reveal the essential structure of data with non-linear distributions. With kernel tricks, kernel principal component analysis (KPCA) [7] and kernel Fisher discriminate analysis (KFD) [8] were developed to handle data with non-linearity structures. Some manifold learning algorithms such as LLE [9], Isomap [10], and Laplacian eigenmaps (LE) [11] have also been presented to discover the intrinsic manifold structure in data. Then, as shown in Figure 2, the block-diagonal constrained low-rank and sparse graph embedding (BLSGE) method finds a low-dimensional subspace with enhanced intra-class compactness and inter-class separation using the graphs. It is worth noting that there are some major differences between our BLSR model and the model presented in [23], although they have similar formulations. In that method, a weight matrix is defined to provide a moderate amount of correct information for the solution. Different from their strategy, we separately optimize the solution to be sparse, and develop an iteration method to explicitly optimize the diagonal elements of the solution to be large, and the rest ones to be small via a predefined block-diagonal mask matrix. Since our aim is to induce inter-class and inter-class graphs from the solution for further embedding learning, our strategy of promoting intra-class affinity weights large and inter-class affinity weights small is more applicable for the problem of interest. In sum, the main contributions of this paper are as follows: (1) A self-expressive model, i.e., BLSR is devised by incorporating sparsity, low rankness as well as a novel block-diagonal constraint. BLSR can not only simultaneously capture the local and global structures, but also highlight both the intra-class similarities and inter-class differences of samples. (2) With the intra-class and inter-class graphs derived from BLSR, BLSGE seeks an optimal feature space by simultaneously minimizing the intra-class scatter and maximizing the inter-class scatter. Generally, a novel supervised dimensionality reduction method namely BLSE is developed by taking the advantages of BLSR and GE framework. (3) BLSE is applied for the dimensionality reduction and classification of visual data. Extensive experiments on the public face and object datasets verify the effective of proposed method.

Figure 2.
Illustration for block-diagonal constrained low-rank and sparse based embedding (BLSE) model. ① BLSR is applied to get the block-diagonal constrained low-rank and sparse representation of data. ② The representation results of BLSR is utilized to construct the intra-class and inter-class graphs. ③ BLSGE finds a low dimensional embedding with enhanced intra-class compactness and inter-class separation using the graphs.
The remainder of this paper is organized as follows: in Section 2, we briefly introduce some related works. In Section 3, we will introduce our BLSE model. Its two steps, i.e., BLSR and BLSGE, will be presented in detail. The experimental results are given in Section 4. Finally, we provide the discussion and conclusions in Section 5. An example for desired block-diagonal constrained low-rank and sparse representation (BLSR). Samples in dataset X are encouraged to be represented by samples from the same class with noise E removed, and the representation matrix Z tends to have a block-diagonal structure.
Then, as shown in Figure 2, the block-diagonal constrained low-rank and sparse graph embedding (BLSGE) method finds a low-dimensional subspace with enhanced intra-class compactness and inter-class separation using the graphs. It is worth noting that there are some major differences between our BLSR model and the model presented in [23], although they have similar formulations. In that method, a weight matrix is defined to provide a moderate amount of correct information for the solution. Different from their strategy, we separately optimize the solution to be sparse, and develop an iteration method to explicitly optimize the diagonal elements of the solution to be large, and the rest ones to be small via a predefined block-diagonal mask matrix. Since our aim is to induce inter-class and inter-class graphs from the solution for further embedding learning, our strategy of promoting intra-class affinity weights large and inter-class affinity weights small is more applicable for the problem of interest. In sum, the main contributions of this paper are as follows: (1) A self-expressive model, i.e., BLSR is devised by incorporating sparsity, low rankness as well as a novel block-diagonal constraint. BLSR can not only simultaneously capture the local and global structures, but also highlight both the intra-class similarities and inter-class differences of samples. (2) With the intra-class and inter-class graphs derived from BLSR, BLSGE seeks an optimal feature space by simultaneously minimizing the intra-class scatter and maximizing the inter-class scatter. Generally, a novel supervised dimensionality reduction method namely BLSE is developed by taking the advantages of BLSR and GE framework. (3) BLSE is applied for the dimensionality reduction and classification of visual data. Extensive experiments on the public face and object datasets verify the effective of proposed method.
Sensors 2017, 17, 1475 3 of 17 these merits, the intra-class and inter-class representations obtained by BLSR are utilized to construct corresponding intra-class and inter-class graphs. An example for desired block-diagonal constrained low-rank and sparse representation (BLSR). Samples in dataset are encouraged to be represented by samples from the same class with noise removed, and the representation matrix tends to have a block-diagonal structure.
Then, as shown in Figure 2, the block-diagonal constrained low-rank and sparse graph embedding (BLSGE) method finds a low-dimensional subspace with enhanced intra-class compactness and inter-class separation using the graphs. It is worth noting that there are some major differences between our BLSR model and the model presented in [23], although they have similar formulations. In that method, a weight matrix is defined to provide a moderate amount of correct information for the solution. Different from their strategy, we separately optimize the solution to be sparse, and develop an iteration method to explicitly optimize the diagonal elements of the solution to be large, and the rest ones to be small via a predefined block-diagonal mask matrix. Since our aim is to induce inter-class and inter-class graphs from the solution for further embedding learning, our strategy of promoting intra-class affinity weights large and inter-class affinity weights small is more applicable for the problem of interest. In sum, the main contributions of this paper are as follows: (1) A self-expressive model, i.e., BLSR is devised by incorporating sparsity, low rankness as well as a novel block-diagonal constraint. BLSR can not only simultaneously capture the local and global structures, but also highlight both the intra-class similarities and inter-class differences of samples. (2) With the intra-class and inter-class graphs derived from BLSR, BLSGE seeks an optimal feature space by simultaneously minimizing the intra-class scatter and maximizing the inter-class scatter. Generally, a novel supervised dimensionality reduction method namely BLSE is developed by taking the advantages of BLSR and GE framework.  Illustration for block-diagonal constrained low-rank and sparse based embedding (BLSE) model. ① BLSR is applied to get the block-diagonal constrained low-rank and sparse representation of data. ② The representation results of BLSR is utilized to construct the intra-class and inter-class graphs. ③ BLSGE finds a low dimensional embedding with enhanced intra-class compactness and inter-class separation using the graphs.
The remainder of this paper is organized as follows: in Section 2, we briefly introduce some related works. In Section 3, we will introduce our BLSE model. Its two steps, i.e., BLSR and BLSGE, will be presented in detail. The experimental results are given in Section 4. Finally, we provide the discussion and conclusions in Section 5. Illustration for block-diagonal constrained low-rank and sparse based embedding (BLSE) model. 1 BLSR is applied to get the block-diagonal constrained low-rank and sparse representation of data. 2 The representation results of BLSR is utilized to construct the intra-class and inter-class graphs.
The remainder of this paper is organized as follows: in Section 2, we briefly introduce some related works. In Section 3, we will introduce our BLSE model. Its two steps, i.e., BLSR and BLSGE, will be presented in detail. The experimental results are given in Section 4. Finally, we provide the discussion and conclusions in Section 5.

Related Works
Let us suppose a labeled dataset X = [x 1 , x 2 , . . . , x n ] ∈ D×n , D is the dimension of sample in original space. n is the number of training samples with n k (k = 1, 2, . . . , C) samples per class. The label of x i (i = 1, 2, . . . , n) is denoted as l(x i ). Data points in X are ordered, as is common, in terms of their class labels. Y = [y 1 , y 2 , . . . , y n ] ∈ d×n (typically d << D) is the lower dimensional projected data of X with projection matrix V ∈ D×d .

Low Rank and Sparse Representation
Low rank and sparse models have been seen a surge of interests in recent years, and been successfully exploited in many applications, such as subspace clustering [16,17], face recognition [25,31,32], head pose estimation [33], information processing [34][35][36][37], transfer learning [38,39], and extreme learning machine [40]. Low-rankness is an appropriate criterion to capture low-rank dimensional structure in high-dimensional data, and low-rank representation (LRR) is robust to sparse noise. Sparse representation (SR) has been shown good discrimination capacity. Low-rank and sparse representation is pursued to take the merits of both the two aspects. Learning the low-rank and sparse representation Z of dataset X on dictionary D can be formulated as follows: where · l is used to characterize noise E. It can be sparse noise E 0 or sample specific noise E 2,1 . λ and β control the effects of noise term E and sparse representation term Z. Since the l 0 -norm and rank minimization problems are non-convex, the problem is NP-hard. Alternatively, rank function can be relaxed with nuclear norm, which is defined as the sum of the singular values of a matrix. The l 0 -norm can be surrogated with l 1 -norm. Thus, one can get the following relaxed optimization problem: With dataset X itself as the dictionary, Reference [28] proposed the following optimization problem with sample-specific noise: where diag (Z) represents the vector containing the diagonal elements of Z, and 0 is a zero vector. The obtained low rank and sparse representation matrix Z can be utilized to construct intrinsic graph for DR [28].

Graph Embedding
The GE framework provides a unified perspective to understand many DR algorithms [12]. In GE, an intrinsic graph G = {X, W} that describes certain desired statistical or geometrical properties of data, and a penalty graph G P = X, W P characterizes a statistical or geometric property which should be avoided need to be constructed. Both G and G P are undirected weighted graphs. X is the vertex set. W ∈ n×n and W p ∈ n×n are the weight matrices.
Assuming that the low-dimensional vector representations of the vertices can be obtained from a linear projection as y = V T x. The purpose of GE is to map each vertex of graph into a low-dimensional space that preserves the similarity between the vertex pairs. Then an optimal low-dimensional embedding is given by the graph preserving criterion as: The weight W ij is used to measure the similarity of the edge connecting vertices. L P is the Laplacian matrix of the penalty graph G P or a simple scale normalization constraint. The linearization extension of graph embedding is computationally efficient for both projection learning and final classification. The construction of intrinsic graph and penalty graph becomes the crux of most dimensionality reduction methods. Besides, the intrinsic and penalty graphs could be, as our method shows, the intra-class and inter-class graphs.

Proposed Method
In Section 3.1, we will detail the two steps of BLSE, i.e., BLSR and BLSGE. The optimization processes for BLSR and BLSGE will be given in Section 3.2. Section 3.3 describes the classification process.

Block-Diagonal Constrained Low-Rank and Sparse Representation (BLSR)
To reveal the intra-class and inter-class adjacent relationships and discover the local and global structures in data, a self-expressive model, i.e., BLSR is firstly developed. The label information of data is harnessed by introducing a block-diagonal constraint to purse a block-diagonal solution. Specifically, the BLSR model is formulated as: where α, β and γ are the trade-off parameters for each component, and · F denotes the Frobenius norm of a matrix. In (5), we try to discover the block-diagonal structure of the resolution via the block-diagonal regularization Z − Z M 2 F , where is the Hadamard product operator of matrices and M is a predefined mask matrix with an ideal block-diagonal structure. Figure 3 shows an example for the definition of M. Z M is used to extract the intra-class representation coefficients for each sample. By minimizing Z − Z M 2 F , representation coefficient of each sample corresponding to the inter-class samples is promoted to be small, but not necessarily be zero. Each sample is encouraged to be represented by the intra-class samples. The obtained block-diagonal representation matrix Z has good identification capability highlighting both the intra-class similarities and inter-class differences. is a diagonal matrix with = ∑ . The weight is used to measure the similarity of the edge connecting vertices.
is the Laplacian matrix of the penalty graph or a simple scale normalization constraint. The linearization extension of graph embedding is computationally efficient for both projection learning and final classification. The construction of intrinsic graph and penalty graph becomes the crux of most dimensionality reduction methods. Besides, the intrinsic and penalty graphs could be, as our method shows, the intra-class and inter-class graphs.

Proposed Method
In Section 3.1, we will detail the two steps of BLSE, i.e., BLSR and BLSGE. The optimization processes for BLSR and BLSGE will be given in Section 3.2. Section 3.3 describes the classification process.

Block-Diagonal Constrained Low-Rank and Sparse Representation (BLSR)
To reveal the intra-class and inter-class adjacent relationships and discover the local and global structures in data, a self-expressive model, i.e., BLSR is firstly developed. The label information of data is harnessed by introducing a block-diagonal constraint to purse a block-diagonal solution.
Specifically, the BLSR model is formulated as: where , and are the trade-off parameters for each component, and ‖•‖ denotes the Frobenius norm of a matrix. In (5), we try to discover the block-diagonal structure of the resolution via the blockdiagonal regularization ‖ − ⊙ ‖ , where ⊙ is the Hadamard product operator of matrices and is a predefined mask matrix with an ideal block-diagonal structure. Figure 3 shows an example for the definition of . ⊙ is used to extract the intra-class representation coefficients for each sample. By minimizing ‖ − ⊙ ‖ , representation coefficient of each sample corresponding to the inter-class samples is promoted to be small, but not necessarily be zero. Each sample is encouraged to be represented by the intra-class samples. The obtained block-diagonal representation matrix has good identification capability highlighting both the intra-class similarities and interclass differences.

Block-Diagonal Constrained Low-Rank and Sparse Graph Embedding (BLSGE)
The representation matrix of BLSR is then employed as affinity weights matrix to construct intra-class graph = , and inter-class graph = , . First, we define the The diagonal-block elements of M are all ones, and the rest are zeros.

Block-Diagonal Constrained Low-Rank and Sparse Graph Embedding (BLSGE)
The representation matrix Z of BLSR is then employed as affinity weights matrix to construct intra-class graph G intra = X, W intra and inter-class graph G inter = X, W inter . First, we define the affinity matrix as W = |Z| + Z T /2, then the connecting weights for intra-class and inter-class graphs are respectively defined as: Whether two arbitrary points in the graphs are connected or not and the connection weight are adaptively determined by our BLSR model. It is desired that samples from the same class in feature space should be as close as possible, and those from different classes should be as far as possible. With projection matrix V, the optimization objective functions are defined as: For ease of classification, a big W intra ij is required in (8) to make the projected samples from the same class close to each other, and a small W inter ij is needed in (8) to make the projected samples from different classes far away from each other. The requirements can be satisfied by the representation strategy in BLSR. With some mathematical operations, we have: Then, the objective function of BLSGE can be formulated as:

Optimization for BLSR
We convert problem (5) into the following equivalent problem by introducing auxiliary variables J and L: We then have the corresponding Augmented Lagrange Multipliers (ALM) [41] function: where Y 1 ∈ D×n , Y 2 ∈ n×n and Y 3 ∈ n×n are Lagrange multipliers and µ > 0 is a penalty parameter. The problem can be solved iteratively by updating each variable with others fixed. The steps to solve the problem in (k + 1)-th iteration are as follows: Step 1 (Update Z): Z can be updated by solving the following optimization problem (13): Since the sub-problem for Z involves Hadamard product operator, which makes the problem hard to optimize. Alternatively, the Z in Z M can be obtained from former iteration. Thus, an iterative algorithm can be formed to solve the sub-problem of Z, we have: Step 2 (Update E): E can be updated by solving following optimization problem (15): Step 3 (Update J): J can be updated by solving the following optimization problem: where U, Σ, is the soft-thresholding (shrinkage) operator given by: Step 4 (Update L): L can be updated by solving following optimization problem (18): Step 5: Update the multipliers and µ: ∞ < ε 8: End while Output: Z Generally, we outline the optimization process of BLSR in Algorithm 1. The major computational burden of BLSR is solving (14) and (16) because they involve matrix inversion and singular value decomposition (SVD). The overall computational complexity of BLSR is O τ n 2 D + n 3 , where τ is the iteration number, and n is the number of training samples.

Optimization for BLSGE
The trace-ratio problem in the form of (10) does not have a closed-form solution [27]. Consequently, such problem can be approximately solved as a determinant-ratio problem, so we turn to solve the following problem (20): With the method of Lagrangian multiplier, the solution of problem (20) is transformed to solve a generalized eigenvalues problem as follows: Then we obtain the eigenvectors corresponding to the d minimum eigenvalues, and the projection matrix can be got as V = [v 1 , v 2 , . . . , v d ]. The detailed process of BLSGE is given in Algorithm 2. The complete procedure of BLSE is outlined in Algorithm 3.

Algorithm 2. BLSGE
Input: Affinity weights matrix Z, reduced dimension d. 1: Compute the weights of inter-class graph (6) and intra-class graph (7) through affinity matrix Z. 2: Solve the generalized eigenvalue problem (21), and get the eigenvectors corresponding to the d minimum eigenvalues. Output: Projection matrix V.

Algorithm 3. BLSE
Input: labeled training data X ∈ D × n . Reduced dimension d.
Tradeoff parameters α, β and γ. 1: Run Algorithm 1 to get the affinity weights matrix Z of X. 2: Run Algorithm 2 to obtain the optimal projection matrix V. Output: Projection matrix V.

Classification
For classification, we directly use the calculated projection V to obtain the transformation results of the training and testing data. One can apply existing classifier such as 1-Nearest Neighbor (NN) to classify the projected results of testing data.

Analysis of BLSE
The representation results of BLSR have a great influence on the graph construction and the performance of BLSE. There are three parameters in BLSR, i.e., regularization parameters α, β and λ. λ and β controls the sparsity noise term E and representation Z. α is used to regularize the representation to be block-diagonal. We conduct experiments to study the sensitivity of the proposed BLSE over a wide range of these parameters. ORL data [42] are divided into training and testing samples for tuning these parameters, and the reduced dimension is 50. The experimental results obtained are used to find an effective range of parameters to ensure a reliable performance. The results are reported in Figure 4. From the results, one can observe that the performance of BLSE is not especially sensitive to λ and is robust in a quiet large range for α and β.   Figure 5 further shows the graph weights matrix obtained by BLSR and the corresponding recognition accuracy obtained by BLSE with α = 1, 10, and 100 respectively. A larger penalty will be imposed on the block-diagonal regularization term as α increases. The obtained graph weights matrix tends to show a better block-diagonal structure. Benefitting from the block-diagonal graph weights matrix, BLSE achieves higher recognition accuracy. The results demonstrate that the proposed method can enhance the block-diagonal structure of graph weights matrix, which helps achieve better recognition performance. However, an extremely large α will regularize the inter-class representation  Figure 5 further shows the graph weights matrix obtained by BLSR and the corresponding recognition accuracy obtained by BLSE with α = 1, 10, and 100 respectively. A larger penalty will be imposed on the block-diagonal regularization term as α increases. The obtained graph weights matrix tends to show a better block-diagonal structure. Benefitting from the block-diagonal graph weights matrix, BLSE achieves higher recognition accuracy. The results demonstrate that the proposed method can enhance the block-diagonal structure of graph weights matrix, which helps achieve better recognition performance. However, an extremely large α will regularize the inter-class representation in Z to be zero, which might not be able to reveal the inter-class adjacent relationship among samples well. To achieve reliable and stable performance, a suggested parameter settings are 100 > α >1, 50 > β > 0.01, 50 > λ >1.
(c) β = 50  Figure 5 further shows the graph weights matrix obtained by BLSR and the corresponding recognition accuracy obtained by BLSE with α = 1, 10, and 100 respectively. A larger penalty will be imposed on the block-diagonal regularization term as α increases. The obtained graph weights matrix tends to show a better block-diagonal structure. Benefitting from the block-diagonal graph weights matrix, BLSE achieves higher recognition accuracy. The results demonstrate that the proposed method can enhance the block-diagonal structure of graph weights matrix, which helps achieve better recognition performance. However, an extremely large α will regularize the inter-class representation in to be zero, which might not be able to reveal the inter-class adjacent relationship among samples well. To achieve reliable and stable performance, a suggested parameter settings are 100 > α >1, 50 > β > 0.01, 50 > λ >1.

2-D Visualization Experiment on CMU PIE Dataset
In this part, a partial CMU PIE face database [43] (120 images of five persons) is used to intuitively show the discriminate ability of different methods using t-SNE [44]. In the experiment, seven images per person are randomly selected for training, and the remaining about 17 images for testing. Figure 6a-g visualize the testing data distributions along the first two dimensions acquired by different methods. From Figure 6, we may draw several conclusions. First, as classical supervised DR methods, LDA and MMC can yield superior performance to that of unsupervised PCA. Second, SGDA [27] only exploits the local neighborhood structure via sparse representation, and it does not perform well, as shown in Figure 6d. Some parts of class 1, 2, 3 and 4 mix together. By introducing global low-rank regularization, LGDA [28] shows better separation ability. Nevertheless, there are overlaps between class 2 and class 5, class 1 and class 3. With both sparse and low-rank constraints, SLGDA [28] performs better than SGDA and LGDA. However, class 2 and class 5 still have significant overlaps as shown in Figure 6f. Contrastively, the proposed BLSE successfully separates all the classes with clear boundaries between them, which can be explained by the simultaneously imposed local sparse, global low-rank and the discriminative block-diagonal structure constraint. The experiment shows that BLSE has the capacity to separate complex face data distribution. SLGDA [28] performs better than SGDA and LGDA. However, class 2 and class 5 still have significant overlaps as shown in Figure 6f. Contrastively, the proposed BLSE successfully separates all the classes with clear boundaries between them, which can be explained by the simultaneously imposed local sparse, global low-rank and the discriminative block-diagonal structure constraint. The experiment shows that BLSE has the capacity to separate complex face data distribution.

Experimental Results on Image Datasets
We conducted extensive experiments to evaluate the performance of the proposed method on widely used face and object databases (ORL [42], Yale [45], CMU PIE [43], and COIL 20 [46]). Figure 7 visually demonstrates characteristics of each database. Our approach is compared with several state-of-the-art subspace learning approaches including PCA, LDA, MMC [6], SGDA [27], LGDA [28], and SLGDA [28]. To make the comparison fair, for all the evaluated algorithms we first apply PCA as preprocessing step by retaining 99% energy. A nearest neighbor classifier is employed in the projected feature space for all the methods.

Experimental Results on Image Datasets
We conducted extensive experiments to evaluate the performance of the proposed method on widely used face and object databases (ORL [42], Yale [45], CMU PIE [43], and COIL 20 [46]). Figure 7 visually demonstrates characteristics of each database. Our approach is compared with several state-of-the-art subspace learning approaches including PCA, LDA, MMC [6], SGDA [27], LGDA [28], and SLGDA [28]. To make the comparison fair, for all the evaluated algorithms we first apply PCA as preprocessing step by retaining 99% energy. A nearest neighbor classifier is employed in the projected feature space for all the methods. The ORL face database consists of a total of 400 face images from 40 individuals with 10 images per person. The images were taken at different times, lighting variation, facial expressions (open/closed eyes, smiling/not smiling) and facial details (glassed/no glassed) against a dark homogeneous background. In the experiments, each image in ORL database is manually cropped and resized to 32 × 32. Using the five samples per person from ORL database as training set, we present the first five basis vectors of Eigenfaces, Fisherfaces, and our BLSEFaces in Figure 8. The ORL face database consists of a total of 400 face images from 40 individuals with 10 images per person. The images were taken at different times, lighting variation, facial expressions (open/closed eyes, smiling/not smiling) and facial details (glassed/no glassed) against a dark homogeneous background. In the experiments, each image in ORL database is manually cropped and resized to 32 × 32. Using the five samples per person from ORL database as training set, we present the first five basis vectors of Eigenfaces, Fisherfaces, and our BLSEFaces in Figure 8. A random subset with t (=3, 4, 5, 6) images of each individual is selected for training and the rest for testing. For each t, we run the programs 10 times and calculate the recognition rates as well as the standard deviations with different reduced dimensions.
The Yale face database contains 165 gray scale images of 15 individuals, each individual has 11 images. The images demonstrate variations in lighting condition, facial expression (normal, happy, sad, sleepy, surprised, and wink). In our experiments, each image in Yale database was manually cropped and resized to 32 × 32. A random subset with t (=4, 5, 6, 7) images each individual is selected for learning the embedding and the rest for testing. For each giving t, we run each program 10 times to randomly choose the training set and report the average recognition rates as well as the standard deviations with different reduced dimensions.
The CMU PIE dataset contains over 40,000 face images of 68 individuals. Images of each individual were acquired across 13 different poses under 43 different illumination conditions, and with four different expressions. Here we use a near frontal pose subset, namely C07, for experiments, which contains 1629 images of 68 individuals. Each individual has about 24 images. All images are manually cropped and resized to 32 × 32 pixel. A random subset with t (=4, 5, 6, 7) images for each individual is selected for learning the embedding and the rest for testing. For each giving t, we perform 10 times to randomly choose the training set and report the average recognition rates as well as the standard deviations under different dimensions.

Discussion and Conclusions
Based on the experimental results on face and object image datasets, one can conclude that with the increase of dimensions and the number of training samples per class, all the methods tend to achieve better performance. PCA is simple to calculate, and performs well in some cases, but its unsupervised nature restricts its performance. By introducing supervised information with different discrimination criteria, LDA and MMC can achieve better performance. SGDA, LGDA and SLGDA can adaptively select neighbors for graph construction, and find the representation of each sample using the labeled samples in the same class to purse block-diagonal structure representations. However, this process may result in large representation error due to the limited samples per class, which might not be able to reveal the intra-class adjacent relationship well. Besides, SGDA, LGDA and SLGDA disconnect inter-class samples in graph construction. The operation cannot capture the inter-class adjacent relationship well. As a result, SGDA, LGDA and SLGDA do not perform well, as the experimental results show. Comparably, the proposed BLSE model can achieve better

Discussion and Conclusions
Based on the experimental results on face and object image datasets, one can conclude that with the increase of dimensions and the number of training samples per class, all the methods tend to achieve better performance. PCA is simple to calculate, and performs well in some cases, but its unsupervised nature restricts its performance. By introducing supervised information with different discrimination criteria, LDA and MMC can achieve better performance. SGDA, LGDA and SLGDA can adaptively select neighbors for graph construction, and find the representation of each sample using the labeled samples in the same class to purse block-diagonal structure representations. However, this process may result in large representation error due to the limited samples per class, which might not be able to reveal the intra-class adjacent relationship well. Besides, SGDA, LGDA and SLGDA disconnect inter-class samples in graph construction. The operation cannot capture the inter-class adjacent relationship well. As a result, SGDA, LGDA and SLGDA do not perform well, as the experimental results show. Comparably, the proposed BLSE model can achieve better performance. The reason is twofold. Firstly, the developed BLSR method can capture both local neighborhood relations and global structures latent in data with low-rank and sparse constraints. Different from SGDA, LGDA and SLGDA, all samples are employed in BLSR when finding the representation of each sample. The introduction of block-diagonal regularization can capture the intra-class and inter-class adjacent relationships hidden in data, and enhance the identification capability of BLSR. Secondly, benefit from BLSR and GE framework, the discriminative capacity of low dimensional subspace learned by BLSGE is further boosted by simultaneously minimizing the intra-class scatter and maximizing the inter-class scatter.
To conclude, we have proposed a novel block-diagonal constrained low-rank and sparse based embedding (BLSE) model for the dimensionality reduction and classification of image data. Two procedures of BLSE, namely, block-diagonal constrained low-rank and sparse representation (BLSR) and block-diagonal constrained low-rank and sparse graph embedding (BLSGE), are detailed. BLSR takes the advantages of local discriminative capacity of SR and the global low-rank property of LRR. Meanwhile, a novel block-diagonal regularization term is introduced to fully harness the label information and purse a block-diagonal representation. The affinity weights matrix obtained by BLSR can well reveal the intra-class similarities and inter-class differences of data. With the intra-class and inter-class graphs derived from BLSR, BLSGE finds a low-dimensional subspace with enhanced intra-class compactness and inter-class separation. Experimental results on public face and object datasets are performed, and validate the effectiveness of BLSE model.