Graph Regularized Within-Class Sparsity Preserving Projection for Face Recognition

.


Introduction
Face recognition is an important but complicated problem in computer vision.It has broad applications ranging from computer interface to surveillance.Many algorithms have been proposed in literature, including two-dimensional face recognition and three-dimensional face recognition methods [1][2][3][4].Three-dimensional face recognition, which tries to use 3D geometry of the face for identification, proves to be more robust to illumination, pose and disguise.However, the problem of facial expressions is a major issue in 3D face recognition, since the geometry of the face significantly changes with different facial expressions.Most of the images can be seen as two-dimensional matrices, so 2D face recognition also received tremendous attention in computer vision and pattern recognition.Subspace learning methods, such as principle component analysis (PCA) [5] and linear discriminant analysis (LDA) [6], have been extensively studied.Both of them seek to find the low-dimensional representation for the original high-dimensional data, and to preserve some kind of intrinsic structure.
PCA is an unsupervised method and the projections are obtained by maximizing the total scatter of the data.While LDA is a supervised method and it tries to maximize the ratio of between-class scatter to within-class scatter.Experiments show that LDA outperforms PCA in face recognition.However, it is reported that the face images probably reside in some sort of manifolds [7].One problem of these two algorithms is that they only exploit the linear global Euclidean structure and ignores the local geometry structure.Although they have been extended to nonlinear methods like KPCA [8] and KLDA [9] by kernel trick, it is hard to choose a perfect kernel function and the computation is expensive.
Manifold learning tries to find an embedding that projects the high dimensional data onto low dimensional data while preserving the intrinsic geometry of data, especially the local geometry.The representatives are Isomap [10], locally linear embedding (LLE) [11] and Laplacian eigenmaps (LE) [12].However, the manifold learning algorithms are affected by two critical problems [13]: (i) the construction of the adjacency graph, (ii) the embedding of new test data, which is also called the out of sample problem.As for the later problem, He proposed a linear method named locality preserving projections (LPP) [14] to approximate the eigenfunctions of the Laplace-Beltrami operator on the manifold, that is to say, LPP is a linear version of LE.By considering the local information and class label information, many variants [15][16][17][18] were proposed and can achieve good performance.One critical step in these methods is to construct the adjacency graph; however, how to define the sparse adjacency weight matrices is still an open problem.
Traditional method for adjacency graph is to use the k nearest  neighborhood graph or ε-neighborhood graph.However, these two methods are all parametric and the performances of the algorithms are highly dependent on the choice of its parameter.In [19], it is reported that the adjacency graph structure and graph weights are highly interrelated and should not be separated.So, it is desirable to design a method that can simultaneously carry out these two tasks in one step.To this end, recently two 1 l graph construction methods [20,21] have been proposed, which complete the adjacency graph and graph weight calculation within one step.
Recently, the sparse representation (SR) [22] has been extensively studied and found wide applications in computer vision and image processing problems.The main idea of SR is that a query image can be sparsely represented as a linear combination of all the training samples, its non-zero representation coefficients are naturally sparse and the representations are mostly from the same class of the query image, SR is an unsupervised method but it exploits the discriminant nature of sparse representation for classification.Based on this idea, Qiao proposed sparsity preserving projection (SPP) [23] for feature extraction, which tries to preserve the sparse reconstructive relationship of samples in the low-dimensional data by minimizing the distance between sparsely reconstructed samples and the original sample.However, there are still some issues to be solved.First, SPP is an unsupervised method and does not make use of the class information.Second, when the dictionary is large, SPP is very time-consuming.
To this end, in this paper, a method called graph regularized within-class sparsity preserving projection analysis (GRWSPA) is proposed, which aims at preserving the within-class sparse reconstructive relationship by minimizing the distance between sparsely reconstructed samples in the same class (within class) and their corresponding original samples like SPP, which can reduce the computation time, as the number of samples in each class is usually much smaller than the total number of training samples.At the same time, by assuming samples in different classes lie on different sub-manifolds, it tries to maximize the scatter of inter-class samples by constructing a between-class adjacency graph, and pulls samples from different classes as far as possible.
The rest of the paper is organized as follows.In Section 2, SPP is briefly reviewed.The proposed algorithm is presented in Section 3. In Section 4, experiments are carried out to evaluate the proposed algorithm.Finally, the conclusions are drawn in Section 5.

Sparsity Preserving Projections
be the training samples.In real applications, the dimensionality D is often very high.One reasonably way is to use dimensionality reduction to map the data from the high-dimensional space to a low dimensional one, which can be expressed mathematically as  is sparse enough, the above optimization can be replaced as This can be solved by standard convex programming method [24].Suppose i  is the optimal solution to the above optimization, SPP then tries to preserve the sparse reconstruction relationship, which can be expressed as the following optimization: min . .
which can be simplified by simple algebra: So the optimal projections are the eigenvectors of the following generalized eigenvalue problem

Graph Regularized Within-Class Sparsity Preserving Analysis
From the above section, we can see that SPP is an unsupervised method, and does not use the label information properly.Moreover, the sparse representations are obtained from the whole training samples.If the number of training samples is large, the process is very computation-expensive.In this section we will present an improved SPP algorithm.

Preserve the Sparsity Structure for Within-Class Samples
In sparse representation, a test sample i x can be represented as a linear combination of all training samples, the non-zero sparse representation coefficients j i w can reflect the contribution of j x while reconstructing i x .The higher value j i w is, the more similar j x and i x are, and are supposed to concentrate on the training samples within the same class as the test sample.While the small value j i w means that j x has little contribution for reconstructing i x , and is probably from different classes.
However, SPP does not consider the class information, and its adjacency graph weights are based on sparse representation and take the whole training samples as dictionary.However, it is very timeconsuming if the number of training samples is large.One solution to this problem might be that we can take the samples in the same class as the dictionary to reconstruct i x , like SPP, it can be represented as: , .
where , ki x is denoted as the th i sample in the th k class, . Suppose ' ,, (0,0,0, ,0, ,0, ,0) , then the weight matrix has the form of 12 ' ' ' ' ' ' 1,1 Like SPP, we hope that the sparse structure can be well preserved, which can be solved by the following formulation: The above optimization can be reduced to the following problem: max ( ) . .

TT TT tr A XSX A s t A XX A I 
where

Discover the Discriminant Structure for between Class Samples
It is supposed that samples from different classes lie on different sub-manifolds; one reasonable way for classification is to map these sub-manifolds as far as possible.We construct an adjacency graph ( , ) G X B  over the training data X to characterize the relationship for different classes.The elements of the weight matrix B can be defined as follows: ij (1 ) if x and x are k nearest neighbors but have different labels In order to guarantee the discriminant ability in low dimensional representation, like Unsupervised Discriminant Projection (UDP) [25], we hope that the connected points in the adjacency graph should stay as distant as possible, which can be expressed as the following optimization: where i y is the low dimensional representation of i x .The above objective incurs a heavy penalty if nearby points i x and j x are mapped close while they are belonging to different classes, which is an attempt to ensure that if points i x and j x are close but are from different classes, then i y and j y are far apart, which can encode the local discriminant information and helpful for classification.We can simplify the above optimization as follows: where B S is called the Laplacian difference scatter matrix. 11 where D is a diagonal matrix, as ii ij j DB   , L D B  is the Laplacian matrix.

GRWSPA
To take the within-class reconstruction relationship and between-class separability into account, it is desirable to keep the reconstruction weights in the same class as SPP while maximize the local discriminant information.By combining 3.1 and 3.2, it can easily form the following optimization max ( .
where μ is a factor to balance the sparse representation and the discriminant ability.
For compact expression, the maximization problem can further be transformed to the following problem: Then the optimal A is the eigenvectors corresponding to the largest d eigenvalues of the following generalized eigenvalue problem:

Experimental Section
In this section, several experiments are carried out to show the effectiveness of the proposed algorithm on the ORL and YALE databases.We compare our method with some classic methods including LDA, LPP, UDP and SPP.For classification, we use the nearest neighbor classifier for its easy implementation.There is a parameter μ , here we set it to max max μ λ ( ) / λ () , where λ () T XSX means the maximum eigenvalue of T XSX .Note that, during the feature extraction, we will encounter that some matrices are singular, so here PCA is employed as a preprocessing step and keep 98% energy of images.For UDP, the neighborhood size needs to be determined, here we set it to 1 i kn  , where i n is the number of samples in the th i class.The ORL database contains 40 individuals; each has 10 sample images with some variations in poses, facial expressions and some details.For each image, it is taken at different times and has different variations including expressions like open or closed eyes, smiling or non-smiling.Some are captured with a tolerance for some tilting and rotation of the face up to 20 degrees.Figure 1 shows some samples of one subject from ORL database.For each l , we run 10 times for each algorithm and obtain the average rate as the recognition rate.Table 1 gives the classification accuracy rates (%) for each algorithm under different sizes of training.To see how the dimensionality affects recognition rate, Figure 2 shows the recognition rates for different method with respect to different dimensionality on ORL database with four training samples per person.The YALE database contains 165 images from 15 subjects, with each 11 images.The images are captured with variations in lighting condition, facial expression (normal, happy, sad, sleepy, surprised, and wink).Figure 3 shows some samples of one subject from YALE database.For each l , we runs 10 times for each algorithm and obtain the average rate as the recognition rate.Table 2 gives the classification accuracy rates (%) for each algorithm under different sizes of training.From Figure 2 and the tables above, we can see that all the algorithms perform better on ORL than YALE database.This is probably on ORL the images have less variation than the images on YALE.LDA and UDP outperform PCA, this is probably PCA is representative in the low dimensional space and helpful for reconstruction, while LDA is a supervised method and takes the class information into account.UDP, as a manifold learning algorithm, makes use of the local and non-local information of the face image, demonstrates its effectiveness in feature extraction.SPP is based upon sparse representation, which preserves the sparse reconstructive relationship of the data and contains natural discriminant information even if it is unsupervised.The proposed algorithm, on one hand, preserves the within-class sparse reconstructive relationships like SPP, on the other hand, maximizes the scatter of samples from different classes.So after projection, data from the same class are compact while data from different classes are far apart.So the proposed algorithm has much better performance than PCA, LDA, LPP, UDP and SPP.

Conclusions
In this paper, based on sparsity preserving projection, we propose a new algorithm called Graph Regularized Within-class Sparsity Preserving Analysis (GRWSPA).GRWSPA preserves the within-class sparse reconstruction weights so as to discover the intrinsic information, while maximizing the between-class scatter so that after projection the samples from different classes are far apart.Experiments were carried out on the ORL and YALE face databases, and the results demonstrate the performance advantage of the proposed algorithm over others. k

Figure 1 .
Figure 1.Samples of one subject from ORL database.

Figure 3 .
Figure 3.Samples of one subject from YALE database.

Table 1 .
Recognition Rates on ORL.

Table 2 .
Recognition Rates on YALE.