Palmprint and Face Multi-Modal Biometric Recognition Based on SDA-GSVD and Its Kernelization

When extracting discriminative features from multimodal data, current methods rarely concern themselves with the data distribution. In this paper, we present an assumption that is consistent with the viewpoint of discrimination, that is, a person's overall biometric data should be regarded as one class in the input space, and his different biometric data can form different Gaussians distributions, i.e., different subclasses. Hence, we propose a novel multimodal feature extraction and recognition approach based on subclass discriminant analysis (SDA). Specifically, one person's different bio-data are treated as different subclasses of one class, and a transformed space is calculated, where the difference among subclasses belonging to different persons is maximized, and the difference within each subclass is minimized. Then, the obtained multimodal features are used for classification. Two solutions are presented to overcome the singularity problem encountered in calculation, which are using PCA preprocessing, and employing the generalized singular value decomposition (GSVD) technique, respectively. Further, we provide nonlinear extensions of SDA based multimodal feature extraction, that is, the feature fusion based on KPCA-SDA and KSDA-GSVD. In KPCA-SDA, we first apply Kernel PCA on each single modal before performing SDA. While in KSDA-GSVD, we directly perform Kernel SDA to fuse multimodal data by applying GSVD to avoid the singular problem. For simplicity two typical types of biometric data are considered in this paper, i.e., palmprint data and face data. Compared with several representative multimodal biometrics recognition methods, experimental results show that our approaches outperform related multimodal recognition methods and KSDA-GSVD achieves the best recognition performance.


Introduction
Multimodal biometric recognition techniques use multi-source features together in order to obtain integrated information to obtain more essential data about the same object. This is an active research direction in the biometric community, for it could overcome many problems that bother traditional single-modal biometric system, such as the instability in one's feature extraction, noisy sensor data, restricted degree of freedom, and unacceptable error rates. Information fusion is usually conducted on three levels, i.e., pixel level [1,2], feature level [3][4][5] and decision level [6][7][8][9]. The former two levels mainly aim at learning descriptive features, while the last level aims at finding a more effective way to use learned features for decision making. Especially, at the pixel level and feature level, discriminant analysis technique always plays an important role to acquire more descriptive or more discriminative features.
In this paper, we have developed a novel multimodal feature extraction and recognition approach based on linear and nonlinear discriminant analysis technique. We adopt the feature fusion strategy, as features play a critical role in multimodal biometric recognition. More specifically, we try to answer the question of how to effectively obtain discriminative features from multimodal biometric data. Some related works have appeared in the literature. In [1,2], multimodal data vectors are firstly stacked into a higher dimensional vector to form a new sample set, from which discriminative features are extracted for classification. Yang [3] discussed the feature fusion strategy, that is, parallel strategy and serial strategy. The former uses complex vectors to fuse multimodal features, i.e., one modal feature is represented as the real part, and the other modal feature is represented as the imaginary part; while the latter stacks features of two modals into one feature, which is used for classification. Sun [4] proposed a method to learn features from data of two modalities based on CCA, but it has not been utilized in biometric recognition, and is not convenient to learn features from more than two modes of data.
While current methods generally extract discriminative features from multimodal data technically, they have rarely considered the data distribution. In this paper, we present an assumption that is consistent with the viewpoint of discrimination, that is, in the same feature space, one person's different biometric identifier data can form different Gaussians, and thus his overall biometric data can be described using mixture-Gaussian models. Although LDA has been widely used in biometrics to extract discriminative features, it has the limits that it can only handle the data of one person that forms a single Gaussian distribution. However, as we pointed out above, in multimodal analysis, different biometric identifier data of one person can form mixture-Gaussians. Fortunately, subclass discriminant analysis (SDA) [30] has been proposed to remove such a limit of LDA, and therefore could be used to describe multimodal data that lie in the same input space.
Based on the analysis above, in this paper we propose a novel multimodal biometric data feature extraction scheme based on subclass discriminant analysis (SDA) [20]. For simplicity, we consider two typical types of biometric data, that is, face data and palmprint data. For one person, his face data and palmprint data are regarded as two subclasses of one class, and discriminative features are extracted by seeking an embedded space, where the difference among subclasses belonging to different persons is maximized, and the difference within each subclass is minimized. Then, since the parallel fusion strategy is not suitable to fuse features from multiple modals, we fuse the obtained features by adopting the serial fusion strategy and use them for classification.
Two solutions are presented to solve the small sample size problem encountered in calculating the optimal transform. One is to initially do PCA preprocessing, and the other is to employ the generalized singular value decomposition (GSVD) [31,32] technique. Moreover, it is still worthy to explore the non-linear discriminant capability of SDA in multimodal feature fusion, in particular, when some single-modals still show complicated and non-linearly separable data distribution. Hence, in this paper, we further extend SDA feature fusion approach in the kernel space and present two solutions to solve the small sample size problem, which are KPCA-SDA and KSDA-GSVD. In KPCA-SDA, we first use KPCA to transform each single modal input space R n into an m-dimensional space, where m = rank(K), K is the centralized Gram matrix. Then SDA is used to fuse the two transformed features and extract discriminative features. In KSDA-GSVD, we directly perform Kernel SDA to fuse multimodal data by applying GSVD to avoid the singular problem.
We evaluate the proposed approaches on two face databases (AR and FRGC), and the PolyU palmprint database, and compare the results with related methods that also tend to extract descriptive features from multimodal data. Experimental results show that our approaches achieve higher recognition rates than compared methods, and also get better verification performance than compared methods. It is worthwhile to point out that, although the proposed approaches are validated on data of two modalities, it could be easily extended to multimodal biometric data recognition.
The rest of this paper is organized as follows: Section 2 describes the related work. Section 3 presents our approach. In Section 4, we present the kernelization of our approach. Experiments and results are given in Section 5 and conclusions are drawn in Section 6.

Related Work
In this section, we first briefly introduce some typical multimodal biometrics fusion techniques such as pixel level fusion [1,2], Yang's serial and parallel feature level fusion methods [3]. Further, three related methods, which are SDA, KSDA and KPCA, are also briefly reviewed.

Multimodal Fusion Scheme at the Pixel Level
The general idea of pixel level fusion [1,2] is to fuse the input data from multi-modalities in as early as the pixel level, which may lead to less information loss. The pixel level fusion scheme fuses the original input face data vector and palmprint data vector of one person, and then the discriminant features are extracted from the fused dataset. For simplicity and fair comparison, we testified the effectiveness of such scheme by extracting LDA features from the fused set in this paper.

Serial Fusion Strategy and Parallel Fusion Strategy
In [3], Yang et al. the authors discussed two strategies to fuse features of two data modes. One is called serial strategy and the other is called parallel strategy. Let x i , y i denote the face feature vector and palmprint feature vector of the i th person, respectively. The serial fusion strategy obtains the fused features by stacking two vectors into one higher dimensional vector α i , i.e.: On the other hand, the parallel fusion strategy combines the features into a complex vector β i , i.e., Yang et al. also pointed out that the fused feature set {α i } and {β i } can either be used directly for classification, which is called feature combination, or can be input into a feature extractor to further extract more descriptive features with less redundant information, which is called feature fusion.

Subclass Discriminant Analysis (SDA) and Its Kernelization
Subclass discriminant analysis (SDA) [30] is an extension of LDA, which aims at processing data of one class that form mixture Gaussian distribution. It divides each class into a number of subclasses, and calculates a transform space where the distances between both class means and subclass means are maximized, and distances between samples of each subclass is minimized. SDA redefines the between-class scatter Σ B , within-class scatter Σ W as: where H i is the number of subclasses of class i, p ij = n ij /n is the prior of the j th subclass of class i, μ ij is the mean of the j th subclass of class i. The advantage of this new definition of between class scatter is that it emphasizes the role of class separability over that of intra-subclass scatter. The optimal solution of SDA is the eigenvectors of matrix (Σ W ) −1 Σ B associated with the largest eigenvalues. Kernel subclass discriminant analysis (KSDA) is the nonlinear extension of SDA based on kernel functions [26]. The main idea of the kernel method is that without knowing the nonlinear feature mapping explicitly, we can work on the feature space through kernel functions. It first maps the input data x into a feature space F by using a nonlinear mapping . KSDA adopts nonlinear clustering technique to find the underlying distributions of datasets in the kernel space. The between-class scatter matrix and within-class scatter matrix of KSDA are defined as: where indicates the mean vector of j th subclass of i th class, is the global mean. Like SDA, KSDA tries to maximize the ratio / to find a transformation matrix V. The columns of V are the eigenvectors corresponding to the largest eigenvalues of .

Kernel Principle Component Analysis
In kernel PCA [33], the input data x is mapped into a feature space F via a nonlinear mapping and then perform a linear PCA in F. To be specific, we centralize the mapped data as ∑ = 0 firstly, where M is the number of input data. Then the covariance matrix of the mapped data (x i ) is defined as follows: Like PCA, the eigenvalue equation λV = CV must be solved for eigenvalue λ ≥ 0 and eigenvector V F\{0}. We can prove that all the solutions V lie in the space spanned by (x 1 ),... (x M ). Therefore, we may consider the equivalent system: for all (8) and V can be represented as the linear combination of the mapped data (x i ): coefficients α 1 ,...α M such that: (9) where α 1 ,...α M denotes the coefficients. Substituting Equations (8) and (9) into Equations (7), and defining an M × M matrix K by: we arrive at: where α den solutions of for nonzero using α to ge

Subclass
In this se Two solutio GSVD. The approaches.

Problem
For simp face data, an o assume t palmprint an An example  As can be seen from Figure 1, identifier samples of one person show typical mix-Gaussian distribution, i.e., the face data cluster together and form a Gaussian, while the palmprint data form another Gaussian. If we apply traditional LDA, which enforces both of face and palmprint data of one person to cluster together, then data of two persons would be very likely overlap in the embedded space. It is apparent that, in Figure 1, SDA is a better descriptor of such a data distribution.
Let and be the k th face sample and palmprint sample of person i, respectively; n c represent the sample number of each subclass. Then we construct the between-subclass scatter matrix S B and within-subclass scatter matrix S W as follows: Let be the optimal transform vector to be calculated, and then it can be obtained by: The within-class matrix S W is usually singular, and the solution cannot be calculated directly. We present two solutions below to solve this problem, i.e., SDA-PCA and SDA-GSVD.

SDA-PCA
The first solution is to first apply PCA to project each image into a lower dimensional space, and then apply SDA to do feature extraction. By employing the Lagrange multipliers method to solve the optimization problem (15), we could obtain the optimal solution W SDA , i.e., the eigenvectors of matrix (S W ) −1 S B associated with the largest eigenvalues. Based on Formula (14), the rank of S W is n -2c, where n represents the total number of training samples (including face and palmprint images), and c represents the number of persons. Therefore, we can project original samples into a subspace whose dimension is no more than n -2c, and then apply SDA to extract features.
Let , separately denote the initial PCA transformations of the sample set of each modal, and W SDA denote the later SDA transform. Then the final transformations for each modal are expressed as: After the optimal transformations and are obtained, we project the face sample and palmprint sample on them: Then, features derived from face and palmprint are fused used using serial fusion strategy and used for classification:

SDA-GSVD
While PCA is a popular way to overcome the singular problem and accelerate computation, it may cause information loss. Therefore, we present a second way to overcome the singularity problem by employing GSVD. First, we rewrite the between-class scatter matrix and within-class scatter matrix as follows: (19) H b is obtained by transforming formula (13) as follows: (20) Compared with Equation (21), H b is defined as: (21) where .
According to Equation (14), we can easily achieve H w : Then, we employ GSVD [31,32] to calculate the optimal transform, and the procedures are given in Algorithm 1.

Algorithm 1. Procedures of GSVD based LDA.
Step 1: Define matrix K = [H b , H w ] T , and compute the complete orthogonal decomposition Step 2: Compute G by performing SVD on matrix , i.e., , where t is the rank of K.
Step 3: Compute matrix M = Q 0 0 . Put the first c − 1 columns of M into matrix W. Then, W is the optimal transform matrix.
Then, face data and palmprint data are separately projected on W and fused using serial fusion strategy: is then used for classification.

Algorithmic Procedures
In this section, we summarize the complete algorithmic procedures of the proposed approach. In practice, if the dimension of two biometric data and are not equal, we could simply pad the lower-dimensional vector with zeros until its dimension is equal to the other one before fusing them using SDA. In case of SDA-PCA, after PCA projection, it is easy guarantee that and have the same dimension if we select the same number of principal components for them.  Figure 2 displays the complete procedure of the proposed approach for multimodal biometric recognition. It is worthwhile to note that, on one hand, our approach outputs features of each modal separately, which is convenient for later processing; on the other hand, discriminative information of different modals have been initially fused in the extraction process, since their features are extracted from the same input space and the transformed space also consider the distribution of data of other modals. Therefore, we think this approach can effectively obtain fused discriminative information from multimodal data.

SDA Kernelization Based Multimodal Biometric Feature Extraction
In this section, we provide the nonlinear extensions of two SDA based multimodal feature extraction approaches, which are named KPCA-SDA and KSDA-GSVD. In KPCA-SDA, we first apply Kernel PCA on each single modal before performing SDA. While in KSDA-GSVD, we directly perform Kernel SDA to fuse multimodal data by applying GSVD to avoid the singular problem.

KPCA-SDA
In this subsection, the SDA-PCA approach is performed in a high dimension space by using the kernel trick. We realized the KPCA-SDA in the following steps: (2) Perform KPCA for each single modal database .
For the j th modal, we perform KPCA by maximizing the following equation: where , and is the global mean of the j th modal database in the kernel space.
According to the kernel reproducing theory [34], the projection transformation in F can be linearly expressed by using all the mapped samples: where is a coefficient matrix.
Substituting Equation (26) into Equation (25), we have: where K j = Ψ Ψ , which indicates an N × N non-symmetric kernel matrix whose element is , where denotes the total number of the samples, denotes the m th sample of the j th modal database. The solution of Equation (27) is equivalent to the eigenvalue problem: The optimal solutions α j = (α j1 , α j2 ,…, α j(N-c) ) T are the eigenvectors corresponding to N − c largest eigenvalues of . We project the mapped training sample set Ψ j on by: (3) Calculate kernel discriminant vectors in the KPCA transformed space.
By using the KPCA transformed sample set , we reformulate Equations (13) and (14) as: where is the sample in , and .
We can obtain a set of nonlinear discriminant vectors , i.e., the eigenvector of matrix ( ) −1 associated with the largest eigenvalues.
(4) Construct the nonlinear projection transformation and do classification.
We then construct the nonlinear projection transformation as: (31) After the optimal transform is obtained, the fused features can be generated as:

KSDA-GSVD
In this subsection, the SDA-GSVD is performed in a high dimension space by using the kernel trick.
Then, we apply GSVD to calculate the optimal transformation so that the singular problem is avoided. The procedures are precisely introduced in Algorithm 1. When the optimal is obtained, the fused features can be generated as: (36) Finally, the nearest neighbor classifier with cosine distance is employed to perform classification.

Experime
In this se modal metho verification databases an

Experimental Identification Results
Firstly, the identification experiments are conducted. Identification is a one-to-many comparison which aims to answer the question of "who is this person?" We compare the identification performance of two proposed approaches, i.e., SDA-PCA (which is abbreviated to SDA here), SDA-GSVD, with single modal recognition method using traditional LDA, a representative pixel level fusion method [1], parallel and serial feature level fusion [3], and score level fusion method using the sum rule [7], respectively. Further, we compare the proposed kernelizaion methods (KPCA-SDA and KSDA-GSVD), with single modal recognition method using KDA. Figures 6 and 7 Table 1 shows that on the AR and PolyU palmprint databases, SDA and SDA-GSVD perform better than other compared linear methods. It also shows that KPCA-SDA and KSDA-GSVD achieve better recognition results than KDA (single modal). Compared with the single modal LDA, pixel level fusion, parallel feature fusion, parallel feature fusion, serial feature fusion and score level fusion, SDA improves the average recognition rate at least by 3.53% (=98.23%-92.99%), SDA-GSVD improves the average recognition rate at least by 5.24% (=98.23%-92.99%). And the average recognition rate of KPCA-SDA is at least 15.29% (=98.74%-83.45%) higher than that of KDA (single modal), and the average recognition rate of KSDA-GSVD is at least 15.7% (=99.15%-83.45%) higher than that of KDA (single modal). Table 2 shows a similar phenomenon on the FRGC and PolyU palmprint databases. SDA boosts the average recognition rate at least by 0.85% (=98.06%-97.21%), and SDA-GSVD boosts the average recognition rate at least by 1.40% (=98.61%-97.21%) than other linear methods. The average recognition rate of KPCA-SDA is at least 17.59% (=98.82-81.23) higher than that of KDA (single modal), and the average recognition rate of KSDA-GSVD is at least 17.79% (=99.02%-81.23%) higher than that of KDA (single modal).

Experimental Results of Verification
Verification is a one-to-one comparison which aims to answer the question of "whether the person is one he/she claims to be". In the verification experiments, we show the receiver operating characteristic (ROC) curves, which plot the false rejection rate (FRR) versus the false accept rate (FAR), to report the verification performance. There is a tradeoff between the FRR and the FAR. It is possible to reduce one of them with the risk of increasing the other one. Thus the curve which is called receiver operating characteristic (ROC) reflects the tradeoff between the FAR and FRR, and FRR is plotted as a function of FAR. ± ± ± ± ± ± ± ± ± ± ± ± Figures 8 and 9 show the Receiver Operating Characteristic (ROC) curves of our approaches and other compared methods on different databases. Table 3 shows the equal error rate (EER) of all compared methods. From the ROC curves shown in Figures 8-9 and the results listed in Table 3, we can see that our SDA based feature extraction approaches attains a significantly low EER (a point on the ROC curve where FAR is equal to FRR) than other representative multimodal fusion methods, including pixel level fusion method, score level fusion method and feature level fusion methods. On the AR face and PolyU palmprint databases, the lowest EER of related methods is 3.71%, while the EER of our approaches are all below 1%. And our KSDA-GSVD approach obtains the lowest EER 0.56% among all compared methods. On the FRGC face and PolyU palmprint databases, the lowest EER of other methods is 2.62%, while the EER of ours are all below 2%. Especially, the proposed SDA-GSVD approach gets the lowest EER that is 0.28%. The above experimental results demonstrate the superiority of our approaches.

Conclusions
In this paper, we present novel multimodal biometric feature extraction approaches using subclass discriminant analysis (SDA). Considering the nonsingularity requirements, we present two ways to overcome this problem. The first is to initially do principle component analysis before SDA, and the second is to employ generalized singular value decomposition (GSVD) to directly obtain the solution. Further, we present the kernel extensions (KPCA-SDA and KSDA-GSVD) for multimodal biometric feature extraction. We perform the experiments on two public face databases (i.e., AR face database and FRGC database) and the PolyU palmprint database. In designing the experiments, we firstly do extraction on the AR and palmprint database, secondly on the FRGC and palmprint database. Compared with several representative linear and nonlinear multimodal biometrics recognition methods, the proposed approaches acquire better identification and verification performance. In particular, the proposed KSDA-GSVD approach performs best on all the databases.