^{1}

^{1}

^{2}

^{*}

^{1}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

In the face recognition field, principal component analysis is essential to the reduction of the image dimension. In spite of frequent use of this analysis, it is commonly believed that the basis faces with large eigenvalues are chosen as the best subset in the nearest neighbor classifiers. We propose an alternative that can predict the classification error during the training steps and find the useful basis faces for the similarity metrics of the classical pattern algorithms. In addition, we also show the need for the eye-aligned dataset to have the pure face. The experiments using face images verify that our method reduces the negative effect on the misaligned face images and decreases the weights of the useful basis faces in order to improve the classification accuracy.

Pattern recognition algorithms usually use the Lambertian surface, which is covered with the brightness, as the training set. However, this surface cannot be directly applied to the algorithm's classifier since its dimension is too large. Thus, the algorithms generally need compressed images or the use of the Dimension Reduction (DR) stage via image representations such as the Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA). These techniques can reduce the dimension by leaving only the linearly independent column spaces. This can be shown in the following several methods.

First, Turk's Eigenface uses the face space via the eigenvectors like PCA [

Second, the reduction is also needed when classical LDA is applied, since this LDA frequently causes the singular problem of the total scatter matrix. Therefore, Belhumeur proposed the subspace–based LDA, including the DR stage, for this matrix to be the full rank. This can be possible by eliminating the null spaces of the within–class scatter matrix via the use of the DR stage before the classical LDA is applied [

Thus, Yu proposed the direct LDA, which eliminates the null spaces of the between–class scatter matrix instead of those of the within–class scatter matrix [^{2} dimensions) into one dimension [

However, even now the classical PCA is used for obtaining a subspace and reducing a data noise because of its simple calculation and high performance. For example, Wang proposed a unified subspace using a 3-dimensional parameter space: the PCA, Bayes, and LDA [

First, this can be thought of as a sub–sampling technique like partitioning, since the best basis features are partitions of the prototype. However, this partitioning method divides the human face into nine equal partitions, and then each partition has a different weight based on psychology and neuropathic psychology. As a result, the weights of the eyes, nose and mouth should be strengthened, while the weights of the cheeks and forehead should be weakened. The recognition rate is increased by giving different weights to the different parts of the face image [

Second, our method is also similar to the approach of AdaBoost, the famous face detection algorithm, since the feature weights are basically calculated and applied [

Third, the proposed method is somewhat similar to the WkNN algorithm, since the algorithm inserts the weight to the Nearest Neighbor (NN) classifier. However, these weights,

This paper is organized as follows: Section 2 describes the state of the art in the field of classical pattern algorithms. Section 3 describes the efficiency and advantages of the best basis selection method using the learning metrics for face recognition. Section 4 presents the technical implementation of the proposed method and validates its efficiency through various experiments. We show the high accuracy of the proposed method by reducing the negative effect of the outliers. Finally, Section 5 provides our research conclusions, weighing the benefits and drawbacks of the proposed method.

In pattern recognition, the image dataset has a dramatic effect on the classification result. Thus, the famous images were obtained in a limited environment with stationary light and backgrounds. Additionally, these images were measured in a limited region, which includes both the front and side of the face, shoulder, and hair. These limitations are because the researchers believe that the pattern recognition algorithm can find only the important features of a pure face, such as the eyes and nose. However, in 1998 Chen raised questions about this belief and changed the measured region and environment for his experiments [

To meet this demand, Yale University published a set of cropped pure face images, CroppedYaleB, and these images were also measured in variant lights and expressions for use in various pattern classification experiments [

However, we easily find recent studies using these outline facial images instead of pure faces [

The face recognition algorithm commonly uses the high resolution images for the training steps. This is because the algorithm needs clearly discriminant information for each person. Therefore, it is unfortunately inevitable that the small sample size problem occurs since the image resolution is larger than the number of the training images [_{1}, … , _{N}

However, the researchers do not exactly know the relationship between the eigenvalues and eigenvectors in the similarity metrics. That is, it is absolutely normal that the basis faces that have the larger eigenvalues are chosen as the best feature subset [

These false impressions unfortunately continue until now. For example, Huang proposed in 2012 an improved principal component regression classification algorithm [

As mentioned in the previous section, the classical face recognition algorithms (PCA, LDA) absolutely depend on the basis faces. Thus, this paper proposes similarity metrics for obtaining the best basis subset by adjusting the weights of the basis faces and then selecting only the useful basis faces. For this reason, the proposed method is included in variable selection or feature selection [

The basis faces are not only the features but also results of the eigenface algorithm. That is, the projected face images consist of these basis faces. Thus, it is proper to insert the weights into the projected face images such as _{i}_{ij}_{i}_{j}_{cj}

To learn the weights, ^{=} and ^{≠} are the NNs that belong to the same class and different class as ^{=} be the ^{=} can be indicated as ^{=}). In the same way, let ^{≠} be the ^{≠}) can be indicated as the distance between ^{≠}.

The second technique is to define the classification error as follows. As the definition of the 0–1 loss function, the classification error (1 loss) occurs when the argument of the step function is larger than 0. On the other hand, when its argument is smaller than 1, the classification result is regarded as a 0 loss. That is, the step function in ^{=}) – ^{≠}). For example, ^{=}) > ^{≠}) means that the classification result is incorrect. Thus, the learning metrics try to reduce the weight of ^{=} in order to decrease the distance between ^{=}.

The third technique is to be independent of ^{=}) + ^{≠}) is necessary. This helps the output of the step function to keep a certain boundary even if _{β}^{−}^{βz}

To minimize _{ij}

Then, by separating the two terms,

This reformulation is needed for the easy analysis by separating into the two term (^{=}, ^{≠}). As shown in _{ij}_{ij}_{ij}

As shown in ^{=} reduces its weight, but the second term related to ^{≠} increases its weight. These terms directly affect the distances between ^{=}, ^{≠}) per prototype. These learned weights can be also used to select the best basis subset. This selection criterion is as follows:

Thus, _{mj}_{ij}_{j}_{j}_{mj}^{=})/_{ij}^{≠}), can indicate _{j}_{j}^{=})/_{j}_{j}^{≠}). We call this method the Relationship of two _{j}_{j}^{=}) > _{j}_{j}^{≠}). This means that the

Our method was implemented by dividing it into three parts: the acquisition from the image dataset, the conversion into the images projected by the Eigenface, and the application of the learning weights using the gradient descent algorithm.

First, the acquisition is implemented as follows: detailed dataset specifications for all the experiments are shown in

Second, the conversion part is implemented in order to obtain the images projected by Eigenface. Thus, we just applied these datasets to the Eigenface algorithm. By using _{i}_{p}_{t}

Third, the gradient descent algorithm is implemented in order to learn the weights for both the basis faces and prototypes. First, 1 image set among the 10 sets from the 10-fold cross validation is loaded. Next, we needed to know the differences among ^{=} and ^{=}) in order to correctly implement this part in ^{=} and ^{≠} for easy implementation.

The following experimental parameters were commonly used for learning the gradient descent algorithm in our experiments. We set these tuning parameters via initial experiments. An updating amount,

This dataset included not only the facial outlines, but also aligned the faces by the eyes. In addition, this had to be changed into gray images, since it contained red, green and blue images. Then, when using the 10-fold cross validation, we needed to consider whether or not the number of images in the dataset is enough for the validation. If not enough, one of the 10 sets is vacant. This experiment tested for the effect of the basis faces on the classification error. We called Paredes' distance as the weight–based distance in our experiments.

The variations with regard to the weights when we used various basis faces (Or1, Or2, RR1, and RR2) are shown in ^{=}) and ^{=})/^{≠}) − 1 when the number of basis faces used is

^{=}) > ^{≠}). On the contrary, ^{=} is decreased and the added basis faces have a positive effect on it. Additionally, this indicates that the 8th test image needs to apply more basis faces or its weight needs to be updated more. In other words, this shows different results than the other distances since the two other distances applied the eigenfaces with large eigenvalues (Or1, Or2). These differences directly affected the classification error.

The second experiment was based on the CroppedYaleB dataset, which was introduced in Section 2. We also added the additional misaligned face images in

These results occur because of the following reasons. First, the sigmoid function of the proposed method in

The third experiment tested to know differences between the Paredes method and proposed method in

The fourth experiment shows the classification results using the various datasets, as shown in

The last experiment is the classification results which are compared with two classification algorithms and two principal component selection algorithms, as shown in

The proposed method is based on Paredes' NN learning algorithm. The main difference with Paredes' algorithm is the application field. That is, Paredes tried to improve the NN classification, but we applied it to find the best subset for the face recognition. This can be confirmed by _{i}

Various variables were used in this proposed method. It is particularly important that we need to know the exact meanings of ^{=} and ^{=}), since these variables are closely related to each other. Both ^{=} are the prototypes. ^{=} indicates the ^{=}) is the index of ^{=}. This is also needed in order to know which weights are updated. As mentioned in Section 3, this method does not update all of the weights of the prototypes, but those of the NNs, which are chosen by the similarity metrics in

Updating the weights has to be repeated until the method reaches the minimum of the cost function in

In the DR stage, we have to decide how our method can reduce an image's dimension. In other words, we decide the maximum number of the best subset,

In this study, we have proposed a best basis selecting method for NN classifiers in the classical recognition algorithms. The primary contribution of the proposed method was to help face recognition algorithms to find correct faces by using only small numbers of basis faces. This improvement was possible because the proposed method provides a simple scheme for learning weights via the cost function related to the classification error and choosing the best subset among the basis faces via these weights. This is validated by our experimental results. That is, they reveal that basis faces with large eigenvalues do not include the high discriminant information in face recognition. This means that important basis faces are not related to the illumination or face outline. In addition, these results also show that face recognition algorithms need to learn proper training images, which are aligned with eyes and are related to the pure face. This is necessary because our method predicts the classification error using these training images via the proposed cost function, and checks whether prototypes are outliers via the sigmoid function of the cost function. However, this is not enough to detect all of the outliers among the training set. In further research efforts, it would be desirable to apply other outlier detecting algorithms to our method.

This research is the result of a study on the “Leaders INdustry-university Cooperation” Project, supported by the Ministry of Education(MOE). I would like to thank to Haeun Kong for her lovely supports and a beautiful prayer. Additionally, I would like to thank Jeong-ik Seo (A&T) in Republic of Korea for his technical advices.

The authors declare no conflict of interest

^{2}PCA

Face images for Chen's experiments: (

Eigenfaces in descending order of eigenvalue size: (

Comparison of eigenfaces chosen by

Variation of weights using Euclidean, weighted–based and RR-based distance (^{=}); (^{=})/ ^{≠}) − 1.

Variation of weights using Euclidean, weighted-based and RR-based distance (^{=}); (^{=}) / ^{≠}) − 1.

Face images of CroppedYaleB: (

Prototypes projected by two basis faces (

Classification results of Euclidean distances, weight-based distance and RR-based distance (

Cost function according to the learning step (

Dataset specifications for our experiments.

Yale Face | 15 | 11 | 165 | 149 | 16 | 77,760 × 11 |

CroppedYaleB | 17 | 18 | 306 | 276 | 30 | 32,256 × 18 |

CroppedYaleB + Misaligned | 17 | 28 | 476 | 429 | 47 | 32,256 × 28 |

Pain Crop + HSF images | 12 | 14 | 168 | 152 | 16 | 43,621 × 14 |

Pain Inner + HSF images | 12 | 14 | 168 | 152 | 16 | 21,463 × 14 |

ORL face database + HSF images | 40 | 20 | 800 | 720 | 80 | 10,304 × 20 |

FERET + HSF images | 100 | 10 | 1000 | 667 | 333 | 24,576 × 10 |

Comparison of Paredes method and proposed RR-based method (

| ||||||
---|---|---|---|---|---|---|

CroppedYaleB + Misaligned | 1 | 10 | 0.6596 | 0.1277 | 0.1064 | |

20 | 0.4681 | 0.0426 | 0.0426 | |||

30 | 0.3191 | 0.3191 | 0.0213 | 0.0213 | ||

40 | 0.2340 | 0.0213 | ||||

| ||||||

2 | 10 | 0.6042 | 0.1250 | |||

20 | 0.4167 | 0.0208 | ||||

30 | 0.2708 | 0.0000 | 0.0000 | |||

40 | 0.1875 | 0.0000 | 0.0000 | |||

| ||||||

Yale Face | 4 | 10 | 0.3529 | 0.3529 | 0.2941 | |

30 | 0.3529 | 0.2941 | 0.2941 | |||

40 | 0.2941 | 0.2353 | 0.2353 | |||

| ||||||

8 | 10 | 0.1875 | 0.1875 | 0.1875 | ||

20 | 0.1875 | 0.1250 | ||||

30 | 0.1250 | 0.1250 | ||||

40 | 0.0625 | 0.0625 | 0.0625 | 0.0625 |

Classification results using various distance metrics (

| ||||||||
---|---|---|---|---|---|---|---|---|

Yale Face | Subset 1 | 0.2500 | 0.2500 | 0.1250 | 0.1250 | 0.1250 | ||

Subset 2 | 0.2941 | 0.2941 | 0.2941 | |||||

Subset 3 | 0.2353 | 0.2353 | 0.2353 | 0.2941 | 0.2941 | |||

CroppedYaleB | Subset 1 | 0.7234 | 0.7021 | 0.3830 | 0.3404 | |||

Subset 2 | 0.6458 | 0.5833 | 0.2917 | 0.2292 | ||||

Subset 3 | 0.5417 | 0.5625 | 0.2708 | 0.2292 | ||||

CroppedYaleB + Misaligned | Subset 1 | 0.6596 | 0.6596 | 0.3830 | 0.3191 | |||

Subset 2 | 0.5833 | 0.5208 | 0.2500 | 0.1458 | ||||

Subset 3 | 0.6042 | 0.6250 | 0.3542 | 0.3333 | ||||

Pain Crop + HSF images | Subset 1 | 0.1875 | 0.1875 | 0.1250 | 0.1250 | |||

Subset 2 | 0.1765 | 0.2353 | 0.0000 | 0.0000 | 0.0000 | |||

Subset 3 | 0.1176 | 0.1176 | 0.0588 | 0.0588 | ||||

Pain Inner + HSF images | Subset 1 | 0.3750 | 0.3125 | 0.1250 | 0.0625 | |||

Subset 2 | 0.1765 | 0.1765 | 0.0000 | 0.0000 | 0.0000 | |||

Subset 3 | 0.2353 | 0.1176 | 0.1176 | 0.0000 | ||||

ORL face database + HSF images | Subset 1 | 0.0500 | 0.0500 | 0.0125 | 0.0125 | |||

Subset 2 | 0.0125 | 0.0125 | 0.0000 | 0.0000 | 0.0000 | |||

Subset 3 | 0.0250 | 0.0250 | 0.0375 | 0.0375 | ||||

FERET database+ HSF images | Subset 1 | 0.3874 | 0.3874 | 0.3874 | 0.3724 | |||

Subset 2 | 0.3743 | 0.3743 | 0.4102 | 0.4042 | ||||

Subset 3 | 0.3964 | 0.3964 | 0.4264 | 0.4144 |

Accuracy comparison of Information Gain, Sequential Forward Selection, RR

Yale Face | Subset 1 | 0.2500 | 0.1875 | × | 0.1250 | 0.1250 |

Subset 3 | 0.1765 | 0.3529 | × | 0.2352 | ||

Subset 10 | 0.1176 | 0.1176 | × | 0.1176 | ||

CroppedYaleB + Misaligned | Subset 1 | 0.2128 | 0.2979 | 0.0425 | 0.0851 | |

Subset 4 | 0.1458 | 0.1042 | 0.0833 | 0.0625 | 0.0625 | |

Subset 5 | 0.1042 | 0.2292 | 0.0625 | 0.0625 | ||

Pain Crop + HSF images | Subset 1 | 0.2500 | 0.0000 | 0.0000 | 0.1250 | 0.0000 |

Subset 2 | 0.1176 | 0.0588 | 0.0000 | 0.0000 | 0.0000 | |

Subset 7 | 0.0588 | 0.0588 | 0.0588 | 0.0588 | ||

Pain Inner + HSF images | Subset 4 | 0.0000 | 0.0000 | 0.1176 | 0.1176 | 0.0000 |

Subset 5 | 0.1176 | 0.1176 | 0.1176 | 0.1764 | 0.1176 | |

Subset 10 | 0.0000 | 0.0000 | 0.1250 | 0.1875 | 0.0000 | |

ORL face database + HSF images | Subset 2 | 0.0000 | 0.0000 | 0.0125 | 0.0125 | 0.0000 |

Subset 6 | 0.0125 | 0.0375 | 0.0250 | 0.0125 | ||

Subset 9 | 0.0250 | 0.0250 | 0.0125 | 0.0125 | ||

FERET database + HSF images | Subset 1 | 0.2312 | × | 0.4264 | 0.4234 | |

Subset 2 | 0.1138 | × | 0.3682 | 0.3562 | 0.1138 | |

Subset 3 | 0.2132 | × | 0.4084 | 0.4024 |