The Georgia Tech dataset consists of 750 color images, 15 images for each of 50 photographers, and is widely used in face recognition. The face images were taken in the Georgia Tech Lab, and most of the images included changes in light, expression, and posture.
Color FERET is a Color version of the gray face dataset FERET, with 11,338 color face images from 994 people. Since the number of images varies from person to person, 200 photographers were selected with seven images each, and the 1400 images were recorded as the Color FERET subset. Seven images of each person were selected with changes in lighting, expression, and posture.
AR is a face dataset composed of 3120 color face images, 26 images per 120 photographers. Face images were taken from the front, so the pose changes are relatively few. In addition to the illumination and expression changes, the influence of occlusion factors is also considered.
LFW-A is a version of LFW face dataset after face alignment processing, which includes 13,233 color images from 5749 photographers. As the images come from the network, it is suitable for face recognition research in natural scenes. The face images in the dataset include a variety of factors such as illumination, age, posture, expression, and occlusion, so the dataset is very challenging. In order to evenly distribute samples, photographers with more than nine images were selected and A subset of LFW-A was constructed.
The images of each person in each database were randomly divided into the training set and test set in accordance with 2:1. Since the distribution of training samples and test samples in the experiment was random, the experiment was repeated for ten times and the average value was taken. In order to test more effectively in the actual experiment, the size of all images was uniformly set as .
Meanwhile, based on the experimental situation in this paper, the SVM classifier chose the LIBLINEAR library based on linear kernel function. The linear kernel is suitable for cases where the number of samples is much smaller than the number of features (no need to map to higher dimensions), or where both the number of samples and the number of features are large (mainly considering the training speed).
3.1. Comparison of Algorithm Performance under Different Occlusion Conditions
Firstly, the QNSPCANet proposed in this paper and the three algorithms, PCANet, QPCANet, and QSPCANet, were compared under different occlusion conditions. Different occlusion conditions mainly include occlusion contained in the dataset, self-added pure color occlusion, and salt-and-pepper noise occlusion. At the same time, Color FERET, AR, and LFW-A datasets also compared the recognition accuracy of other latest occlusion algorithms under the same dataset. The Georgia Tech dataset was only compared with similar structure algorithms due to its few applications.
Since there are no large-area occlusion elements in the face images of Georgia Tech, Color FERET, and LFW-A datasets, the experiments of these three datasets can be divided into three groups: (1) when the first group is without occlusion, the images of the training set and test set are randomly selected in 2:1; (2) in the second group, a blue block with 20% pixel area was added to the randomly selected face images in the test set; (3) and the third group is the condition of salt-and-pepper noise occlusion. Salt-and-pepper noise blocks are added to randomly selected face images in the test set.
For AR dataset, each person should have 26 face images, among which, eight have no occlusion factor, six have illumination change, six have sunglasses occlusion, and six have scarf occlusion. The experiment can be divided into five groups: (1) in the first group, under the condition of no occlusion, nine images were randomly selected from 14 images with no occlusion containing light changes to form the training set, and the remaining five images were formed into the test set; (2) in the second group, 14 face images without occlusion were included in the training set, and six face images with sunglasses occlusion were included in the test set; (3) the third group was scarf occlusion, 14 images without occlusion were used as the training set, and six images with scarf were used as the test set; (4) the fourth group was in the condition of self-added pure color occlusion. The training set consisted of nine randomly selected face images without occlusion, and then added blue occlusion blocks (about 20% occlusion area) to the remaining five images, which were used as test samples; (5) and the fifth group was the condition of salt-and-pepper noise occlusion. Nine images were randomly selected from 14 images without occlusion as the training set, and then the remaining five images were added with salt-and-pepper noise blocks as the test set. Color face samples of the AR dataset, self-added solid color occlusion of images, and different areas of salt-and-pepper noise occlusion are shown in
Figure 3.
The convolution order of each principal component analysis network is set as 2, the number of convolution kernels at each layer is , the size of sampling matrix is set as the optimal size of each dataset, the size of histogram window is set as , and the corresponding overlap rate is set as 0.5. For the quaternion sparse optimization problem in the convolution layer, is set, the appropriate sparse parameter for each feature vector is selected, the update rate threshold is set as , and the maximum cycle time is set as 1000. Finally, a trained SVM classifier is used for color face recognition based on the extracted features.
Table 1 shows the correct recognition rate of each algorithm in Georgia Tech dataset under different occlusion conditions.
It can be seen from
Table 1 that the algorithm introducing
non-convex regularization achieves a relatively high recognition rate under different occlusion conditions. When there is no occlusion, the recognition rate of the two algorithms under sparse constraint is close. In the case of 20% pure color occlusion area and salt-and-pepper noise occlusion, the recognition rate of QNSPCANet is the highest. Due to the few occlusion applications in Georgia Tech dataset, this part is only compared with the relevant PCANet method.
The correct recognition rate of each algorithm in Color FERET dataset under different occlusion conditions is shown in
Table 2.
As can be seen from
Table 2, in the Color FERET dataset, QNSPCANet achieves the highest recognition rate in the case of non-noise occlusion. In the case of 20% salt-and-pepper noise, the recognition rate is slightly lower than GMSRC, but the overall recognition performance is still superior.
The correct recognition rate of each algorithm in AR dataset under different shielding conditions is shown in
Table 3. The recognition rate of all algorithms in AR dataset is generally higher. QNSPCANet has the best recognition effect under all occlusion conditions, and the difference of recognition rate is larger when there is occlusion, indicating that
regularization has a good suppression effect on both outliers and occlusion.
The correct recognition rate of each algorithm in LFW-A dataset under different shielding conditions is shown in
Table 4. It can be seen from the table that QNSPCANet, which introduced non-convex regularization, has a higher recognition rate than PCANet and QPCANet under different occlusion conditions. Compared with QSPCANet based on
regularization, the recognition rate is close to that of QSPCANet without occlusion, while the recognition rate is greatly improved with occlusion. At the same time, compared with other existing methods in the table, the algorithm proposed in this paper can also achieve better recognition effect. Although there is still a certain gap between QNSPCANet and MobileFaceNet when there is no occlusion, QNSPCANet has better performance when there is occlusion and noise.
Figure 4 shows the recognition rate curves of each algorithm under different solid color occlusion areas. As can be seen from the figure, with the increase in the occlusion area, the gap between the correct recognition ability of the four recognition algorithms gradually increases and QNSPCANet performs better under different occlusion areas.
Since no specific training time is provided in the literature of other occlusion methods, this paper only compares the training time of the four PCANet algorithms, as shown in
Table 5.
As can be seen from
Table 5, the overall training time of the four PCANet related algorithms is short. The training time of the QNSPCANet method proposed in this paper is increased, mainly because non-convex sparse optimization produces certain calculation consumption, but the recognition accuracy is improved, and the overall recognition performance is still superior.
Based on the above comparative experimental results, it can be shown that in the case of no occlusion, the recognition rate of non-convex sparse convolution check is improved, but the effect is not obvious because the image contains fewer outliers at this time, which has a slight impact on model recognition results. In the case of occlusion, the strong sparse performance of non-convex regularization can effectively reduce the influence of occlusion outliers to improve the identification accuracy of the model.
3.2. Algorithm Sparsity Verification
Secondly, the sparsity of QNSPCANet proposed in this paper is verified. The validation experiment is mainly carried out on AR datasets and compared with
regularization and SCAD regularization [
25]. In order to test the sparsity of
non-convex regularization (
is the most representative of sparsity), this section adopts two sparsity measures: the proportion of non-zero elements in the sparse matrix and Hoyer′s sparsity measure. Hoyer’s sparsity measures the components of small values and is a more refined sparse measurement method than non-zero values, which is defined as:
where
represents the quaternion sparse vector matrix in the
-th convolution layer, and
represents the number of elements of the matrix.
The proportional convergence curve of non-zero elements of the sparse matrix and the convergence curve of Hoyer′s sparsity are shown in
Figure 5.
As can be seen from
Figure 5, the proportion of
regularization non-zero elements is 0.56, while the Hoyer’s sparsity is about 0.70. The proportion of non-zero elements convergent in
regularization is 0.41, and the Hoyer’s sparsity is about 0.77. The proportion of non-zero elements regularized by SCAD is 0.46, and the Hoyer’s sparsity is about 0.75. Compared with
regularization,
regularization decreased by 26.8% and SCAD regularization decreased by 17.9%. Compared with
regularization,
regularization improved Hoyer′s sparsity by 10.0%, while SCAD regularization improved by 7.1%. Therefore,
regularization has stronger sparsity than SCAD and
regularization.