1. Introduction
Face recognition is a branch of visual pattern recognition. Humans usually understand the world through visual pattern recognition, because humans receive visual information through the naked eye. However, both photos and videos are recognized by computers as matrices, and the elements of the matrices are individual pixels. In contrast to human perception, computers process images by analyzing these matrices, which is a fundamental difference in how recognition tasks are approached. The challenge in face recognition lies in the computer’s ability to identify and distinguish between different faces, despite variations in lighting, poses and expressions. This is achieved through the use of algorithms that can extract features from the pixel matrices and compare them to a database of known faces. The development of such algorithms has been a significant area of research in the field of computer vision and machine learning.
Facial recognition technology achieves the identification of an individual’s identity by comparing the digital image of a human face with the known face database. In fact, face recognition technology has experienced many stages in its development, such as the initial template-matching-based approach, the machine learning-based approach [
1] and the deep learning approach [
2] nowadays. Early face recognition methods primarily relied on template-based matching to identify individuals. This approach achieved face recognition by comparing the similarity between a given face image and a predefined template. However, this method is sensitive to the interference of illumination, expressions and some other factors, and it is difficult to achieve accurate face recognition [
1]. With the development of machine learning technology, face recognition algorithms based on machine learning have received much attention, such as the Eigenface [
3] and the Fisherface [
4]. These algorithms extract the features of face images and use machine learning algorithms for classification and recognition [
4]. Compared to template-matching methods, machine learning-based face recognition algorithms seem to achieve higher accuracy and be more robust. Furthermore, deep learning technology can extract and classify high-dimensional features of face images by training deep neural network models [
5], such as FaceNet [
6], VGGFace [
7], etc.
However, with the progress and development of technology, the sparse sampling method has been widely used in the field of face recognition. In this method, it is assumed that the signal can be represented as a linear combination of a sparse coefficient vector and an atomic dictionary, where the atomic dictionary is a set of sample points that can represent various parts of the signal. Therefore, the aim of this method is to extract the sparse coefficient vector based on as few sampling data as possible. After that, the signal can be reconstructed by the sparse coefficient vector and the atomic dictionary. Therefore, for a signal
, it can be expressed as
where
denotes the measurement noise,
A is the atomic dictionary and
x is the sparse coefficient vector.
In 2009, Wright et al. proposed the theory of compressed perception for face recognition [
8]. The key point of the sparse sampling problem is how to reconstruct a sparse signal. Given the sparse nature of the signal, the reconstruction process involves identifying the non-zero elements of the sparse coefficient vector
x. This is typically achieved through optimization techniques that aim to find the sparsest solution that satisfies the equation
. Therefore, the reconstruction algorithm must effectively exploit the structure of the atomic dictionary
A and the sparsity of the signal to recover the original signal from the noisy measurements. Sparsity means that most of the elements in the signal vector
x are assumed to be zero, which leads to the following optimization problem:
where
denotes the
-norm, counting the non-zero entries of
x, and
denotes the level of sampling noise. However, the above optimization problems is an NP-hard problem. To reconstruct the sparse signal, one alternative widely used approach is to replace
-norm with
-norm. Therefore, this optimization problem (
2) can be rewritten as follows:
where
denoting the
-norm as a concave approximation of the
-norm. This approximation allows for the relaxation of the original NP-hard problem into a tractable convex optimization problem, which can be efficiently solved using various algorithms such as linear programming or iterative thresholding methods. The
-norm minimization has been shown to be effective in recovering sparse signals in the presence of noise, which is particularly useful in signal processing and compressed sensing applications. Finally, it is obvious that we only need to solve the following unconstrained minimization problem:
Various algorithms have been proposed in the literature to deal with this well-known LASSO problem, such as the alternating direction method of multipliers (ADMM) [
9], Bregman methods [
10], the Frank–Wolfe algorithm [
11] and the iterative thresholding methods [
12]. When the measurement matrix
A satisfies the restricted isometry property (RIP) with a sufficient high order, the sparse solution can be obtained through these methods. Therefore, the
-norm regularization method is particularly useful in scenarios where the signal of interest is sparse. In the meantime, this method has also been widely used in some other situations, including signal processing, image reconstruction and machine learning, where the goal is to extract meaningful information from noisy and incomplete data [
13].
However, most of the methods mentioned above cannot achieve efficient recognition. On the one hand, this failure is due to the similarity between faces and the variability in the faces. Images of faces from different individuals have similar biometric characteristics. This similarity makes it difficult to identify a face image. Also, for different postures, illuminations, angles and expressions, the face images are unstable, which would lead to variability. On the other hand, the face images may be partly blocked or damaged, which is not conducive to face recognition. Similarly, there may exist noises in face images, which can also cause difficulties in face recognition. Therefore, it is necessary to study robust face recognition methods [
14].
In recent years, many robust face recognition methods are proposed in the literature. Zhang et al. (2011) explained the possibility of sparse representation in the case of sufficient samples and verified the advantages of sparse representation for face recognition. They introduced a collaborative representation classifier (CRC) based on
-norm constraints and further proposed its robust version (RCRC) [
15]. Zhou et al. (2015) considered a classification method for face recognition based on
regularization, which balanced the sparse representation classifier (SRC) and the CRC through an iterative Tikhonov regularization (ITR) [
16]. Zhang et al. (2017) solved the robust face recognition problem via the iterative re-constrained group sparse classifier (IRGSC) with adaptive weights learning [
17]. Lei et al. (2020) proposed weighted Huber constrained sparse coding (WHCSC) and established a robust weighted regression model with sparse constraints for face recognition [
14]. Zhang et al. (2022) proposed Enhanced Group Sparse Regularized Nonconvex Regression (EGSNR). This algorithm mitigates the bias problem and reduces the adverse impact of outliers by introducing a nonconvex function to replace the traditional
-norm [
18]. Liu et al. (2025) proposed Graph Regularized Discriminative Nonnegative Matrix Factorization (GR-DNMF) which is an advanced variant of NMF that incorporates both graph regularization and discriminative constraints to improve feature extraction and classification performance [
19].
In this paper, a new robust face recognition method based on weighted wing loss constrained sparse coding and its weighted version are proposed. For these models, the wing loss function can adaptively balances robustness and discriminability. The adaptive weighting mechanism demonstrates effective outlier suppression, significantly reducing their impact on model performance. Performances of the proposed method are examined by various experiments, and it is demonstrated to outperform some other robust face recognition methods in the literature, especially when face images are partly blocked or damaged.
2. The Recognition Method Based on Sparse Robust Coding
In the face recognition problem, each gray image of a face can be represented as a gray matrix. For ease of representation, we can stack the gray matrix as a column vector. It is assumed that there are
k classes of face images and
k is known. Let
, which denotes training samples of class-
i, where
is the number of training samples of class-
i and each column of
represents a face image from class-
i. Therefore, if given enough training samples from class-
i, any new test sample from the same class can be represented linearly by the columns of the training samples
, i.e.,
where
are the corresponding coefficients.
Furthermore, let
, where
is the total number of training samples, and assume
k is very large; then, for a given new test sample
, it can be linearly represented as follows:
where the vector
must be sparse since the test image only belongs to a certain class. To obtain the sparse vector, we need to consider the following sparse optimization problem:
where
is the penalty coefficient for the
-norm. The essence of Formula (
7) is the sparse constraint when the residuals of least squares estimation obey the Gaussian distribution. However, when residuals obey the Laplacian distribution, the sparse coding problem is as follows:
Sparse sampling is a technique which can capture high-level correlated structures in images and represent signals with as few atoms as possible in a given over-complete dictionary. In a word, sparse sampling attempts to construct or approximately represent complex signals or images using the minimum number of basic elements (atoms in the dictionary), while retaining important features and structural information in the image. However, the problem is whether the fidelity term (
or
) is sufficiently effective to describe the fidelity of the signal, especially when the signal has noise or abnormal values [
14]. Here, fidelity generally refers to the degree of similarity between the original signal and the signal which has been processed or encoded. For Formulas (
7) and (
8), the
norm (Euclidean distance) or
norm (Manhattan distance) are employed to define fidelity. This definition is based on the Maximum A Posteriori Probability (MAP) and assumes that the residuals after encoding (i.e., the difference between the original signal and the encoded signal) follow a Gaussian distribution (normal distribution) or a Laplace distribution. However, in real practice, the distribution of residuals is unknown, and it may not be a good choice to follow a certain distribution of a single hypothetical residual, especially when occlusion, camouflage or corruption occurs in facial images. Therefore, a fidelity item that uses a single norm in a sparse coding model may not be robust in these cases [
14].
To solve this problem, we use the wing loss to replace the
-norm loss or the
-norm loss. This type of loss was proposed by Feng et al. [
20], which can be defined as follows:
where the non-negative parameter
limits the value range of the nonlinear part to (-
,
), the parameter
limits the curvature of the nonlinear region and
is a constant that smoothly connects the linear and nonlinear parts defined in the piecewise function [
20]. Building upon the theoretical framework established by Boyd et al. [
21] and leveraging the alternating direction method of multipliers (ADMM) algorithm, we propose a novel wing-constrained sparse coding (WCSC) model defined as follows:
where
Under specific circumstances, as the value of
increases, the sparsity of
x becomes more pronounced.
In real life, the sample data may contain outliers. To further diminish the impact of noise or outliers in the training sample, an effective weighted approach is to assign lower weights to the outliers. In robust sparse representation-based classifier (RSRC), the corresponding minimization problem could be converted into an iteratively reweighted sparse coding problem [
22].
Based on the weight vector in RSRC and the wing-constrained sparse coding model (WCSC) mentioned above, we also consider a weighted wing-constrained sparse coding model (WWCSC) in this paper. The proposed WCSC model, combined with its adaptive weight vector, demonstrates remarkable capability in mitigating the adverse effects of noise and outliers. Subsequently, the -norm minimization can be addressed by the ADMM algorithm. Numerous data experiments conducted in some open face databases demonstrate that the WWCSC model exhibits an excellent classification effect, particularly when confronted with complex facial images such as occlusion, corrosion and so on.
The WWCSC model can be expressed as
where
is the weight vector. In particular, the weight of the
i-th sample
is defined by the following sigmoid function:
where
is the residual and
is the residual threshold. Obviously,
represents the distance between the residual and the threshold. Furthermore, the parameter
q influences the penalty rate of the weight. Thus, the sigmoid function constrains the weight values within the range of
. Also, when the residual is greater than
, the weight is less than 0.5, while, if the residual is less than
, the weight is greater than 0.5. Let
, and then rearrange
to
in a descending order ranging from the smallest to the largest. In addition, if we assume
, where
and
represents the largest integer less than
, then the parameter
can be expressed as
following the article [
22]. For the sake of facilitating the calculation, Formula (
13) can be rewritten as
where the parameter
.
For the optimization problem (
12), we can use the ADMM algorithm to solve it. The Lagrange function corresponding to the above optimization problem is
where
and
are Lagrange multipliers and
denotes the inner product.
In the ADMM algorithm, while one variable is updated, the others are assumed to be fixed as constants, which gradually approximates the optimal solution by minimizing the Lagrange function. This method has been widely applied in statistical learning and machine learning due to its effective handling of convex optimization problems with equality constraints, fast processing speed and good convergence. Furthermore, by adding a quadratic penalty term to the original Lagrange function, the improved augmented Lagrange function is defined as
where
. In fact, the augmented Lagrange function could also be rewritten as
where
,
. By the well-known ADMM algorithm, the above augmented Lagrange function could be iterated as follows:
where
. Subsequently, the aforementioned sub-optimization problems can be addressed individually:
where the S operator is defined as
Within the theoretical framework of the alternating direction method of multipliers (ADMM), the algorithm is guaranteed to achieve global convergence for convex optimization problems (Boyd et al., 2011) [
21]. This fundamental property ensures that the iterative sequence will inevitably converge to the optimal solution. Given ADMM’s proven convergence, we establish a termination criterion where the algorithm exits upon satisfying
, guaranteeing solution feasibility. Leveraging ADMM’s theoretical convergence guarantees, we implement threshold-based termination (
) when above residual meets the precision requirements.
For a given test sample , which is assumed to belong to one category in the training set, the sparse representation can be calculated by Algorithm 1. Ideally, the non-zero terms of the estimator are associated only with a certain category in the training set. In such cases, it is quite easy to determine the category to which the test sample belongs. However, in practical applications, noise or modeling errors may result in numerous non-zero terms in the obtained estimator, and these non-zero terms are associated with multiple categories in the training set. For such situations, numerous possible classifiers are designed to address this issue. For example, we can pick out the largest one among the estimator and attribute the test sample y to the category associated with it. However, the aforementioned method does not take into account the utilization of the subspace structure associated with the images in face recognition. As can be seen from the previous description, there exists a linear structure among the images in the model. To better utilize this linear structure, the test samples are reconstructed using the training samples of each category at first, and then classification is carried out based on the differences between the reconstructed samples and the test samples.
For each class
i, suppose
is the characteristic function that selects the coefficients related to the
i-th class. Then, for given
, non-zero elements of
are just the items in
x associated with category
i. Based on the characteristic functions, we can reconstruct the test sample by using the coefficients associated with all the training samples of the
i-th category. Therefore, the given test sample
y can be estimated as
. Then, the recognition of
y can be realized based on these approximations by assigning it to the object class that minimizes the residual between
y and
:
The computational costs of most classification algorithms are associated with the dimensions of the input samples. In numerous scenarios of practical applications, the dimensions of the data might be extremely high, particularly in issues related to image classification. Thus, it is quite meaningful to reduce the dimensions of the data. Various dimensional reduction methods have been proposed in the literature, such as principal component analysis (PCA), linear discriminant analysis (LDA), marginal Fisher analysis (MFA), maximum margin criterion (MMC) [
23], locality preserving projections (LPP) [
24], sparsity preserving projection (SSP) [
25], semi-supervised dimensionality reduction (SSDR), semi-supervised discriminant analysis (SDA) [
26] and random projection (RP) [
27,
28]. In particular, random projection is a commonly used technique in data mining and machine learning. It reduces the dimensionality of high-dimensional data by mapping it to a lower dimensional space while preserving the structure of the original data as much as possible. This method is particularly useful when dealing with large-scale datasets, as it can significantly reduce computational and storage requirements. Wright et al. [
8] examined the applications of RP in the dimension reduction of face images. In the subsequent experiments, we shall also employ random projection technology to conduct dimensionality reduction of the data and realize face recognition.
Algorithm 1: Weighted wing-constrained sparse coding model |
- Input:
The atomic dictionary A, test sample y - Output:
The estimate - 1:
Given the atomic dictionary A and test sample y, select the appropriate parameter - 2:
Initialize - 3:
Calculate the weight of the initialization - 4:
for all do - 5:
Update x based on Equation ( 23) - 6:
Update z based on Equation ( 24) - 7:
Update based on Equation ( 25) - 8:
Update based on Equation ( 26) - 9:
Update based on Equation ( 27) - 10:
Update the weight w based on Equation - 11:
end for - 12:
Criteria for exiting a loop: If , then exit loop - 13:
return The estimator
|
4. Experimental Results
In this section, we will conduct experiments based on some public face recognition datasets. These experiments can not only demonstrate the effectiveness of the proposed classification algorithm but also verify the claims made in the previous chapters. Secondly, the robustness of the proposed algorithm against distortion and occlusion shall also be discussed. Particularly, we consider the following four face datasets: the ORL face dataset [
30], the Yale face dataset [
31], the AR face dataset [
32] and the FERET face dataset [
33]. In these experiments, the ORL face dataset contains a total of 400 images of 40 different individuals. The Yale face dataset was created by the Yale University’s Center for Computational Vision and Control. The dataset comprises 165 images from 15 volunteers, exhibiting variations in lighting conditions, facial expressions and body poses. The AR dataset comprises over 4000 frontal images of 126 individuals, with each individual contributing 26 photographs. For this experiment, 10 photos of 40 individuals from this database are selected for recognition. The FERET face dataset contains a total of 1400 face images from 200 individuals, with each person having 7 images. The face images in the dataset include variations in expressions, lighting and poses. In addition, all the images from the FERET dataset are stored in the TIFF format as grayscale images, with a width of 80 and a height of 80.
The WCSC and WWCSC models incorporate several tunable parameters (e.g., regularization coefficients , wing function parameters and ). To ensure optimal model performance, we adopted a systematic cross-validation approach for parameter selection. Specifically, we implemented 10-fold stratified cross-validation to comprehensively evaluate parameter combinations, selecting the configuration that simultaneously maximized recognition accuracy (measured by F1-score) and maintained robust generalization performance across validation sets. This rigorous selection process helps mitigate overfitting while preserving model stability under varying occlusion conditions. The parameter optimization process employed k-fold cross-validation (k = 10) with early stopping, where we evaluated the regularization parameters at 0.1 intervals and wing parameters through grid search. The selected configuration minimized the reconstruction error while satisfying convergence criteria.
Firstly, for the case in the absence of damage and occlusion of the face images, the corresponding recognition rates of various methods are listed in
Table 1, where the proposed WWCSC is compared with some existing competitors, such as the SRC, RSRC, Sparse Huber (SH), IRGSC and WHCSC.
It can be inferred from
Table 1 that the WWCSC method outperforms the WCSC for the four datasets, which implies that adding weights to the loss function can enhance the recognition rate of the model in face recognition. Compared with the SRC, WHCSC and SH methods, the WWCSC method demonstrates superior performance across all four aforementioned datasets, particularly on the FERET dataset. As can be seen from
Table 1, on the above four datasets, the performance of IRGSC is the best compared with other methods. This result is attributed to the superiority of the IRGSC method itself. However, the performance of the WWCSC method we proposed is only slightly inferior to that of IRGSC. This result encourages us to explore the robustness of the WWCSC method. Therefore, in the following experiments, we mainly consider the case when there exists loss or occlusion in the face images. In particular, the robustness of the WWCSC is investigated under various types of occlusions, such as Gaussian noise random pixel corruption and random block occlusion.
Secondly, our main research focuses on the robustness of the WWCSC method when face images have different degrees of loss. In this experiment, we specifically conduct our analysis using the AR dataset. For the AR dataset, we artificially damage the face images under varying degrees following Wright, J. [
8], where different percentages of randomly chosen pixels from each of the test images are replaced with simulated values from a uniform distribution. For example, the effects of a specific face image with various pixel noises are shown in
Figure 1.
In the following, in order to validate the robustness of the WWCSC method, we imposed damage ranging from 10% to 90% on the test images. The recognition rates of face recognition for different methods with different degrees of damages are presented in
Table 2.
From
Table 2, it can be seen that, as the degree of image damage increases, the face recognition rates of most models show a downward trend. Among them, the performance of the WWCSC is relatively stable. When the loss of face images is 10% or 20%, the recognition rate of the WWCSC method is marginally lower than those of the IRGSC and RSRC methods. However, when face images suffer from a more than 30% degree of damage, the performance of the WWCSC method surpasses most of the other methods. In particular, in the case of damage to face images reaching 90%, the recognition rate of the WWCSC method is 28.55%, 23.71%, 33.85%, 21.67%, 27.45% and 18.90% higher than SRC, RSRC, SH, IRGSC, WHCSC and WCSC. Furthermore, comparative experiments demonstrate that the proposed WWCSC exhibits significantly greater robustness in processing occluded and corrupted images compared to existing state-of-the-art methods.
Thirdly, we consider the case when the test images have imposed white or black block occlusions of varying sizes, as shown in
Figure 2 and
Figure 3.
Also, the face recognition rates of different methods under different degrees of damage are listed in
Table 3.
It can be observed from
Table 3 that, as the occluded part of the image increases, the face recognition rate of all models shows a gradually decreasing trend. This result indicates that, as the proportion of face images that are occluded increases, the difficulty in face recognition also increases. When images exhibit 10% to 20% block occlusion, the IRGSC model demonstrates the optimal performance among all evaluated models, with the WWCSC model ranking second in effectiveness. However, when images are subject to 30% to 70% block occlusion, recognition rates of the WWCSC distinctly exceed those of other alternative methods. In particularly, in the case of block occlusion of face images reaching 70%, the recognition rate of the WWCSC method is 27.71%, 25.56%, 29.69%, 18.33%, 20.08% and 1.66% higher than SRC, RSRC, SH, IRGSC, WHCSC and WCSC. Moreover, this result demonstrates the superiority of the WWCSC method in the case of high block occlusion in face images.