Next Article in Journal
Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification
Previous Article in Journal
Enhancing Medium-Orbit Satellite Orbit Prediction: Application and Experimental Validation of the BiLSTM-TS Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Face Recognition Based on the Wing Loss and the 1 Penalty

1
College of Mathematics and Statistics, Chongqing University, Chongqing 400044, China
2
National Elite Institute of Engineering, Chongqing University, Chongqing 400044, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2025, 14(9), 1736; https://doi.org/10.3390/electronics14091736
Submission received: 4 March 2025 / Revised: 18 April 2025 / Accepted: 22 April 2025 / Published: 24 April 2025

Abstract

:
In recent years, face recognition under occluded or corrupted conditions has emerged as a prominent research topic. The advancement in sparse sampling techniques based on regression analysis has provided a novel solution to this challenge. Currently, numerous regression-based sparse sampling models have been investigated by researchers to address this problem. However, the recognition accuracy of most existing models deteriorates significantly when handling heavily occluded or severely corrupted facial images. To overcome this limitation, this paper proposes a wing-constrained sparse coding (WCSC) model and its weighted variant (weighted wing-constrained sparse coding, WWCSC) for robust face recognition in complex scenarios. The corresponding minimization problems are solved using the alternating direction method of multipliers (ADMM) algorithm. Extensive experiments are conducted on four benchmark face databases: the Olivetti Research Laboratory (ORL) database, the Yale database, the AR database and the Face Recognition Technology (FERET) database, to evaluate the proposed method’s performance. Comparative results demonstrate that the WWCSC model maintains superior recognition rates even under challenging conditions involving significant occlusion or corruption, highlighting its remarkable robustness in face recognition tasks. This study provides both theoretical and empirical validation for the effectiveness of the proposed approach.

1. Introduction

Face recognition is a branch of visual pattern recognition. Humans usually understand the world through visual pattern recognition, because humans receive visual information through the naked eye. However, both photos and videos are recognized by computers as matrices, and the elements of the matrices are individual pixels. In contrast to human perception, computers process images by analyzing these matrices, which is a fundamental difference in how recognition tasks are approached. The challenge in face recognition lies in the computer’s ability to identify and distinguish between different faces, despite variations in lighting, poses and expressions. This is achieved through the use of algorithms that can extract features from the pixel matrices and compare them to a database of known faces. The development of such algorithms has been a significant area of research in the field of computer vision and machine learning.
Facial recognition technology achieves the identification of an individual’s identity by comparing the digital image of a human face with the known face database. In fact, face recognition technology has experienced many stages in its development, such as the initial template-matching-based approach, the machine learning-based approach [1] and the deep learning approach [2] nowadays. Early face recognition methods primarily relied on template-based matching to identify individuals. This approach achieved face recognition by comparing the similarity between a given face image and a predefined template. However, this method is sensitive to the interference of illumination, expressions and some other factors, and it is difficult to achieve accurate face recognition [1]. With the development of machine learning technology, face recognition algorithms based on machine learning have received much attention, such as the Eigenface [3] and the Fisherface [4]. These algorithms extract the features of face images and use machine learning algorithms for classification and recognition [4]. Compared to template-matching methods, machine learning-based face recognition algorithms seem to achieve higher accuracy and be more robust. Furthermore, deep learning technology can extract and classify high-dimensional features of face images by training deep neural network models [5], such as FaceNet [6], VGGFace [7], etc.
However, with the progress and development of technology, the sparse sampling method has been widely used in the field of face recognition. In this method, it is assumed that the signal can be represented as a linear combination of a sparse coefficient vector and an atomic dictionary, where the atomic dictionary is a set of sample points that can represent various parts of the signal. Therefore, the aim of this method is to extract the sparse coefficient vector based on as few sampling data as possible. After that, the signal can be reconstructed by the sparse coefficient vector and the atomic dictionary. Therefore, for a signal y R m , it can be expressed as
y = A x + ε ,
where ε R m denotes the measurement noise, A is the atomic dictionary and x is the sparse coefficient vector.
In 2009, Wright et al. proposed the theory of compressed perception for face recognition [8]. The key point of the sparse sampling problem is how to reconstruct a sparse signal. Given the sparse nature of the signal, the reconstruction process involves identifying the non-zero elements of the sparse coefficient vector x. This is typically achieved through optimization techniques that aim to find the sparsest solution that satisfies the equation y = A x . Therefore, the reconstruction algorithm must effectively exploit the structure of the atomic dictionary A and the sparsity of the signal to recover the original signal from the noisy measurements. Sparsity means that most of the elements in the signal vector x are assumed to be zero, which leads to the following optimization problem:
m i n x 0 s . t . A x y ϵ ,
where x 0 denotes the 0 -norm, counting the non-zero entries of x, and ϵ 0 denotes the level of sampling noise. However, the above optimization problems is an NP-hard problem. To reconstruct the sparse signal, one alternative widely used approach is to replace 0 -norm with 1 -norm. Therefore, this optimization problem (2) can be rewritten as follows:
m i n x 1 s . t . A x y ϵ ,
where x 1 = i = 1 n | x i | denoting the 1 -norm as a concave approximation of the 0 -norm. This approximation allows for the relaxation of the original NP-hard problem into a tractable convex optimization problem, which can be efficiently solved using various algorithms such as linear programming or iterative thresholding methods. The  1 -norm minimization has been shown to be effective in recovering sparse signals in the presence of noise, which is particularly useful in signal processing and compressed sensing applications. Finally, it is obvious that we only need to solve the following unconstrained minimization problem:
m i n { 1 2 A x y 2 2 + λ x 1 } .
Various algorithms have been proposed in the literature to deal with this well-known LASSO problem, such as the alternating direction method of multipliers (ADMM) [9], Bregman methods [10], the Frank–Wolfe algorithm [11] and the iterative thresholding methods [12]. When the measurement matrix A satisfies the restricted isometry property (RIP) with a sufficient high order, the sparse solution can be obtained through these methods. Therefore, the  1 -norm regularization method is particularly useful in scenarios where the signal of interest is sparse. In the meantime, this method has also been widely used in some other situations, including signal processing, image reconstruction and machine learning, where the goal is to extract meaningful information from noisy and incomplete data [13].
However, most of the methods mentioned above cannot achieve efficient recognition. On the one hand, this failure is due to the similarity between faces and the variability in the faces. Images of faces from different individuals have similar biometric characteristics. This similarity makes it difficult to identify a face image. Also, for different postures, illuminations, angles and expressions, the face images are unstable, which would lead to variability. On the other hand, the face images may be partly blocked or damaged, which is not conducive to face recognition. Similarly, there may exist noises in face images, which can also cause difficulties in face recognition. Therefore, it is necessary to study robust face recognition methods [14].
In recent years, many robust face recognition methods are proposed in the literature. Zhang et al. (2011) explained the possibility of sparse representation in the case of sufficient samples and verified the advantages of sparse representation for face recognition. They introduced a collaborative representation classifier (CRC) based on 2 -norm constraints and further proposed its robust version (RCRC) [15]. Zhou et al. (2015) considered a classification method for face recognition based on 1 2 regularization, which balanced the sparse representation classifier (SRC) and the CRC through an iterative Tikhonov regularization (ITR) [16]. Zhang et al. (2017) solved the robust face recognition problem via the iterative re-constrained group sparse classifier (IRGSC) with adaptive weights learning [17]. Lei et al. (2020) proposed weighted Huber constrained sparse coding (WHCSC) and established a robust weighted regression model with sparse constraints for face recognition [14]. Zhang et al. (2022) proposed Enhanced Group Sparse Regularized Nonconvex Regression (EGSNR). This algorithm mitigates the bias problem and reduces the adverse impact of outliers by introducing a nonconvex function to replace the traditional 1 -norm [18]. Liu et al. (2025) proposed Graph Regularized Discriminative Nonnegative Matrix Factorization (GR-DNMF) which is an advanced variant of NMF that incorporates both graph regularization and discriminative constraints to improve feature extraction and classification performance [19].
In this paper, a new robust face recognition method based on weighted wing loss constrained sparse coding and its weighted version are proposed. For these models, the wing loss function can adaptively balances robustness and discriminability. The adaptive weighting mechanism demonstrates effective outlier suppression, significantly reducing their impact on model performance. Performances of the proposed method are examined by various experiments, and it is demonstrated to outperform some other robust face recognition methods in the literature, especially when face images are partly blocked or damaged.

2. The Recognition Method Based on Sparse Robust Coding

In the face recognition problem, each gray image of a face can be represented as a gray matrix. For ease of representation, we can stack the gray matrix as a column vector. It is assumed that there are k classes of face images and k is known. Let A i = v i , 1 , v i , 2 , , v i , n i R m × n i , which denotes training samples of class-i, where n i is the number of training samples of class-i and each column of A i represents a face image from class-i. Therefore, if given enough training samples from class-i, any new test sample from the same class can be represented linearly by the columns of the training samples A i , i.e., 
x i , 1 v i , 1 + x i , 2 v i , 2 + + x i , n i v i , n i ,
where x i , 1 , x i , 2 , , x i , n i are the corresponding coefficients.
Furthermore, let A = A 1 , A 2 , , A k R m × n , where n = i = 1 k n i is the total number of training samples, and assume k is very large; then, for a given new test sample y R m × 1 , it can be linearly represented as follows:
y = A x 0 ,
where the vector x 0 must be sparse since the test image only belongs to a certain class. To obtain the sparse vector, we need to consider the following sparse optimization problem:
m x i n { A x y 2 2 + λ α 1 } s . t . α = x ,
where λ is the penalty coefficient for the 1 -norm. The essence of Formula (7) is the sparse constraint when the residuals of least squares estimation obey the Gaussian distribution. However, when residuals obey the Laplacian distribution, the sparse coding problem is as follows:
m x i n { A x y 1 + λ α 1 } s . t . α = x .
Sparse sampling is a technique which can capture high-level correlated structures in images and represent signals with as few atoms as possible in a given over-complete dictionary. In a word, sparse sampling attempts to construct or approximately represent complex signals or images using the minimum number of basic elements (atoms in the dictionary), while retaining important features and structural information in the image. However, the problem is whether the fidelity term ( A x y 1 or A x y 2 ) is sufficiently effective to describe the fidelity of the signal, especially when the signal has noise or abnormal values [14]. Here, fidelity generally refers to the degree of similarity between the original signal and the signal which has been processed or encoded. For Formulas (7) and (8), the  2 norm (Euclidean distance) or 1 norm (Manhattan distance) are employed to define fidelity. This definition is based on the Maximum A Posteriori Probability (MAP) and assumes that the residuals after encoding (i.e., the difference between the original signal and the encoded signal) follow a Gaussian distribution (normal distribution) or a Laplace distribution. However, in real practice, the distribution of residuals is unknown, and it may not be a good choice to follow a certain distribution of a single hypothetical residual, especially when occlusion, camouflage or corruption occurs in facial images. Therefore, a fidelity item that uses a single norm in a sparse coding model may not be robust in these cases [14].
To solve this problem, we use the wing loss to replace the 1 -norm loss or the 2 -norm loss. This type of loss was proposed by Feng et al. [20], which can be defined as follows:
w i n g ( x ) = ω ln ( 1 + | x | ) / ϵ , i f | x | < ω , | x | C , o t h e r w i s e .
where the non-negative parameter ω limits the value range of the nonlinear part to (- ω , ω ), the parameter ϵ limits the curvature of the nonlinear region and C = ω ω l n ( 1 + ω / ϵ ) is a constant that smoothly connects the linear and nonlinear parts defined in the piecewise function [20]. Building upon the theoretical framework established by Boyd et al. [21] and leveraging the alternating direction method of multipliers (ADMM) algorithm, we propose a novel wing-constrained sparse coding (WCSC) model defined as follows:
m x i n g ( z ) + λ α 1 s . t . z = A x y , α = x ,
where
g ( z ) = ω ln ( 1 + z 1 ) / ϵ , i f z 1 < ω , z 1 C , o t h e r w i s e .
Under specific circumstances, as the value of λ increases, the sparsity of x becomes more pronounced.
In real life, the sample data may contain outliers. To further diminish the impact of noise or outliers in the training sample, an effective weighted approach is to assign lower weights to the outliers. In robust sparse representation-based classifier (RSRC), the corresponding minimization problem could be converted into an iteratively reweighted sparse coding problem [22].
Based on the weight vector in RSRC and the wing-constrained sparse coding model (WCSC) mentioned above, we also consider a weighted wing-constrained sparse coding model (WWCSC) in this paper. The proposed WCSC model, combined with its adaptive weight vector, demonstrates remarkable capability in mitigating the adverse effects of noise and outliers. Subsequently, the  1 -norm minimization can be addressed by the ADMM algorithm. Numerous data experiments conducted in some open face databases demonstrate that the WWCSC model exhibits an excellent classification effect, particularly when confronted with complex facial images such as occlusion, corrosion and so on.
The WWCSC model can be expressed as
m x i n g ( z ) + λ α 1 s . t . z = w T ( A x y ) , α = x ,
where w = ( w 1 , w 2 , , w m ) R m × 1 is the weight vector. In particular, the weight of the i-th sample w i is defined by the following sigmoid function:
w i ( e i ) = 1 1 + e x p ( q ( δ e i 2 δ ) ) ,
where e i is the residual and δ is the residual threshold. Obviously, δ e i 2 represents the distance between the residual and the threshold. Furthermore, the parameter q influences the penalty rate of the weight. Thus, the sigmoid function constrains the weight values within the range of [ 0 , 1 ] . Also, when the residual is greater than δ , the weight is less than 0.5, while, if the residual is less than δ , the weight is greater than 0.5. Let Ψ = ( e 1 2 , e 2 2 , , e m 2 ) , and then rearrange Ψ to Ψ a in a descending order ranging from the smallest to the largest. In addition, if we assume k = τ m , where τ [ 0 , 1 ] and τ m represents the largest integer less than τ m , then the parameter δ can be expressed as δ = Ψ a ( k ) following the article [22]. For the sake of facilitating the calculation, Formula (13) can be rewritten as
w i ( e i ) = e x p ( μ e i 2 + μ δ ) 1 + e x p ( μ e i 2 + μ δ ) ,
where the parameter μ = q δ .
For the optimization problem (12), we can use the ADMM algorithm to solve it. The Lagrange function corresponding to the above optimization problem is
L ( x , z , α , h 1 , h 2 ) = g ( z ) + λ α 1 + h 1 , w T ( A x y ) z + h 2 , α x ,
where h 1 and h 2 are Lagrange multipliers and . denotes the inner product.
In the ADMM algorithm, while one variable is updated, the others are assumed to be fixed as constants, which gradually approximates the optimal solution by minimizing the Lagrange function. This method has been widely applied in statistical learning and machine learning due to its effective handling of convex optimization problems with equality constraints, fast processing speed and good convergence. Furthermore, by adding a quadratic penalty term to the original Lagrange function, the improved augmented Lagrange function is defined as
L ρ 1 , ρ 2 ( x , z , α , h 1 , h 2 ) = g ( z ) + λ α 1 + h 1 , w T ( A x y ) z + h 2 , α x + ρ 1 2 w T ( A x y ) z 2 2 + ρ 2 2 α x 2 2 ,
where ρ 1 > 0 , ρ 2 > 0 . In fact, the augmented Lagrange function could also be rewritten as
L ρ 1 , ρ 2 ( x , z , α , h 1 , h 2 ) = g ( z ) + λ α 1 + ρ 1 2 w T ( A x y ) z + u 1 2 2 + ρ 2 2 α x + u 2 2 2 ,
where u 1 = h 1 ρ 1 , u 2 = h 2 ρ 2 . By the well-known ADMM algorithm, the above augmented Lagrange function could be iterated as follows:
x ( k + 1 ) = a r g m i n ρ 1 2 W T ( A x ( k ) y ) z ( k ) + u 1 ( k ) 2 2 + ρ 2 2 α ( k ) x ( k ) + u 2 ( k ) 2 2
z ( k + 1 ) = a r g m i n g ( z ( k ) ) + ρ 1 2 W T ( A x ( k + 1 ) y ) z ( k ) + u 1 ( k ) 2 2
α ( k + 1 ) = a r g m i n λ α ( k ) 1 + ρ 2 2 α ( k ) x ( k + 1 ) + u 2 ( k ) 2 2
u 1 ( k + 1 ) = u 1 ( k ) + W T ( A x ( k + 1 ) y ) z ( k + 1 )
u 2 ( k + 1 ) = u 2 ( k ) + α ( k + 1 ) x ( k + 1 )
where W = d i a g ( w ) = d i a g ( w 1 , w 2 , , w m ) . Subsequently, the aforementioned sub-optimization problems can be addressed individually:
x ( k + 1 ) = [ ρ 1 A T W W T A + ρ 2 I ] 1 [ ρ 1 A T W ( W T y + z ( k ) u 1 ( k ) ) + ρ 2 ( α ( k ) + u 2 ( k ) ) ]
z ( k + 1 ) = S η ρ 1 ( ε + z ( k ) 1 ) ( W T ( A x ( k + 1 ) y ) + u 1 ( k ) ) , z ( k ) 1 < η , S 1 ρ 1 ( W T ( A x ( k + 1 ) y ) + u 1 ( k ) ) , z ( k ) 1 η .
α ( k + 1 ) = S λ ρ 2 ( x ( k + 1 ) u 2 ( k ) )
u 1 ( k + 1 ) = u 1 ( k ) + W T ( A x ( k + 1 ) y ) z ( k + 1 )
u 2 ( k + 1 ) = u 2 ( k ) + α ( k + 1 ) x ( k + 1 )
where the S operator is defined as
S k ( a ) = a k , a > k 0 , | a | k a + k , a < k .
Within the theoretical framework of the alternating direction method of multipliers (ADMM), the algorithm is guaranteed to achieve global convergence for convex optimization problems (Boyd et al., 2011) [21]. This fundamental property ensures that the iterative sequence will inevitably converge to the optimal solution. Given ADMM’s proven convergence, we establish a termination criterion where the algorithm exits upon satisfying x ( k + 1 ) x ( k ) 2 / x ( k ) 2 < τ 0 , guaranteeing solution feasibility. Leveraging ADMM’s theoretical convergence guarantees, we implement threshold-based termination ( τ 0 = 10 5 ) when above residual meets the precision requirements.
For a given test sample y R m × 1 , which is assumed to belong to one category in the training set, the sparse representation x ^ can be calculated by Algorithm 1. Ideally, the non-zero terms of the estimator are associated only with a certain category in the training set. In such cases, it is quite easy to determine the category to which the test sample belongs. However, in practical applications, noise or modeling errors may result in numerous non-zero terms in the obtained estimator, and these non-zero terms are associated with multiple categories in the training set. For such situations, numerous possible classifiers are designed to address this issue. For example, we can pick out the largest one among the estimator and attribute the test sample y to the category associated with it. However, the aforementioned method does not take into account the utilization of the subspace structure associated with the images in face recognition. As can be seen from the previous description, there exists a linear structure among the images in the model. To better utilize this linear structure, the test samples are reconstructed using the training samples of each category at first, and then classification is carried out based on the differences between the reconstructed samples and the test samples.
For each class i, suppose δ i : R n R n is the characteristic function that selects the coefficients related to the i-th class. Then, for given x R n , non-zero elements of δ i ( x ) are just the items in x associated with category i. Based on the characteristic functions, we can reconstruct the test sample by using the coefficients associated with all the training samples of the i-th category. Therefore, the given test sample y can be estimated as y i ^ = A δ i ( x ^ ) . Then, the recognition of y can be realized based on these approximations by assigning it to the object class that minimizes the residual between y and y i ^ :
m i n r i ( y ) y A δ i ( x ^ ) 2 .
The computational costs of most classification algorithms are associated with the dimensions of the input samples. In numerous scenarios of practical applications, the dimensions of the data might be extremely high, particularly in issues related to image classification. Thus, it is quite meaningful to reduce the dimensions of the data. Various dimensional reduction methods have been proposed in the literature, such as principal component analysis (PCA), linear discriminant analysis (LDA), marginal Fisher analysis (MFA), maximum margin criterion (MMC) [23], locality preserving projections (LPP) [24], sparsity preserving projection (SSP) [25], semi-supervised dimensionality reduction (SSDR), semi-supervised discriminant analysis (SDA) [26] and random projection (RP) [27,28]. In particular, random projection is a commonly used technique in data mining and machine learning. It reduces the dimensionality of high-dimensional data by mapping it to a lower dimensional space while preserving the structure of the original data as much as possible. This method is particularly useful when dealing with large-scale datasets, as it can significantly reduce computational and storage requirements. Wright et al. [8] examined the applications of RP in the dimension reduction of face images. In the subsequent experiments, we shall also employ random projection technology to conduct dimensionality reduction of the data and realize face recognition.
Algorithm 1: Weighted wing-constrained sparse coding model
Input: 
The atomic dictionary A, test sample y
Output: 
The estimate x ^
1:
Given the atomic dictionary A and test sample y, select the appropriate parameter λ , q , τ , ρ 1 , ρ 2 , N i t h e r , τ 0
2:
Initialize x ( 0 ) , z ( 0 ) , α ( 0 ) , u 1 ( 0 ) , u 2 ( 0 )
3:
Calculate the weight of the initialization
4:
for all  k = 0 , 1 , , N i t h e r  do
5:
 Update x based on Equation (23)
6:
 Update z based on Equation (24)
7:
 Update α based on Equation (25)
8:
 Update u 1 based on Equation (26)
9:
 Update u 2 based on Equation (27)
10:
  Update the weight w based on Equation
w i ( k + 1 ) ( e i ( k + 1 ) ) = e x p ( μ ( e i ( k + 1 ) ) 2 + μ δ ) 1 + e x p ( μ ( e i ( k + 1 ) ) 2 + μ δ )
11:
end for
12:
Criteria for exiting a loop:
τ ( k + 1 ) = x ( k + 1 ) x ( k ) 2 / x ( k ) 2
If τ ( k + 1 ) < τ 0 , then exit loop
13:
return The estimator x ^

3. Convergence Analysis

Before proof of convergence, the standard form of the ADMM objective function is given by Formula (12) as follows:
m x i n f ( x ) + g ( z ) + l ( α ) s . t . z = w T ( A x y ) , α = x ,
where f ( x ) = 0 , l ( α ) = λ α 1 , g ( z ) = ω ln ( 1 + z 1 ) / ϵ , i f z 1 < ω , z 1 C , o t h e r w i s e . . The following is a theorem about the function f ( x ) , l ( α ) , g ( z ) .
Theorem 1. 
The unaugmented Lagrangian
L ( x , z , α , h 1 , h 2 ) = g ( z ) + λ α 1 + h 1 , w T ( A x y ) z + h 2 , α x ,
has a saddle point. Explicitly, there exist ( x * , z * , α * , h 1 * , h 2 * ) , not necessarily unique, for which
L ( x * , z * , α * , h 1 , h 2 ) L ( x * , z * , α * , h 1 * , h 2 * ) L ( x , z , α , h 1 * , h 2 * ) ,
holds for all x , z , α , h 1 , h 2 .
Proof. 
The primitive problem is min x , z , α sup h 1 , h 2 L ( x , z , α , h 1 , h 2 ) , represented by P l . The dual problem is max h 1 , h 2 inf x , z , α L ( x , z , α , h 1 , h 2 ) , represented by D l . For L ( x , z , α , h 1 , h 2 ) , since f ( x ) + l ( α ) + g ( z ) is a proper closed convex function, z w T ( A x y ) = 0 and α x = 0 is an affine function, and the existence points ( x * , z * , α * , h 1 * , h 2 * ) satisfy the Karush–Kuhn–Tucker (KKT) condition, so, according to the strong and weak duality and optimality conditions of the Lagrange multiplier method [29], the following conclusions can be obtained:
The primitive problem P l is equal to the optimal value of the dual problem D l , that is, v a l ( P l ) = v a l ( D l ) . The duality gap between the original problem and the dual problem is zero, which means that it satisfies the strong max–min property, and P l and D l have the same optimal solution, where v a l ( x ) represents the value of x.
Any point ( x * , z * , α * , h 1 * , h 2 * ) that satisfies the KKT condition in L ( x , z , α , h 1 , h 2 ) has
inf x , z , α L ( x , z , α , h 1 * , h 2 * ) L ( x * , z * , α * , h 1 * , h 2 * ) sup h 1 , h 2 L ( x * , z * , α * , h 1 , h 2 ) ,
i.e.,
v a l ( D l ) L ( x * , z * , α * , h 1 * , h 2 * ) v a l ( P l ) .
When the duality gap between the primitive problem and the dual problem is zero, i.e., v a l ( P l ) = v a l ( D l ) , we can obtain
L ( x * , z * , α * , h 1 * , h 2 * ) = inf x , z , α L ( x , z , α , h 1 * , h 2 * ) L ( x , z , α , h 1 * , h 2 * ) , x , z , α R n .
The same reasoning can obtain
L ( x * , z * , α * , h 1 * , h 2 * ) = sup h 1 , h 2 L ( x * , z * , α * , h 1 , h 2 ) L ( x * , z * , α * , h 1 , h 2 ) , h 1 , h 2 R n .
In summary,
L ( x * , z * , α * , h 1 , h 2 ) L ( x * , z * , α * , h 1 * , h 2 * ) L ( x , z , α , h 1 * , h 2 * ) .
That is, L ( x , z , α , h 1 , h 2 ) has a saddle point ( x * , z * , α * , h 1 * , h 2 * ) , not necessarily unique. The standard Lagrangian function of Equation (30) satisfies Theorem 1 as evidence. □
According to Theorem 1, the ADMM iteration satisfies the following conditions, and the convergence of proof (Ref. [21] Appendix A):
Residual convergence. r k 0 as k , i.e., the iterates approach feasibility.
Objective convergence. f ( x k ) + g ( z k ) + l ( α k ) f ( x * ) + g ( z * ) + l ( α * ) as k , i.e., the objective function of the iterates approaches the optimal value.
Dual variable convergence. h 1 k h 1 * , h 2 k h 2 * as k , where ( h 1 * , h 2 * ) is a dual optimal point.

4. Experimental Results

In this section, we will conduct experiments based on some public face recognition datasets. These experiments can not only demonstrate the effectiveness of the proposed classification algorithm but also verify the claims made in the previous chapters. Secondly, the robustness of the proposed algorithm against distortion and occlusion shall also be discussed. Particularly, we consider the following four face datasets: the ORL face dataset [30], the Yale face dataset [31], the AR face dataset [32] and the FERET face dataset [33]. In these experiments, the ORL face dataset contains a total of 400 images of 40 different individuals. The Yale face dataset was created by the Yale University’s Center for Computational Vision and Control. The dataset comprises 165 images from 15 volunteers, exhibiting variations in lighting conditions, facial expressions and body poses. The AR dataset comprises over 4000 frontal images of 126 individuals, with each individual contributing 26 photographs. For this experiment, 10 photos of 40 individuals from this database are selected for recognition. The FERET face dataset contains a total of 1400 face images from 200 individuals, with each person having 7 images. The face images in the dataset include variations in expressions, lighting and poses. In addition, all the images from the FERET dataset are stored in the TIFF format as grayscale images, with a width of 80 and a height of 80.
The WCSC and WWCSC models incorporate several tunable parameters (e.g., regularization coefficients λ , wing function parameters ω and ϵ ). To ensure optimal model performance, we adopted a systematic cross-validation approach for parameter selection. Specifically, we implemented 10-fold stratified cross-validation to comprehensively evaluate parameter combinations, selecting the configuration that simultaneously maximized recognition accuracy (measured by F1-score) and maintained robust generalization performance across validation sets. This rigorous selection process helps mitigate overfitting while preserving model stability under varying occlusion conditions. The parameter optimization process employed k-fold cross-validation (k = 10) with early stopping, where we evaluated the regularization parameters λ [ 0.1 , 1.0 ] at 0.1 intervals and wing parameters ω , ϵ through grid search. The selected configuration minimized the reconstruction error ξ while satisfying | ξ | < 0.001 convergence criteria.
Firstly, for the case in the absence of damage and occlusion of the face images, the corresponding recognition rates of various methods are listed in Table 1, where the proposed WWCSC is compared with some existing competitors, such as the SRC, RSRC, Sparse Huber (SH), IRGSC and WHCSC.
It can be inferred from Table 1 that the WWCSC method outperforms the WCSC for the four datasets, which implies that adding weights to the loss function can enhance the recognition rate of the model in face recognition. Compared with the SRC, WHCSC and SH methods, the WWCSC method demonstrates superior performance across all four aforementioned datasets, particularly on the FERET dataset. As can be seen from Table 1, on the above four datasets, the performance of IRGSC is the best compared with other methods. This result is attributed to the superiority of the IRGSC method itself. However, the performance of the WWCSC method we proposed is only slightly inferior to that of IRGSC. This result encourages us to explore the robustness of the WWCSC method. Therefore, in the following experiments, we mainly consider the case when there exists loss or occlusion in the face images. In particular, the robustness of the WWCSC is investigated under various types of occlusions, such as Gaussian noise random pixel corruption and random block occlusion.
Secondly, our main research focuses on the robustness of the WWCSC method when face images have different degrees of loss. In this experiment, we specifically conduct our analysis using the AR dataset. For the AR dataset, we artificially damage the face images under varying degrees following Wright, J. [8], where different percentages of randomly chosen pixels from each of the test images are replaced with simulated values from a uniform distribution. For example, the effects of a specific face image with various pixel noises are shown in Figure 1.
In the following, in order to validate the robustness of the WWCSC method, we imposed damage ranging from 10% to 90% on the test images. The recognition rates of face recognition for different methods with different degrees of damages are presented in Table 2.
From Table 2, it can be seen that, as the degree of image damage increases, the face recognition rates of most models show a downward trend. Among them, the performance of the WWCSC is relatively stable. When the loss of face images is 10% or 20%, the recognition rate of the WWCSC method is marginally lower than those of the IRGSC and RSRC methods. However, when face images suffer from a more than 30% degree of damage, the performance of the WWCSC method surpasses most of the other methods. In particular, in the case of damage to face images reaching 90%, the recognition rate of the WWCSC method is 28.55%, 23.71%, 33.85%, 21.67%, 27.45% and 18.90% higher than SRC, RSRC, SH, IRGSC, WHCSC and WCSC. Furthermore, comparative experiments demonstrate that the proposed WWCSC exhibits significantly greater robustness in processing occluded and corrupted images compared to existing state-of-the-art methods.
Thirdly, we consider the case when the test images have imposed white or black block occlusions of varying sizes, as shown in Figure 2 and Figure 3.
Also, the face recognition rates of different methods under different degrees of damage are listed in Table 3.
It can be observed from Table 3 that, as the occluded part of the image increases, the face recognition rate of all models shows a gradually decreasing trend. This result indicates that, as the proportion of face images that are occluded increases, the difficulty in face recognition also increases. When images exhibit 10% to 20% block occlusion, the IRGSC model demonstrates the optimal performance among all evaluated models, with the WWCSC model ranking second in effectiveness. However, when images are subject to 30% to 70% block occlusion, recognition rates of the WWCSC distinctly exceed those of other alternative methods. In particularly, in the case of block occlusion of face images reaching 70%, the recognition rate of the WWCSC method is 27.71%, 25.56%, 29.69%, 18.33%, 20.08% and 1.66% higher than SRC, RSRC, SH, IRGSC, WHCSC and WCSC. Moreover, this result demonstrates the superiority of the WWCSC method in the case of high block occlusion in face images.

5. Conclusions

In this paper, we propose new wing-constrained sparse coding and weighted wing-constrained sparse coding. Compared with some other methods, the advantage of WWCSC lies in its robustness and effectiveness in dealing with complex issues such as occlusion and damage in images. On the one hand, the wing loss function utilized by the model can excellently diminish the effects of noise or outliers. On the other hand, the model can distinguish face images of different classes by applying weights to the loss function to decrease the intra-class variance and increase the inter-class variance. Experiments also show that WWCSC is superior to IRGSC, RSRC, SRC and so on. The high robustness and strong effectiveness of the WWCSC method demonstrate that it is an ideal choice for face recognition applications.
This study has several limitations that warrant further investigation. Primarily, the experimental evaluation was limited to comparative analyses between the proposed WCSC/WWCSC models and conventional sparse sampling-based approaches, without benchmarking against deep learning methods or other state-of-the-art alternatives. Additionally, the paper is narrowly focused on algorithmic performance, without addressing system-level deployment or real-world applicability. Future investigations will examine the proposed model’s deployment potential across diverse application scenarios, particularly in (a) resource-constrained embedded systems, (b) real-time robotic control architectures and (c) human–robot interaction (HRI) frameworks.

Author Contributions

Conceptualization, Y.Y. and J.X.; methodology, Y.Y. and J.X.; analysis, Y.Y. and J.X.; writing and original draft preparation, Y.Y.; review and editing, Y.Y. and J.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the Natural Science Foundation of Chongqing, China under Grant No. CSTB2024NSCQ-MSX0551.

Data Availability Statement

To verify the results of this paper, four publicly available datasets are used, including the ORL dataset, the Yale dataset, the AR dataset and the FERET dataset. The ORL dataset is available from the Computer Laboratory at the University of Cambridge and contains a set of facial images captured in the laboratory from April 1992 to April 1994. The ORL database is accessible at https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html (accessed on 4 March 2025). The Yale dataset is a face recognition database created by Yale University. This database contains multi-angle facial images under various lighting conditions, which is accessible at http://vision.ucsd.edu/datasets/yale-face-database (accessed on 4 March 2025). The AR dataset which was created by Aleix Martinez and Robert Benavente at the Computer Vision Center (CVC) at the Universitat Autònoma de Barcelona (UAB) is accessible at https://download.csdn.net/download/yhsbzl/86083027 (accessed on 4 March 2025). The FERET database was sponsored by the U.S. Department of Defense Counterdrug Technology Development Program to support the research on and evaluation of facial recognition technology. The database was primarily created by Dr. Harry Wechsler and Dr. Phillips from George Mason University. The FERET database is accessible at http://www.nist.gov/itl/iad/ig/colorferet.cfm (accessed on 4 March 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ADMMAlternating direction method of multipliers
ORLOlivetti Research Laboratory
FERETFace Recognition Technology Program
LASSOLeast Absolute Shrinkage and Selection Operator
CRCCollaborative representation classifier
SRCSparse representation-based classifier
IRGSCIterative re-constrained group sparse classifier
WHCSCWeighted Huber constrained sparse coding

References

  1. Brunelli, R.; Poggio, T. Face recognition: Features versus templates. IEEE Trans. Pattern Anal. Mach. Intell. 1993, 15, 1042–1052. [Google Scholar] [CrossRef]
  2. Al-Waisy, A.S.; Qahwaji, R.; Ipson, S. A multimodal deep learning framework using local feature representations for face recognition. Mach. Vis. Appl. 2018, 29, 35–54. [Google Scholar] [CrossRef]
  3. Turk, M.; Pentland, A. Eigenfaces for Recognition. J. Cogn. Neurosci. 1991, 3, 71–86. [Google Scholar] [CrossRef] [PubMed]
  4. Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 711–720. [Google Scholar] [CrossRef]
  5. Wickrama, K.; Arachchilage, S.P.; Izquierdo, E. Deep-learned faces: A survey. EURASIP J. Image Video Process. 2020, 25, 35–54. [Google Scholar]
  6. Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
  7. Madarkar, J.; Sharma, P. Sparse Representation Based Face Recognition Using VGGFace. In Machine Learning and Big Data Analytics (Proceedings of International Conference on Machine Learning and Big Data Analytics (ICMLBDA) 2021); Misra, R., Shyamasundar, R.K., Chaturvedi, A., Omer, R., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 280–288. [Google Scholar] [CrossRef]
  8. Wright, J.; Yang, A.Y.; Ganesh, A.; Sastry, S.; Ma, Y. Robust Face Recognition via Sparse Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 210–227. [Google Scholar] [CrossRef] [PubMed]
  9. Yang, J.; Zhang, Y. Alternating Direction Algorithms for 1-Problems in Compressive Sensing. SIAM J. Sci. Comput. 2011, 33, 250–278. [Google Scholar] [CrossRef]
  10. Goldstein, T.; Osher, S. The Split Bregman Method for L1-Regularized Problems. SIAM J. Imag. Sci. 2009, 2, 323–343. [Google Scholar] [CrossRef]
  11. Jarret, A.; Fageot, J.; Simeoni, M. A Fast and Scalable Polyatomic Frank-Wolfe Algorithm for the LASSO. IEEE Signal Process. Lett. 2022, 29, 637–641. [Google Scholar] [CrossRef]
  12. Beck, A.; Teboulle, M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
  13. Candes, E.J.; Romberg, J.; Tao, T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 2006, 52, 489–509. [Google Scholar] [CrossRef]
  14. Lei, D.; Jiang, Z.; Wu, Y. Weighted Huber constrained sparse face recognition. Neural Comput. Appl. 2020, 32, 5235–5253. [Google Scholar] [CrossRef]
  15. Zhang, L.; Yang, M.; Xiangchu, F. Sparse representation or collaborative representation: Which helps face recognition? In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 471–478. [Google Scholar]
  16. Zhong, D.; Xie, Z.; Li, Y.; Han, J. Loose L1/2 regularised sparse representation for face recognition. IET Comput. Vis. 2015, 9, 251–258. [Google Scholar] [CrossRef]
  17. Zheng, J.; Yang, P.; Chen, S.; Shen, G.; Wang, W. Iterative Re-Constrained Group Sparse Face Recognition With Adaptive Weights Learning. IEEE Trans. Image Process. 2017, 26, 2408–2423. [Google Scholar] [CrossRef] [PubMed]
  18. Zhang, C.; Li, H.; Chen, C.; Qian, Y.; Zhou, X. Enhanced Group Sparse Regularized Nonconvex Regression for Face Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 2438–2452. [Google Scholar] [CrossRef] [PubMed]
  19. Liu, Z.H.; Zhu, F.; Xiong, H.; Chen, X.C.; Pelusi, D.; Vasilakos, A.V. Graph regularized discriminative nonnegative matrix factorization. Eng. Appl. Artif. Intell. 2025, 139, 109629. [Google Scholar] [CrossRef]
  20. Feng, Z.H.; Kittler, J.; Awais, M.; Huber, P.; Wu, X.J. Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2235–2245. [Google Scholar]
  21. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Found. Trends® Mach. Learn. 2011, 3, 1–122. [Google Scholar]
  22. Yang, M.; Zhang, L.; Yang, J.; Zhang, D. Robust sparse coding for face recognition. In Proceedings of the Computer Vision & Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 625–632. [Google Scholar]
  23. Xin, A.; Yang, W.; Zheng, X.J. Sub-pattern based Maximum Margin Criterion for face Recognition. In Proceedings of the 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2–4 June 2017; pp. 218–222. [Google Scholar]
  24. Cai, X.F.; Wen, G.H.; Wei, J.; Li, J. Enhanced supervised locality preserving projections for face recognition. In Proceedings of the 2011 International Conference on Machine Learning and Cybernetics, Guilin, China, 10–13 July 2011; pp. 1762–1766. [Google Scholar]
  25. Qiao, L.; Chen, S.; Tan, X. Sparsity preserving projections with applications to face recognition. Pattern Recognit. 2010, 43, 331–341. [Google Scholar] [CrossRef]
  26. Ling, G.F.; Han, P.Y.; Yee, K.E.; Yin, O.S. Face recognition via semi-supervised discriminant local analysis. In Proceedings of the 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Lumpur, Malaysia, 19–21 October 2015; pp. 292–297. [Google Scholar]
  27. Majumdar, A.; Ward, R.K. Robust Classifiers for Data Reduced via Random Projections. IEEE Trans. Syst. Man, Cybern. Part B (Cybern.) 2010, 40, 1359–1371. [Google Scholar] [CrossRef] [PubMed]
  28. Lestariningati, S.I.; Suksmono, A.B.; Edward, I.J.M.; Usman, K. Group Class Residual 1-Minimization on Random Projection Sparse Representation Classifier for Face Recognition. Electronics 2022, 11, 2723. [Google Scholar] [CrossRef]
  29. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  30. The ORL Database. Available online: https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html (accessed on 4 March 2025).
  31. The Yale Database. Available online: http://vision.ucsd.edu/datasets/yale-face-database (accessed on 4 March 2025).
  32. The Aleix Martinez and Robert Benavente Database. Available online: https://www2.ece.ohio-state.edu/aleix/ARdatabase.html (accessed on 4 March 2025).
  33. The FERET Database. Available online: http://www.nist.gov/itl/iad/ig/colorferet.cfm (accessed on 4 March 2025).
Figure 1. Face images with different percentages of pixel corruption (from 0% to 50%).
Figure 1. Face images with different percentages of pixel corruption (from 0% to 50%).
Electronics 14 01736 g001
Figure 2. Face images of a white block occlusion (from 50% to 90%).
Figure 2. Face images of a white block occlusion (from 50% to 90%).
Electronics 14 01736 g002
Figure 3. Face images of a black block occlusion (from 50% to 90%).
Figure 3. Face images of a black block occlusion (from 50% to 90%).
Electronics 14 01736 g003
Table 1. Comparison table of recognition rates for different datasets (unit: percentage).
Table 1. Comparison table of recognition rates for different datasets (unit: percentage).
ARORLYALEFERET
SRC94.1789.1795.5685.36
RSRC95.3890.2196.3488.42
SH94.2688.3293.1283.27
IRGSC96.8890.6210093.35
WHCSC93.2189.5793.3690.42
WCSC95.0089.0094.5792.27
WWCSC95.0090.0095.5692.78
Bold values indicate the best performance in each column.
Table 2. Comparison table of recognition rates of different corruption (unit: percentage).
Table 2. Comparison table of recognition rates of different corruption (unit: percentage).
10%20%30%40%50%60%70%80%90%
SRC94.1693.3393.3387.5087.5084.3884.3881.2565.62
RSRC95.2194.3793.8990.6588.4785.3684.8982.3770.46
SH93.9893.4192.0689.3985.7283.6180.4270.1560.32
IRGSC96.8896.8894.6290.3889.2589.2589.2580.1272.50
WHCSC93.0692.5891.3488.4986.6784.5281.9478.6166.72
WCSC95.0095.0095.0093.3692.3490.2588.2386.3475.27
WWCSC95.0095.0095.0094.1794.1794.1794.1794.1794.17
Bold values indicate the best performance in each column.
Table 3. Comparison of facial recognition accuracy under varying levels of occlusion(unit: percentage).
Table 3. Comparison of facial recognition accuracy under varying levels of occlusion(unit: percentage).
10%20%30%40%50%60%70%
SRC90.6290.6284.3884.3871.8865.0053.12
RSRC91.3791.3786.4382.3170.3764.3555.27
SH90.5490.5485.6180.4768.3560.5951.14
IRGSC96.8893.7587.5084.3878.1268.7562.50
WHCSC92.4892.4886.8783.9475.4669.6860.75
WCSC94.1793.3388.3387.5086.6785.8379.17
WWCSC95.0093.3390.8390.8390.0087.5080.83
Bold values indicate the best performance in each column.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yun, Y.; Xu, J. Robust Face Recognition Based on the Wing Loss and the 1 Penalty. Electronics 2025, 14, 1736. https://doi.org/10.3390/electronics14091736

AMA Style

Yun Y, Xu J. Robust Face Recognition Based on the Wing Loss and the 1 Penalty. Electronics. 2025; 14(9):1736. https://doi.org/10.3390/electronics14091736

Chicago/Turabian Style

Yun, Yaoyao, and Jianwen Xu. 2025. "Robust Face Recognition Based on the Wing Loss and the 1 Penalty" Electronics 14, no. 9: 1736. https://doi.org/10.3390/electronics14091736

APA Style

Yun, Y., & Xu, J. (2025). Robust Face Recognition Based on the Wing Loss and the 1 Penalty. Electronics, 14(9), 1736. https://doi.org/10.3390/electronics14091736

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop