Robust Face Recognition Based on the Wing Loss and the ℓ1 Penalty
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors introduces a wing-constrained sparse coding model(WCSC) and its weighted version(WWCSC). The reported experiments demonstrate the performance of the proposed method. My comments are given as follows:
- It would be beneficial if the authors could outline the innovative aspects of the proposed methodology in the first section.
- The authors has proposed two algorithms: WCSC and WWCSC. However, only the steps for the WWCSC algorithm are provided in the text.
- In the experimental section, the authors should analyze how the parameters in the objective function are valued.
- The author should provide convergence experiments for the proposed method in the experimental section.
- The limitation of this paper should be discussed in detail.
- Some of the references cited in the article are outdated, and the authors should discuss some of the latest literature instead, such as, Graph regularized discriminative nonnegative matrix factorization, Engineering Applications of Artificial Intelligence, 2025
Author Response
Comment 1: It would be beneficial if the authors could outline the innovative aspects of the proposed methodology in the first section.
Response 1: We appreciate this suggestion and have revised the introduction section to explicitly highlight the innovative aspects of the Wing-Constrained Sparse Coding (WCSC) model and its weighted version (WWCSC). Specifically, we have emphasized the novel constraints and weighting mechanisms that distinguish our approach from some existing sparse coding methods.
Comment 2: The authors has proposed two algorithms: WCSC and WWCSC. However, only the steps for the WWCSC algorithm are provided in the text.
Response 2: In this manuscript, the proposed WWCSC model actually degenerates to the WCSC model when the weight matrix becomes an identity matrix with non-updatable weights. Therefore, the manuscript exclusively elaborates on the algorithmic workflow of the WWCSC framework, as it represents the more general case encompassing both scenarios.
Comment 3: In the experimental section, the authors should analyze how the parameters in the objective function are valued.
Response 3: As suggested, we have expanded the experimental section to include a detailed analysis of how the parameters in the objective function are valued. This includes a discussion on parameter selection and their impact on the performance of the proposed method. The WCSC and WWCSC models incorporate several tunable parameters (e.g., regularization coefficients , wing function parameters and ). To ensure optimal model performance, we adopted a systematic cross-validation approach for parameter selection. Specifically, we implemented 10-fold stratified cross-validation to comprehensively evaluate parameter combinations, selecting the configuration that simultaneously maximized recognition accuracy (measured by F1-score) and maintained robust generalization performance across validation sets. This rigorous selection process helps mitigate overfitting while preserving model stability under varying occlusion conditions. The parameter optimization process employed k-fold cross-validation (k=10) with early stopping, where we evaluated the regularization parameters at 0.1 intervals and wing parameters through grid search. The selected configuration minimized the reconstruction error , while satisfying convergence criteria.
Comment 4: The author should provide convergence experiments for the proposed method in the experimental section.
Response 4: In response to this issue, we have incorporated a convergence proof in the revised manuscript.
Comment 5: The limitation of this paper should be discussed in detail.
Response 5: We sincerely regret this oversight. In response, we have incorporated a dedicated subsection in the conclusion to comprehensively discuss the limitations of our work, including computational complexity and application constraints.
Comment 6: Some of the references cited in the article are outdated, and the authors should discuss some of the latest literature instead, such as, Graph regularized discriminative nonnegative matrix factorization, Engineering Applications of Artificial Intelligence, 2025
Response 6: Thanks the reviewer very much for pointing out this problem. We have updated the references to include the latest literature, such as the suggested paper *"Graph regularized discriminative nonnegative matrix factorization, Engineering Applications of Artificial Intelligence, 2025"* and other recent works relevant to our study.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsIn this paper, Jianwen et al. (Submission ID-Electronics-3538014) propose a novel face recognition framework that integrates wing-constrained sparse coding (WCSC) and its weighted version (WWCSC) to enhance robustness against occlusion and image corruption. The alternating direction method of multipliers (ADMM) is employed for optimization, and experiments conducted on ORL, Yale, AR, and FERET datasets demonstrate the effectiveness of the proposed approach. The study is methodologically rigorous and provides a structured evaluation of simulation techniques. However, I have a few concerns that need to be addressed. Once these revisions are made, I will recommend the article for acceptance in Electronics.
Major Revisions:
- Abstract: The problem statement and key results are unclear. Please revise the abstract to clearly define the research problem and summarize the main findings.
- Methodology: The hyperparameter selection lacks justification. Provide a detailed explanation of the chosen hyperparameters in the methodology section.
- Comparison with Deep Learning Approaches: The study does not include a comparison with deep learning-based face recognition models. It is recommended to include such a comparison to demonstrate the advantages and limitations of the proposed method.
Minor Revisions:
Title:
- The title is overly technical and does not clearly highlight the main contribution.
- Avoid using technical jargon like "ℓ1 regularization" unless absolutely necessary.
- Consider a simplified title that emphasizes robustness and practical impact.
Keywords:
- Ensure that each keyword begins with a capital letter.
Introduction:
- The literature review should be updated to include recent studies from 2023 and 2024.
- Proper citation formatting:
- Correct: text [1, 2]
- Incorrect: text[1,2]
- Journal name abbreviations should follow standard conventions. Use the following resources for guidance:
- UBC Journal Abbreviations
- CAS Source Index (CASSI)
- Resurchify Journal Abbreviations
By implementing these revisions, the manuscript will be significantly improved in clarity, structure, and alignment with journal standards.
Author Response
Comment 1: Abstract: The problem statement and key results are unclear. Please revise the abstract to clearly define the research problem and summarize the main findings.
Response 1: Following the suggestion by the reviewer, we have restructured the abstract to clearly state the research problem (e.g., robustness challenges in occluded/corrupted face recognition) and succinctly summarize the key contributions and results (e.g., performance gains over baseline methods on ORL, Yale, AR, and FERET datasets).
Comment 2: Methodology: The hyperparameter selection lacks justification. Provide a detailed explanation of the chosen hyperparameters in the methodology section.
Response 2: A detailed explanation of hyperparameter selection (e.g., λ for ℓ1 regularization, weights in WWCSC) has been added to the Experimental Section, including empirical and theoretical rationale.
Comment 3: Comparison with Deep Learning Approaches: The study does not include a comparison with deep learning-based face recognition models. It is recommended to include such a comparison to demonstrate the advantages and limitations of the proposed method.
Response 3: Thanks the reviewer for this valuable suggestion. In this study, we intentionally focused on sparse sampling paradigms and performance comparisons to conventional sparse coding techniques (e.g., SRC, SH). While this provides rigorous evaluation within the designated framework, we recognize that benchmarking against some deep learning models (e.g., CNNs, Transformers) would offer valuable perspectives. Such comparative analysis warrants further investigation in future studies.
Comment 4: The title is overly technical and does not clearly highlight the main contribution.Avoid using technical jargon like "ℓ1 regularization" unless absolutely necessary.Consider a simplified title that emphasizes robustness and practical impact.
Response 4: The title has been simplified to emphasize practical impact. The original title: "Robust face recognition based on the wing loss and the ℓ1 regularization"; The revised title: "Robust face recognition based on the wing loss and the ℓ1 penalty".
Comment 5: Keywords: Ensure that each keyword begins with a capital letter.
Response 5: All keywords are revised to ensure that each one begins with a capital letter.
Comment 6: The literature review should be updated to include recent studies from 2023 and 2024. Proper citation formatting: Correct: text [1, 2] Incorrect: text[1,2]. Journal name abbreviations should follow standard conventions. Use the following resources for guidance: UBC Journal Abbreviations, CAS Source Index (CASSI) Resurchify Journal Abbreviations
Response 6: Added 2 recent references (2020–2024) on sparse coding for face recognition. Corrected citation formatting (e.g., "text [1, 2]" instead of "text[1,2]"). Journal abbreviations are standardized based on CAS Source Index (CASSI).
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis paper proposes a novel approach for robust face recognition based on a weighted wing loss-constrained sparse coding model (WWCSC). The authors claim that the proposed method demonstrates superior performance, particularly under challenging conditions such as occlusion and corruption, as validated on several public face datasets. While the topic is relevant and the model has potential merit, the manuscript in its current form requires significant revision in several aspects before it can be considered for publication.
First, the language and writing quality of the paper need considerable improvement. Although the technical descriptions are generally clear, the manuscript suffers from awkward phrasing, grammatical inconsistencies, and direct translations from Chinese academic writing style, which make it difficult to follow in places. Expressions such as "impacts of the noise," "based on the theory by Byod," or "the WWCSC is more robust" should be revised for fluency and scientific clarity. A thorough proofreading by a native English speaker or a professional language editing service is strongly recommended.
Second, while the WWCSC model is a reasonable extension of existing sparse coding methods, its novelty is somewhat incremental. The authors combine wing loss—originally developed for landmark detection—with a sparse coding framework, and add a dynamic weighting mechanism inspired by existing robust sparse representation classifiers. The paper would benefit from a clearer articulation of its unique theoretical contribution. For instance, how does the integration of wing loss fundamentally differ from previous use cases such as Huber or ℓ₁/ℓ₂ loss? How does the adaptive weighting strategy advance the state of the art beyond WHCSC and IRGSC? These distinctions should be emphasized more explicitly in the introduction and conclusion.
In terms of experimental evaluation, the authors provide extensive results across several datasets and under various distortion settings, which is commendable. However, the current results are presented mostly in tabular format with little accompanying analysis. It would be helpful to include visual comparisons (e.g., reconstructed images under different occlusion levels), as well as statistical significance testing (e.g., confidence intervals, t-tests) to support claims of robustness. Moreover, runtime performance and computational complexity of WWCSC versus the compared methods are not reported. Since robustness often comes at the cost of speed or memory usage, a discussion of trade-offs would improve the practical relevance of the work.
Additionally, the paper is narrowly focused on algorithmic performance, without addressing system-level deployment or real-world applicability. It would be valuable for the authors to situate their method in broader application contexts such as embedded systems, real-time robotics, or human-robot interaction (HRI). Recent works such as "Distributed Real-Time Control Architecture for Electro-Hydraulic Humanoid Robots" and "Multi-sensor Guided Hand Gesture Recognition for a Teleoperated Robot Using a Recurrent Neural Network" provide compelling directions for integrating computer vision with real-time, multi-modal robotic systems. The authors may consider extending the WWCSC model in these directions, such as applying it to dynamic or multi-modal face recognition tasks, or benchmarking it on edge AI platforms to assess its real-time capability.
Author Response
Comment 1: First, the language and writing quality of the paper need considerable improvement. Although the technical descriptions are generally clear, the manuscript suffers from awkward phrasing, grammatical inconsistencies, and direct translations from Chinese academic writing style, which make it difficult to follow in places. Expressions such as "impacts of the noise," "based on the theory by Byod," or "the WWCSC is more robust" should be revised for fluency and scientific clarity. A thorough proofreading by a native English speaker or a professional language editing service is strongly recommended.
Response 1: Thanks the reviewer for this suggestion about the language and we have taken the following steps to revise the manuscript: Firstly, we have revised all problematic expressions (e.g., "impacts of the noise" → "effects of noise," "based on the theory by Byod" → " Building upon the theoretical framework established by Boyd et al." "the WWCSC is more robust" → "WWCSC demonstrates superior robustness"). Secondly, we conducted a line-by-line proofreading to eliminate grammatical errors and improve clarity of the manuscript.
Comment 2: Second, while the WWCSC model is a reasonable extension of existing sparse coding methods, its novelty is somewhat incremental. The authors combine wing loss—originally developed for landmark detection—with a sparse coding framework, and add a dynamic weighting mechanism inspired by existing robust sparse representation classifiers. The paper would benefit from a clearer articulation of its unique theoretical contribution. For instance, how does the integration of wing loss fundamentally differ from previous use cases such as Huber or ℓ₁/ℓ₂ loss? How does the adaptive weighting strategy advance the state of the art beyond WHCSC and IRGSC? These distinctions should be emphasized more explicitly in the introduction and conclusion.
Response 2: We have revised the Introduction and Conclusion sections to explicitly highlight our contributions. In particular, unlike Huber or ℓ₁/ℓ₂ losses, the the small errors critical for occlusion robustness could be better preserved due to the wing loss’s nonlinear penalty. In addition, compared to WHCSC’s fixed weights, our adaptive strategy optimizes feature-specific contributions, which could improve the accuracy under heterogeneous corruptions.
Comment 3: In terms of experimental evaluation, the authors provide extensive results across several datasets and under various distortion settings, which is commendable. However, the current results are presented mostly in tabular format with little accompanying analysis. It would be helpful to include visual comparisons (e.g., reconstructed images under different occlusion levels), as well as statistical significance testing (e.g., confidence intervals, t-tests) to support claims of robustness. Moreover, runtime performance and computational complexity of WWCSC versus the compared methods are not reported. Since robustness often comes at the cost of speed or memory usage, a discussion of trade-offs would improve the practical relevance of the work.
Response 3: As pointed out by the reviewer, visual comparisons are quite interesting and the correlated image reconstruction problems warrants further investigation in future studies. However, this study primarily investigates the application of sparse sampling techniques to enhance face recognition performance under severe occlusion or corruption. The core objective is to improve recognition accuracy in challenging scenarios rather than to address image reconstruction. This deliberate focus stems from the following considerations: (1) The proposed WCSC/WWCSC framework is specifically designed for discriminative feature extraction, optimizing recognition robustness. (2) Reconstruction- oriented approaches often introduce computational overhead without commensurate gains in recognition accuracy. (3) Real-world applications (e.g., surveillance, access control) prioritize recognition reliability over pixel-level reconstruction.
As for the Computational Complexity, the current implementations of our proposed WCSC and WWCSC models indeed demonstrate inferior computational efficiency compared to established methods such as IRGSC and RSRC in terms of runtime performance. This limitation stems primarily from: (1) Algorithmic Complexity Factors. The adaptive weighting mechanism in WWCSC necessitates additional computational overhead and the optimization framework requires more iterations to achieve convergence. (2) Implementation Constraints. The current prototype implementation has not yet incorporated hardware acceleration and the matrix operations require further optimization.
To address these computational challenges, our future work will focus on algorithmic refinements and computational optimizations. Development of an accelerated version employing approximate optimization techniques and reduce unnecessary iterations based on adaptive convergence criteria. Moreover, systematic evaluation of accuracy-efficiency tradeoffs and computational complexity analysis could be performed.
Comment 4: Additionally, the paper is narrowly focused on algorithmic performance, without addressing system-level deployment or real-world applicability. It would be valuable for the authors to situate their method in broader application contexts such as embedded systems, real-time robotics, or human-robot interaction (HRI). Recent works such as "Distributed Real-Time Control Architecture for Electro-Hydraulic Humanoid Robots" and "Multi-sensor Guided Hand Gesture Recognition for a Teleoperated Robot Using a Recurrent Neural Network" provide compelling directions for integrating computer vision with real-time, multi-modal robotic systems. The authors may consider extending the WWCSC model in these directions, such as applying it to dynamic or multi-modal face recognition tasks, or benchmarking it on edge AI platforms to assess its real-time capability.
Response 4: As pointed out by the reviewer, it would be quite interesting to examine the proposed model's deployment potential across diverse application scenarios, particularly in: (a) resource-constrained embedded systems, (b) real-time robotic control architectures, and (c) human-robot interaction (HRI) frameworks. In this manuscript, we only focused on algorithmic performances. The system-level deployment or real-world applicability of the proposed methods warrants further investigation in future studies.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have addressed the concerns I raised regarding this study. Based on the satisfactory responses and the overall quality of the research, I recommend accepting this paper for publication.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors answered all my questions, and I recommend the article for acceptance in the journal.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe current version can be accepted.