Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Deep Residual Learning for Hyperspectral Imaging Camouflage Detection with SPXY-Optimized Feature Fusion Framework

Appl. Sci. 2025, 15(22), 11902; https://doi.org/10.3390/app152211902

by Qiran Wang^1,2

and Jinshi Cui^1,*

Reviewer 1:

Francisco De Assis Zampirolli

Reviewer 2: Anonymous

Reviewer 3:

Sama Hussein Al-Gburi

Reviewer 4: Anonymous

Reviewer 5:

Riaz-Ul-Haque Mian

Appl. Sci. 2025, 15(22), 11902; https://doi.org/10.3390/app152211902

Submission received: 6 October 2025 / Revised: 1 November 2025 / Accepted: 3 November 2025 / Published: 9 November 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The article proposes a deep residual learning–based framework optimized for hyperspectral camouflage detection.

The paper is generally well written, with only minor formatting and content issues to be addressed:
1 Replace all occurrences of “Figure. X” with “Figure X” (remove the period).
2 Figure 1(a) is missing on line 136.
3 The caption “Figure 1.”: the image resolution is too low to distinguish differences; improve the image quality and provide a clearer explanation in the text.
4 Several images have low resolution.
5 Many abbreviations are used without definition.
6 Line 146: correct “data [6].Additio...” to “data [6]. Additio...”.
7 Line 175: the summation symbol (Σ) appears to be missing from the equation.
8 Add short introductory paragraphs at the beginning of Sections 2 and 3.
9 The caption for Figure 4 should appear on the same page; also, this figure requires further discussion in the text.
10 Include a Future Work section or at least mention future directions in the conclusion.

Author Response

Dear Reviewer,

On behalf of my coauthors, we would like to express our sincere gratitude to all reviewers for your valuable time, constructive comments, and insightful suggestions on our manuscript titled “Deep Residual Learning for Hyperspectral Imaging Camouflage Detection with SPXY-Optimized Feature Fusion Framework.”

In response to your thoughtful feedback, we have carefully revised the manuscript, with all modifications marked in red font. Your comments have been instrumental in helping us improve the clarity, quality, and overall contribution of our work.

We truly appreciate the time and effort you devoted to reviewing our paper, and we are grateful for your professional guidance and support.

Thank you again for your valuable feedback and consideration.

With sincere appreciation,

Qiran Wang
wangqiran0129@gmail.com
2023004925@mails.cust.edu.cn

Replace all occurrences of “Figure. X” with “Figure X” (remove the period)

Response: We have carefully checked the entire manuscript and replaced all occurrences of “Figure. X” with “Figure X”.

Figure 1(a) is missing on line 136

Response: The missing figure has been properly included, and the numbering has been updated due to the addition of a preceding image. The previously missing Figure 1(a) now appears as Figure 2(a). The corresponding description has also been added to clarify its content. Specifically, Figure 2(a) presents a representative region of interest (ROI) selected from the measured hyperspectral images of natural grass. The new paragraph (Page 5, Lines 137–145) describes the ROI characteristics, key spectral features in the 680–750 nm red-edge region, and the ENVI-based multi-point annotation approach. This correction ensures that Figure 2 now contains both (a) the natural grass ROI and (b) the camouflage fabric ROI, with consistent context and updated captions in the text.

The caption “Figure 1.”: the image resolution is too low to distinguish differences; improve the image quality and provide a clearer explanation in the text.

Response: We thank the reviewer for the valuable suggestion regarding the clarity of Figure 2. In the revised manuscript, the resolution of Figure 2 has been significantly improved to ensure that the spectral–spatial details of both the natural grass and camouflage fabric ROIs are clearly distinguishable.

In addition, the corresponding description in the text has been rewritten for greater clarity and detail. Specifically, Figure 2(a) depicts a representative ROI from the measured hyperspectral images of natural grass, encompassing a typical mixture of leaf and substrate structures, exhibiting relatively uniform green reflectance and pronounced red-edge amplitude variations in the 680–750 nm range, which are critical for differentiating vegetation from fabric materials. ROIs were annotated in ENVI using a multi-point, multi-region approach to mitigate single-point noise effects, and the prominent spatial details and red-edge features indicate that this ROI effectively captures the spectral heterogeneity of natural grass, providing reliable samples for SPXY sampling and model training.

Figure 2(b) shows the corresponding ROI extracted from the camouflage fabric using ENVI software. Compared with the natural grass ROI, the camouflage area displays a more uniform texture distribution and weaker red-edge reflectance, reflecting the absence of chlorophyll absorption. This ROI provides clearer spatial boundaries and enhanced material contrast, serving as the basis for reliable spectral extraction and subsequent classification analysis.

Several images have low resolution.

Response: We appreciate your comment regarding the image resolution. In the revised manuscript, all figures have been updated and replaced with high-resolution versions to ensure clarity and readability.

Many abbreviations are used without definition.

Response: All abbreviations (e.g., SNV, ReLU, PCs, SG, SVM, RF, KNN, CNN, and ResNet) have been defined at their first appearance to improve clarity and readability.

Line 146: correct “data [6].Additio...” to “data [6]. Additio...”.

Response: The typo has been corrected as suggested.

Line 175: the summation symbol (Σ) appears to be missing from the equation.

Response: The PCA transformation equation has been revised to explicitly include the summation symbol for clarity.

Add short introductory paragraphs at the beginning of Sections 2 and 3.

Response: In response to your comment, short introductory paragraphs have been added at the beginning of Sections 2 and 3 to provide a clearer overview of the content and structure of each section. These introductions serve to guide the reader through the methodology and results, enhancing the clarity of the paper’s organization. The revised text now reads as follows:

In Section 1, an overview of the background and challenges in hyperspectral camouflage detection is presented, stressing the need for advanced techniques to differentiate artificial materials from natural vegetation. The main components of the proposed framework are introduced, highlighting its potential advantages over existing methods. In Section 2, the materials, experimental setup, and methods for hyper-spectral data acquisition, preprocessing, and classification are described, along with the algorithms used to improve classification accuracy. In Section 3, experimental results are presented, including an evaluation of preprocessing techniques, dataset partitioning strategies, and the performance of various classification models. The proposed SPXY-ResNet model is compared with other approaches, demonstrating its superior accuracy and efficiency. In Section 4, the findings are discussed, comparing the proposed framework to other methods in terms of performance and computational efficiency. Challenges and future improvements are also addressed. In Section 5, the study’s conclusions are summarized, and the practical applications of the proposed framework in camouflage detection, agriculture, and environmental monitoring are highlighted.

These additions help to outline the key aspects of each section in a concise and structured manner, improving the readability and organization of the paper. We hope this meets your expectations and enhances the clarity of the manuscript.

The caption for Figure 4 should appear on the same page; also, this figure requires further discussion in the text.

Response: Due to the insertion of a new figure earlier in the text, the original Figure 4 has been renumbered as Figure 5. The caption for Figure 5 has been adjusted to appear on the same page as the figure, addressing the formatting concern.

Furthermore, we have expanded the discussion in the text to provide a clearer explanation of the figure’s role and relevance within the classification pipeline. The updated text now reads as follows:

"Figure 5 presents the system workflow of the classification pipeline, illustrating the sequential steps involved in hyperspectral data processing, model training, and evaluation. The workflow begins with the acquisition of hyperspectral data, followed by preprocessing steps such as dimensionality reduction using PCA, normalization, and noise suppression techniques. These preprocessing steps are crucial in enhancing the quality of the spectral data by reducing redundancy and improving the separability of material classes. The dataset is then partitioned using the SPXY algorithm, ensuring representative sampling for training and testing, which is essential for improving the generalization performance of the model by mitigating potential biases introduced by non-representative training subsets. Several classification models, including SPXY-RF, SPXY-SVM, SPXY-CNN, and SPXY-ResNet, are applied to the preprocessed data, with each model evaluated based on standard performance metrics such as accuracy, precision, recall, and F1-score."

We hope these revisions sufficiently address your concerns. Please do not hesitate to let us know if any further adjustments are needed.

Include a Future Work section or at least mention future directions in the conclusion.

Response: We have revised the manuscript to explicitly include a detailed discussion of future research directions at the end of the Conclusion section. The added text highlights plans for extending the dataset to diverse environments, incorporating multiple camouflage materials, testing under variable illumination conditions, and applying Explainable AI techniques (e.g., SHAP, Grad-CAM) to improve model interpretability. Additionally, future work will explore large-scale deployment, automated hyperparameter optimization, and implementation on portable hyperspectral devices. These additions clarify the potential future development and practical expansion ofthe proposed SPXY–ResNet framework.

Changes in Manuscript:

A new paragraph describing these future directions has been added at the end of the Conclusion section (Section 5, Page 18, Lines 614–625). The revised passage reads:

“Future work will address these limitations while further improving the framework's interpretability and scalability. Planned efforts include expanding the dataset to cover forested and mixed vegetation environments, incorporating multiple camouflage fabric types with distinct spectral characteristics, and conducting outdoor measurements under varying illumination and weather conditions to evaluate model robustness and transferability. Concurrently, Explainable AI techniques such as SHAP and Grad-CAM will be applied to visualize the contribution of individual spectral bands and enhance model interpretability. Additionally, large-scale deployment, automated hyperparameter optimization, and implementation on portable hyperspectral devices will be explored to improve operational efficiency and practical adaptability.”

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The article presents a study of the use of deep learning for the task of recognising camouflage materials. The topic of the study is relevant, and the article fills a gap in existing research by proposing an improved approach to camouflage recognition. In the study, the authors used several classifiers, such as PCA, SVM, RF, KNN, CNN, and ResNet. Based on a comparison of the results, the authors found that ResNet is the most effective, achieving the best performance. The figures, tables, and references used in the article have corresponding links in the text. The article is well structured, and the authors have analysed the research results in detail. The authors should pay attention to the following minor comments:

At the end of the Introduction section, it would be advisable to provide a brief description of the content of the following sections of the article.

In Figure 9, it would be advisable to improve the clarity of the labels “training set” and “test set”, as even when enlarged, these labels are blurred.

The height of the label “Iteration” in Figure 10 c should be corrected so that the text is not cut off.

In Table 1, it is advisable to highlight the best data indicators in each column of the table in bold to improve the visual perception of the results.

In the article, the authors presented a significant number of results, but used only two short formulas. It would be advisable to provide more formulas to describe the methods presented, although the authors do use references to literary sources.

Author Response

Dear Reviewer,

On behalf of my coauthors, I would like to sincerely thank you for your valuable time and constructive comments on our manuscript titled “Deep Residual Learning for Hyperspectral Imaging Camouflage Detection with SPXY-Optimized Feature Fusion Framework.”

Your insightful suggestions have been extremely helpful in improving the quality, clarity, and overall rigor of our work. We have carefully revised the manuscript in accordance with your comments, and all changes have been marked in red. We truly appreciate your professional input, which has greatly contributed to strengthening our study.

Thank you again for your thoughtful review and for helping us enhance the quality of our research.

Qiran Wang
wangqiran0129@gmail.com
2023004925@mails.cust.edu.cn

In Figure 9, it would be advisable to improve the clarity of the labels “training set” and “test set”, as even when enlarged, these labels are blurred.

Response: We have improved the clarity of the labels “training set” and “test set” in Figure 11 (previously Figure 9). The updated figure now has higher-resolution text to ensure that all labels remain clear and legible, even when enlarged. Additionally, we have included two new figures to further support our results.

The height of the label “Iteration” in Figure 10(c) should be corrected so that the text is not cut off.

Response: The height of the label “Iteration” in Figure 12(c) (previously Figure 10(c)) has been corrected so that the text is now fully visible.

In Table 1, it is advisable to highlight the best data indicators in each column of the table in bold to improve the visual perception of the results.

Response: The best data indicators in each column of Table 1 have been highlighted in bold to improve the visual clarity of the results.

In the article, the authors presented a significant number of results, but used only two short formulas. It would be advisable to provide more formulas to describe the methods presented, although the authors do use references to literary sources.

Response: We sincerely thank the reviewer for this helpful suggestion. In the revised manuscript, additional mathematical formulations have been incorporated to provide clearer descriptions of the methods used. Specifically, formulas have been added for the PCA transformation, SNV normalization, SPXY sample partitioning, and ResNet residual mapping. These additions enhance the methodological transparency and mathematical completeness of the paper (see Section 2, lines 237–238,265,283,318-319).

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The figure captions, particularly for Figures 3–6, could be made more concise. Rather than restating what each image shows, they should emphasize the technical insight or analytical outcome each figure conveys.
There are a few small typographical and formatting inconsistencies, especially around equations and figure references, that should be reviewed and corrected for smoother readability.
The related work section would benefit from the inclusion of a few recent studies (2024–2025) focusing on transformer-based or hybrid spectral, spatial hyperspectral imaging models. This would better position the paper within the current research landscape.
In the results section, it would strengthen the statistical credibility of the findings if confidence intervals or standard deviations were provided for the reported performance metrics.

Author Response

Dear Reviewer,

We would like to express our heartfelt gratitude for the time and effort you dedicated to reviewing our manuscript. Your detailed and constructive comments were very helpful in improving the overall quality and readability of our work.

We carefully considered each of your suggestions and made corresponding revisions, which are marked in red in the revised version. Your insightful feedback not only enhanced the clarity of our analysis but also strengthened the contribution of our research.

Thank you once again for your kind and valuable guidance.With sincere appreciation,

With best regards,
Qiran Wang
wangqiran0129@gmail.com
2023004925@mails.cust.edu.cn

The figure captions, particularly for Figures 3-6, could be made more concise. Rather than restating what each image shows, they should emphasize the technical insight or analytical outcome each figure conveys.

Response: We have revised all figure captions, particularly for Figures 4–7 (previously Figures 3–6), to make them more concise. The updated captions now focus on the technical insights and analytical outcomes conveyed by each figure, rather than simply describing the content. The figure numbering has changed due to the addition of a new figure earlier in the manuscript.

There are a few small typographical and formatting inconsistencies, especially around equations and figure references, that should be reviewed and corrected for smoother readability.

Response: We have reviewed and corrected all typographical and formatting inconsistencies, including those around equations and figure references, to improve readability.

The related work section would benefit from the inclusion of a few recent studies (2024-2025) focusing on transformer-based or hybrid spectral, spatial hyperspectral imaging models. This would better position the paper within the current research landscape.

Response: In response, we have further enriched the Introduction section with several recent and well-reviewed papers (2022–2024) that provide additional methodological and literature review perspectives. These references strengthen the contextual foundation and highlight current trends in hyperspectral imaging and methodological transparency.

Specifically, we have added six new citations:

[17] Sundaram & Berleant (2022) — automation of systematic literature reviews using NLP and text mining.
[18] Wang et al. (2022) — methodological entity extraction and evaluation.
[23] de la Torre-López et al. (2023) — AI-assisted systematic literature review.
[24] Smela et al. (2023) — definition and methodology of rapid literature reviews.
[26] Springer Review Team (2024) — NLP/ML/DL automation in evidence synthesis.
[31] ICASR Collaboration (2024) — large language models in systematic review automation.

These papers have been integrated into the Introduction to emphasize the growing methodological rigor and automation trends in hyperspectral data analysis and literature review workflows.

In the results section, it would strengthen the statistical credibility of the findings if confidence intervals or standard deviations were provided for the reported performance metrics.

Response: We have conducted 100 independent runs for each SPXY-based model with randomly partitioned datasets. The reported Mean Accuracy (%) and corresponding standard deviation (SD) are now included in Table 1. This provides a quantitative measure of the model’s stability and robustness.

Specifically, SPXY-ResNet achieves the highest mean accuracy of 99.17% with a small SD of 0.79%, indicating that its superior performance is consistently reproducible. All other models exhibit SD values below 1%, confirming the high repeatability and reliability of the experimental results.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

This is a more or less well-written submission on camouflage (cammo) detection, combining Hyperspectral Imaging with deep learning techniques.

The contributions are stated as the proposal of a hyperspectral classification framework specific for cammo recognition. This is attained by maximizing class separability and some generalization, -this to be better explained. Besides achieving a similar accuracy than other methods, the main advantage is maintaining a low computational overhead.

Paper Strengths + plus some recommendations

The introduction explains, through a reasonable review of published works, useful and important features of . Hyperspectral Imaging and classification, as well as some classical classification techniques for military cammo detection, including those based on deep-learning. Several methods are unified (please check if it is the best descriptive term, by having an English-spoken expert the paper, and in particular this paragraph). The explanations on maximizing class separability seem feasible but must be better illustrated and presenting a minimal mathematical basis.

The explanations and method design are qualitatively sound, but see below.

Bibliography is mostly recent and well-reviewed in the introduction and in the methods, based on similar works. This reference review may result very useful for some readers

Paper Weakness – some pointed out along the recommendations below.

No formulation of the methods or the statistical analysis is presented, but just a trivial linear matrix relation (equation 2). Most of the exposition explains arguments of how the authors selected and combined existing deep-learning and hyperspectral classifiers. They claim to have “constructed” some (I guess that they mean to have produced software implementations). The exposition is poorly illustrated by unclear figures, supposed to be representative of grass and cammo uniforms (a forest is a more general context, but there is cammo for rocky background, desert, etc.). See details below.

Apparently, the only clear, final contribution is an expected low computational overhead. I suggest making explicit non-evident contributions, if any, such as how the authors have dealt with the combination and integration of the selected technique, or on the method-steps design summarized in Figure 4. More on the paper weakness in the following:

Recommendations, together with comments on the figures.

Figure 1 is not clear, and it does not show at all natural grass regions and camouflage uniforms (b) is, at least at first sight, the same image (a), with displaced and rotated red, green and blue rhomboids. Most readers will not consider them as cammo, military or other. Please use more representative image examples, recognizable forest or grass and uniforms or military hardware hidden, with actual cammo, usually difficult to recognize quickly by humans. A benchmark or reference of delay in detection (say, less than 8 seconds, after a project I participated for synthesis of several cammo patterns) may be included in your assessment of existing methods, against your approach. This represents more work, but is justified to better explain, and stress quantitatively your contribution through an operational criterion, resulting also better appreciated by readers.

Figure 2 does not give structure information of the method but is just a linear list of textually described steps which can be clearer and more compact, if written in the main text as numbered paragraphs. Intercalating mathematical details would enrich the method description, allowing readers to better understand and appreciate your contributions (also please make them more explicit, other than lower costs).

Figure 3 reveals information not visible in Figure 1 and the difference between the geometric figures and background can be obtained with much simpler methods or making evident the advantage of hyperspectral imaging. No camouflaged uniforms or objects appears. Please design and perform your experiments on actual grass or forest and cammo, such as those found in:

https://airsoftmilsimnews.com/best-camo-pattern-for-thickly-wooded-terrain/

https://addmagazine.co.uk/tarnplanen-camouflage-tarpaulins/

https://phys.org/news/2021-01-camouflage-arbitrary-environment.html

But please search for more technical and scientific web pages, the latter are only the first three I found.

Figure 4. is more informative of the method’s structure than figure 1 (just a list), despite the long-text boxes. If you make here the boxes twice+ larger, remaining within the page width, the lines in the largest paragraphs would half, becoming more readable. I believe this improvement is easy to make, but there remains the lack of corresponding mathematical background.

Figure 5. Same comments than those for figure 3; no visible grass or forest, nor cammo uniforms, representative of those found in databases and the internet appear, and there are no better figures included to support the analysis, results, conclusions and, since results, tables graphics and analysis are based on not shown examples nor from the samples you mention in section 2, Materials and Methods, they must resort to much better and representative images, evidencing a forest background with camouflaged uniforms -if not hidden soldiers, as a common application. You did mention a source of samples in Materials and Methods. Please be more explicit if they were used in your experiments and how; also shown at very least 4 different clear image examples, their corresponding hyperspectral images, as you explained was used in Figure 1 and 3, and relate the results (either if they are from the database -or if there exists another source, or if the experiments and analysis were the samples -which must be then shown, instead of figures 1 and 3.

I must mention that other reviewed submissions, on image recognition problems, present many pictures, organized in three or more cases, or scenarios, some zoom-in details, graphic features and reference images, followed by image results-on-progress, images from other methods for comparison, and still other. A camouflage object or person in the middle of natural scenarios requires a much more image-heavy based work.

Finally, after addressing the above recommendations, please briefly list and explain the limitations of your approach, suggesting improvements and further future work .

Author Response

Dear Reviewer,

On behalf of my coauthors, I would like to express our deepest gratitude for your detailed and constructive review of our manuscript. Your insightful comments and thoughtful suggestions have been invaluable in improving both the scientific rigor and presentation quality of our work.

We have carefully addressed each of your comments and revised the manuscript accordingly, marking all modifications in red. The revisions have significantly enhanced the clarity, accuracy, and overall contribution of our research.

Thank you once again for your valuable time, effort, and expertise in helping us improve this study.

Yours faithfully,
Qiran Wang
wangqiran0129@gmail.com
2023004925@mails.cust.edu.cn

The explanations on maximizing class separability seem feasible but must be better illustrated and presenting a minimal mathematical basis.

Response: We have removed the linear model expression , as it does not accurately reflect the methodology adopted in this work. Instead, we have added three more relevant mathematical formulations to clarify the methodological process, namely:

the PCA projection (Section 2.3, lines 237–238,243),
The SNV transformation(Section 2.3, line 265)
the SPXY combined distance (Section 2.4, line 283)
the residual unit equation (Section 2.4, lines 318–319).

These additions better illustrate the workflow of “dimensionality reduction and preprocessing, sample partitioning based on SPXY, and final classification using the ResNet model.”

Bibliography is mostly recent and well-reviewed in the introduction and in the methods, based on similar works. This reference review may result very useful for some readers.

Response: We have further enriched the Introduction section with several recent and well-reviewed papers (2022–2024) that provide additional methodological and literature review perspectives. These references strengthen the contextual foundation and highlight current trends in hyperspectral imaging and methodological transparency.

Specifically, we have added six new citations:

[17] Sundaram & Berleant (2022) — automation of systematic literature reviews using NLP and text mining.
[18] Wang et al. (2022) — methodological entity extraction and evaluation.
[23] de la Torre-López et al. (2023) — AI-assisted systematic literature review.
[24] Smela et al. (2023) — definition and methodology of rapid literature reviews.
[26] Springer Review Team (2024) — NLP/ML/DL automation in evidence synthesis.
[31] ICASR Collaboration (2024) — large language models in systematic review automation.

These papers have been integrated into the Introduction to emphasize the growing methodological rigor and automation trends in hyperspectral data analysis and literature review workflows.

No formulation of the methods or the statistical analysis is presented, but just a trivial linear matrix relation (equation 2).

Response: The previous equation (Eq. 2, ) has been removed from the revised manuscript, as it indeed represented only a generic linear model and was not logically consistent with our proposed workflow.

The exposition is poorly illustrated by unclear figures.

Response: The visual appearance of several figures may seem less sharp compared with standard RGB images because the hyperspectral camera used in this study (FigSpec® FS23) acquires data in narrow spectral bands (2.5 nm resolution, 389.06–1005.10 nm range) rather than broad-band color channels. As a result, the reconstructed pseudo-color and PCA-processed images inevitably appear less vivid or lower in contrast than conventional photographs.

Furthermore, the hyperspectral images were captured at different acquisition distances (3 m, 5 m, and 10 m) to simulate realistic field conditions and to evaluate model robustness under varying spatial scales. This variation, together with atmospheric scattering and sensor characteristics, slightly affects spatial sharpness but provides more representative and generalizable data.

We have clarified these aspects in the revised manuscript (Section 2.2) and slightly enhanced figure contrast for better readability while preserving the original spectral integrity. We hope this explanation clarifies that the figure's appearance results from the intrinsic optical and spectral characteristics of the imaging system, not from experimental or presentation errors.

I suggest making explicit non-evident contributions, if any, such as how the authors have dealt with the combination and integration of the selected technique, or on the method-steps design summarized in Figure 4.

Response: We have explicitly pointed out the non-evident methodological contributions in the Conclusion section. Specifically, we highlight how the integration of SPXY-based sampling with residual deep learning forms a joint optimization chain that ensures spectral–label balance and strengthens model generalization. We also clarify the method-step design summarized in Figure 5 (previously Figure 4), showing how the sequential preprocessing and stepwise architecture unify data acquisition, feature enhancement, and classification within a lightweight and reproducible workflow. These points now make the unique contributions of our framework more explicit.

Figure 1 is not clear, and it does not show at all natural grass regions and camouflage uniforms (b) are, at least at first sight, the same image (a), with displaced and rotated red, green, and blue rhomboids. Most readers will not consider them as camo, military or other. Please use more representative image examples, recognizable forest or grass and uniforms or military hardware hidden, with actual camo, usually difficult to recognize quickly by humans.

Response: Firstly, regarding the representativeness of the selected grass: the samples were collected on the South Campus of Changchun University of Science and Technology in September 2024. The grass species in the scene is Poa pratensis L. (Kentucky bluegrass), which is extensively used in northern Chinese cities and university campuses as a common lawn species due to its cold resistance and strong durability. Its widespread application as a standard turf material is well documented in horticultural and urban greening references. We have described this representativeness in the manuscript and clearly indicated the sampling location and date in the Materials and Methods section, ensuring reproducibility and ecological relevance of the data.

Secondly, regarding the comment that Figure 1(a) and (b) might appear visually similar except for displaced/rotated colors: we appreciate the reviewer’s intuitive observation. In fact, Panel (a) shows the original grass ROI, with the vegetation red-edge and texture characteristics visible. Panel (b) depicts the camouflage fabric ROI processed by principal component analysis (PCA). PCA does not simply add or move visual elements but projects high-dimensional spectral information into principal components to highlight spectral differences between materials. Since our core purpose is to distinguish natural vegetation (e.g., grass) from artificial camouflage fabrics using hyperspectral sensing and residual learning, we focus on discriminability in the spectral domain (especially in the red-edge and near-infrared regions), rather than producing visually artistic effects in the RGB domain.

Finally, regarding the representativeness of camouflage samples: the camouflage uniforms used in this study are made from one of the most widely deployed military camouflage fabrics in China (cotton/polyester blended textile). Samples were collected locally and confirmed through chemical and spectral properties as typical camouflage material, as stated in our Materials and Methods section. The primary objective of this study is not to classify different types of vegetation or backgrounds but to answer a more practical question: under conditions where human observers struggle to identify concealed soldiers or disguised targets quickly, can hyperspectral sensing combined with residual networks automatically detect such camouflage? This practical motivation guided our ROI selection and experimental design.

We sincerely appreciate the reviewer’s valuable suggestions regarding more complex natural scenes and operational detection benchmarks. Although these aspects are beyond the current scope of this study, they are highly meaningful, and we will actively incorporate them into our future research to further enhance the practical applicability of the proposed method.

Figure 2 does not give structure information of the method but is just a linear list of textually described steps which can be clearer and more compact, if written in the main text as numbered paragraphs. Intercalating mathematical details would enrich the method description.

Response: Both Figure 3 (previously Figure 2) and its caption have been updated to more clearly present the structure of the preprocessing workflow. The mathematical formulations associated with the corresponding steps are now explicitly referenced in the caption, where PCA is linked to Eqs. (1)- (3) and SNV to Eq. (4). These changes enhance the technical clarity of the figure and allow readers to easily trace each processing operation to its theoretical foundation in the main text.

No camouflaged uniforms or objects appear in Figure 3. Please design and perform your experiments on actual grass or forest and camo

Response: In our study, all hyperspectral data were collected in a real outdoor grass environment on the lawn of the South Campus of Changchun University of Science and Technology (43.82°N, 125.42°E) in September 2024. Camouflaged uniforms and natural grass were sampled to ensure that the experiments reflect realistic operational backgrounds. To illustrate this, we have added images of the grass environment in Figure 1: Figure 1(a) shows a wide-area view of the natural outdoor environment, Figure 1(b) presents a close-range view of the camouflage target within the grass, and Figure 1(c) displays a representative scene within ENVI used for hyperspectral experiments. These images demonstrate the complexity of the natural grass scene and highlight the necessity of hyperspectral imaging for effective camouflage detection.

As suggested, forest environments and additional camouflage patterns are of interest; however, these will be included in our future research plans to further extend the applicability of our method.

Figure 4 is more informative of the method’s structure than Figure 1 (just a list), despite the long-text boxes. If you make here the boxes twice+ larger, remaining within the page width, the lines in the largest paragraphs would be half, becoming more readable.

Response: We have revised Figure 5 (previously Figure 4) accordingly the boxes have been enlarged to improve readability as suggested. The updated figure now fits within the page width and presents the method’s structure more clearly.

Figure 5. Same comments than those for figure 3; no visible grass or forest, nor camo uniforms.

Response: In response, we have added three representative grassland images to Figure 1 to better illustrate the natural background and camouflage scenarios. These additional images provide clearer visual support for the analysis and results discussed in the manuscript. The descriptions in Section 2: Materials and Methods have also been updated to explicitly indicate that these images were included in our experiments.

After addressing the above recommendations, please briefly list and explain the limitations of your approach, suggesting improvements and further future work.

Response: We would like to briefly outline the limitations of our approach and potential future improvements:

Limited environmental diversity:

Limitation: Experiments were conducted only in a single grassland environment, which may limit the model’s applicability to other natural backgrounds such as forests, shrubs, or mixed vegetation.
Improvement/Future Work: We plan to expand the dataset to include a variety of ecological environments to enhance the model’s generalization and robustness across different natural settings.

Single camouflage material type:
- Limitation: The study used only one type of camouflage fabric under controlled illumination, which does not fully capture the spectral and textural complexity of real-world targets.
- Improvement/Future Work: Future work will incorporate multiple camouflage materials with diverse spectral and textural characteristics, as well as measurements under varying illumination and weather conditions, to better simulate real operational scenarios.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

The study presents a valuable framework integrating SPXY partitioning and residual deep learning; however, several technical weaknesses limit its generalizability and reproducibility.
1. First, model validation relies solely on a single 70/30 train–test split without any cross-validation or repeated trials. Although the SPXY algorithm improves the representativeness of the partitioned subsets, it does not replace the need for cross-validation, which remains essential to statistically verify model robustness and to mitigate potential bias arising from a single data division.
2. Second, there is no evidence of hyperparameter optimization (e.g., learning-rate tuning, batch-size adjustment, or network-depth evaluation). All models appear to use default settings, which may unfairly favor or disadvantage specific algorithms in the comparison.
3. Third, the work omits any Explainable AI (XAI) analysis, such as SHAP, Grad-CAM, or spectral-band importance visualization, which are increasingly necessary in hyperspectral studies to interpret model behavior and identify dominant spectral features.
As a result, while the reported accuracy of 99.17% is impressive, the lack of cross-validation, hyperparameter tuning, and explainability significantly weakens the technical credibility and practical interpretability of the proposed SPXY-ResNet framework.

4. Finally, The authors should explicitly acknowledge that the proposed SPXY-ResNet framework was evaluated only on a self-collected dataset and not validated on any publicly available camouflage or hyperspectral benchmark datasets.
This limitation restricts the generalizability and external validity of the model’s performance.

Minor Comments

The authors should unify the style of figure presentation and explanation throughout the manuscript. For example, the figures currently vary in formatting and descriptive depth. Captions, numbering, and in-text references should follow a consistent structure — for instance, each figure should include a clear title, concise methodological context, and a brief explanation of the key observation.

Figure 10(a) – The inner font size is too small and difficult to read. Figure 10(c) – The figure is not properly cropped; the lower portion of the plot (loss curve area) is cut off and not fully visible.
Figures 8 and 9[optional] – These two figures could be replaced with a brief descriptive paragraph summarizing the statistical comparison between SPXY and K–S algorithms.
Figure captions – Many figures lack adequate descriptive explanations. Captions should clearly describe the purpose and key observation of each figure.
Figure 4 – The current diagram should be revised to clearly show the sequential workflow from preprocessing to classification. Consider labeling each stage (PCA, SNV, SG filtering, SPXY sampling, and ResNet model) with directional arrows for better visual flow.
Figure 2 – The figure does not align well with the explanation provided in the methodology section. It should be updated to match the described preprocessing sequence.

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

On behalf of my coauthors, I would like to express our heartfelt gratitude for your careful and insightful review of our manuscript titled “Deep Residual Learning for Hyperspectral Imaging Camouflage Detection with SPXY-Optimized Feature Fusion Framework.” Your constructive comments not only helped us identify areas for improvement but also inspired us to think more deeply about the presentation and rigor of our work. We have carefully addressed each of your suggestions and revised the manuscript accordingly, with all modifications clearly marked in red. Your feedback has significantly improved the clarity, depth, and overall quality of our paper.

We truly appreciate the time, effort, and expertise you devoted to reviewing our work. Your thoughtful guidance has been invaluable, and we hope that the revised manuscript now reflects the high standards you helped us achieve.

Thank you again for your generous and constructive input.

With sincere appreciation and respect,
Qiran Wang
wangqiran0129@gmail.com
2023004925@mails.cust.edu.cn

Model validation relies solely on a single 70/30 train–test split without any cross-validation or repeated trials. Although the SPXY algorithm improves the representativeness of the partitioned subsets, it does not replace the need for cross-validation, which remains essential to statistically verify model robustness and to mitigate potential bias arising from a single data division.

Response: We sincerely appreciate the reviewer’s insightful comment. To comprehensively address this concern and eliminate the potential bias from a single train–test split, we have added a robustness validation experiment to the revised manuscript. The major improvements are as follows:

Repeated SPXY resampling

We performed 1000 independent SPXY-based dataset partitions, rather than relying on a single split.

Cross-validation performed in every trial

Each generated training subset was validated using 5-fold cross-validation to statistically verify generalization performance.

Stability analysis added

Performance statistics over 1000 trials were calculated:

RMSE Mean: 2.8231
RMSE Standard Deviation: 0.1264

These results demonstrate strong robustness and low sensitivity to sample division variation.

Corresponding visualization added

A new figure has been included (Figure 7 in Section 3.2), showing PCA distributions of training and test subsets across all 1000 trials, confirming consistent representativeness and distributional alignment.

The revised manuscript now clarifies that SPXY enhances representativeness but does not replace the necessity of cross-validation, and the added results statistically confirm the reliability of the proposed framework.

There is no evidence of hyperparameter optimization (e.g., learning-rate tuning,

batch-size adjustment, or network-depth evaluation). All models appear to use default.

Response: We sincerely appreciate the reviewer’s insightful comment. In the revised manuscript, we have added a detailed description of the hyperparameter optimization process for both the ResNet and conventional machine-learning models (see Section 2.4, lines 333-344).

Specifically, the learning rate (initially set to 0.001) was tuned using a piecewise decay schedule with a decay factor of 0.5 every 25 epochs, and the batch size was adaptively selected within the range of 8–32 depending on dataset size. Network depth and dropout ratios (0.25–0.5) were also adjusted through pilot experiments to balance model capacity and overfitting risk.

For traditional models (SVM, KNN, Random Forest, and PCA), the core parameters were likewise optimized through empirical testing: SVM penalty factor ? (0.1-10), KNN neighbor number ? (1-5), Random Forest tree count (10-100), and PCA component number determined by 90% cumulative variance. This information has been integrated into the text to clarify that all models were trained using optimized, not default, configurations.

The work omits any Explainable AI (XAI) analysis, such as SHAP, Grad-CAM, or

spectral-band importance visualization, which are increasingly necessary in hyperspectral studies to interpret model behavior and identify dominant spectral features. As a result, while the reported accuracy of 99.17% is impressive, the lack of cross-validation, hyperparameter tuning, and explainability significantly weakens the

technical credibility and practical interpretability of the proposed SPXY-ResNet framework.

Response: We sincerely thank the reviewer for this valuable and constructive comment. We fully agree that Explainable AI (XAI) methods such as SHAP or Grad-CAM play an important role in improving the interpretability of hyperspectral models. In this study, our primary objective was to evaluate the overall effectiveness of the SPXY–ResNet framework using full-spectrum data, rather than selecting or emphasizing individual spectral bands. Therefore, explainability analyses were not included in the current version to maintain the focus on full-band classification performance.

Nevertheless, we appreciate the reviewer’s suggestion and plan to incorporate XAI-based visualization (e.g., spectral-band importance and Grad-CAM feature maps) in our future work to better interpret the model’s decision-making process and highlight key spectral regions related to camouflage detection.

In addition, the revised manuscript now includes cross-validation experiments (see Section 3.2, lines 416–432), which further confirm the stability and reliability of the proposed model. We believe these additions strengthen both the technical credibility and reproducibility of the results.

The authors should explicitly acknowledge that the proposed SPXY-ResNet framework was evaluated only on a self-collected dataset and not validated on any publicly available camouflage or hyperspectral benchmark datasets. This limitation restricts the generalizability and external validity of the model’s performance.

Response: We fully agree that the lack of external dataset validation limits the generalizability of the proposed framework. In the revised manuscript, we have explicitly acknowledged this limitation in the Discussion section (see Section 4, lines 577–581).

Specifically, we added the following statement:

“Although the SPXY–ResNet framework achieved high accuracy, it was trained and evaluated only on a self-collected hyperspectral dataset. No publicly available camouflage datasets were used, which may limit its external validity. Future work will involve evaluation on public benchmarks to verify the model’s robustness.”

This addition clearly states the limitation and our plan to address it in future work.

Figure 10(a) – The inner font size is too small and difficult to read. Figure 10(c) –

The figure is not properly cropped; the lower portion of the plot (loss curve area) is

cut off and not fully visible.

Response: The font size within Figure 12(a) (previously Figure 10(a)) has been enlarged to ensure better readability. In addition, Figure 12(c) (previously Figure 10(c)) has been properly cropped and adjusted so that the lower portion of the plot (loss curve area) is now fully visible.

Figures 8 and 9[optional] – These two figures could be replaced with a brief

descriptive paragraph summarizing the statistical comparison between SPXY and K–

S algorithms.

Response: We sincerely appreciate the reviewer’s thoughtful suggestion. After careful consideration, we have decided to retain Figures 8 and 9, as they provide a more intuitive and visual comparison between the SPXY and K–S algorithms, which we believe helps readers better understand the statistical differences discussed in the text.

Figure captions – Many figures lack adequate descriptive explanations. Captions

should clearly describe the purpose and key observation of each figure.

Response: Thank you very much for your constructive suggestion. We fully agree that figure captions should explicitly describe the purpose and the key observations conveyed by each figure. Following your recommendation, all figure captions have been thoroughly revised and expanded to improve clarity and informative value.

The updated captions now:

Clearly state what each figure represents.
Highlight the most important observations or findings that the reader should focus on.
Provide explicit definitions of visual elements such as colors, symbols, ROI labels, and class distinctions.
Ensure standalone interpretability without requiring the reader to refer back to the main text.

Figure 4 – The current diagram should be revised to clearly show the sequential

workflow from preprocessing to classification. Consider labeling each stage (PCA,

SNV, SG filtering, SPXY sampling, and ResNet model) with directional arrows for

better visual flow.

Response: Figure 5 (previously Figure 4) has been completely redrawn to illustrate the sequential workflow from hyperspectral data acquisition to classification. Each stage—PCA, SNV, SG filtering, SPXY sampling, and ResNet model training—is now clearly labeled and connected with directional arrows to emphasize the step-by-step process. The revised figure provides improved visual flow and aligns with the overall methodological framework.

Figure 2 – The figure does not align well with the explanation provided in the

methodology section. It should be updated to match the described preprocessing

sequence.

Response: Figure 3 (previously Figure 2) has been revised to align precisely with the preprocessing sequence described in Section 2.3. The updated diagram now clearly presents the sequential operations of PCA, SNV normalization, and SG filtering, followed by the enhancement step, ensuring consistency with the methodology description. The caption has also been refined for clarity.

Author Response File: Author Response.pdf

Round 2

Reviewer 4 Report

Comments and Suggestions for Authors

The authors addressed most of my comments and recommendations and highly improved the clarity, presentation quality and added convincing mathematical basis that was missing before, and made explicit enough their contributions, the limitations identified, and future work.

Not addressing my recommendation on Figure 3 (simplifying it into a list incorporated in the main text), and some other minor suggestions are not serious, and may remain as they are in this revision, if no other reviewer concours in similar comments.

The new Figure 1 is very welcome, allowing readers to better understand the figures based on this one. It could be less dark, but I guess I corresponds to real raw acquisitions. A representative figure from the database could be added, without having to be processed. The new data analysis also improved the results presentation.

It still remains unclear to me the colored rhomboids of figure 2. From the new explanation added above, I suppose that these were manually defined as ROIs for analysis, but I recommend explicitly mentioning their inclusion (use the word “rhomboids”), also improve the interpretation of their colors (just to distinguish what?) or other source and purpose, if my latter supposition is wrong. You also mention “red-edge”; but from not carefully reading (the possible case of other readers) I did not identify red boundaries anywhere, if the color refers to visible features. I infer, from the explanation mentioning “680–750 nm range” and the “near-infrared reflectance” term, that red-edge is a spectral band range of 680–750 nm. Try clarifying further, from its first mention.

I have to insist for line 351 the unsound use of the term “constructed” (see my first review), especially if you have not made a technical English-spoken person read this paragraph. I believe the editors provide a language correction service, if required.

Author Response

Dear Reviewer,

We truly appreciate the time and effort you devoted to reviewing our paper, and we are grateful for your professional guidance and support.

Thank you again for your valuable feedback and consideration.

With sincere appreciation,

Qiran Wang
wangqiran0129@gmail.com
2023004925@mails.cust.edu.cn

Not addressing my recommendation on Figure 3 (simplifying it into a list incorporated in the main text)

Response: We sincerely thank the reviewer for this valuable observation. The figure previously referred to as Figure 2 in the first review round corresponds to Figure 3 in the current version. We fully understand the reviewer’s intention to make the methodological steps clearer and more concise. After careful consideration, we decided to retain Figure 3 because it provides an intuitive visualization of the sequential preprocessing workflow (PCA, SNV, SG filtering, and derivative normalization). This visual layout helps readers quickly grasp the structure and data flow of the method, which would be less apparent in a purely textual list. At the same time, we have addressed the reviewer’s suggestion on enriching the mathematical background by adding detailed equations (Eqs. (1)–(4)) and explanatory text in Section 2.3, corresponding to each step shown in Figure 3. This revision ensures that the figure serves as a concise graphical summary, while the main text provides the full mathematical formulation and rationale.

The new Figure 1 is helpful for understanding. It looks a bit dark, and a representative unprocessed image from the database could be added.

Response: We sincerely thank the reviewer for the positive feedback. Figure 1 indeed represents the real outdoor acquisition environment where hyperspectral data were collected. The slightly dark tone reflects the natural illumination conditions during field imaging and was intentionally preserved to ensure the authenticity of the acquisition scene. The figure already depicts a representative raw view of the experimental site and target placement (grass background and camouflage fabric), corresponding directly to the database used in this study. Therefore, we respectfully maintain the current version of Figure 1 to preserve the fidelity of the experimental conditions.

The meaning of the colored rhomboids in Figure 2 is unclear. Please clarify whether they are manually defined ROIs, explain the color purpose, and clearly define the term “red-edge.”

Response: We have explicitly clarified the meaning of the colored rhomboids in Figure 2. The text now specifies that the colored rhomboids were manually defined in ENVI software to indicate the ROI boundaries for spectral extraction, and that different colors are used only to distinguish individual ROIs of each material type (grass or camouflage fabric), without representing any physical or spectral difference. We have also improved the definition of “red-edge” to clarify that it refers to the spectral range between 680 nm and 750 nm, where vegetation reflectance increases sharply due to chlorophyll absorption. The updated explanation appears in Section 2.2 (lines 181–200) of the revised manuscript.

I have to insist for line 351 the unsound use of the term “constructed”.

Response: The wording has been revised for better technical accuracy and fluency. The term “constructed” has been replaced with “developed”, and the surrounding sentences have been refined for clearer and more natural expression (Section 2.4, line 357 in the revised manuscript).

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

The authors have addressed most of the concerns or indicated that they will do so in future work. The new version also improves readability and provides a more justifiable claim for the proposed study.

Author Response

Dear Reviewer,

We sincerely thank you for your valuable comments and positive evaluation of our revised manuscript entitled “Deep Residual Learning for Hyperspectral Imaging Camouflage Detection with SPXY-Optimized Feature Fusion Framework.”

We truly appreciate your encouraging remark: “The authors have addressed most of the concerns or indicated that they will do so in future work. The new version also improves readability and provides a more justifiable claim for the proposed study.”

Your recognition greatly encourages us. We carefully reviewed all points indicated in the review form and have made corresponding improvements to ensure the manuscript is clearer, more rigorous, and methodologically consistent.

With sincere appreciation,
Qiran Wang
wangqiran0129@gmail.com
2023004925@mails.cust.edu.cn

Introduction – Can be improved

Response: We have revised the Introduction to better highlight the research gap and clarify the motivation for developing a unified and lightweight SPXY–ResNet framework.

A concise transition sentence was added after the paragraph discussing previous studies that lacked integrated sampling and deep learning strategies: “In this context, a more systematic framework that balances data representativeness, model efficiency, and generalization is still needed for practical hyperspectral camouflage detection.”

Research Design – Can be improved

Response: We have enhanced the Experimental Design in Section 2.1 to provide clearer information on repeatability and controlled acquisition conditions. Specifically, we added: “All experiments were repeated under identical illumination and geometry to ensure data reliability and reproducibility.”

Methods– Can be improved

Response: To strengthen methodological transparency and reproducibility, we added a short clarification after Figure 5 in Section 2.4: “For consistency and robustness, all classifiers were optimized under unified hyperparameter settings and assessed through repeated SPXY-based cross-validation experiments.”

Author Response File: Author Response.pdf

Article Menu

Deep Residual Learning for Hyperspectral Imaging Camouflage Detection with SPXY-Optimized Feature Fusion Framework

Minor Comments

Further Information

Guidelines

MDPI Initiatives

Follow MDPI