Next Article in Journal
Contrastive Dual-Pool Feature Adaption for Domain Incremental Remote Sensing Scene Classification
Next Article in Special Issue
On-Satellite Implementation of Real-Time Multi-Object Moving Vehicle Tracking with Complex Moving Backgrounds
Previous Article in Journal
Polarization Scattering Regions: A Useful Tool for Polarization Characteristic Description
Previous Article in Special Issue
Remote Sensing of Target Object Detection and Identification II
 
 
Article
Peer-Review Record

Infrared Dim Small Target Detection Algorithm with Large-Size Receptive Fields

Remote Sens. 2025, 17(2), 307; https://doi.org/10.3390/rs17020307
by Xiaozhen Wang 1,2, Chengshan Han 1, Jiaqi Li 1,2, Ting Nie 1, Mingxuan Li 1, Xiaofeng Wang 1,2 and Liang Huang 1,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Remote Sens. 2025, 17(2), 307; https://doi.org/10.3390/rs17020307
Submission received: 4 December 2024 / Revised: 6 January 2025 / Accepted: 13 January 2025 / Published: 16 January 2025

Round 1

Reviewer 1 Report (Previous Reviewer 3)

Comments and Suggestions for Authors

Accept in the present form. 

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

See Attachment

Comments for author File: Comments.pdf

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

1. The description of RIPS and LRIB in the paper is not detailed enough, especially the relationship between them and U-Net jump connection is not clear. It is recommended to supplement the information and innovation of RIPS and LRIB modules.

2. Although the description of the functions of CHAM and SPAM modules is relatively clear, the combination of the two and their role in the overall architecture are not emphasized enough.

3. The experimental part is slightly incomplete. For example, the paper does not explain why the maximum convolution kernel is 11, and how the anti-bottleneck structure of LRIB effectively enhances the feature extraction ability. It is recommended to add detailed explanations in this part.

4. The time and space complexity of the proposed algorithm are not discussed in the paper, which is crucial for model deployment in practical applications. It is recommended to add algorithm complexity analysis.

5. The comparative experiment section does not clearly state whether the comparison method is trained under the same data augmentation, hyperparameter optimization and other conditions.

6. The paper uses PD and FA when introducing indicators, but the formats of the two indicators are inconsistent in the table of experimental results.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 4 Report (New Reviewer)

Comments and Suggestions for Authors

It would be interesting to see how the latest system deals with the MFIRST dataset

 

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

The revised version of the manuscript has responded to all my concerns and has also made great improvements in other aspects.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The initial reading of the paper causes discomfort, as the authors do not define a clear task. The characteristics of the detection object are not given. Only after carefully studying the entire article does it become clear what the authors are doing.

Even when the authors begin to describe their method, the purpose of the algorithm is unclear, because UNET is usually used to segment certain objects. And in this paper, the objects are not defined.

Only after reading the entire article does it become clear that small objects are distinguished by brightness-local features and this has nothing to do with the geometric type of the object.
Practically, the task of contrasting or emphasizing a specific object is solved.
Therefore, I strongly recommend in the introduction to define the object of selection and what is highlighted in it. And also add a more detailed description of the set of images on which the network was trained and tested, where it will be described in more detail what we see and what needs to be highlighted.

Reviewer 2 Report

Comments and Suggestions for Authors

Recommendation: Reject

 

Comments:

The manuscript submitted by Wang et.al. proposes a method for infrared small target detection. They introduce a large-size receptive field and utilize residual network. However, the quality of the paper is low and there are a lot of fatal problems with academic description, academic writing, language, methods design, experimental analysis. Therefore, it is recommended to be rejected for publication. Here are some details.

1. First of of all, the introduction is ambiguious and does not describe the relationship between each section, especially for the the contributions which are described subjectly.

2. There are a lot of methods achieve superior performance with low false alarm rate and high detection rate thus the description in abstract “Today’s detection algorithms do not meet the needs of false alarm rate and detection rate in real scenes” is not suitable and reasonable.

3. For the related work section, there are many deep-leanring based algorithms which are not reviewed by the authors and the logic is confused without leading to the proposed network. It seems like authors merely introduce the module from the other fields. Moreover, why this paper did not review the approaches that have already combined with attention mechanism for infrared small target detection?

4. All of the abbreviations need explanations. The proposed modules should be demonstrated in the Figure 1 to make the paper more clear and he detail of the overall computation should be illustrated. Moreover, the description of the module is not same in the whole paper.

5. In the Figure 2, what are the relationships between subgraphs? Where is this module located in the network? Which parts of the network are designed according to this? What part of the original ResNet was replaced? What does “different sizes” mean? More details should be added to make the paper readable and scentific.

6. In the Figure 3, the description does not agree with the figure and dirate should be dilate.

7. For the section 2.2.4, it is not exactly a transformer structure and at most a transformer encoder structure. It is also just an “attention plus MLP mode” which seems like transformer encoder, and has nothing to do with transformer's core attention mechanism.

8. In the Figure 5, why the kernel of conv1d is K*K? And which dimensional convolution does this convolution belongs to? The “sigmod” should be “sigmoid”.

9. For the ablation study, there is no ablation experiments were performed on each module. In addition, there does not appear to be a baseline comparison network that serves as the basis for ablation experiments and module analysis. All the ablation experiments do not seem to exclude the influence of other modules, which will have a certain impact on the evaluation of the role of modules.

10. Except for the comparison between inception and ResLNet, the branch design and parameter design of this module also need experimental analysis.

11. In the section 3.2.3, the experimental analysis of this module needs to be further enriched. In addition to the role of the MLP section, which has been discussed, more details need to be discussed about modules.

12. For the section 3.3, more deep learning based approaches should be compared such as ACM, IAA, ISNet, APAFNet and so on.

13. Overall, the writing and description are not academic and scentific with many problems such as abbreviations, grammer mistakes and utilization of capital and small letter.

 

 

Comments for author File: Comments.pdf

Comments on the Quality of English Language

 Extensive editing of English language required.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors introduce an algorithm for detecting dim and small infrared targets, named LRF-Net, which leverages a large-size receptive field. This algorithm incorporates a residual network composed of large-sized convolutional layers (ResLNet), enabling a larger effective receptive field and enhancing the model's robustness. Additionally, by integrating a transformer structure attention mechanism (TSAM), the network can more precisely localize target regions and improve overall detection performance.

Regarding feedback on the text, the following points need to be addressed:

  1. A reference to ResLNet is necessary.
  2. The claim that the effective receptive field of a large number of small-size convolutional layers is far smaller than that of large-size convolutional layers is debatable and should be re-evaluated.
  3. The use of the transformer structure attention mechanism (TSAM) is not essential for improving overall detection performance in binary segmentation scenarios.
  4. The model's details are insufficiently explained and need to be elaborated.
  5. There are several textual errors, such as the missing dot in “et al.”, and many errors require correction.
  6. The characteristics of the dataset are not presented, which may lead to overfitting issues with unbalanced data.
  7. In section 3.3.2, Visual Comparison, there are numerous unnecessary figures; in deep learning, comparisons should typically be made using metrics rather than visual figures.
  8. The discussion in section 4 is overly brief and needs to be expanded.
  9. Future work is not mentioned in the conclusion, which should be rectified.
Comments on the Quality of English Language

 Minor editing of English language required.

Back to TopTop