Next Article in Journal
MOON: A Subspace-Based Multi-Branch Network for Object Detection in Remotely Sensed Images
Next Article in Special Issue
Long-Tailed Object Detection for Multimodal Remote Sensing Images
Previous Article in Journal
Coastal Dynamics at Kharasavey Key Site, Kara Sea, Based on Remote Sensing Data
Previous Article in Special Issue
Radar Active Jamming Recognition under Open World Setting
 
 
Article
Peer-Review Record

Infrared Small Target Detection Based on a Temporally-Aware Fully Convolutional Neural Network

Remote Sens. 2023, 15(17), 4198; https://doi.org/10.3390/rs15174198
by Lei Zhang 1,2, Peng Han 3, Jiahua Xi 3 and Zhengrong Zuo 3,*
Reviewer 1:
Reviewer 2:
Reviewer 3:
Remote Sens. 2023, 15(17), 4198; https://doi.org/10.3390/rs15174198
Submission received: 17 July 2023 / Revised: 13 August 2023 / Accepted: 21 August 2023 / Published: 26 August 2023

Round 1

Reviewer 1 Report

The authors have design FCST algorithm, and leverages techniques like FCOS and ConvLSTM to design a lightweight feature extraction network and incorporate spatiotemporal information, improving the detection performance of infrared small targets. The whole paper is of good quality ,  however, there are still some issues in this paper:

 

1.I have a question about how the authors abbreviated "fully convolutional-based detection algorithm" as FCST?

2.Unclear Figures: Some supplementary figures are blurry, making it difficult for readers to understand the experimental results. Improving the image quality is necessary.

3.Ineffective Explanations: The explanations for the figures lack detail and fail to highlight the core ideas of the experiments. More comprehensive and explicit explanations are needed.

4.I noticed that in this paper, there are some problems with the authors’ typography, which need to be revised and improved

5.What I am most concerned about is the authors’ explanation of the implementation principle of the FRM module, please discuss how this module suppresses background information.

6. Some infrared target detection methods are missing. [1] Deep-IRTarget: An automatic target detector in infrared imagery using dual-domain feature extraction and allocation [2] Graph-based few-shot learning with transformed feature propagation and optimal class allocation

NAN

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

1. Author has to specify the problem statement attained from existing infrared small target identification methods in the abstract.

2. It is mentioned in the abstract that both spatial and temporal information is used in proposed for detection. Author has to clarify, what kind of temporal information is taken from input data?

3. Justify the results in abstract in comparison to existing standard multiscale detection frameworks

4. In introduction section, the author mentioned (Line 45 and 46): "where existing deep learning-based algorithms may lose feature information". Are you try to solve the issue without affect the information loss?. Justification is required. Similarly, how the feature refinement mechanism is working here? Explanation is required.

5. Fig 1: The information on the input should be included. 

6. Fig 3: The size of the final block is required.

7. Fig 8: Mention its LSTM cell.

8. Preamble information is required (Section 4 - 4.1)

9. System and software requirement should be mentioned in the section 4.

10. The results of the proposed methodology has to be compared and analysed with the existing standard multiscale infrared small target identification methods

11. Feature extraction is being done with ResNet, DenseNet and VoVnet, Why these methods are chosen and what its impact if taken individually

12. How the feature refinement module works and how it is different from layers of batch normalization, Global average pooling etc.,

13. The author has mentioned that there is no learning stage in proposed method then how the training of network and loss optimization works with the model?

Proofread the paper to avoid grammatical errors

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Dear authors, 

After reviewing your manuscript, I found it really interesting and original, but I consider important to share some inquiries that I would like you explain a bit further, as these:

- You mention your method enhanced detection accuracy in more complex scenarios with a wide diversity of infrared small targets in practical scenarios. Could you describe what kind of diversity are your referencing to?

- Could you explain further about this paragraph: "the trajectory data of the targets are generated by a space target simulation system, with the flat trajectory data being obtained by calculating the target’s mapping position on the imaging plane. The simulation system’s imaging cycle is set between 0.1 s and 2.5 s to simulate differences in target movement speed". I read several times, and I feel like if something is missing there.

- I am wondering if your network would works the same way, when the target is moving and camera still, b) when the target is still and camera moving, c) both moving and d) both still.

- I guess your references 14 and 35 are the same, am I wrong?.

It is all by now, 

Best regards, 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 4 Report

When both the object and the backdrop are moving, differentiating them exclusively on temporal information may be challenging.How do you deal with it?

When the input data has a low spatial resolution, the algorithm may struggle to capture fine features and differentiate small objects from the background.

The noise in the image might cause the detection algorithm to malfunction, resulting in false positives or missing detection.

Line 213 with the anchor-box-free techniques, precisely localizing items may be difficult, especially when working with objects of varied sizes, shapes, or aspect ratios. How do you handle different scales and unseen aspect ratios in novel scenarios?

Line 231 Low-level features such as edges and textures are frequently captured in the earliest layers of a feature extraction network and are critical for successful target recognition. These crucial low-level features may be lost if the Focus structure excessively down samples the input, resulting in lower detection performance.

 

Line 357 While RNN's can capture short-term temporal dependencies, they may fail to catch long-term dependencies, which are important for some sequential tasks. This constraint may impair the model's ability to forecast accurately over longer sequences.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Accept

NAN

Reviewer 2 Report

Revised version looks fine to me 

Back to TopTop