DSAD: Multi-Directional Contrast Spatial Attention-Driven Feature Distillation for Infrared Small Target Detection
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper presents comprehensive experiments and further demonstrates the effectiveness of the method on edge computing devices. To strengthen the work, I recommend adding analyses of representative failure cases, which would help clarify the method’s limitations and provide insights for future research directions.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsIn this manuscript, a infrared small target detection is proposed based on multi-directional contrast spatial attention. Some problems and suggestions are as follows.
1. From Figure 1, it can be seen that the decoding outputs of the teacher network and the student network do not play any roles.
2. It is suggested to illustraite the discussion of the convergence of the model training.
3. In Tables 1-3, the network parameters in the same table are the same, but the same network is different in different tables. How was this achieved?
4. It is suggested to show some examples of error detection.
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsIn this paper, the authors proposed a novel Multi-Directional Contrast Spatial Attention-driven Feature Distillation (DSAD) method for achieving quick and high- performance IRSTD. a Multi-Directional Contrast Spatial Attention (DSA) module and a Perceptual Weighted Mean Square Error (PWMSE) method are proposed to achieve multi-directional spatial information extraction and teacher/student consistent learning, respectively. Experimental results verify the effectiveness of their method on benchmark datasets and lots of edge devices (i.e., NVIDIA AGX and HUAWEI Ascend-310B).
Overall, the experiments in this paper are well-organized, the structure is complete, the innovative aspects are strong, and the hardware experiments enhance the persuasiveness of the study. I have several concerns for this paper:
1. I suggest the authors to provide a brief figure about the motivation of this paper at the beginning.
2. Regarding the design of distillation loss, specifically the “Perceptual Weighted Mean Square Error (PWMSE)” mentioned in section 2.5.Loss function, the paper only introduces the term without providing an intuitive explanation of its function. It is recommended to supplement such an explanation.
3. Consistency of terminology and uniformity of formula expressions. For example, the term “student network” is sometimes written as “student network” and sometimes as “student networks”. Similarly, other content expressions should also be checked. In formula (10), the symbol notations should be unified with those in the previous formulas.
4. Figure 5 depicts the composition of multi-directional convolution kernels. It would be beneficial to annotate the figure briefly with the corresponding degrees for the eight discrete directions (e.g., kd1 corresponds to 0°, kd2 corresponds to 45°), so that readers can immediately grasp the directional encoding.
5. The conclusion does not fully summarize potential constraints. For instance, the method is evaluated on single-frame IRSTD and does not consider the multi-frame correlation information of video sequences. This supplementation does not deny the effectiveness of the method. it reflects the comprehensiveness of research thinking and will not affect the value of the existing achievements.
6. The related work section mainly focuses on general feature fusion and network design. It would strengthen the paper to also include comparisons or discussions of recent works in infrared small target detection, such as fc3net, irsam, and irprune.
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThe paper proposes a Multi-Directional Contrast Spatial Attention-driven Feature Distillation (DSAD) method for achieving quick and high performance IRSTD. The paper is well-structured, methodologically sound, and offers significant contributions to the field. However, the manuscript requires some revisions before a possible publication. Below are the reviewer's comments and suggestions aimed at enhancing the clarity, rigor, and overall quality of the manuscript.
- The authors should simplify the part of abstractto ensure readability and communicability.
2.The related works lacks a summary of the limitations of existing methods.
3.The authors should provide detailed explanations of the symbols used in the figures and ensure a clear correspondence between the illustrations and the text.
4.The manuscript does not adequately explain some of the symbols in the formulas, for instance
5.The compared algorithms are insufficient and should include more SOTA methods.
6.The authors should provide the rationality and interpretability of dataset division.
7.The authors should provide sufficient explanations for the ablation experiment results.
8.The conclusion section lacks a discussion of the limitations and the prospect for the future work.
9.The authors should ensure the standardization of references, for instance Ref.37 and 39 are exactly same.
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Round 2
Reviewer 4 Report
Comments and Suggestions for AuthorsThe authors have addressed all my comments and I suggest to accept it in present form.

