Multi-Scale Object Detection in Remote Sensing Images Based on Feature Interaction and Gaussian Distribution
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors1.The relationship between the GFI and RFI modules depicted in Figure 3 is ambiguous. It could be interpreted as the two modules entering the CLFI module after passing through an adder, or alternatively, as the GFI and RFI modules collectively comprising the CLFI module. This ambiguity is inconsistent with the description in lines 218-220 of the article, which states that the GFI and RFI modules together form the CLFI module. To eliminate this ambiguity, the author needs to revise Figure 3.
2. Figure 5(b) illustrates the summation of the feature maps from the i-th layer of the GFI, RFI, and SFI modules to obtain the overall feature map for the i-th layer. However, it remains unclear what deeper intrinsic connections exist between the GFI and RFI modules and why they specifically form the CLFI module. The author should clarify these points at the beginning of Section 3.2.
3. The unit used for angle deviation in Figure 6 needs clarification. It is unclear whether it is expressed in degrees or radians. The author should explicitly indicate the unit to avoid any confusion.
4.The reasons for conducting experiments on the DOTA-v1.0 and HRSC2016 datasets should be clarified. Do these datasets effectively reflect the challenges in remote sensing images, such as significant object scale variations, diverse orientations, small and dense objects, and complex backgrounds?
5.In Figure 9, some small targets within the remote sensing image are not clearly visible. The author could consider using a local enlargement to demonstrate the recognition effect of the proposed method on these small targets. This would enhance the clarity and comprehensibility of the presentation, particularly for the readers interested in the method's performance on smaller objects.
6.Why were ablation experiments investigating the individual effects of GFI and RFI modules not included in Table 4 while keeping the SFI module active? The author should clarify this omission.
7.Please add more methods on small infrared target detection in the introduction section.
[1] ISNet: Shape matters for infrared small target detection, CVPR.
[2] Rkformer: Runge-kutta transformer with random-connection attention for infrared small target detection, ACM MM.
[3] Exploring feature compensation and cross-level correlation for infrared small target detection, ACM MM.
[4] Chfnet: Curvature half-level fusion network for single-frame infrared small target detection, Remote sensing.
[5] Thermodynamics-Inspired Multi-Feature Network for Infrared Small Target Detection, Remote sensing.
[6] Dim2Clear Network for Infrared Small Target Detection, TGRS
[7] IRPruneDet: Efficient Infrared Small Target Detection viaWavelet
Structure-Regularized Soft Channel Pruning, AAAI.
Author Response
Our point-to-point response can be found in the attachment. Thanks.
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors propose a multi-scale object detection algorithm based on feature interaction and Gaussian distribution in remote sensing images. Its effectiveness is verified on the DOTA-v1.0 dataset and the HRSC2016 dataset.
1. The first sentence in the abstract could be deleted. The statements in the abstract should be concise.
2. Remote sensing images include at least optical and microwave remote sensing images. The work in the manuscript is aimed at optical remote sensing images and it is recommended that this be clearly stated.
3. The Ref.21 shows little relevance to the research.
Author Response
Our point-to-point response can be found in the attachment. Thanks.
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for Authors1.The issue in the abstract is not very clear. Multi-scale distribution is just a phenomenon of data and not a problem of object detection. Please clarify the problem that needs to be solved in this paper.
2.The distinction between object and target is unclear throughout this paper text. Please clarify this description.
3.The references in related work are not comprehensive, and there are many Transformer and CNN-Transformer hybrid networks that have not been introduced.
4.The relationship between a and b in Figure 4 is not clear enough.
5.The method used in the paper was only applied to the ResNet backbone and cannot prove its universality. Please test it on different types of backbones.
Author Response
Our point-to-point response can be found in the attachment. Thanks
Author Response File: Author Response.docx