Review Reports
- Jinhang Liu1,2,
- Chenxu Yang1,2 and
- Jing Wang1,2
- et al.
Reviewer 1: Yong Song Reviewer 2: Anonymous Reviewer 3: Anonymous
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript presents a remote sensing object detection network based on Mamba and Multi-Head Self-Attention, offering a detailed and logical explanation of the network structure and principles. The technical novelty of the proposed approach appears limited. However, the study is well-supported by sufficient data and effectively demonstrates the advantages of the FI-MambaNet.
Several issues should be addressed:
- Even FI-MambaNet is noted for its lower computational complexity and fewer parameters, it would be more intuitive to describe its efficiency using a direct performance metric. It is suggested that FPS comparison is added in Table 2 to provide a clearer assessment of its inference speed.
- The visual comparison of data in some figures (e.g., Figure 7) lacks sufficient clarity, and it is suggested to enhance the visual contrast for better readability.
- Regarding the detection results of the network on the DOTA 1.0 test set, although the proposed network performs well in most categories, its performance on the SH and ST categories is significantly weaker compared to mainstream networks.
- The confusion matrix reveals that many categories exhibit a certain probability of missed detections, indicating that further improvements are needed in this aspect.
- In Figure 6, connecting the mAP values of different networks with a line chart does not serve a meaningful purpose.
- After incorporating the MSA-Mamba and MCSA modules in the ablation study, the frame rate experiences a significant decline. Is there any approach to improve the frame rate?
Author Response
We sincerely appreciate your valuable time and constructive comments on our manuscript. Your insightful suggestions have greatly helped us improve the quality and clarity of the paper.
We have carefully addressed all the comments, and the detailed point-by-point responses are provided in the attached PDF file for your review.
Thank you again for your constructive feedback and kind support.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors(1)The multi-scale feature extraction and fine-grained contextual information acquisition techniques employed in this paper, which utilize neural networks and deep learning algorithms for remote sensing images, are widely adopted methods in remote sensing target detection. Please further highlight the contributions of this paper by conducting comparative analyses with existing literature.
(2)A significant and prevalent challenge in remote sensing imagery involves small targets that are prone to rotation. Have the authors considered and researched this critical challenge? Please supplement with relevant experiments and discussion.
(3)Table 1 presents the comparison results between the proposed method and existing methods across multiple evaluation metrics. However, for several metrics—such as SV, SH, ST, and SP—the proposed method significantly underperforms existing methods, failing to validate its performance advantages.
Comments on the Quality of English Language(1)The multi-scale feature extraction and fine-grained contextual information acquisition techniques employed in this paper, which utilize neural networks and deep learning algorithms for remote sensing images, are widely adopted methods in remote sensing target detection. Please further highlight the contributions of this paper by conducting comparative analyses with existing literature.
(2)A significant and prevalent challenge in remote sensing imagery involves small targets that are prone to rotation. Have the authors considered and researched this critical challenge? Please supplement with relevant experiments and discussion.
(3)Table 1 presents the comparison results between the proposed method and existing methods across multiple evaluation metrics. However, for several metrics—such as SV, SH, ST, and SP—the proposed method significantly underperforms existing methods, failing to validate its performance advantages.
Author Response
We sincerely appreciate your valuable time and constructive comments on our manuscript. Your insightful suggestions have greatly helped us improve the quality and clarity of the paper.
We have carefully addressed all the comments, and the detailed point-by-point responses are provided in the attached PDF file for your review.
Thank you again for your constructive feedback and kind support.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis article is technically correct, but it cannot be published in this version.
1) The abstract is too long, the issue is not well defined, and it is not self-consistent. The abstract must contain all the information about the article and, above all, it must contain the issue. There are also some unclear acronyms. In my opinion, the use of many acronyms should be avoided.
2) The introduction is OK.
Figure 1 is not self-consistent. The data flow diagram is unclear, the dimensions are not evident, and the computational steps are obscure.
Figure 2, ditto. Here, there is also the detail of MDSA, which needs to be explained better.
The use of formula nomenclature is fine, but punctuation must be used.
Figure 3, again, must be self-consistent. In addition, the output is unclear.
Experimental Results, Figure 6: use grid on.
Figure 10: Explain better and contextualize in more depth in the article.
The bibliograph is fine
Author Response
We sincerely appreciate your valuable time and constructive comments on our manuscript. Your insightful suggestions have greatly helped us improve the quality and clarity of the paper.
We have carefully addressed all the comments, and the detailed point-by-point responses are provided in the attached PDF file for your review.
Thank you again for your constructive feedback and kind support.
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for Authors(1)The multi-scale feature extraction and fine-grained contextual information acquisition techniques employed in this paper, which utilize neural networks and deep learning algorithms for remote sensing images, are widely adopted methods in remote sensing target detection. Please further highlight the contributions of this paper by conducting comparative analyses with existing literature.
(2)A significant and prevalent challenge in remote sensing imagery involves small targets that are prone to rotation or time-consuming, such as "Gaussian-based R-CNN with large selective kernel for rotated object detection in remote sensing images, 2025" and " Lightweight Ship Object Detection Algorithm for Remote Sensing Images Based on Multi-scale Perception and Feature Enhancement, 2025". Have the authors considered and researched this critical challenge? Please supplement with relevant experiments and discussion by comparing with the mentioned related algorithms above.
(3)Table 1 presents the comparison results between the proposed method and existing methods across multiple evaluation metrics. However, for several metrics—such as SV, SH, ST, and SP—the proposed method significantly underperforms existing methods, failing to validate its performance advantages.
Author Response
We sincerely thank the reviewers for their thoughtful and constructive feedback.
We have carefully revised the manuscript based on all suggestions and provided detailed responses to each comment in the accompanying response document.
We are deeply grateful to the reviewers for their time and effort, which have helped us enhance the quality of our work.
Author Response File:
Author Response.pdf