Target Positioning for Complex Scenes in Remote Sensing Frame Using Depth Estimation Based on Optical Flow Information
Round 1
Reviewer 1 Report
This work aims to improve the accuracy of target localization while only using the UAV information. With depth estimation methods, the localization errors can be effectively reduced. Experiments partly verify the effects of the proposed modules. There are some key concerns: 1.The system includes target detection and depth estimation. Most of the used modules are from previous works. I think the authors should tell the key differences between the proposed methods and previous works. Besides, how to combine them should also be provided. In fact, current presentations are only show the results. 2. For experiments, I think the depth estimation plays a vital role for the system. What is the effect of using different estimation methods? How to ensure the effectiveness when missed detections appear. Besides, there are no full comparisons with other methods. I suggest the authors add more comparisons with similar systems or techniques.Author Response
Please see the attachment.
Author Response File:  Author Response.docx
 Author Response.docx
Reviewer 2 Report
This paper mainly discussed the UAV-based target positioning methods based on video data. The idea is straightforward and technically sound. While, there are several issues that should be considered seriously.
1. The contribution should be reorganized, especially, innovation 1 is not convincing. Otherwise, the detailed theoretical improvement should be added on the novelty description.
2. As a positioning system, the corresponding hardware should be disclosed, such as the platform, the processor, the sensors.
3. Is these video data processed onboard or onground? The configuration of hardware environment should be listed on the experimental part.
4. What's the difference with the proposed method with a series of images and other monocular depth estimation with only several images?
5. Is the positioning accuracy related with the UAV positioning accuracy, the flight height of UAV, the velocity of UAV?
Author Response
Please see the attachment.
Author Response File:  Author Response.docx
 Author Response.docx
Reviewer 3 Report
The manuscript proposed a method to improve target localization accuracy using depth estimation. The experiments in section 4 demonstrated the proposed method improves the estimated depth with quantitative metrics, which seems promising. However, they lack a comparison in subsection 4.3 between the proposed method and those using techniques discussed in subsection 4.3 for positioning accuracy errors. The positioning error of 8m is not trivial. It is insufficient to support the authors’ claim that the proposed method improves target localization accuracy. Comparing it with methods prevalent in the recent literature needs to be added. In addition, there are typos in the manuscript, such as in lines 176 -177, 211, and 228, to name a few. A thorough proofreading of the entire manuscript should be fulfilled.
Author Response
Please see the attachment.
Author Response File:  Author Response.docx
 Author Response.docx
Reviewer 4 Report
The manuscript proposes a positioning system for targets in complex scenes, which is composed of target detection, depth estimation, and coordinate transformation. The manuscript validates new system in practical applications and analyzes the impact of the algorithm by experiments.
Some problems are listed below:
(1) In the introduction part, the problem of binocular methods applied to localization is proposed, the scheme of monocular location method in remote sensing is described. However, relevant literature of monocular methods is not indexed, while too much attention is paid to the description and comparison of binocular methods. Actually, the contribution statement section even doesn't clarify whether the system in this manuscript is monocular method or binocular method.
(2) In section 2.2.1, related work does not show any flaw in recent works. The significance of this system in this manuscript is ignored in current analysis.
(3) In section 3.1, description on object detection needs to be introduced, while the current contents are too general, especially about the relationship between different frames.
(4) Regarding the algorithm proposed in section 3.2, theoretical advantages of proposed system is not reflected, as the main innovation point of this manuscript. The corresponding methods such as world coordinate calculation and optical flow correct are already relatively common methods.
(5) In section 4.1, the superiority of optical flow estimation model RAFT in the current scenario should be stated about the description of Fig.4.
(6) In section 4.3, there is a lack of comparative experiments of similar methods, regarding the positioning of complex scenarios. As the core of this manuscript involved in the title, there are no quantitative data in related experiment.
(7) Vocabulary spelling problems need to be checked.
Author Response
Please see the attachment.
Author Response File:  Author Response.docx
 Author Response.docx
Round 2
Reviewer 2 Report
Some comments have been considered seriously. But, there still exist an important issue that should be addressed.
1. In the experiment part, the internal results are partially absent, e.g., loss and accuracy curves. Only the experimental configuration and results are presented. Such description will not convince readers.
Author Response
Please see the attachment.
Author Response File:  Author Response.docx
 Author Response.docx
Reviewer 3 Report
The revised manuscript resolved the issues I raised previously. Suggest accepting it for publication.
Author Response
We appreciate your suggestions on our manuscript and your recognition of our work.
 
        
 
                                                

