Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning

Remote Sens. 2020, 12(22), 3789; https://doi.org/10.3390/rs12223789

by Bo Li^1,*

, Zhigang Gan¹, Daqing Chen²

and Dyachenko Sergey Aleksandrovich³

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Remote Sens. 2020, 12(22), 3789; https://doi.org/10.3390/rs12223789

Submission received: 20 October 2020 / Revised: 8 November 2020 / Accepted: 16 November 2020 / Published: 18 November 2020

Round 1

Reviewer 1 Report

This paper reflects a Deep Reinforcement Learning (DRL) approach that facilitates the rapid target tracking of mobile objects in uncertain environments using UAVs. A Meta-Learning approach is also collaboratively exploited to handle this target tracking. Using these methods the UAV can sense its surroundings, map out areas, track objects, and respond directly and successfully to real-time changes in the surrounding environment. The paper is readable and well organized, the setup of the optimization problem is interesting, and the results are promising. However, revision is needed in some parts. The authors should address the following comments:

1) The authors should enrich the background information and more clearly highlight the main contributions of this paper. Some relevant and recent research efforts are not acknowledged in the introduction section, such as:

[R1] Wan, K.; Gao, X.; Hu, Z.; Wu, G. Robust Motion Control for UAV in Dynamic Uncertain Environments Using Deep Reinforcement Learning. Remote Sens. 2020, 12, 640.

[R2] S. Bhagat and P. B. Sujit, "UAV Target Tracking in Urban Environments Using Deep Reinforcement Learning," 2020 International Conference on Unmanned Aircraft Systems (ICUAS), Athens, Greece, 2020, pp. 694-701, doi: 10.1109/ICUAS48674.2020.9213856.

[R3] B. Yang, X. Cao, C. Yuen and L. Qian, "Offloading Optimization in Edge Computing for Deep Learning Enabled Target Tracking by Internet-of-UAVs," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2020.3016694.

[R4] Mukherjee, A.; Misra, S.; Sukrutha, A.; Raghuwanshi, N.S. Distributed aerial processing for IoT-based edge UAV swarms in smart farming. Comput. Netw. 2020, 167, 107038. [CrossRef]

2) A discussion about the complexity and the computation requirements of this approach should be included along with a discussion regarding the feasibility of this method in real-world scenarios. Although the selected methods seem appropriate, UAVs' limitations in terms of energy and computing resources are not considered. These limitations may restrict the application of these advanced methods.

3) The authors assume that the UAV's altitude is fixed. However, this altitude may change in practice as time passes, since rising, diving, and hovering operations may be realized, whereas air and pressure variations may affect the position of the UAV. How does the variation of this altitude affects the results?

4) Lastly, the authors may more thoroughly discuss how this work can be expanded.

Author Response

Dear Reviewer:

Thank you for your letter and for the reviewer’s constructive comments on our manuscript entitled “UAV Maneuvering Target Tracking in Uncertain Environments based on Deep Reinforcement Learning and Meta-learning” (remotesensing-989238). The comments are valuable and very helpful for us to revise and improve the manuscript. We have looked into the comments carefully and have made corrections and amendments accordingly.

Please see the attachment.

The comments from the Reviewer are highly appreciated.

Author Response File: Author Response.pdf

Reviewer 2 Report

Dear authors,

please find attached your manuscript with notes concerning mainly English language comments. You need to thoroughly edit your manuscript for grammatical and syntax errors. I have made some corrections just for the first paragraph but the whole manuscript needs rewritting.

The manuscript overall is scientifically sound and shows great potential. Thus, my recommendation is to accept it given that the presentation gets better. In the discussion section please elaborate more on your findings and provide a through discussion, comparisons, prerequisites for your system to perform and limitations.

Comments for author File: Comments.pdf

Author Response

Dear Reviewer:

Please see the attachment.

The comments from the Reviewer are highly appreciated.

Author Response File: Author Response.pdf

Reviewer 3 Report

Title: “UAV Maneuvering Target Tracking in Uncertain Environments based on Deep Reinforcement Learning and Meta-learning “.

In this work authors combines Deep Reinforcement Learning (DRL) with Meta-learning, proposing a novel approach, named Meta Twin Delayed Deep Deterministic policy gradient (Meta-TD3), to realize the control of Unmanned Aerial Vehicle (UAV). This approach seem to be able to allow a UAV to quickly track a target in an environment also where the motion of a target is uncertain. In addition, authors claim that this approach could be applied to a variety of scenarios, such as wildlife protection, emergency aid, and remote sensing.

General comment: The overall merit of this work can not be clearly judged since “formally” the stile of this manuscript is not suitable of a standard scientific contribution. Indeed, after the “Introduction” section, a “(Materials and) Methods” section is needed, followed by a “Results” section, and a “Discussion” and ”Conclusion” sections. The “Results” section should be the main crucial part of the manuscript. Nevertheless, within the current version of the main manuscript it is not totally clear what kind of results are presented to interested readers. Indeed, although authors claim that: “experimental results show that the Meta-TD3 algorithm has achieved a great improvement in terms of both convergence value and convergence rate. “, only a section named as “4. Simulation and Analysis” is found, where authors claim that they :”set up comparative experiments to verify the implementation effect, training efficiency and generalization ability of Meta-TD3 algorithm.” As a consequence, perhaps, the current version of this manuscript could be seen as a “feasibility study” and an effectiveness assessment of the goodness of the “ Meta-TD3 algorithm”. In addition, also several details in the mathematical description of the framework are to be carefully revised and improved to allow interested readers to fully appreciate, and eventually replicate, the presented work. In conclusion, this manuscript should be deeply reworked to improve its quality and impact.

Some further detailed comment:

Paragraph: “2. Problem Formulation”

*) This paragraph should be improved in order to imcrease the impact on the interested readership.

*) Perhaps, these following paragraphs “2.2. Target Tracking Model Based on Reinforcement Learning,2.2.1. State Space, 2.2.2. Action Space” should be inserted within the reworked manuscript within the “Methods” section.

Lines: “In order to describe the UAV maneuvering target tracking, we construct the UAV motion model and target tracking model based on Markov Decision Process (MDP).

*) Authors should better explain why just MDP have been chosen.

Lines: “The UAV motion model is the basis for completing navigation and target tracking missions. The 98 UAV can be thought as a rigid body with forces and torques applied from the four rotors and gravity 99 [16]. In navigation and target tracking scenario, we assume that UAV is flying at a fixed altitude. The 100 experiments of this paper are set in a x-y plane of Cartesian inertial coordinates.

101 The UAV motion model in Cartesian inertial coordinates is expressed as, etc..”

*) Authors should better specify the meaning of the mathematical formalism. Indeed, vectors are not clearly described nor plotted within Figure 1, and the time dependence of all components is not clearly explained in formulas (1). Also physical dimensions should be carefully checked. Figure 1 could be deeply reworked to improve its understandability.

Paragraph “3. Meta-TD3 for Target Tracking Model”

*)This paragraph seems to be crucial for the whole work. However, it should be deeply reworked and improved. In particular, Figure 3 is currently not clear: please improve to allow interested readers to understand this important part of the work. Similarly, Figures 4 and 5 are not clear: please rework and improve. In particular, paragraph “3.2.2. Meta-learning Update Method“ is quite hard to follow for not experts. Please provide a more understandable version within the main text of the manuscript and, perhaps, a more detailed description within an appendix.

Paragraph: “4. Simulation and Analysis”

*) Authors claim that:”In this section, we set up comparative experiments to verify the implementation effect, training efficiency and generalization ability of Meta-TD3 algorithm.”

As a consequence, If I correctly understand, in this paragraphs authors should present all the experimental results achieved in this work.

However, apparently, only simulations have been performed. Therefore, the whole work seems to be a “big proof a concept” of the suitability of the Meta-TD3 for Target Tracking Model with respect to other strategy.

In addition, authors should provide more quantitative results through a quantification of the suitability of the tracking strategy due to the implementation of different algorithms.

More specifically:

Paragraph: "4.1. Experimental Platform and Environment Setting"

*) This paragraph seems to be a little bit out of the context. Perhaps it should be better suited as a description within the “Methods” section. All physical quantities should be exactly expressed with both quantities and units.

*) Figure 6 is not clear….

Paragraph: “4.2. Model Training and Testing”

All these figures, which should be the main results of this manuscript are not so representative.

They should be reworked in order to add et least a quantitative metric related to the suitability of each different strategy. Also a plots with respect the time could be useful.

The 2D representation of a 3D task (tracking in a 3D environment) is sub-optimal. I understand that authors limit all this study to a constant high, but, perhaps, this is a too strong condition, limiting in a very huge way the interest of all the presented results. I suggest, at least, a quantitative improvement of the following figures:

Figure 7: The tracking trajectory and tracking distance in Task 1

Figure 8. Average reward trends with respect to training episodes in Task 1

Figure 9. The tracking trajectory and tracking distance in Task 2.

Figure 10. Average reward trends with respect to training episodes in Task 2.

Figure 11. The tracking trajectory and tracking distance in Task 3.

Figure 12. Average reward trends with respect to training episodes in Task 3

Author Response

Dear Reviewer:

Please see the attachment.

The comments from the Reviewer are highly appreciated.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Authors partially improved their work according to the comments of this reviewer.

No further comments.

Article Menu

UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI