Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Three-Dimensional Trajectory and Resource Allocation Optimization in Multi-Unmanned Aerial Vehicle Multicast System: A Multi-Agent Reinforcement Learning Method

Drones 2023, 7(10), 641; https://doi.org/10.3390/drones7100641

by Dongyu Wang^1,*, Yue Liu¹, Hongda Yu¹ and Yanzhao Hou²

Reviewer 1:

Sumana Biswas

Reviewer 2: Anonymous

Reviewer 3:

Alexey Ivutin

Drones 2023, 7(10), 641; https://doi.org/10.3390/drones7100641

Submission received: 30 August 2023 / Revised: 28 September 2023 / Accepted: 18 October 2023 / Published: 19 October 2023

(This article belongs to the Special Issue Resilient Networking and Task Allocation for Drone Swarms)

Round 1

Reviewer 1 Report

1. Authors need to discuss more about the limitations of the existing methods to highlight the research gaps.

2. “K-medoids algorithm” needs reference in the introduction section.

3. Authors need to highlight the novel contribution(s) of the paper in the introduction section.

4. Is there any obstacle avoidance considered in the proposed approach.

5. Need to provide one flowchart to explain the methodology to make it easier for the reader.

6. Need explanation for figure 7. What does the triangle, cross and dot means in figure 7?

7. How did you calculate the performance indicator of the system?

8. Authors need to include future direction of the work.

Author Response

Thank you for your valuable suggestions. We have provided a point-by-point reponse in the cover letter. Please see the attachment. Thank you very much.

Author Response File: Author Response.pdf

Reviewer 2 Report

the authors propose a multi-UAV multicast system where a multi-agent reinforcement learning method is utilized to help UAVs determine the optimal altitude and trajectory. However I have the following comments

1- The introduction is too long missing clear motivation and paper contributions. i recommend splitting the related work part in a new section

2- A lot of abbreviations are missing such as MADPPG, DDQN, DDPG, etc

3- Some papers are missing in the related work part such as:

a-Hosny, R.; Hashima, S.; Mohamed, E.M.; Zaki, R.M.; ElHalawany, B.M. Budgeted Bandits for Power Allocation and Trajectory Planning in UAV-NOMA Aided Networks. Drones 2023, 7, 518. https://doi.org/10.3390/drones7080518

b. T. L. Nguyen, G. Kaddoum, T. N. Do and Z. J. Haas, "Channel Characterization of UAV-RIS-Aided Systems With Adaptive Phase-Shift Configuration," in IEEE Wireless Communications Letters 2023, doi: 10.1109/LWC.2023.3306553.

c. Y. Ge, J. Fan and J. Zhang, "Active Reconfigurable Intelligent Surface Enhanced Secure and Energy-Efficient Communication of Jittering UAV," in IEEE Internet of Things Journal 2023, doi: 10.1109/JIOT.2023.3304004.

d. H. Son and M. Jung, "Phase Shift Design for RIS-Assisted Satellite-Aerial-Terrestrial Integrated Network," in IEEE Transactions on Aerospace and Electronic Systems 2023, doi: 10.1109/TAES.2023.3301464.

e. W. Feng et al., "Resource Allocation for Power Minimization in RIS-assisted Multi-UAV Networks with NOMA," in IEEE Transactions on Communications 2023, doi: 10.1109/TCOMM.2023.3298984.

4-At the end of the related work part, the authors should mention the difference between related work and this paper.

5- it is unclear which RL technique is utilized in this paper, centralized or decentralized.

6- in Figs 1,2 there is no IRS referred. The authors should mention/type it

7- Some sections contain a high plagiarism ratio such as lines 76-78 , 127-128, 161-164, 173-184 , 188-190, 198-202, 219-224, 243-248,356-358, 365-368, 401-404

8- comparison results with DDQN, and bandits should be added

Some typos and errors exist

Author Response

Thank you for your valuable suggestions. We have provided a point-by-point reponse int the cover letter. Please see the attachment. Thank you very much.

Author Response File: Author Response.pdf

Reviewer 3 Report

The article is devoted to the problem of optimizing the trajectory and energy consumption of a UAV to solve the three-dimensional problem of covering ground users with a wireless network, taking into account blocking or significant deterioration of the signal due to buildings and structures. The article contains links to current scientific research, is well structured, offers improvements to existing approaches to solving the optimization problem and, in general, in my opinion, does not contain significant critical problems.

However, it is advisable to revise the article taking into account the following comments:

1. The statement of the problem requires explanation, since it is easy to see that it can be divided into two: static planning of the distribution of objects on the ground and dynamic distribution of objects taking into account the characteristics of the environment. At the same time, to solve the first problem, there are a sufficient number of well-proven solutions, since areas with insufficient coverage will be predominantly static in nature. It is also obvious that in such a formulation the solution to the dynamic distribution problem will have a significantly smaller space of feasible solutions, and in most cases the position will be adjusted along the vertical axis.

2. Taking into account Remark 1, additional explanation is needed on the choice of multi-agent learning algorithms. Moreover, it would be correct in this case to prove the advantages of the proposed approach, including in comparison with classical algorithms.

3. The process of dividing into user groups is confusing. Taking into account the possible movement of GUs and the movement of the UAV, these groups may constantly change and the algorithm cycle will not complete. Unless it is assumed that the grouping is carried out based on data received once without taking into account changes in the location of users in real time. Additionally, I would like to see information in the article about how often redistribution of groups of ground users occurs. There is also the question of choosing the correct value of k.

4. The analysis of the experimental results is confusing. The article states optimization of resource consumption and trajectory. The results show a comparison in terms of throughput and in terms of the reward function, which obviously should be considered as an analysis of the initially formulated optimization problem, but the comparison in terms of trajectory is unclear. The trajectory is shown only for the developed method, and not for the other two with which comparisons were made. I would also like to clarify whether in any of the compared algorithms the situation of blocking a user (group of users) is possible if it is located so that it is outside the access zone and how much time (number of episodes) is required to correct the situation during the learning process.

5. Figure 5 is actually a copy from the original work arXiv:1706.02275v4 and does not introduce anything new into the formulation of the solution to the problem.

6. In some formulas, not all symbols used in the formulas are deciphered

7. Figure 8. Everywhere in the text episode, and in the signature - epoch.

Minor editing of English language required

Author Response

Thank you for your valuable suggestions. We have provided a point-by-point reponse int the cover letter. Please see the attachment. Thank you very much.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors successfully reflected my comments. I accept the paper for publication Great Work!

minor English typos

Reviewer 3 Report

The authors provided acceptable responses to the comments. The materials of the article have been supplemented in accordance with the recommendations received

Article Menu

Three-Dimensional Trajectory and Resource Allocation Optimization in Multi-Unmanned Aerial Vehicle Multicast System: A Multi-Agent Reinforcement Learning Method

Further Information

Guidelines

MDPI Initiatives

Follow MDPI