UAV-Mounted Base Station Coverage and Trajectory Optimization Using LSTM-A2C with Attention
Round 1
Reviewer 1 Report (Previous Reviewer 1)
Comments and Suggestions for AuthorsConcerns are answered.
Author Response
Thank you for your review.
Reviewer 2 Report (Previous Reviewer 2)
Comments and Suggestions for AuthorsThis revised version of the manuscript has been substantially improved, with the authors addressing all of my previous comments.
Author Response
Thank you for your review.
Reviewer 3 Report (Previous Reviewer 3)
Comments and Suggestions for AuthorsThe manuscript has been significantly improved, and I am satisfied with their detailed revisions.
I have two minor suggestions to further enhance the paper before publication.
First, a small point of clarification regarding the methodology. In Section 3.1, the model is simplified to a 2D trajectory optimization problem by assuming a constant UAV altitude. While this is a reasonable simplification and is noted in the future work, it might be beneficial to briefly state this assumption more explicitly in the methodology section itself, perhaps with a short sentence justifying why this is a common and valid approach for this type of problem.
Second, to further strengthen the literature review, I was pleased to see the discussion on the synergy with emerging physical-layer technologies like Intelligent Reflecting Surfaces (IRS) on page 4. To enhance this forward-looking point, the discussion could be slightly expanded to include recent advancements in how AI is used to jointly optimize these complex systems. For instance, the use of deep reinforcement learning (DRL) to manage resources and maximize energy efficiency in IRS-assisted networks is a highly relevant area, as explored in works like "Deep Reinforcement Learning for Energy Efficiency Maximization in RSMA-IRS-Assisted ISAC System". A brief mention of this line of research would nicely complement the paper's context within the broader 6G landscape.
Author Response
Comments 7: First, a small point of clarification regarding the methodology. In Section 3.1, the model is simplified to a 2D trajectory optimization problem by assuming a constant UAV altitude. While this is a reasonable simplification and is noted in the future work, it might be beneficial to briefly state this assumption more explicitly in the methodology section itself, perhaps with a short sentence justifying why this is a common and valid approach for this type of problem.
Response 7: Thank you for this precise and helpful note. We have inserted a clear, justified statement in Section 3.1 (page 5, line 216 in the revised manuscript), affirming the 2D simplification as standard practice that sharpens focus on horizontal dynamics while preserving line-of-sight reliability. Revised text (concise): "UAV-BS flies at fixed 50 m altitude... reducing 3D to 2D horizontal planning—a widely validated approach in UAV-BS literature that isolates maneuverability from vertical variations..."
Comments 8: Second, to further strengthen the literature review, I was pleased to see the discussion on the synergy with emerging physical-layer technologies like Intelligent Reflecting Surfaces (IRS) on page 4. To enhance this forward-looking point, the discussion could be slightly expanded to include recent advancements in how AI is used to jointly optimize these complex systems. For instance, the use of deep reinforcement learning (DRL) to manage resources and maximize energy efficiency in IRS-assisted networks is a highly relevant area, as explored in works like "Deep Reinforcement Learning for Energy Efficiency Maximization in RSMA-IRS-Assisted ISAC System". A brief mention of this line of research would nicely complement the paper's context within the broader 6G landscape.
Response 8: Thank you for this visionary suggestion. We have expanded the Related Work section (page 6, line 231 in the revised manuscript) with a forward-looking paragraph that celebrates DRL-driven RSMA-IRS-ISAC synergy [11048955], positioning our trajectory framework as a natural partner in building resilient, AI-orchestrated 6G networks. Revised text (concise): "Our framework synergizes beautifully with IRS and NOMA... recent DRL breakthroughs in RSMA-IRS-ISAC for energy efficiency pave the way for integrated 6G systems where UAVs, reflection, and sensing converge..."
Summary of Revision
Author Response File:
Author Response.pdf
Reviewer 4 Report (New Reviewer)
Comments and Suggestions for AuthorsThis paper proposes an attention-enhanced LSTM-A2C reinforcement learning framework for optimizing UAV-mounted base station trajectories in disaster scenarios. The method maximizes fair user coverage under energy, wind, and obstacle constraints by integrating temporal memory with attention-based focus on unserved users. A nine-direction mobility model and fairness-driven reward design further improve adaptability. Simulation results show better performance compared with baseline RL methods, demonstrating the framework’s efficiency, fairness, and scalability for 6G-enabled disaster communications.
Generally, this is an interesting and well-written paper. I only have a few suggestions before its publication.
1 The topic, coverage analysis and optimization in UAV networks using RL, is already a well-studied area. The authors should clarify the key distinctions between this study and prior works in UAV trajectory optimization and RL-based coverage control. I also suggest enriching the Introduction with recent and timely research directions, such as UAV networks for low-altitude economy [R1] and UAV-enabled Integrated Sensing and Communication (ISAC)[R2], to highlight the paper’s contemporary relevance.
[R1]Z. Lyu, Y. Gao, J. Chen, H. Du, J. Xu, K. Huang, D. Kim, "Empowering Intelligent Low-altitude Economy with Large AI Model Deployment", 2025. Online. Available: https://arxiv.org/pdf/2505.22343
[R2]X. Jing, F. Liu, C. Masouros, and Y. Zeng, "ISAC From the Sky: UAV Trajectory Design for Joint Communication and Target Localization," in IEEE Transactions on Wireless Communications, vol. 23, no. 10, pp. 12857-12872, Oct. 2024.
2 Please explicitly emphasize the technical contributions. Under the conventional LSTM-A2C framework, what unique architectural modifications or design enhancements are introduced here?
3 As traditional coverage optimization studies often rely on optimization theory, the rationale for employing RL instead should be justified, what specific challenges or dynamic conditions motivate this choice?
4 Fairness and energy are global mission-level objectives, while RL updates policies step-by-step through immediate rewards. How are these long-horizon constraints decomposed into the MDP formulation? Please explain how global fairness and energy objectives are incorporated into per-step design without misaligning with the optimization goal.
5 Please justify why considering the benchmark schemes (DDQN, A2C, etc.).
6 Please discuss how the grid resolution and mission area discretization affect both performance and computational complexity. This sensitivity discussion would strengthen the paper’s methodological robustness.
Author Response
Comments 1: The topic, coverage analysis and optimization in UAV networks using RL, is already a well-studied area. The authors should clarify the key distinctions between this study and prior works in UAV trajectory optimization and RL-based coverage control. I also suggest enriching the Introduction with recent and timely research directions, such as UAV networks for low-altitude economy [R1] and UAV-enabled Integrated Sensing and Communication (ISAC)[R2], to highlight the paper’s contemporary relevance.
Response 1: Thank you for this excellent suggestion. We have enthusiastically enriched the Introduction (page 1, line 30 in the revised manuscript) by integrating the recommended references [R1] and [R2], seamlessly connecting our fairness-driven RL framework to the vibrant low-altitude economy (LAE) and UAV-enabled ISAC landscapes. This not only elevates contemporary relevance but also sharpens our distinction from prior RL works through joint emphasis on fairness-aware rewards, nine-direction mobility, and attention-guided adaptation in chaotic disaster zones. Revised text (concise): "Beyond disaster relief, UAV networks are increasingly pivotal in emerging paradigms such as the low-altitude economy (LAE)... Our work aligns with these timely directions by advancing equitable, adaptive UAV-BS control..."
Comments 2: Please explicitly emphasize the technical contributions. Under the conventional LSTM-A2C framework, what unique architectural modifications or design enhancements are introduced here?
Response 2: Thank you for highlighting this opportunity to showcase innovation. the original manuscript under the contribution section listed the main novelity, hence we retained and amplified the Key Contributions subsection (pages 1–3, starting from line 30 in the revised manuscript), now vividly enumerating three transformative enhancements: attention-embedded LSTM-A2C for context-aware focus, memory-tracked fairness rewards for equitable service, and multi-objective energy-aware optimization—each a leap beyond standard LSTM-A2C. Revised text (concise): "Novel LSTM-A2C with Attention Framework... Fairness-Aware Coverage Optimization... Comprehensive Evaluation and Scalable Insights..."
Comments 3: As traditional coverage optimization studies often rely on optimization theory, the rationale for employing RL instead should be justified, what specific challenges or dynamic conditions motivate this choice?
Response 3: Thank you for prompting this critical justification. We have added a compelling new paragraph in the Introduction (page 2, line 51 in the revised manuscript) that deeply explains why RL triumphs in disaster chaos and super dynamic environment : non-stationary winds and users, partial observability, explosive action spaces, and fairness-energy trade-offs—all insurmountable for classical optimization. Revised text (concise): "We embrace reinforcement learning (RL) because disaster environments shatter traditional assumptions—non-stationary dynamics, partial observability, high-dimensional actions, and fairness under uncertainty demand adaptive, memory-rich policies..."
Comments 4: Fairness and energy are global mission-level objectives, while RL updates policies step-by-step through immediate rewards. How are these long-horizon constraints decomposed into the MDP formulation? Please explain how global fairness and energy objectives are incorporated into per-step design without misaligning with the optimization goal.
Response 4: Thank you for this deep technical insight. We have inserted a clear, explanatory paragraph in (page 10, line 357 in the revised manuscript) after Eq. (9), demonstrating how global fairness and energy are integrated into every step via direct reward inclusion, with memory and attention ensuring long-term harmony and convergence. Revised text (concise): "Global fairness and energy are seamlessly decomposed into per-step rewards... memory and attention prevent misalignment, guiding the agent toward equitable, efficient long-horizon outcomes..."
Comments 5: Please justify why considering the benchmark schemes (DDQN, A2C, etc.).
Response 5: Thank you for encouraging methodological rigor. We have introduced a dedicated, insightful paragraph at the start of Section V (page 18, line 575 in the revised manuscript), tracing the natural evolution from DDQN (value mastery) → A2C (policy elegance) → LSTM-A2C (temporal memory) to spotlight how our attention and fairness innovations deliver superior equity and efficiency. Revised text (concise): "We benchmark against a progressive RL lineage—DDQN, A2C, and LSTM-A2C—under identical 1500-unit energy caps to isolate the transformative impact of our attention and fairness mechanisms..."
Comments 6: Please discuss how the grid resolution and mission area discretization affect both performance and computational complexity. This sensitivity discussion would strengthen the paper’s methodological robustness.
Response 6: Thank you for this valuable call for robustness. We have added a thoughtful theoretical analysis (page 15, line 481 in the revised manuscript), exploring how finer grids boost precision and fairness but explode complexity—justifying our balanced M=20 choice as optimal for real-world scalability. Revised text (concise): "Grid resolution (M=20, c=200 m) strikes a golden balance: finer grids enhance fairness and targeting, but coarsening preserves tractable training—our design ensures fidelity without computational overload..."
Author Response File:
Author Response.pdf
Round 2
Reviewer 4 Report (New Reviewer)
Comments and Suggestions for AuthorsThanks the authors for their responses. I have no further questions. I suggest to accept this work.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn this study, a reinforcement learning framework is designed to optimize UAV-BS trajectories and coverage within 6G-enabled networks. The objective of the approach is to maximize the reach of unique users while prioritizing fairness in service delivery.
- The proposed method combines existing LSTM and attention mechanisms, lacking sufficient novelty. How does the proposed method achieve fair coverage?
- No explanation was provided for Figure 1.
- Many formatting issues need to be revised. For example, the format of text below Equations (1), (2), (5).
- The planning problem addressed in this paper is a 2-D path-planning task, a class that has already been extensively studied. Where does the difficulty of the present work lie?
- For optimization problem (7) what are the optimization variables?
- The paper mixes the terms “path planning” and “trajectory planning.” They are not synonymous: the former refers to the geometric selection of way-points, whereas the latter prescribes the time-varying evolution of the vehicle’s state (including timing and dynamic-performance constraints).
- The manuscript’s presentation must be strengthened: it frequently resorts to itemized lists (e.g., pp. 8, 9,10,17,19,20), a device rarely employed in formal academic writing. Please recast these passages into rigorous, continuous paragraphs.
Author Response
We sincerely thank Reviewer 1 for their constructive comments and valuable insights, which have greatly strengthened the clarity, structure, and technical rigor of the manuscript.
Below, we provide detailed responses to each point.
1. Comment:
The proposed method combines existing LSTM and attention mechanisms, lacking sufficient novelty. How does the proposed method achieve fair coverage?
Response:
We clarified the novelty of the proposed framework in Sections 1.1 and 4.3. Our contribution extends beyond a straightforward integration of LSTM and attention:
-
The framework introduces a fairness-aware reinforcement learning formulation that dynamically tracks previously unserved users through a memory-based fairness state, influencing both reward and policy updates.
-
Fairness is explicitly quantified through Jain’s Fairness Index (JFI) and the Coverage Disparity Index (CDI), which are integrated into both the reward function (Eq. 11) and the optimization formulation (Eq. 7).
-
The system prioritizes balanced service delivery among all users, including sparsely distributed nodes, outperforming prior RL-based UAV frameworks that lacked such fairness enforcement.
(See pages 2–3, 6–8, and 9 of the revised manuscript.)
2. Comment:
No explanation was provided for Figure 1.
Response:
We revised the caption and the accompanying paragraph on page 5, now providing a detailed description of the operational area, grid partitioning, UAV movement model, wind/obstacle regions, and hovering points. The visual and mathematical elements are now explicitly aligned for reader clarity.
3. Comment:
Many formatting issues need to be revised. For example, the format of text below Equations (1), (2), (5).
Response:
All formatting inconsistencies have been corrected. Text following equations now adheres to MDPI mathematical layout standards, ensuring consistent spacing, alignment, and indentation.
4. Comment:
The planning problem addressed in this paper is a 2-D path-planning task, a class that has already been extensively studied. Where does the difficulty of the present work lie?
Response:
We appreciate this insightful question. While the UAV operates in a 3D environment, the altitude is fixed for tractability, effectively projecting motion onto a 2D plane. However, the challenge lies not in geometric path planning, but in temporal and stochastic trajectory optimization under coupled energy, fairness, and environmental constraints.
The UAV must make sequential decisions that jointly consider wind disturbances, dynamic user distributions, energy depletion, and fairness-driven service memory, all interacting through a multi-objective reward.
These interactions yield a high-dimensional, partially observable, and non-Markovian decision space, with effective complexity scaling as O(M2×9T)O(M^2 \times 9^T).
Accordingly, Section 3.1 (Remark on Problem Complexity) now elaborates that, even with 2D motion, the problem remains computationally intensive and algorithmically novel relative to conventional path-planning studies.
5. Comment:
For optimization problem (7) what are the optimization variables?
Response:
All optimization variables and parameters are now explicitly defined below Equation (7) on page 8, including:
π\pi (policy), ptp_t (UAV position), EtE_t (energy), KtK_t (attached users), and parameters λ,κ,E,O,L,T\lambda, \kappa, E, O, L, T.
This addition improves transparency and completeness.
6. Comment:
The paper mixes the terms “path planning” and “trajectory planning.” They are not synonymous: the former refers to geometric selection of waypoints, whereas the latter prescribes time-varying evolution of state.
Response:
We thank the reviewer for catching this distinction. The manuscript was carefully revised to consistently use the term “trajectory optimization” throughout, aligning with the correct dynamic interpretation. A short clarifying note was added in Section 3.1.
7. Comment:
The manuscript’s presentation must be strengthened: it frequently resorts to itemized lists (e.g., pp. 8, 9,10,17,19,20), a device rarely employed in formal academic writing. Please recast these passages into rigorous, continuous paragraphs.
Response:
We revised the identified sections (pp. 8–20), rewriting all itemized lists as continuous narrative paragraphs. This significantly improved readability and ensured a formal academic style consistent with Drones formatting standards.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper introduces a novel LSTM-A2C reinforcement learning framework with attention for optimizing UAV-mounted base station trajectories and coverage in disaster scenarios. It effectively combines temporal modeling and selective focus to handle dynamic, partially observable environments, emphasizing fairness alongside coverage and energy efficiency. Supported by comprehensive simulations, the approach outperforms strong baselines in user coverage, mission completion, and spatial equity, demonstrating both technical innovation and practical relevance for real-time emergency communication systems. I have the following suggestions to improve the manuscript:
- I suggest improving Figure 1 because it is difficult to read the words since they are small and slanted.
- The domains of the variables in Equation (1) should be explicitly defined. For example, it would be helpful to indicate whether I_in_coverage is a binary variable.
- The authors should better explain the movements of the UAV, since if the UAV moves at a constant speed for each t and travels a distance of c/2, it will not reach the vertices of the small square in the diagonal movements as it appears in Fig. 1 with red dots. Specifically, the distance to the center would be sqrt(2)/2*c.
- The system model does not specify the mobility model of the ground users. It should be clarified whether users follow the same mobility model as the UAVs or a different one.
- The propulsion energy model should be explicitly presented, including the main parameters involved in its formulation. This clarification is critical, as energy consumption is incorporated as a constraint in the optimization problem. In Section V, energy is expressed in units per movement. This point should be clarified to avoid ambiguity.
- It is recommended to provide additional details about the optimization problem, as it represents the main objective of the work. In particular, the formulation of the objective function and the implications of the constraints should be discussed in greater depth.
- Current latency measurement is included in the state representation, but this parameter is not analyzed in the system model.
- The authors must investigate the optimal combination of hyperparameters, at least as part of future work. This aspect is important, as such an analysis could potentially enhance the reported results.
- The authors should clarify the criteria used to assign weights to the different components of the multi-objective problem presented in Equation (24), as this information is essential for understanding the formulation.
- Sections 5 and 6 are presented with the same title, which should be revised to avoid confusion.
Author Response
We thank Reviewer 2 for recognizing the technical depth of our study and for the constructive comments that enhanced model clarity and physical grounding.
1. Comment:
I suggest improving Figure 1 because it is difficult to read the words since they are small and slanted.
Response:
Figure 1 was redrawn with larger, non-slanted labels, higher-resolution text, and clear zone demarcations. The new figure meets MDPI graphical clarity standards.
2. Comment:
The domains of the variables in Equation (1) should be explicitly defined. For example, it would be helpful to indicate whether I_in_coverage is a binary variable.
Response:
We added explicit domain definitions:
Iin_coverage(i,t),Iserved(i,t−1)∈{0,1}I_{\text{in\_coverage}}(i, t), I_{\text{served}}(i, t-1) \in \{0,1\} (page 6).
A note now explains that these binary indicators determine user coverage and fairness tracking.
3. Comment:
The authors should better explain the movements of the UAV, since if the UAV moves at a constant speed for each t and travels a distance of c/2, it will not reach the vertices of the small square in the diagonal movements as it appears in Fig. 1 with red dots. Specifically, the distance to the center would be sqrt(2)/2c.*
Response:
We incorporated this correction in Equation (3) (page 6) and text, clarifying that diagonal displacements equal 2/2⋅c\sqrt{2}/2 \cdot c, ensuring uniform motion geometry across all directions.
4. Comment:
The system model does not specify the mobility model of the ground users. It should be clarified whether users follow the same mobility model as the UAVs or a different one.
Response:
Section 5.4 (User Distribution and Mobility Model) now clarifies that ground users are distributed across five Gaussian clusters (residential, commercial, emergency, shelter, hospital).
User positions remain stationary within each episode but are re-sampled across episodes through a Gaussian random-walk process, representing slow population mobility.
This decoupled mobility modeling captures realistic user variability without adding excessive computational complexity.
5. Comment:
The propulsion energy model should be explicitly presented, including the main parameters involved in its formulation.
Response:
Sections 3.4 and 5.2 (Energy Consumption Model) now describe the physical basis of our energy model.
Drawing from established rotary-wing UAV propulsion studies [23–26], we reference realistic hovering power (≈200–300 W) and cruising power (≈400–600 W) at 50 m altitude and 10–15 m/s speeds.
These values were normalized into discrete RL energy units (hover = 2, cardinal = 5, diagonal = 7), maintaining physical meaning while ensuring computational tractability.
6. Comment:
Provide additional details about the optimization problem.
Response:
Section 3.6 has been expanded to elaborate on the multi-objective optimization process, showing how coverage (NtN_t), fairness (JtJ_t), and disparity (CDItCDI_t) interact under energy and safety constraints.
7. Comment:
Current latency measurement is included in the state representation, but this parameter is not analyzed in the system model.
Response:
We acknowledge this inconsistency and have removed latency (LtL_t) from the state vector to ensure internal consistency.
A note in the Conclusion highlights that latency modeling will be incorporated in future work to support delay-sensitive QoS analysis.
8. Comment:
The authors must investigate the optimal combination of hyperparameters, at least as part of future work.
Response:
A new paragraph in the Conclusion (Section 7) commits to future hyperparameter sensitivity analysis (learning rate, entropy coefficient, attention weights) using automated tuning (e.g., grid/Bayesian optimization).
9. Comment:
The authors should clarify the criteria used to assign weights to the different components of the multi-objective problem presented in Equation (24).
Response:
Section 5.6 (Reward Function Weights) was expanded to explain the rationale behind weight selection.
The coefficients (η=1.0,λ=0.7,κ=0.5,μ=0.3,ξ=5.0)(\eta=1.0, \lambda=0.7, \kappa=0.5, \mu=0.3, \xi=5.0) were empirically tuned via pilot simulations, reflecting mission priorities—coverage as the primary goal, fairness and disparity penalties as secondary objectives, moderate energy penalties, and strict safety enforcement.
We also note that a Pareto-based sensitivity study is planned in future work.
10. Comment:
Sections 5 and 6 are presented with the same title, which should be revised to avoid confusion.
Response:
Corrected. The section titles now read:
-
Section 5: Simulation Framework and Experimental Setup
-
Section 7: Performance Evaluation
Reviewer 3 Report
Comments and Suggestions for Authors1.The multi-objective reward function is central to the agent's learning process (Eq. 11 and later in Section 5.3.2). For enhanced reproducibility and clarity, could the authors please add a brief sentence explaining the rationale or methodology for setting the specific weight values (e.g., Coverage=1.0, Fairness=0.7, etc.)? A short note clarifying if these were determined through empirical tuning or a sensitivity analysis would be very helpful for readers looking to build upon this work.
2.A key strength of the proposed model is the interpretability offered by the attention mechanism, as excellently visualized in Figure 4. To further emphasize the practical significance of this feature, the authors could briefly expand in Section 4.3.5 on how this interpretability could be leveraged by a human operator in a real-world disaster scenario (e.g., for post-mission analysis or for building trust in the autonomous system's decisions). This would strengthen the connection between the algorithmic innovation and its operational value.
3.The literature review in Section 2 provides a solid foundation. To further enrich the context, the authors might consider adding a brief comment on how their trajectory optimization approach could potentially synergize with other emerging physical-layer technologies aimed at improving UAV communication efficiency. For instance, recent research has explored the use of Intelligent Reflecting Surfaces (IRS) and advanced multiple access schemes to enhance energy efficiency and coverage in challenging non-line-of-sight conditions. Positioning the current work alongside these complementary research avenues would provide readers with a more holistic view of the innovation landscape.
4.The analysis of computational trade-offs in Section 6.4.2 is insightful, highlighting that the proposed method's increased training cost yields significant gains in deployment efficiency and performance. This is a crucial finding. The manuscript would benefit from explicitly restating this important trade-off in the Conclusion section. This would ensure that readers capture this key takeaway message about the value proposition of investing more computational resources during the offline training phase.
5.Table 1 provides a clear and effective comparison of the current work against the authors' prior work. To make the "State Space Complexity" row even more intuitive, consider adding a brief footnote explaining how the O(400 × 4²⁰) and O(841 × 9²⁵) expressions are derived (e.g., "(Number of Positions) × (Number of Actions)^(Episode Length)"). This is a minor addition that would enhance the table's accessibility for a broader audience.
Author Response
We thank Reviewer 3 for the encouraging comments and specific suggestions that helped refine both the clarity and interpretability of the manuscript.
1. Comment:
The multi-objective reward function is central to the agent’s learning process. For enhanced reproducibility and clarity, could the authors please add a brief explanation of the rationale for the specific weight values (Coverage=1.0, Fairness=0.7, etc.) and indicate if these were determined through empirical tuning or sensitivity analysis?
Response:
Section 5.6 (Reward Function Weights) has been expanded to detail the rationale for each coefficient in Equation (23).
Weights (η=1.0,λ=0.7,κ=0.5,μ=0.3,ξ=5.0)(\eta=1.0, \lambda=0.7, \kappa=0.5, \mu=0.3, \xi=5.0) were determined through pilot simulations, serving as an empirical sensitivity analysis to ensure stable convergence and balanced trade-offs among objectives.
The fairness term now explicitly combines Jain’s Index and the Coverage Disparity Index (CDI) for user- and region-level equity.
2. Comment:
A key strength of the model is interpretability via attention. Please expand on how this could aid real-world human operators.
Response:
We expanded Section 4.3.5 to explain that the attention mechanism enhances human interpretability by highlighting which regions or user clusters influenced UAV decisions.
These visual attention maps can support post-mission analysis, improve situational awareness, and foster trust in autonomous UAV systems during disaster response.
3. Comment:
Consider adding a discussion on potential synergy with emerging physical-layer technologies such as IRS or advanced multiple-access schemes.
Response:
We integrated this discussion at the end of Section 2 (Related Work), explaining how our RL-based trajectory optimization can synergize with Intelligent Reflecting Surfaces (IRS) and Non-Orthogonal Multiple Access (NOMA) to further enhance energy and spectrum efficiency in 6G networks.
4. Comment:
Restate the computational trade-off between training cost and performance in the conclusion.
Response:
Section 6 (Conclusion) now reiterates that the proposed model’s higher offline training cost is justified by significant deployment-time efficiency, fairness, and energy gains, underscoring the practical trade-off.
5. Comment:
Add a brief footnote explaining the derivation of the “State Space Complexity” expressions in Table 1.
Response:
A footnote was added below Table 1 clarifying that the complexity expressions follow
(Number of Positions) × (Number of Actions)^(Episode Length)")
providing a clear derivation example for both the previous and current frameworks.
