AoI-Aware Data Collection in Heterogeneous UAV-Assisted WSNs: Strong-Agent Coordinated Coverage and Vicsek-Driven Weak-Swarm Control
Abstract
1. Introduction
1.1. Motivations and Challenges
1.2. Solutions and Contributions
- Hierarchical heterogeneous UAV architecture: We design a two-tier framework where a small number of high-capability H-UAVs learn to coordinate regional coverage and manage large swarms of resource-constrained L-UAVs. This architecture naturally decomposes the complex global optimization problem into manageable subproblems while maintaining coordination through power–Voronoi partitioning that adapts to workload dynamics and UAV density.
- MADRL-based intelligent coordination: We formulate the H-UAV coordination problem as a partially observable Markov decision process (POMDP) and employ multi-agent deep deterministic policy gradient (MADDPG) with centralized training and decentralized execution. This enables H-UAVs to learn coordinated policies for trajectory planning, partition management, and uplink transmission control without requiring complete environmental models, while adapting to time-varying GU demands and channel conditions.
- Scalable self-organized L-UAV swarm control: We develop a weighted Vicsek model that incorporates task-specific factors, including the GUs’ AoI, wireless link quality, and congestion avoidance, to guide L-UAV motion within Voronoi cells. This decentralized mechanism requires only local information exchange, enabling efficient scaling to large-scale L-UAVs while achieving emergent collective behaviors such as coverage maximization and load balancing.
2. Related Work
2.1. UAV-Assisted Data Collection in Wireless Sensor Networks
2.2. Reinforcement Learning for UAV Control and Network Optimization
2.3. Swarm Intelligence and Bio-Inspired Control
2.4. Summary and Positioning of This Work
3. System Model
3.1. Heterogeneous UAV-Assisted Uplink Data Transmissions
3.1.1. Channel Model
3.1.2. GU-to-L-UAV Transmissions in Sub-Slot
3.1.3. NOMA Uplink from L-UAVs to H-UAVs in Sub-Slot
3.1.4. OFDMA Downlink from H-UAVs to RAP in Sub-Slot
3.2. Flow Conservation and Data Queue Dynamics
3.3. AoI Dynamics with L-UAV and H-UAV Queueing Under Full-Duplex Relaying
4. AoI-Aware Hierarchical MADRL for Coordinated Coverage and Collection with Hybrid UAV Swarms
4.1. AoI Minimization Problem Formulation
4.2. Power-Voronoi Partitioning with Adaptive Weights
4.3. Weighted Vicsek Model for L-UAV Mobility
- Bounded updates and numerical robustness: Our L-UAV motion update follows a bounded-step direction-field form, where the speed is fixed (or upper-bounded) and only the heading is updated using the normalized resultant vector. Concretely, with a normalization term (for a small ), the update prevents unbounded accelerations and improves numerical robustness. This boundedness inherently limits abrupt changes in motion and reduces high-frequency oscillations.
- L-UAV swarm avoids excessive clustering: The resultant control vector is composed of complementary terms. In particular, the congestion avoidance force term introduces short-range repulsion among nearby L-UAVs, acting as a soft separation constraint. This mechanism prevents excessive clustering and alleviates local deadlock caused by overcrowding in the same area. In practice, the repulsion magnitude can be clipped to avoid overly stiff responses that may induce jitter.
- Feasibility gating reduces futile oscillations: The task-driven attraction components (e.g., toward high-AoI regions or relay opportunities) are gated by link feasibility indicators (such as SINR/connectivity conditions). Hence, targets that are temporarily unreachable do not generate attraction, which avoids chasing behaviors and reduces oscillations due to repeatedly switching to infeasible objectives.
- Handling dynamic Voronoi partitions: When Voronoi regions evolve due to H-UAV decisions, boundary movement can in principle cause chattering near partition edges. Our design addresses this in two ways. First, the boundary-keeping term is activated only within a buffer distance from the boundary, which introduces hysteresis and reduces sensitivity to small boundary shifts. Moreover, if an L-UAV approaches or crosses the boundary, a projection step keeps the position within the feasible region, guaranteeing region adherence. Second, an implementation-friendly time-scale separation is adopted. Partition weights (and thus Voronoi boundaries) are updated at a slower period by H-UAVs than the L-UAV heading updates, or smoothed over time. This reduces high-frequency boundary fluctuations and improves stability without changing the overall framework.
4.4. H-UAVs’ Trajectory Planning via MADDPG
4.4.1. POMDP Formulation for H-UAV Coordination
4.4.2. DNN Updates in MADDPG
5. Numerical Results
5.1. Convergence Evaluation of SW-MADRL Framework
5.2. Trajectory Planning of the SW-MADRL Framework
5.3. AoI Performance Under Different Methods
6. Discussion
6.1. Comparison with Existing Paradigms
6.2. Advantages of the Hierarchical Architecture
6.3. Limitations and Challenges
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Krishnan, S.; Nemati, M.; Loke, S.W.; Park, J.; Choi, J. Energy-Efficient UAV-Assisted IoT Data Collection via TSP-Based Solution Space Reduction. In Proceedings of the 2023 IEEE Global Communications Conference 2023, Kuala Lumpur, Malaysia, 4–8 December 2023. [Google Scholar] [CrossRef]
- Soltani, K.; Coro, F.; Das, S.K. Optimizing UAV-Assisted Data Collection in IoT Sensor Networks Using Dual Cluster Head Strategy. In Proceedings of the 2024 IEEE 21st International Conference on Mobile Ad-Hoc and Smart Systems (MASS), Seoul, Republic of Korea, 23–25 September 2024; IEEE: Piscataway, NJ, USA; pp. 279–287. [CrossRef]
- Guo, X.; Liu, X.; Meng, Y.; Cheng, W.; Wang, W.; Zhu, L. Energy-Efficient Path Planning Scheme of Multiple UAVs for Reliable Data Collection. IEEE Internet Things J. 2025, 12, 50882–50898. [Google Scholar] [CrossRef]
- Meng, K.; He, X.; Wu, Q.; Li, D. Multi-UAV Collaborative Sensing and Communication: Joint Task Allocation and Power Optimization. IEEE Trans. Wirel. Commun. 2023, 22, 4232–4246. [Google Scholar] [CrossRef]
- Zhao, L.; Yao, Y.; Zhou, H.; Wang, H.; Leung, V.C.M. TD3-Based Collaborative Computation Offloading and Charging Scheduling in Multi-UAV-Assisted MEC Networks. In Proceedings of the 2024 IEEE Wireless Communications and Networking Conference (WCNC), Dubai, United Arab Emirates, 21–24 April 2024. [Google Scholar] [CrossRef]
- Samir, M.; Assi, C.; Sharafeddine, S.; Ebrahimi, D.; Ghrayeb, A. Age of Information Aware Trajectory Planning of UAVs in Intelligent Transportation Systems: A Deep Learning Approach. IEEE Trans. Veh. Technol. 2020, 69, 12382–12395. [Google Scholar] [CrossRef]
- Zhang, X.; Xing, H.; Shen, Y.; Xu, J.; Cui, S. Age of Information Minimization in UAV-Enabled IoT Networks via Federated Reinforcement Learning. IEEE Trans. Wirel. Commun. 2025, 24, 7923–7939. [Google Scholar] [CrossRef]
- Osband, I.; Blundell, C.; Pritzel, A.; Van Roy, B. Deep Exploration via Bootstrapped DQN. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016; Curran Associates, Inc.: Red Hook, NY, USA, 2016. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. Available online: https://arxiv.org/abs/1707.06347 (accessed on 10 November 2025).
- Babaeizadeh, M.; Frosio, I.; Tyree, S.; Clemons, J.; Kautz, J. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU. arXiv 2016, arXiv:1611.06256. Available online: https://arxiv.org/abs/1611.06256 (accessed on 10 November 2025).
- Abd-Elmagid, M.A.; Dhillon, H.S.; Pappas, N. A Reinforcement Learning Framework for Optimizing Age of Information in RF-Powered Communication Systems. IEEE Trans. Commun. 2020, 68, 4747–4760. [Google Scholar] [CrossRef]
- Rashid, T.; Farquhar, G.; Peng, B.; Whiteson, S. Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020; Curran Associates, Inc.: Red Hook, NY, USA, 2020. [Google Scholar]
- Foerster, J.; Farquhar, G.; Afouras, T.; Nardelli, N.; Whiteson, S. Counterfactual Multi-Agent Policy Gradients. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA, 2–7 February 2018; AAAI Press: Palo Alto, CA, USA, 2018. [Google Scholar] [CrossRef]
- Li, S.; Wu, Y.; Cui, X.; Dong, H.; Fang, F.; Russell, S. Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, USA, 27 January–1 February 2019; AAAI Press: Palo Alto, CA, USA, 2019; pp. 4213–4220. [Google Scholar] [CrossRef]
- Amato, C. An Introduction to Centralized Training for Decentralized Execution in Cooperative Multi-Agent Reinforcement Learning. arXiv 2024, arXiv:2409.03052. Available online: https://arxiv.org/abs/2409.03052 (accessed on 18 November 2025).
- Wu, F.; Zhang, H.; Wu, J.; Han, Z.; Poor, H.V.; Song, L. UAV-to-Device Underlay Communications: Age of Information Minimization by Multi-Agent Deep Reinforcement Learning. IEEE Trans. Commun. 2021, 69, 4461–4475. [Google Scholar] [CrossRef]
- Tang, H.; Hao, J.; Lv, T.; Chen, Y.; Zhang, Z.; Jia, H.; Ren, C.; Zheng, Y.; Meng, Z.; Fan, C.; et al. Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction. arXiv 2018, arXiv:1809.09332. Available online: https://arxiv.org/abs/1809.09332 (accessed on 18 November 2025).
- Wang, L.; Wang, K.; Pan, C.; Xu, W.; Aslam, N.; Hanzo, L. Multi-Agent Deep Reinforcement Learning-Based Trajectory Planning for Multi-UAV Assisted Mobile Edge Computing. IEEE Trans. Cogn. Commun. Netw. 2021, 7, 73–84. [Google Scholar] [CrossRef]
- Hu, H.; Zhu, F.; Yang, L.; Ren, W. Design of swarm control based on Vicsek model. In Proceedings of the International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2023), Yinchuan, China, 18–19 August 2023; SPIE: Bellingham, WA, USA, 2023. [Google Scholar] [CrossRef]
- Zeng, Q.; Nait-Abdesselam, F. Multi-Agent Reinforcement Learning-Based Extended Boid Modeling for Drone Swarms. In Proceedings of the 2024 IEEE International Conference on Communications (ICC 2024), Denver, CO, USA, 9–13 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1551–1556. [Google Scholar] [CrossRef]
- Marini, F.; Walczak, B. Particle swarm optimization (PSO). A tutorial. Chemom. Intell. Lab. Syst. 2015, 149, 153–165. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, Y.; He, L.; Li, D.; Liu, S.; Liu, N. A Bio-Inspired Adaptive Formation Architecture Based on Multi-Agents with Application to UAV Swarm. In Proceedings of the 2024 IEEE International Conference on Unmanned Systems (ICUS), Nanjing, China, 18–20 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 908–914. [Google Scholar] [CrossRef]
- Du, W.; Ying, W.; Yang, P.; Cao, X.; Yan, G.; Tang, K. Network-Based Heterogeneous Particle Swarm Optimization and Its Application in UAV Communication Coverage. IEEE Trans. Emerg. Topics Comput. Intell. 2020, 4, 312–323. [Google Scholar] [CrossRef]
- Zhou, L.; Leng, S.; Liu, Q.; Wang, Q. Intelligent UAV Swarm Cooperation for Multiple Targets Tracking. IEEE Internet Things J. 2022, 9, 743–754. [Google Scholar] [CrossRef]
- Li, J.; Wang, C.; Li, B.; Ding, L.; Song, L.; Huang, D. A Hybrid Coverage Control Method Based on Geodesic Sensing and Voronoi Partitioning for UAVs Exploration. In Proceedings of the 2024 International Conference on Guidance, Navigation and Control, Changsha, China, 9–11 August 2024; Yan, L., Duan, H., Deng, Y., Eds.; Springer Nature: Singapore, 2025; pp. 442–451. [Google Scholar] [CrossRef]
- Hao, H.; Xu, C.; Zhang, W.; Yang, S.; Muntean, G.M. Joint Task Offloading, Resource Allocation, and Trajectory Design for Multi-UAV Cooperative Edge Computing With Task Priority. IEEE Trans. Mob. Comput. 2024, 23, 8649–8663. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, H.; Long, K.; Jiang, C.; Guizani, M. Joint Resource Allocation and Trajectory Optimization With QoS in UAV-Based NOMA Wireless Networks. IEEE Trans. Wirel. Commun. 2021, 20, 6343–6355. [Google Scholar] [CrossRef]
- Zhong, R.; Liu, X.; Liu, Y.; Chen, Y. NOMA in UAV-aided cellular offloading: A machine learning approach. In Proceedings of the 2020 IEEE Globecom Workshops (GC Wkshps), Taipei, Taiwan, 7–11 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.C.; Kim, D.I. Applications of Deep Reinforcement Learning in Communications and Networking: A Survey. IEEE Commun. Surv. Tutor. 2019, 21, 3133–3174. [Google Scholar] [CrossRef]
- Zhao, S.; Gong, S.; Gu, B.; Li, L.; Lyu, B.; Thai Hoang, D.; Yi, C. Exploiting NOMA Transmissions in Multi-UAV-Assisted Wireless Networks: From Aerial-RIS to Mode-Switching UAVs. IEEE Trans. Wirel. Commun. 2025, 24, 2530–2544. [Google Scholar] [CrossRef]







| Symbol | Description |
|---|---|
| System Parameters and Sets | |
| , , | Set of H-UAVs, L-UAVs, and GUs, respectively |
| Duration of one time slot | |
| Duration of sub-slots 1 and 2, where | |
| Channel Model and Communication | |
| , | Channel gain and distance between nodes a and b at time t |
| Power of additive white Gaussian noise (AWGN) | |
| Transmit power of GU i, L-UAV j, and H-UAV k | |
| UAV Dynamics and Constraints | |
| Position and velocity of H-UAV k at time t | |
| Position and velocity of L-UAV l at time t | |
| Maximum velocity and acceleration of H-UAVs | |
| Minimum separation distance between H-UAVs and L-UAVs | |
| Power-Voronoi Partitioning | |
| Power-Voronoi cell associated with H-UAV k at time t | |
| Adaptive weight for H-UAV k in power-Voronoi diagram | |
| , | Set of GUs and L-UAVs within cell |
| Total data generation rate within cell | |
| Area of power-Voronoi cell | |
| SINR and Data Rates | |
| SINR for GU i to L-UAV j link in sub-slot | |
| SINR for L-UAV j to H-UAV k link in sub-slot | |
| SINR for H-UAV k to RAP link | |
| Data rate from GU i to L-UAV j in sub-slot | |
| Data rate from L-UAV j to H-UAV k in sub-slot | |
| Data rate from H-UAV k to RAP | |
| Sum-throughput from all L-UAVs to H-UAV k | |
| Queue Dynamics and Scheduling | |
| , | Data queue length at L-UAV j and H-UAV k at time t |
| Maximum buffer capacity | |
| Scheduling variable indicating GU i transmits to L-UAV j | |
| Packet size of GU i at time t | |
| Data generation rate of GU i | |
| Age of Information (AoI) | |
| Age of Information of GU i at time t | |
| End-to-end latency of GU i’s packet | |
| Indicator whether GU i’s packet was generated in slot t | |
| End-to-end success indicator for GU i | |
| Minimum and maximum AoI for normalization | |
| Weighted Vicsek Model | |
| Unnormalized direction vector for L-UAV l | |
| Alignment weight between L-UAVs l and j | |
| AoI-weighted attraction force | |
| Link quality enhancement force | |
| Congestion avoidance force | |
| Boundary confinement force | |
| AoI weight for GU i | |
| Link quality weight for L-UAV l | |
| Weight parameters for different forces | |
| Sensing radius, minimum separation radius, boundary threshold | |
| MADRL Framework | |
| Global state space | |
| , | Local observation space and action space of H-UAV k |
| , , | Local observation, action, and local reward of H-UAV k at time t |
| Actor and critic networks for H-UAV k | |
| Parameters of actor and critic networks | |
| Discount factor for reinforcement learning | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, L.; Li, L.; Zhao, S.; Qu, D.; Xu, J. AoI-Aware Data Collection in Heterogeneous UAV-Assisted WSNs: Strong-Agent Coordinated Coverage and Vicsek-Driven Weak-Swarm Control. Sensors 2026, 26, 419. https://doi.org/10.3390/s26020419
Huang L, Li L, Zhao S, Qu D, Xu J. AoI-Aware Data Collection in Heterogeneous UAV-Assisted WSNs: Strong-Agent Coordinated Coverage and Vicsek-Driven Weak-Swarm Control. Sensors. 2026; 26(2):419. https://doi.org/10.3390/s26020419
Chicago/Turabian StyleHuang, Lin, Lanhua Li, Songhan Zhao, Daiming Qu, and Jing Xu. 2026. "AoI-Aware Data Collection in Heterogeneous UAV-Assisted WSNs: Strong-Agent Coordinated Coverage and Vicsek-Driven Weak-Swarm Control" Sensors 26, no. 2: 419. https://doi.org/10.3390/s26020419
APA StyleHuang, L., Li, L., Zhao, S., Qu, D., & Xu, J. (2026). AoI-Aware Data Collection in Heterogeneous UAV-Assisted WSNs: Strong-Agent Coordinated Coverage and Vicsek-Driven Weak-Swarm Control. Sensors, 26(2), 419. https://doi.org/10.3390/s26020419

