MA-PF-AD3PG: A Multi-Agent DRL Algorithm for Latency Minimization and Fairness Optimization in 6G IoV-Oriented UAV-Assisted MEC Systems
Highlights
- We develop a priority–fairness coupled optimization framework together with a multi-agent DRL algorithm (MA-PF-AD3PG) to jointly optimize latency, fairness, and task priority in UAV-assisted 6G IoV MEC systems.
- The proposed algorithm incorporates an occlusion-aware dynamic deadline model, fairness-aware preprocessing, and an adaptive delayed update mechanism, achieving significantly improved convergence stability and scheduling performance.
- The results demonstrate that fairness-driven multi-UAV cooperation can sustain near-perfect service fairness while reducing latency under dynamic vehicular environments.
- The findings offer practical insights for designing next-generation UAV-assisted drone communication systems that require balanced QoS, priority awareness, and system-wide efficiency.
Abstract
1. Introduction
2. Related Work
- Occlusion-Aware Dynamic Deadline Model: A novel environment-aware model is developed to capture the time-varying impact of occlusion on channel conditions, transforming real-time link variations into dynamic noise parameters. The model incorporates system-wide hard deadlines based on total task volume to better reflect vehicular service constraints.
- Priority–Fairness Coupled Optimization Framework: The latency minimization problem is reformulated into a two-level optimization structure, consisting of fairness-driven vehicular scheduling and UAV trajectory/resource allocation. This formulation enables adaptive decision-making while maintaining fairness and priority consistency among heterogeneous tasks.
- Multi-Agent Priority-based Fairness Adaptive Delayed DDPG (MA-PF-AD3PG) Algorithm: A novel MADRL algorithm is proposed by embedding a three-dimensional state representation—priority evaluation, connection history, and fairness factor—within an enhanced MADDPG architecture. The algorithm achieves adaptive trajectory control and offloading optimization, improving system fairness and reducing latency in dynamic 6G IoV environments.
3. System Model
3.1. Communication Model
3.2. Computation Model
3.2.1. Latency
3.2.2. Energy Consumption
3.3. Task Prioritization
3.4. User Fairness
3.4.1. Key Variable Definitions
- Service availability variable : This binary variable indicates whether is served ( if served, if not). It is derived from the connection variable :
- Cumulative service count : This variable represents the total number of times has been served up to time slot t, capturing the history of service allocation:
3.4.2. Modeling of Target Service Count
3.4.3. Fairness Evaluation Metrics
- Deviation variance : This metric measures the mean squared deviation between each user’s actual cumulative service count and its target service count, reflecting how well the system aligns with priority goals:
- Fairness index : This index intuitively reflects the fairness of the system at time slot t. It is defined as:
3.5. Formulation of the Optimization Problem
- Vehicle positions are confined within the service area:
- UAV flight control constraints:
- Task priority constraint:
- UAV energy constraint:
- Unique association constraint:
- Minimum task completion requirement:
- UAV collision-avoidance constraint:
4. Optimization Framework and Algorithm Development
4.1. MDP Formulation
4.1.1. State Space
4.1.2. Action Space
4.1.3. Reward Function
4.2. The MADDPG Algorithm
4.2.1. Network Architecture
- 1.
- Actor Network and Actor Target Network
- 2.
- Critic Network and Critic Target Network
4.2.2. Experience Replay
4.2.3. Network Parameter Update
- 1.
- Critic Network Update
- 2.
- Actor Network Update
- 3.
- Soft Update of Target Networks
4.3. The MA-PF-AD3PG Algorithm
4.3.1. Discrete Service Selection Preprocessing
| Algorithm 1 Fairness-Guaranteed Scheduling Algorithm (FGSA) |
| Require: Task set ; Priority mapping rule; Fairness factor ; Time horizon T. Ensure: Service object selection result .
|
4.3.2. Network Delayed Update Mechanism
4.3.3. MA-PF-AD3PG Algorithm Overview
| Algorithm 2 MA-PF-AD3PG Training Procedure |
| Require: System parameters Ensure: Optimized policies
|
4.3.4. Time Complexity of MA-PF-AD3PG
5. Simulation Analysis
5.1. Validation of the Priority-Based Fairness Mechanism
5.2. Impact of the Delayed Update Mechanism
5.3. Comparison with Other DRL Algorithms
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Tran, T.X.; Hajisami, A.; Pandey, P.; Pompili, D. Collaborative mobile edge computing in 5G networks: New paradigms, scenarios, and challenges. IEEE Commun. Mag. 2017, 55, 54–61. [Google Scholar] [CrossRef]
- Zhou, F.; Hu, R.Q.; Li, Z.; Wang, Y. Mobile edge computing in unmanned aerial vehicle networks. IEEE Wirel. Commun. 2020, 27, 140–146. [Google Scholar] [CrossRef]
- Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A survey on mobile edge computing: The communication perspective. IEEE Commun. Surv. Tutor. 2017, 19, 2322–2358. [Google Scholar] [CrossRef]
- Zhou, F.; Wu, Y.; Hu, R.Q.; Qian, Y. Computation rate maximization in UAV-enabled wireless-powered mobile-edge computing systems. IEEE J. Sel. Areas Commun. 2018, 36, 1927–1941. [Google Scholar] [CrossRef]
- Wang, Y.T.; Wang, H.; Ding, J.F.; Yu, H.B. From latency bottlenecks to seamless edge: AD3PG-powered joint optimization of UAV trajectory and task offloading. Comput. Netw. 2025, 272, 111700. [Google Scholar] [CrossRef]
- Jeremiah, S.R.; Yang, L.T.; Park, J.H. Digital twin-assisted resource allocation framework based on edge collaboration for vehicular edge computing. Future Gener. Comput. Syst. 2024, 150, 243–254. [Google Scholar] [CrossRef]
- Wang, F.; Zhang, S.; Hong, E.-K.; Quek, T.Q.S. Constellation as a Service: Tailored Connectivity Management in Direct-Satellite-to-Device Networks. IEEE Commun. Mag. 2025, 63, 30–36. [Google Scholar] [CrossRef]
- Ghosh, S.; Kuila, P. Efficient offloading in disaster-affected areas using unmanned aerial vehicle-assisted mobile edge computing: A gravitational search algorithm-based approach. Int. J. Disaster Risk Reduct. 2023, 97, 104067. [Google Scholar] [CrossRef]
- Ghosh, S.; Kuila, P.; Bey, M.; Azharuddin, M. Quantum-inspired gravitational search algorithm-based low-price binary task offloading for multi-users in unmanned aerial vehicle-assisted edge computing systems. Expert Syst. Appl. 2024, 263, 125762. [Google Scholar] [CrossRef]
- Liu, Y.; Yan, J.; Zhao, X. Deep reinforcement learning based latency minimization for mobile edge computing with virtualization in maritime UAV communication network. IEEE Trans. Veh. Technol. 2022, 271, 4225–4236. [Google Scholar] [CrossRef]
- Lu, Y.R.; Xu, C.; Wang, Y.T. Joint computation offloading and trajectory optimization for edge computing UAV: A KNN-DDPG algorithm. Drones 2024, 8, 564. [Google Scholar] [CrossRef]
- Li, J.; Sun, G.; Duan, L.; Wu, Q. Multi-Objective Optimization for UAV Swarm-Assisted IoT with Virtual Antenna Arrays. IEEE Trans. Mobile Comput. 2024, 23, 4890–4907. [Google Scholar] [CrossRef]
- Pang, S.; Wang, L.; Gui, H.; Qiao, S.; He, X.; Zhao, Z. UAV-IRS-assisted energy harvesting for edge computing based on deep reinforcement learning. Future Gener. Comput. Syst. 2025, 163, 107527. [Google Scholar] [CrossRef]
- Hu, Q.; Cai, Y.; Yu, G.; Qin, Z.; Zhao, M.; Li, G.Y. Joint offloading and trajectory design for UAV-enabled mobile edge computing systems. IEEE Internet Things J. 2019, 6, 1879–1892. [Google Scholar] [CrossRef]
- Xve, K.; Zhai, L.B.; Li, Y.M.; Lu, Z.K.; Zhou, W.J. Task offloading and multi-cache placement based on DRL in UAV-assisted MEC networks. Veh. Commun. 2025, 53, 100900. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, X.; Qin, K.; Wang, Z.; Yin, H.; Zhou, J.; Song, D. Dynamic Trajectory Control and User Association for Unmanned-Aerial-Vehicle-Assisted Mobile Edge Computing: A Deep Reinforcement Learning Approach. Drones 2025, 9, 367. [Google Scholar] [CrossRef]
- Yang, Y.L.; Xu, H.; Jin, Z.; Song, T.C.; Hu, J.; Song, X.Q. RS-DRL-based offloading policy and UAV trajectory design in F-MEC systems. Digit. Commun. Netw. 2025, 11, 377–386. [Google Scholar] [CrossRef]
- Wang, C.; Liu, K.; Yuan, Y.; Peng, S.C.; Li, G.R. Joint trajectory and offloading optimization in UAV-assisted MEC via federated multi-agent reinforcement learning and potential fields. Comput. Netw. 2025, 272, 111681. [Google Scholar] [CrossRef]
- Shen, F.F.; Yang, B.F.; Zhang, J.; Xu, C.; Chen, Y.; He, Y.X. TD3-based trajectory optimization for energy consumption minimization in UAV-assisted MEC system. Comput. Netw. 2024, 255, 110882. [Google Scholar] [CrossRef]
- Zheng, X.D.; Wu, Y.X.; Zhang, L.H.; Tang, M.B.; Zhu, F.S. Priority-aware path planning and user scheduling for UAV-mounted MEC networks: A deep reinforcement learning approach. Phys. Commun. 2024, 62, 102234. [Google Scholar] [CrossRef]
- Du, Y.; Wang, K.; Yang, K.; Zhang, G. Energy-efficient resource allocation in UAV based MEC system for IoT devices. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Huang, J.; Zhang, M.; Wan, J.; Chen, Y.; Zhang, N. Joint data caching and computation offloading in UAV-assisted internet of vehicles via federated deep reinforcement learning. IEEE Trans. Veh. Technol. 2024, 73, 17644–17656. [Google Scholar] [CrossRef]
- Hu, Z.; Yang, Y.; Gu, W.; Chen, Y.; Huang, J. DRL-based trajectory optimization and task offloading in hierarchical aerial MEC. IEEE Internet Things J. 2025, 12, 3410–3423. [Google Scholar] [CrossRef]
- Liu, J.; Wang, Y.; Pan, D.; Yuan, D. QoS-aware task offloading and resource allocation optimization in vehicular edge computing networks via MADDPG. Comput. Netw. 2024, 242, 110282. [Google Scholar] [CrossRef]
- Wu, Y.C.; Dinh, T.Q.; Fu, Y.; Lin, C.; Quek, T.Q.S. A Hybrid DQN and Optimization Approach for Strategy and Resource Allocation in MEC Networks. IEEE Trans. Wirel. Commun. 2021, 20, 4282–4295. [Google Scholar] [CrossRef]
- Cao, Y.; Wang, H.; Li, D.; Zhang, G. Smart Online Charging Algorithm for Electric Vehicles via Customized Actor–Critic Learning. IEEE Internet Things J. 2022, 9, 684–694. [Google Scholar] [CrossRef]
- Luo, X.; Wang, Q.; Gong, H.; Tang, C. UAV Path Planning Based on the Average TD3 Algorithm with Prioritized Experience Replay. IEEE Access 2024, 12, 38017–38029. [Google Scholar] [CrossRef]











| Parameter | Value | Unit |
|---|---|---|
| Noise Power (LOS) | −100 | dB |
| Noise Power (NLOS) | −80 | dB |
| VE Computing Frequency | 0.2 | GHz |
| UAV Computing Frequency | 1.2 | GHz |
| CPU Cycles per bit | 1000 | Cycles |
| Uplink Transmission Power | 0.1 | W |
| Reference Channel Gain (1 m) | −50 | dB |
| UAV Weight | 9.65 | kg |
| UAV Battery Capacity | 500 | kJ |
| Maximum UAV Flight Speed | 20 | m/s |
| Parameter | Value |
|---|---|
| Maximum Episodes () | 1000 |
| Actor Network Learning Rate () | 0.001 |
| Critic Network Learning Rate () | 0.002 |
| Discount Factor () | 0.5 |
| Soft Update Coefficient () | 0.01 |
| Minimum Variance () | 0.01 |
| Replay Buffer Size () | 10,000 |
| Batch Size (B) | 64 |
| Delayed Update Parameter () | 2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, Y.; Wang, H.; Yu, H. MA-PF-AD3PG: A Multi-Agent DRL Algorithm for Latency Minimization and Fairness Optimization in 6G IoV-Oriented UAV-Assisted MEC Systems. Drones 2026, 10, 9. https://doi.org/10.3390/drones10010009
Wang Y, Wang H, Yu H. MA-PF-AD3PG: A Multi-Agent DRL Algorithm for Latency Minimization and Fairness Optimization in 6G IoV-Oriented UAV-Assisted MEC Systems. Drones. 2026; 10(1):9. https://doi.org/10.3390/drones10010009
Chicago/Turabian StyleWang, Yitian, Hui Wang, and Haibin Yu. 2026. "MA-PF-AD3PG: A Multi-Agent DRL Algorithm for Latency Minimization and Fairness Optimization in 6G IoV-Oriented UAV-Assisted MEC Systems" Drones 10, no. 1: 9. https://doi.org/10.3390/drones10010009
APA StyleWang, Y., Wang, H., & Yu, H. (2026). MA-PF-AD3PG: A Multi-Agent DRL Algorithm for Latency Minimization and Fairness Optimization in 6G IoV-Oriented UAV-Assisted MEC Systems. Drones, 10(1), 9. https://doi.org/10.3390/drones10010009

