You are currently viewing a new version of our website. To view the old version click .
Drones
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

25 December 2025

MA-PF-AD3PG: A Multi-Agent DRL Algorithm for Latency Minimization and Fairness Optimization in 6G IoV-Oriented UAV-Assisted MEC Systems

,
and
1
School of Information Engineering, Shenyang University of Chemical Technology, Shenyang 110142, China
2
State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
3
Key Laboratory of Networked Control Systems, Chinese Academy of Sciences, Shenyang 110016, China
4
University of Chinese Academy of Sciences, Beijing 100049, China
Drones2026, 10(1), 9;https://doi.org/10.3390/drones10010009 
(registering DOI)
This article belongs to the Section Drone Communications

Abstract

The rapid proliferation of connected and autonomous vehicles in the 6G era demands ultra-reliable and low-latency computation with intelligent resource coordination. Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC) provides a flexible and scalable solution to extend coverage and enhance offloading efficiency for dynamic Internet of Vehicles (IoV) environments. However, jointly optimizing task latency, user fairness, and service priority under time-varying channel conditions remains a fundamental challenge.To address this issue, this paper proposes a novel Multi-Agent Priority-based Fairness Adaptive Delayed Deep Deterministic Policy Gradient (MA-PF-AD3PG) algorithm for UAV-assisted MEC systems. An occlusion-aware dynamic deadline model is first established to capture real-time link blockage and channel fading. Based on this model, a priority–fairness coupled optimization framework is formulated to jointly minimize overall latency and balance service fairness across heterogeneous vehicular tasks. To efficiently solve this NP-hard problem, the proposed MA-PF-AD3PG integrates fairness-aware service preprocessing and an adaptive delayed update mechanism within a multi-agent deep reinforcement learning structure, enabling decentralized yet coordinated UAV decision-making. Extensive simulations demonstrate that MA-PF-AD3PG achieves superior convergence stability, 13–57% higher total rewards, up to 46% lower delay, and nearly perfect fairness compared with state-of-the-art Deep Reinforcement Learning (DRL) and heuristic methods.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.