Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (158)

Search Parameters:
Keywords = Multi-Agent Reinforcement Learning (MARL)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 2467 KB  
Article
H-MAPPO-Based UAV–Satellite Cooperative Deployment for Space–Air–Ground–Sea Integrated Networks
by Hua Yang, Yalan Shi, Yanli Xu and Naoki Wakamiya
Drones 2026, 10(5), 333; https://doi.org/10.3390/drones10050333 - 29 Apr 2026
Abstract
To support intelligent maritime applications, space–air–ground–sea integrated networks (SAGSINs) have been introduced in maritime communications to provide wide coverage and reliable network services. In unmanned aerial vehicle (UAV)-assisted SAGSIN architectures, UAVs can flexibly extend coverage and provide on-demand communication and computing support. However, [...] Read more.
To support intelligent maritime applications, space–air–ground–sea integrated networks (SAGSINs) have been introduced in maritime communications to provide wide coverage and reliable network services. In unmanned aerial vehicle (UAV)-assisted SAGSIN architectures, UAVs can flexibly extend coverage and provide on-demand communication and computing support. However, due to the high mobility of low Earth orbit (LEO) satellites and the limited endurance of UAVs, single-platform deployment strategies struggle to provide both flexibility and scalability in maritime communication networks. To mitigate the service instability caused by satellite orbital dynamics and limited UAV endurance, we propose a Hybrid Multi-Agent Proximal Policy Optimization (H-MAPPO)-based joint satellite–UAV deployment scheme for UAV-assisted SAGSIN systems. The proposed method optimizes joint UAV positioning and resource allocation to enhance communication coverage while reducing overall operational cost. By incorporating satellite orbital dynamics and UAV mobility into a multi-agent reinforcement learning (MARL) framework, adaptive resource scheduling can be achieved under time-varying maritime demands. Simulation results show that the proposed H-MAPPO algorithm achieves superior convergence performance, higher user coverage, and lower total system cost compared with learning-based, random, and heuristic methods while maintaining stable and robust performance under varying user densities and network scales. Full article
Show Figures

Figure 1

34 pages, 2661 KB  
Article
Predictive Mamba-Enhanced Multi-Agent Reinforcement Learning Control for Virtual Coupling of High-Speed Trains
by Han Hu, Qingsheng Feng, Zhun Han, Wangyang Liu and Hong Li
Electronics 2026, 15(9), 1823; https://doi.org/10.3390/electronics15091823 (registering DOI) - 24 Apr 2026
Viewed by 124
Abstract
Virtual coupling control of trains is a promising technology for improving railway capacity and operational efficiency. However, existing multi-agent reinforcement learning (MARL) approaches struggle to capture long-sequence temporal dependencies among train states in complex multi-train interaction scenarios, resulting in limited robustness and coordination [...] Read more.
Virtual coupling control of trains is a promising technology for improving railway capacity and operational efficiency. However, existing multi-agent reinforcement learning (MARL) approaches struggle to capture long-sequence temporal dependencies among train states in complex multi-train interaction scenarios, resulting in limited robustness and coordination stability. To address this issue, this paper proposes a Predictive Mamba-based Multi-Agent Soft Actor–Critic (PM-MASAC) framework. A Mamba-based state prediction module is embedded into the centralized Critic network to model historical state sequences and generate predictive state representations, thereby enhancing value estimation accuracy. In addition, a multi-agent aggregated prioritized experience replay (PER) mechanism is introduced to improve the utilization of critical cooperative samples and stabilize training. A hierarchical local–global reward structure is further designed to ensure individual tracking performance while promoting overall formation coordination. Experimental results under realistic railway operating conditions demonstrate that PM-MASAC achieves superior robustness compared with baseline MARL methods. Velocity and spacing tracking errors are maintained within 3% and 1%, respectively, and the steady-state formation success rate exceeds 95.7% in the training environment. Full article
30 pages, 961 KB  
Article
Semantic-Aware Resource Allocation for Massive Payload Data Backhaul in Space-Ground TT&C Networks
by Chenrui Song, Ziji Guo, Zhilong Zhang, Danpu Liu, Guixin Li and Yiguang Ren
Electronics 2026, 15(8), 1764; https://doi.org/10.3390/electronics15081764 - 21 Apr 2026
Viewed by 245
Abstract
The rapid development of space exploration demands real-time backhaul of massive sensing payload data in space-ground integrated telemetry, tracking, and command (TT&C) networks. However, traditional narrow-band TT&C links suffer from severe congestion during massive data backhaul. Since most TT&C applications are inherently task-oriented [...] Read more.
The rapid development of space exploration demands real-time backhaul of massive sensing payload data in space-ground integrated telemetry, tracking, and command (TT&C) networks. However, traditional narrow-band TT&C links suffer from severe congestion during massive data backhaul. Since most TT&C applications are inherently task-oriented and do not require pixel-perfect data reconstruction, we propose a task-oriented joint resource allocation framework based on semantic communications. Specifically, we introduce an adaptive semantic split computing mechanism that extracts and transmits only compact, decision-critical features instead of raw bitstreams, fundamentally mitigating the bandwidth bottleneck. The joint optimization of computation offloading, semantic splitting, and continuous on-board computing allocation is formulated as a stochastic mixed-integer nonlinear programming (MINLP) problem. We propose a decoupled algorithm based on Hierarchical Multi-Agent Proximal Policy Optimization (HMAPPO) to solve it. An outer layer employs multi-agent reinforcement learning (MARL) for distributed discrete decision-making, while an inner layer utilizes a Karush–Kuhn–Tucker (KKT)-based solver for continuous space-based computing allocation. This bi-level architecture overcomes the curse of dimensionality and mathematically guarantees zero-violation of physical capacity constraints. Simulations demonstrate that HMAPPO rapidly converges and sustains a high weighted success rate under heavy traffic congestion, significantly improving system utility compared to state-of-the-art baselines. Full article
(This article belongs to the Section Microwave and Wireless Communications)
Show Figures

Figure 1

37 pages, 4351 KB  
Article
Synthetic Learning and Control: MAPPO-Tuned MAADRC with Graph-Laplacian Enhancement for Resilient Multi-USV Formation in Dynamic Maritime Settings
by Xingda Li, Jianqiang Zhang, Yiping Liu, Pengfei Zhang and Jing Wang
Drones 2026, 10(4), 309; https://doi.org/10.3390/drones10040309 - 21 Apr 2026
Viewed by 169
Abstract
Formation control of unmanned surface vehicles (USVs) in complex marine environments is required to contend with strongly coupled, high-dimensional disturbances. A Multi-Agent Active Disturbance Rejection Control (MAADRC) framework is developed for this purpose. The design centers on a distributed extended state observer (DESO) [...] Read more.
Formation control of unmanned surface vehicles (USVs) in complex marine environments is required to contend with strongly coupled, high-dimensional disturbances. A Multi-Agent Active Disturbance Rejection Control (MAADRC) framework is developed for this purpose. The design centers on a distributed extended state observer (DESO) coupled with a dual-channel feedback structure—NEFL-GCO and LGL-FC—that collectively maintains formation geometry. Three main ideas underpin the approach. First, a bandwidth-efficient distributed observation scheme enables agents to share disturbance estimates while using substantially less communication bandwidth. Second, an adaptive consensus compensation mechanism accommodates parameter variations as formations evolve. Third, a formation-compatible obstacle avoidance algorithm enhances reliability in congested waters. To evaluate the control structure and optimize its parameters, a multi-agent reinforcement learning (MARL) method—specifically Multi-Agent Proximal Policy Optimization (MAPPO)—is employed. The MARL agent tunes two critical parameters: observer bandwidth and nonlinear feedback gain, thereby establishing a performance baseline. After ten million training steps, the MAPPO-optimized MAADRC achieves a tracking root-mean-square error (RMSE) of 1.18 m. This value lies within 3% of the manually tuned result of 1.21 m, indicating that the bandwidth parameterization is near-optimal. Extensive simulations incorporating realistic wind, wave and current disturbances demonstrate a dynamic obstacle avoidance success rate maintaining an expected level, alongside consistently low formation tracking errors. Collectively, these findings confirm the resilience and practical utility of the proposed framework in demanding maritime settings. Full article
Show Figures

Graphical abstract

24 pages, 21933 KB  
Article
Parametrized Graph Convolutional Multi-Agent Reinforcement Learning with Hybrid Action Spaces in Dynamic Topologies
by Pei Chi, Chen Liu, Jiang Zhao and Yingxun Wang
Biomimetics 2026, 11(4), 232; https://doi.org/10.3390/biomimetics11040232 - 1 Apr 2026
Viewed by 461
Abstract
Multi-agent swarm collaboration, inspired by the collective behaviors of biological swarms in nature, has wide applications in dynamic open environments. However, hybrid action spaces in multi-agent reinforcement learning (MARL) present a critical challenge: the inherent coupling between discrete and continuous actions severely undermines [...] Read more.
Multi-agent swarm collaboration, inspired by the collective behaviors of biological swarms in nature, has wide applications in dynamic open environments. However, hybrid action spaces in multi-agent reinforcement learning (MARL) present a critical challenge: the inherent coupling between discrete and continuous actions severely undermines policy stability and convergence, especially under dynamic topologies. Existing methods fail to decouple this coupling, leading to suboptimal policies and unstable training. This paper addresses the core problem of action coupling under dynamic topologies, proposing a Parametrized Graph Convolution Reinforcement Learning (P-DGN) method. Operating within the actor–critic framework, P-DGN decouples the optimization pathways for hybrid actions, with a biomimetic observation design inspired by starling flock behaviors: each agent only observes the states of its seven nearest neighbors to achieve efficient local interaction and global collaboration. Its actor network uses multi-head attention to build dynamic relation kernels, develops temporal relation regularization (TRR) to improve policy consistency across time steps, and generates continuous actions with a Gaussian policy. Meanwhile, P-DGN’s critic network, based on deep Q-network (DQN), evaluates Q-values for discrete actions to guide optimal choices. We evaluate P-DGN in two different multi-agent cooperative environments. Experimental results show that compared with parametrized deep Q-network (P-DQN) and DQN baseline, the proposed method has faster convergence speed and stronger training stability. Moreover, with dense rewards, P-DGN agents learn emergent tactics like encirclement. Overall, P-DGN offers a new approach for optimizing hybrid action spaces in multi-agent systems within open, dynamic environments, balancing theoretical generality with practical utility, and its biomimetic design provides a biologically plausible framework for multi-agent swarm collaboration. Full article
(This article belongs to the Special Issue Bionic Intelligent Robots)
Show Figures

Figure 1

24 pages, 1013 KB  
Article
DEDMAC: Disentangling Environment and Decision Messages for Multi-Agent Communication
by Yihan Liang and Jinlong Li
Information 2026, 17(4), 332; https://doi.org/10.3390/info17040332 - 1 Apr 2026
Viewed by 298
Abstract
In cooperative multi-agent reinforcement learning (MARL), communication can address the challenges of partial observability and environmental non-stationarity by conveying environmental features and decision intents, respectively. However, existing methods either focus on only one type of information—failing to tackle both challenges simultaneously—or conflate these [...] Read more.
In cooperative multi-agent reinforcement learning (MARL), communication can address the challenges of partial observability and environmental non-stationarity by conveying environmental features and decision intents, respectively. However, existing methods either focus on only one type of information—failing to tackle both challenges simultaneously—or conflate these signals, causing agents to confuse environmental context with decision intents. This paper introduces Disentangling Environment and Decision messages for Multi-Agent Communication (DEDMAC), a framework that explicitly separates these two information types into two distinct message streams and processes them independently. Specifically, environment messages are integrated into long-term memory to resolve partial observability, while decision messages provide instantaneous intent signals to mitigate non-stationarity and facilitate coordination. To prevent semantic confusion between the two message streams, we employ mutual information constraints to ensure semantic disentanglement. Furthermore, we design a mechanism that leverages global information to correct intent biases in decision messages resulting from limited local perspectives during generation. Evaluations across complex multi-agent benchmarks demonstrate that DEDMAC significantly outperforms state-of-the-art communication-based methods. These findings indicate that the explicit separation and specialized processing of environment and decision semantics are critical for achieving optimal performance in dynamic, collaborative multi-agent systems. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

22 pages, 28650 KB  
Article
Benchmarking MARL for UAV-Assisted Mobile Edge Computing Under Realistic 3D Collision Avoidance Navigation Constraints for Periodic Task Offloading
by Jiacheng Gu, Qingxu Meng, Qiurui Sun, Bing Zhu, Songnan Zhao and Shaode Yu
Technologies 2026, 14(4), 202; https://doi.org/10.3390/technologies14040202 - 27 Mar 2026
Viewed by 446
Abstract
The rapid growth of Internet of Things (IoT) and Industrial IoT applications has intensified the demand for low-latency and reliable computation support for deadline-constrained periodic real-time tasks. While unmanned aerial vehicles (UAVs) enabling mobile edge computing (MEC) can reduce latency by bringing compute [...] Read more.
The rapid growth of Internet of Things (IoT) and Industrial IoT applications has intensified the demand for low-latency and reliable computation support for deadline-constrained periodic real-time tasks. While unmanned aerial vehicles (UAVs) enabling mobile edge computing (MEC) can reduce latency by bringing compute closer to data sources, terrestrial MEC deployments often suffer from limited coverage and poor adaptability to spatially heterogeneous demand. In this paper, we study a multiple-UAV-assisted MEC system serving cluster-based IoT networks, where cluster heads generate deadline-constrained periodic tasks for offloading under strict deadlines. To ensure practical feasibility in dense urban environments, we benchmark UAV mobility using a realistic 3D collision avoidance navigation graph with shortest-path execution, rather than assuming unconstrained continuous UAV motion in free space. On top of this benchmark, we systematically compare three multi-agent reinforcement learning (MARL) paradigms for joint navigation and periodic task offloading: (i) continuous 3D control MARL that outputs motion commands directly; (ii) discrete graph-based MARL that selects collision-free shortest paths; and (iii) asynchronous macro-action MARL. Using a high-fidelity 3D digital twin of San Francisco, we evaluate these paradigms under a unified protocol in terms of offloading success, end-to-end latency, and energy consumption. The results reveal clear performance trade-offs induced by realistic 3D collision avoidance constraints and provide actionable insights for designing UAV-assisted MEC systems supporting periodic real-time task offloading. Full article
Show Figures

Figure 1

35 pages, 6392 KB  
Article
EO-MADDPG: An Improved Reinforcement Learning Approach for Multi-UAV Pursuit–Evasion Games
by Xiao Wang, Mengyu Wang, Xueqian Bai, Zhe Ma, Kewu Sun and Jiake Li
Aerospace 2026, 13(3), 296; https://doi.org/10.3390/aerospace13030296 - 21 Mar 2026
Viewed by 440
Abstract
To advance research in multi-agent reinforcement learning (MARL) for pursuit–evasion scenarios, this paper introduces a novel algorithm called Expert Knowledge and Opponent Modeling Multi-UAV Deep Deterministic Policy Gradient (EO-MADDPG). EO-MADDPG consists of two key components: the integration of expert knowledge and real-time sampled [...] Read more.
To advance research in multi-agent reinforcement learning (MARL) for pursuit–evasion scenarios, this paper introduces a novel algorithm called Expert Knowledge and Opponent Modeling Multi-UAV Deep Deterministic Policy Gradient (EO-MADDPG). EO-MADDPG consists of two key components: the integration of expert knowledge and real-time sampled data and the prediction of evader UAV actions. The expert knowledge includes a multi-UAV formation control algorithm and an encirclement strategy, which incorporates consensus algorithms and Apollonius circle guidance. Additionally, the network-training framework is optimized by integrating information about opponent actions under a fixed policy for improved prediction accuracy. The experiments focus on three vs. one and three vs. two scenarios, where pursuer UAVs utilize EO-MADDPG and evader UAVs follow fixed policies with Gaussian perturbations. Experimental results show that EO-MADDPG achieves success rates of 99.9 ± 0.3% and 97.5 ± 1.4% (mean ± std over five seeds) in three vs. one and three vs. two pursuit–evasion simulations, respectively, outperforming the baseline MADDPG (72.7 ± 6.0% and 64.4 ± 34.4%). Ablation studies and cooperative landmark tasks further demonstrate improved training stability and interpretability. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

18 pages, 1843 KB  
Article
Heterogeneous Computing Resources Scheduling Based on Time-Varying Graphs and Multi-Agent Reinforcement Learning
by Jinshan Yuan, Xuncai Zhang and Kexin Gong
Future Internet 2026, 18(3), 168; https://doi.org/10.3390/fi18030168 - 20 Mar 2026
Viewed by 423
Abstract
The evolution toward 6G Computing Power Networks (CPN) aims to deeply integrate multi-tier computing resources across Cloud, Edge, and end devices. However, the significant heterogeneity of computing resources, characterized by varying hardware architectures such as CPUs, GPUs, and NPUs, coupled with the time-varying [...] Read more.
The evolution toward 6G Computing Power Networks (CPN) aims to deeply integrate multi-tier computing resources across Cloud, Edge, and end devices. However, the significant heterogeneity of computing resources, characterized by varying hardware architectures such as CPUs, GPUs, and NPUs, coupled with the time-varying network topology caused by terminal mobility, poses severe challenges to realizing efficient integrated scheduling that satisfies Quality of Service (QoS). To address spatiotemporal mismatches between task requirements and hardware architectures, this paper proposes an integrated scheduling method combining Discrete Time-Varying Graph (DTVG) construction with Multi-Agent Reinforcement Learning (MARL). Specifically, we model the dynamic interaction between mobile tasks and heterogeneous nodes as a DTVG to capture spatiotemporal evolution and employ a QMIX-based algorithm to enable collaborative decision-making among distributed agents. Simulation results demonstrate that the proposed approach effectively solves the joint optimization problem of heterogeneous resource matching and dynamic path planning, significantly outperforming traditional baselines in terms of resource utilization and average latency. This study confirms that incorporating graph-theoretic modeling with reinforcement learning offers a robust solution for the complex coupling of communication and computation in dynamic 6G networks. Full article
(This article belongs to the Special Issue Collaborative Intelligence for Connected Agents)
Show Figures

Figure 1

24 pages, 4009 KB  
Article
Spatiotemporal-Aware Multi-Agent Reinforcement Learning for Revisit-Oriented Multi-Satellite Observation Task Scheduling
by Wenbo Zhang, Xuanyu Liu, Wei Zhao, Qi He, Chongbin Guo and Binpin Su
Appl. Sci. 2026, 16(6), 2685; https://doi.org/10.3390/app16062685 - 11 Mar 2026
Viewed by 378
Abstract
The scheduling of Earth observation satellites presents a formidable multi-objective optimization challenge, characterized by inherent trade-offs among task completion rate, execution timeliness, and the temporal uniformity of revisits. To address this, we introduce the Multi-Satellite Observation Task Scheduling (MSOTS) framework, a novel end-to-end [...] Read more.
The scheduling of Earth observation satellites presents a formidable multi-objective optimization challenge, characterized by inherent trade-offs among task completion rate, execution timeliness, and the temporal uniformity of revisits. To address this, we introduce the Multi-Satellite Observation Task Scheduling (MSOTS) framework, a novel end-to-end approach based on Multi-Agent Reinforcement Learning (MARL). This framework formulates the scheduling process as a Markov game, employing the Multi-Agent Proximal Policy Optimization (MAPPO) algorithm within a Centralized Training, Decentralized Execution (CTDE) paradigm to effectively navigate these competing objectives. Furthermore, to ensure a balanced evaluation, we propose a Composite Multi-Objective Performance Score grounded in a weighted harmonic mean. Comprehensive empirical evaluations conducted on large-scale, simulated orbital scenarios demonstrate that MSOTS significantly outperforms both traditional heuristics and existing deep reinforcement learning methods in comprehensive performance and robust efficiency. This research provides a highly effective and intelligent approach to modern satellite task scheduling. Full article
(This article belongs to the Collection Space Applications)
Show Figures

Figure 1

39 pages, 67440 KB  
Article
LLM-TOC: LLM-Driven Theory-of-Mind Adversarial Curriculum for Multi-Agent Generalization
by Chenxu Wang, Jiang Yuan, Tianqi Yu, Xinyue Jiang, Liuyu Xiang, Junge Zhang and Zhaofeng He
Mathematics 2026, 14(5), 915; https://doi.org/10.3390/math14050915 - 8 Mar 2026
Viewed by 719
Abstract
Zero-shot generalization to out-of-distribution (OOD) teammates and opponents in multi-agent systems (MASs) remains a fundamental challenge for general-purpose AI, especially in open-ended interaction scenarios. Existing multi-agent reinforcement learning (MARL) paradigms, such as self-play and population-based training, often collapse to a limited subset of [...] Read more.
Zero-shot generalization to out-of-distribution (OOD) teammates and opponents in multi-agent systems (MASs) remains a fundamental challenge for general-purpose AI, especially in open-ended interaction scenarios. Existing multi-agent reinforcement learning (MARL) paradigms, such as self-play and population-based training, often collapse to a limited subset of Nash equilibria, leaving agents brittle when faced with semantically diverse, unseen behaviors. Recent approaches that invoke Large Language Models (LLMs) at run time can improve adaptability but introduce substantial latency and can become less reliable as task horizons grow; in contrast, LLM-assisted reward-shaping methods remain constrained by the inefficiency of the inner reinforcement-learning loop. To address these limitations, we propose LLM-TOC (LLM-Driven Theory-of-Mind Adversarial Curriculum), which casts generalization as a bi-level Stackelberg game: in the inner loop, a MARL agent (the follower) minimizes regret against a fixed population, while in the outer loop, an LLM serves as a semantic oracle that generates executable adversarial or cooperative strategies in a Turing-complete code space to maximize the agent’s regret. To cope with the absence of gradients in discrete code generation, we introduce Gradient Saliency Feedback, which transforms pixel-level value fluctuations into semantically meaningful causal cues to steer the LLM toward targeted strategy synthesis. We further provide motivating theoretical analysis via the PAC-Bayes framework, showing that LLM-TOC converges at rate O(1/K) and yields a tighter generalization error bound than parameter-space exploration under reasonable preconditions. Experiments on the Melting Pot benchmark demonstrate that, with expected cumulative collective return as the core zero-shot generalization metric, LLM-TOC consistently outperforms self-play baselines (IPPO and MAPPO) and the LLM-inference method Hypothetical Minds across all held-out test scenarios, reaching 75% to 85% of the upper-bound performance of Oracle PPO. Meanwhile, with the number of RL environment interaction steps to reach the target relative performance as the core efficiency metric, our framework reduces the total training computational cost by more than 60% compared with mainstream baselines. Full article
(This article belongs to the Special Issue Applications of Intelligent Game and Reinforcement Learning)
Show Figures

Figure 1

22 pages, 2995 KB  
Article
Energy-Efficient Distributed AUV Swarm for Target Tracking via LSTM-Assisted Offline-to-Online Reinforcement Learning
by Renbo Li, Denghui Li, Xiangxin Zhang and Weiming Ni
Drones 2026, 10(3), 158; https://doi.org/10.3390/drones10030158 - 26 Feb 2026
Viewed by 575
Abstract
In recent years, autonomous underwater vehicles (AUVs) have been increasingly employed for target surveillance and tracking. However, the limited performance and information-processing capability of a single AUV make it difficult to achieve high-precision tracking in practice. To address these challenges, this paper proposes [...] Read more.
In recent years, autonomous underwater vehicles (AUVs) have been increasingly employed for target surveillance and tracking. However, the limited performance and information-processing capability of a single AUV make it difficult to achieve high-precision tracking in practice. To address these challenges, this paper proposes an online-to-offline multi-agent reinforcement learning (MARL) framework that employs offline training on historical data to obtain the expert policy. Then, the optimal policy is generated by online fine-tuning technology, which enhances the training efficiency of reinforcement learning in new scenarios. To expand the surveillance range of AUV swarms, a distributed cooperative strategy based on area information entropy (AIE) is introduced. To reduce energy consumption in complex marine environments containing obstacles and vortices, ocean current and energy consumption models are introduced, together with an energy-efficiency optimization strategy. Furthermore, a long short-term memory (LSTM) network is integrated into the offline-to-online MARL framework to predict time-varying environmental states, thereby improving tracking accuracy and energy efficiency. Experimental results show that the proposed scheme is superior to the baseline schemes in terms of energy consumption, task success rate, and distance between AUVs. In addition, various performance indicators of the extended AUV swarm are also superior to the baseline schemes, demonstrating that the proposed scheme has excellent performance and scalability. Full article
Show Figures

Figure 1

40 pages, 8354 KB  
Article
System-Level Optimization of AUV Swarm Control and Perception: An Energy-Aware Federated Meta-Transfer Learning Framework with Digital Twin Validation
by Zinan Nie, Hongjun Tian, Yijie Yin, Yuhan Zhou, Wei Li, Yang Xiong, Yichen Wang, Zitong Zhang, Yang Yang, Dongxiao Xie, Manlin Wang and Shijie Huang
J. Mar. Sci. Eng. 2026, 14(4), 384; https://doi.org/10.3390/jmse14040384 - 18 Feb 2026
Viewed by 622
Abstract
Deep-sea exploration increasingly relies on Autonomous Underwater Vehicles (AUVs) to enable persistent, wide-area surveying in harsh and uncertain environments. In practice, however, deployments are constrained by tight energy budgets and bandwidth-limited, intermittent acoustic links, which complicate mission-level coordination. Moreover, many existing systems treat [...] Read more.
Deep-sea exploration increasingly relies on Autonomous Underwater Vehicles (AUVs) to enable persistent, wide-area surveying in harsh and uncertain environments. In practice, however, deployments are constrained by tight energy budgets and bandwidth-limited, intermittent acoustic links, which complicate mission-level coordination. Moreover, many existing systems treat perception and control as loosely coupled modules, often resulting in redundant sensing, inefficient communication, and degraded overall performance—particularly under heterogeneous sensing modalities and shifting geological conditions. To address these challenges, we propose a hierarchical Federated Meta-Transfer Learning (FMTL) framework that tightly integrates collaborative perception with adaptive control for swarm optimization. The framework operates at three levels: (1) Representation Learning aligns heterogeneous sensors in a shared latent space via a physics-informed contrastive objective, substantially reducing communication overhead; (2) Meta-Learning Adaptation enables rapid transfer and convergence in new environments with minimal data exchange; and (3) Energy-Aware Control realizes closed-loop exploration by coupling Federated Explainable AI (FXAI) with decentralized multi-agent reinforcement learning (MARL) for path planning under energy constraints. Validated in high-fidelity hardware-in-the-loop simulations and a digital-twin environment, FMTL outperforms state-of-the-art baselines, achieving an AUC of 0.94 for target identification. Furthermore, an energy–intelligence Pareto analysis demonstrates a 4.5× improvement in information gain per Joule. Overall, this work provides a physically consistent and communication-efficient blueprint for the optimization and control of next-generation intelligent marine swarms. Full article
(This article belongs to the Special Issue System Optimization and Control of Unmanned Marine Vehicles)
Show Figures

Figure 1

30 pages, 4914 KB  
Article
MSTAGNN-MARL: A Multi-Level Intelligent Decision Framework for Integrated Spatial-Temporal Conflict Resolution in High-Density Airspace
by Ershen Wang, Haolong Xu, Nan Yu, Fei Liu, Guipeng Ji, Song Xu, Pingping Qu and Yunhao Chen
Aerospace 2026, 13(2), 175; https://doi.org/10.3390/aerospace13020175 - 12 Feb 2026
Viewed by 527
Abstract
The spatial and temporal conflicts within terminal maneuvering areas, particularly in multi-airport systems, are growing increasingly complex. Traditional independent processing methods face inherent limitations when dealing with multi-source uncertainties, dynamic weather conditions, and high-density operations. This paper proposes MSTAGNN-MARL that systematically integrates the [...] Read more.
The spatial and temporal conflicts within terminal maneuvering areas, particularly in multi-airport systems, are growing increasingly complex. Traditional independent processing methods face inherent limitations when dealing with multi-source uncertainties, dynamic weather conditions, and high-density operations. This paper proposes MSTAGNN-MARL that systematically integrates the resolution of spatial conflicts and temporal scheduling issues. This framework is based on four crucial innovations: First, a strategic-tactical-execution hierarchical architecture is constructed that integrates multi-criteria decision optimization with graph neural network-based multi-agent reinforcement learning. Second, an uncertainty perception mechanism is designed that explicitly encodes conflict features as dynamic edge attributes in social graphs, incorporating a real-time dynamic weather model and a Gaussian noise-based perception uncertainty model. Third, develop a compliance automated system for behavior cloning that learns the decision preferences of controllers to achieve human–machine collaboration and provide transparent visualization. Fourth, a robustness assurance mechanism for abnormal scenarios is constructed, employing behavior tree-driven emergency strategies to handle unexpected situations. Experiments demonstrate that the proposed method achieves an 89.3% conflict resolution rate, reduces average delays by 6 min compared to existing methods, and exhibits robust performance under varying traffic densities and dynamic weather conditions. Ablation experiments validate the effectiveness of the four innovations. This framework provides a new research paradigm for scheduling and decision-making in Intelligent Transportation Systems (ITS). Full article
(This article belongs to the Section Air Traffic and Transportation)
Show Figures

Figure 1

35 pages, 2737 KB  
Article
Joint Trajectory and Power Optimization for Loosely Coupled Tasks: A Decoupled-Critic MAPPO Approach
by Xiangyu Wu, Changbo Hou, Guojing Meng, Zhichao Zhou and Qin Liu
Drones 2026, 10(2), 116; https://doi.org/10.3390/drones10020116 - 6 Feb 2026
Viewed by 593
Abstract
Multi-unmanned aerial vehicle (UAV) systems are crucial for establishing resilient communication networks in disaster-stricken areas, but their limited energy and dynamic characteristics pose significant challenges for sustained and reliable service provision. Optimizing resource allocation in this situation is a complex sequential decision-making problem, [...] Read more.
Multi-unmanned aerial vehicle (UAV) systems are crucial for establishing resilient communication networks in disaster-stricken areas, but their limited energy and dynamic characteristics pose significant challenges for sustained and reliable service provision. Optimizing resource allocation in this situation is a complex sequential decision-making problem, which is naturally suitable for multi-agent reinforcement learning (MARL). However, the most advanced MARL methods (e.g., multi-agent proximal policy optimization (MAPPO)) often encounter difficulties in the “loosely coupled” multi-UAV environment due to their overly centralized evaluation mechanism, resulting in unclear credit assignment and inhibiting personalized optimization. To overcome this, we propose a novel hierarchical framework supported by MAPPO with decoupled critics (MAPPO-DC). Our framework employs an efficient clustering algorithm for user association in the upper layer, while MAPPO-DC is used in the lower layer to enable each UAV to learn customized trajectories and power control strategies. MAPPO-DC achieves a complex balance between global coordination and personalized exploration by redesigning the update rules of the critic network, allowing for precise and personalized credit assignment in a loosely coupled environment. In addition, we designed a composite reward function to guide the learning process towards the goal of proportional fairness. The simulation results show that our proposed MAPPO-DC outperforms existing baselines, including independent proximal policy optimization (IPPO) and standard MAPPO, in terms of communication performance and sample efficiency, validating the effectiveness of our tailored MARL architecture for the task. Through model robustness experiments, we have verified that our proposed MAPPO-DC still has certain advantages in strongly coupled environments. Full article
(This article belongs to the Section Drone Communications)
Show Figures

Figure 1

Back to TopTop