Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (134)

Search Parameters:
Keywords = multi-agent reinforcement learning (MARL)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 1672 KB  
Article
Relaxed Monotonic QMIX (R-QMIX): A Regularized Value Factorization Approach to Decentralized Multi-Agent Reinforcement Learning
by Liam O’Brien and Hao Xu
Robotics 2026, 15(1), 28; https://doi.org/10.3390/robotics15010028 - 21 Jan 2026
Abstract
Value factorization methods have become a standard tool for cooperative multi-agent reinforcement learning (MARL) in the centralized-training, decentralized-execution (CTDE) setting. QMIX (a monotonic mixing network for value factorization), in particular, constrains the joint action–value function to be a monotonic mixing of per-agent utilities, [...] Read more.
Value factorization methods have become a standard tool for cooperative multi-agent reinforcement learning (MARL) in the centralized-training, decentralized-execution (CTDE) setting. QMIX (a monotonic mixing network for value factorization), in particular, constrains the joint action–value function to be a monotonic mixing of per-agent utilities, which guarantees consistency with individual greedy policies but can severely limit expressiveness on tasks with non-monotonic agent interactions. This work revisits this design choice and proposes Relaxed Monotonic QMIX (R-QMIX), a simple regularized variant of QMIX that encourages but does not strictly enforce the monotonicity constraint. R-QMIX removes the sign constraints on the mixing network weights and introduces a differentiable penalty on negative partial derivatives of the joint value with respect to each agent’s utility. This preserves the computational benefits of value factorization while allowing the joint value to deviate from strict monotonicity when beneficial. R-QMIX is implemented in a standard PyMARL (an open-source MARL codebase) and evaluated on the StarCraft Multi-Agent Challenge (SMAC). On a simple map (3m), R-QMIX matches the asymptotic performance of QMIX while learning substantially faster. On more challenging maps (MMM2, 6h vs. 8z, and 27m vs. 30m), R-QMIX significantly improves both sample efficiency and final win rate (WR), for example increasing the final-quarter mean win rate from 42.3% to 97.1% on MMM2, from 0.0% to 57.5% on 6h vs. 8z, and from 58.0% to 96.6% on 27m vs. 30m. These results suggest that soft monotonicity regularization is a practical way to bridge the gap between strictly monotonic value factorization and fully unconstrained joint value functions. A further comparison against QTRAN (Q-value transformation), a more expressive value factorization method, shows that R-QMIX achieves higher and more reliably convergent win rates on the challenging SMAC maps considered. Full article
(This article belongs to the Special Issue AI-Powered Robotic Systems: Learning, Perception and Decision-Making)
50 pages, 3712 KB  
Article
Explainable AI and Multi-Agent Systems for Energy Management in IoT-Edge Environments: A State of the Art Review
by Carlos Álvarez-López, Alfonso González-Briones and Tiancheng Li
Electronics 2026, 15(2), 385; https://doi.org/10.3390/electronics15020385 - 15 Jan 2026
Viewed by 126
Abstract
This paper reviews Artificial Intelligence techniques for distributed energy management, focusing on integrating machine learning, reinforcement learning, and multi-agent systems within IoT-Edge-Cloud architectures. As energy infrastructures become increasingly decentralized and heterogeneous, AI must operate under strict latency, privacy, and resource constraints while remaining [...] Read more.
This paper reviews Artificial Intelligence techniques for distributed energy management, focusing on integrating machine learning, reinforcement learning, and multi-agent systems within IoT-Edge-Cloud architectures. As energy infrastructures become increasingly decentralized and heterogeneous, AI must operate under strict latency, privacy, and resource constraints while remaining transparent and auditable. The study examines predictive models ranging from statistical time series approaches to machine learning regressors and deep neural architectures, assessing their suitability for embedded deployment and federated learning. Optimization methods—including heuristic strategies, metaheuristics, model predictive control, and reinforcement learning—are analyzed in terms of computational feasibility and real-time responsiveness. Explainability is treated as a fundamental requirement, supported by model-agnostic techniques that enable trust, regulatory compliance, and interpretable coordination in multi-agent environments. The review synthesizes advances in MARL for decentralized control, communication protocols enabling interoperability, and hardware-aware design for low-power edge devices. Benchmarking guidelines and key performance indicators are introduced to evaluate accuracy, latency, robustness, and transparency across distributed deployments. Key challenges remain in stabilizing explanations for RL policies, balancing model complexity with latency budgets, and ensuring scalable, privacy-preserving learning under non-stationary conditions. The paper concludes by outlining a conceptual framework for explainable, distributed energy intelligence and identifying research opportunities to build resilient, transparent smart energy ecosystems. Full article
Show Figures

Figure 1

23 pages, 3086 KB  
Article
MARL-Driven Decentralized Crowdsourcing Logistics for Time-Critical Multi-UAV Networks
by Juhyeong Han and Hyunbum Kim
Electronics 2026, 15(2), 331; https://doi.org/10.3390/electronics15020331 - 12 Jan 2026
Viewed by 119
Abstract
Centralized UAV logistics controllers can achieve strong navigation performance in controlled settings, but they do not capture key deployment factors in crowdsourcing-enabled emergency logistics, where heterogeneous UAV owners participate with unreliability and dropout, and incentive expenditure and fairness must be accounted for. This [...] Read more.
Centralized UAV logistics controllers can achieve strong navigation performance in controlled settings, but they do not capture key deployment factors in crowdsourcing-enabled emergency logistics, where heterogeneous UAV owners participate with unreliability and dropout, and incentive expenditure and fairness must be accounted for. This paper presents a decentralized crowdsourcing multi-UAV emergency logistics framework on an edge-orchestrated architecture that (i) performs urgency-aware dispatch under distance/energy/payload constraints, (ii) tracks reliability and participation dynamics under stress (unreliable agents and dropout), and (iii) quantifies incentive feasibility via total payment and payment inequality (Gini). We adopt a hybrid decision design in which PPO/DQN policies provide real-time navigation/control, while GA/ACO act as planning-level route refinement modules (not reinforcement learning) to improve global candidate quality under safety constraints. We evaluate the framework in a controlled grid-world simulator and explicitly report stress-matched re-evaluation results under matched stress settings, where applicable. In the nominal comparison, centralized DQN attains high navigation-centric success (e.g., 0.970 ± 0.095) with short reach steps, but it omits incentives by construction, whereas the proposed crowdsourcing method reports measurable payment and fairness outcomes (e.g., payment and Gini) and remains evaluable under unreliability and dropout sweeps. We further provide a utility decomposition that attributes negative-utility regimes primarily to collision-related costs and secondarily to incentive expenditure, clarifying the operational trade-off between mission value, safety risk, and incentive cost. Overall, the results indicate that navigation-only baselines can appear strong when participation economics are ignored, while a deployable crowdsourcing system must explicitly expose incentive/fairness and robustness characteristics under stress. Full article
(This article belongs to the Special Issue Parallel and Distributed Computing for Emerging Applications)
Show Figures

Figure 1

50 pages, 3579 KB  
Article
Safety-Aware Multi-Agent Deep Reinforcement Learning for Adaptive Fault-Tolerant Control in Sensor-Lean Industrial Systems: Validation in Beverage CIP
by Apolinar González-Potes, Ramón A. Félix-Cuadras, Luis J. Mena, Vanessa G. Félix, Rafael Martínez-Peláez, Rodolfo Ostos, Pablo Velarde-Alvarado and Alberto Ochoa-Brust
Technologies 2026, 14(1), 44; https://doi.org/10.3390/technologies14010044 - 7 Jan 2026
Viewed by 308
Abstract
Fault-tolerant control in safety-critical industrial systems demands adaptive responses to equipment degradation, parameter drift, and sensor failures while maintaining strict operational constraints. Traditional model-based controllers struggle under these conditions, requiring extensive retuning and dense instrumentation. Recent safe multi-agent reinforcement learning (MARL) frameworks with [...] Read more.
Fault-tolerant control in safety-critical industrial systems demands adaptive responses to equipment degradation, parameter drift, and sensor failures while maintaining strict operational constraints. Traditional model-based controllers struggle under these conditions, requiring extensive retuning and dense instrumentation. Recent safe multi-agent reinforcement learning (MARL) frameworks with control barrier functions (CBFs) achieve real-time constraint satisfaction in robotics and power systems, yet assume comprehensive state observability—incompatible with sensor-hostile industrial environments where instrumentation degradation and contamination risks dominate design constraints. This work presents a safety-aware multi-agent deep reinforcement learning framework for adaptive fault-tolerant control in sensor-lean industrial environments, achieving formal safety through learned implicit barriers under partial observability. The framework integrates four synergistic mechanisms: (1) multi-layer safety architecture combining constrained action projection, prioritized experience replay, conservative training margins, and curriculum-embedded verification achieving zero constraint violations; (2) multi-agent coordination via decentralized execution with learned complementary policies. Additional components include (3) curriculum-driven sim-to-real transfer through progressive four-stage learning achieving 85–92% performance retention without fine-tuning; (4) offline extended Kalman filter validation enabling 70% instrumentation reduction (91–96% reconstruction accuracy) for regulatory auditing without real-time estimation dependencies. Validated through sustained deployment in commercial beverage manufacturing clean-in-place (CIP) systems—a representative safety-critical testbed with hard flow constraints (≥1.5 L/s), harsh chemical environments, and zero-tolerance contamination requirements—the framework demonstrates superior control precision (coefficient of variation: 2.9–5.3% versus 10% industrial standard) across three hydraulic configurations spanning complexity range 2.1–8.2/10. Comprehensive validation comprising 37+ controlled stress-test campaigns and hundreds of production cycles (accumulated over 6 months) confirms zero safety violations, high reproducibility (CV variation < 0.3% across replicates), predictable complexity–performance scaling (R2=0.89), and zero-retuning cross-topology transferability. The system has operated autonomously in active production for over 6 months, establishing reproducible methodology for safe MARL deployment in partially-observable, sensor-hostile manufacturing environments where analytical CBF approaches are structurally infeasible. Full article
Show Figures

Figure 1

19 pages, 1680 KB  
Article
A Hybrid Decision-Making Framework for Autonomous Vehicles in Urban Environments Based on Multi-Agent Reinforcement Learning with Explainable AI
by Ameni Ellouze, Mohamed Karray and Mohamed Ksantini
Vehicles 2026, 8(1), 8; https://doi.org/10.3390/vehicles8010008 - 2 Jan 2026
Viewed by 454
Abstract
Autonomous vehicles (AVs) are expected to operate safely and efficiently in complex urban environments characterized by dynamic and uncertain elements such as pedestrians, cyclists and adverse weather. Although current neural network-based decision-making algorithms, fuzzy logic and reinforcement learning have shown promise, they often [...] Read more.
Autonomous vehicles (AVs) are expected to operate safely and efficiently in complex urban environments characterized by dynamic and uncertain elements such as pedestrians, cyclists and adverse weather. Although current neural network-based decision-making algorithms, fuzzy logic and reinforcement learning have shown promise, they often struggle to handle ambiguous situations, such as partially hidden road signs or unpredictable human behavior. This paper proposes a new hybrid decision-making framework combining multi-agent reinforcement learning (MARL) and explainable artificial intelligence (XAI) to improve robustness, adaptability and transparency. Each agent of the MARL architecture is specialized in a specific sub-task (e.g., obstacle avoidance, trajectory planning, intention prediction), enabling modular and cooperative learning. XAI techniques are integrated to provide interpretable rationales for decisions, facilitating human understanding and regulatory compliance. The proposed system will be validated using CARLA simulator, combined with reference data, to demonstrate improved performance in safety-critical and ambiguous driving scenarios. Full article
(This article belongs to the Special Issue AI-Empowered Assisted and Autonomous Driving)
Show Figures

Figure 1

24 pages, 3711 KB  
Article
A Multi-Agent Regional Traffic Signal Control System Integrating Traffic Flow Prediction and Graph Attention Networks
by Chao Sun, Yuhao Yang, Jiacheng Li, Weiyi Fang and Peng Zhang
Systems 2026, 14(1), 47; https://doi.org/10.3390/systems14010047 - 31 Dec 2025
Viewed by 294
Abstract
Adaptive traffic signal control is a critical component of intelligent transportation systems, and multi-agent deep reinforcement learning (MARL) has attracted increasing interest due to its scalability and control efficiency. However, existing methods have two major drawbacks: (i) they are largely driven by current [...] Read more.
Adaptive traffic signal control is a critical component of intelligent transportation systems, and multi-agent deep reinforcement learning (MARL) has attracted increasing interest due to its scalability and control efficiency. However, existing methods have two major drawbacks: (i) they are largely driven by current and historical traffic states, without explicit forecasting of upcoming traffic conditions, and (ii) their coordination mechanisms are often weak, making it difficult to model complex spatial dependencies in large-scale road networks and thereby limiting the benefits of coordinated control. To address these issues, we propose TG-MADDPG, which integrates short-term traffic prediction with a graph attention network (GAT) for regional signal control. A WT-GWO-CNN-LSTM traffic forecasting module predicts near-future states and injects them into the MARL framework to support anticipatory decision-making. Meanwhile, the GAT dynamically encodes road-network topology and adaptively captures inter-intersection spatial correlations. In addition, we design a reward based on normalized pressure difference to guide cooperative optimization of signal timing. Experiments on the SUMO simulator across synthetic and real-world networks under both off-peak and peak demands show that TG-MADDPG consistently achieves lower average waiting times, shorter queue lengths, and higher cumulative rewards than IQL, MADDPG, and GMADDPG, demonstrating strong effectiveness and generalization. Full article
(This article belongs to the Section Systems Engineering)
Show Figures

Figure 1

33 pages, 4154 KB  
Article
A Reinforcement Learning Method for Automated Guided Vehicle Dispatching and Path Planning Considering Charging and Path Conflicts at an Automated Container Terminal
by Tianli Zuo, Huakun Liu, Shichun Yang, Wenyuan Wang, Yun Peng and Ruchong Wang
J. Mar. Sci. Eng. 2026, 14(1), 55; https://doi.org/10.3390/jmse14010055 - 28 Dec 2025
Viewed by 466
Abstract
The continued growth of international maritime trade has driven automated container terminals (ACTs) to pursue more efficient operational management strategies. In practice, the horizontal yard layout in ACTs significantly enhances transshipment efficiency. However, the more complex horizontal transporting system calls for an effective [...] Read more.
The continued growth of international maritime trade has driven automated container terminals (ACTs) to pursue more efficient operational management strategies. In practice, the horizontal yard layout in ACTs significantly enhances transshipment efficiency. However, the more complex horizontal transporting system calls for an effective approach to enhance automated guided vehicle (AGV) scheduling. Considering AGV charging and path conflicts, this paper proposes a multi-agent reinforcement learning (MARL) approach to address the AGV dispatching and path planning (VD2P) problem under a horizontal layout. The VD2P problem is formulated as a Markov decision process model. To mitigate the challenges of high-dimensional state-action space, a multi-agent framework is developed to control the AGV dispatching and path planning separately. A mixed global–individual reward mechanism is tailored to enhance both exploration and corporation. A proximal policy optimization method is used to train the scheduling policies. Experiments indicate that the proposed MARL approach can provide high-quality solutions for a real-world-sized scenario within tens of seconds. Compared with benchmark methods, the proposed approach achieves an improvement of 8.4% to 53.8%. Moreover, sensitivity analyses are conducted to explore the impact of different AGV configurations and charging strategies on scheduling. Managerial insights are obtained to support more efficient terminal operations. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

18 pages, 3635 KB  
Article
Multi-Agent Reinforcement Learning for Sustainable Integration of Heterogeneous Resources in a Double-Sided Auction Market with Power Balance Incentive Mechanism
by Jian Huang, Ming Yang, Li Wang, Mingxing Mei, Jianfang Ye, Kejia Liu and Yaolong Bo
Sustainability 2026, 18(1), 141; https://doi.org/10.3390/su18010141 - 22 Dec 2025
Viewed by 364
Abstract
Traditional electricity market bidding typically focuses on unilateral structures, where independent energy storage units and flexible loads act merely as price takers. This reduces bidding motivation and weakens the balancing capability of regional power systems, thereby limiting the large-scale utilization of renewable energy. [...] Read more.
Traditional electricity market bidding typically focuses on unilateral structures, where independent energy storage units and flexible loads act merely as price takers. This reduces bidding motivation and weakens the balancing capability of regional power systems, thereby limiting the large-scale utilization of renewable energy. To address these challenges and support sustainable power system operation, this paper proposes a double-sided auction market strategy for heterogeneous multi-resource (HMR) participation based on multi-agent reinforcement learning (MARL). The framework explicitly considers the heterogeneous bidding and quantity reporting behaviors of renewable generation, flexible demand, and energy storage. An improved incentive mechanism is introduced to enhance real-time system power balance, thereby enabling higher renewable energy integration and reducing curtailment. To efficiently solve the market-clearing problem, an improved Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MATD3) algorithm is employed, along with a temporal-difference (TD) error-based prioritized experience replay mechanism to strengthen exploration. Case studies validate the effectiveness of the proposed approach in guiding heterogeneous resources toward cooperative bidding behaviors, improving market efficiency, and reinforcing the sustainable and resilient operation of future power systems. Full article
Show Figures

Figure 1

36 pages, 3105 KB  
Review
Reinforcement Learning for Industrial Automation: A Comprehensive Review of Adaptive Control and Decision-Making in Smart Factories
by Yasser M. Alginahi, Omar Sabri and Wael Said
Machines 2025, 13(12), 1140; https://doi.org/10.3390/machines13121140 - 15 Dec 2025
Viewed by 1400
Abstract
The accelerating integration of Artificial Intelligence (AI) in Industrial Automation has established Reinforcement Learning (RL) as a transformative paradigm for adaptive control, intelligent optimization, and autonomous decision-making in smart factories. Despite the growing literature, existing reviews often emphasize algorithmic performance or domain-specific applications, [...] Read more.
The accelerating integration of Artificial Intelligence (AI) in Industrial Automation has established Reinforcement Learning (RL) as a transformative paradigm for adaptive control, intelligent optimization, and autonomous decision-making in smart factories. Despite the growing literature, existing reviews often emphasize algorithmic performance or domain-specific applications, neglecting broader links between methodological evolution, technological maturity, and industrial readiness. To address this gap, this study presents a bibliometric review mapping the development of RL and Deep Reinforcement Learning (DRL) research in Industrial Automation and robotics. Following the PRISMA 2020 protocol to guide the data collection procedures and inclusion criteria, 672 peer-reviewed journal articles published between 2017 and 2026 were retrieved from Scopus, ensuring high-quality, interdisciplinary coverage. Quantitative bibliometric analyses were conducted in R using Bibliometrix and Biblioshiny, including co-authorship, co-citation, keyword co-occurrence, and thematic network analyses, to reveal collaboration patterns, influential works, and emerging research trends. Results indicate that 42% of studies employed DRL, 27% focused on Multi-Agent RL (MARL), and 31% relied on classical RL, with applications concentrated in robotic control (33%), process optimization (28%), and predictive maintenance (19%). However, only 22% of the studies reported real-world or pilot implementations, highlighting persistent challenges in scalability, safety validation, interpretability, and deployment readiness. By integrating a review with bibliometric mapping, this study provides a comprehensive taxonomy and a strategic roadmap linking theoretical RL research with practical industrial applications. This roadmap is structured across four critical dimensions: (1) Algorithmic Development (e.g., safe, explainable, and data-efficient RL), (2) Integration Technologies (e.g., digital twins and IoT), (3) Validation Maturity (from simulation to real-world pilots), and (4) Human-Centricity (addressing trust, collaboration, and workforce transition). These insights can guide researchers, engineers, and policymakers in developing scalable, safe, and human-centric RL solutions, prioritizing research directions, and informing the implementation of Industry 5.0–aligned intelligent automation systems emphasizing transparency, sustainability, and operational resilience. Full article
Show Figures

Figure 1

25 pages, 43077 KB  
Article
Transformer-Based Soft Actor–Critic for UAV Path Planning in Precision Agriculture IoT Networks
by Guanting Ge, Mingde Sun, Yiyuan Xue and Svitlana Pavlova
Sensors 2025, 25(24), 7463; https://doi.org/10.3390/s25247463 - 8 Dec 2025
Viewed by 562
Abstract
Multi-agent path planning for Unmanned Aerial Vehicles (UAVs) in agricultural data collection tasks presents a significant challenge, requiring sophisticated coordination to ensure efficiency and avoid conflicts. Existing multi-agent reinforcement learning (MARL) algorithms often struggle with high-dimensional state spaces, continuous action domains, and complex [...] Read more.
Multi-agent path planning for Unmanned Aerial Vehicles (UAVs) in agricultural data collection tasks presents a significant challenge, requiring sophisticated coordination to ensure efficiency and avoid conflicts. Existing multi-agent reinforcement learning (MARL) algorithms often struggle with high-dimensional state spaces, continuous action domains, and complex inter-agent dependencies. To address these issues, we propose a novel algorithm, Multi-Agent Transformer-based Soft Actor–Critic (MATRS). Operating on the Centralized Training with Decentralized Execution (CTDE) paradigm, MATRS enables safe and efficient collaborative data collection and trajectory optimization. By integrating a Transformer encoder into its centralized critic network, our approach leverages the self-attention mechanism to explicitly model the intricate relationships between agents, thereby enabling a more accurate evaluation of the joint action–value function. Through comprehensive simulation experiments, we evaluated the performance of MATRS against established baseline algorithms (MADDPG, MATD3, and MASAC) in scenarios with varying data loads and problem scales. The results demonstrate that MATRS consistently achieves faster convergence and shorter task completion times. Furthermore, in scalability experiments, MATRS learned an efficient “task-space partitioning” strategy, where the UAV swarm autonomously divides the operational area for conflict-free coverage. These findings indicate that combining attention-based architectures with Soft Actor–Critic learning offers a potent and scalable solution for high-performance multi-UAV coordination in IoT data collection tasks. Full article
(This article belongs to the Special Issue Unmanned Aerial Systems in Precision Agriculture)
Show Figures

Figure 1

33 pages, 5089 KB  
Article
Graph-Gated Relational Reasoning for Enhanced Coordination and Safety in Distributed Multi-Robot Systems: A Decentralized Reinforcement Learning Approach
by Tianshun Chang, Yiping Ma, Zhiqian Li, Shuai Huang, Zeqi Ma, Yang Xiong, Shijie Huang and Jingbo Qin
Sensors 2025, 25(23), 7335; https://doi.org/10.3390/s25237335 - 2 Dec 2025
Viewed by 664
Abstract
The autonomous coordination of multi-robot systems in complex, environments remains a fundamental challenge. Current Multi-Agent Reinforcement Learning (MARL) methods often struggle to reason effectively about the dynamic, causal relationships between agents and their surroundings. To address this, we introduce the Graph-Gated Transformer (GGT), [...] Read more.
The autonomous coordination of multi-robot systems in complex, environments remains a fundamental challenge. Current Multi-Agent Reinforcement Learning (MARL) methods often struggle to reason effectively about the dynamic, causal relationships between agents and their surroundings. To address this, we introduce the Graph-Gated Transformer (GGT), a novel neural architecture designed to inject explicit relational priors directly into the self-attention mechanism for multi-robot coordination. The core mechanism of the GGT involves dynamically constructing a Tactical Relational Graph that encodes high-priority relationships like collision risk and cooperative intent. This graph is then used to generate an explicit attention mask, compelling the Transformer to focus its reasoning exclusively on entities rather than engaging in brute-force pattern matching across all perceived objects. Integrated into a Centralized Training with Decentralized Execution (CTDE) framework with QMIX, our approach demonstrates substantial improvements in high-fidelity simulations. In complex scenarios with dynamic obstacles and sensor noise, our GGT-based system achieves 95.3% coverage area efficiency with only 0.4 collisions per episode, a stark contrast to the 60.3% coverage and 20.7 collisions of standard QMIX. Ablation studies confirm that this structured, gated attention mechanism—not merely the presence of attention—is the key to unlocking robust collective autonomy. This work establishes that explicitly constraining the Transformer’s attention space with dynamic, domain-aware relational graphs is a powerful and effective architectural solution for engineering safe and intelligent multi-robot systems. Full article
Show Figures

Figure 1

32 pages, 18611 KB  
Article
Optimization of Multi-Intelligent Body Strategies for UAV Adversarial Tasks Based on MADDPG-SASP
by Zhenfei Xiao, Fuyong Liu and Qian Wang
Information 2025, 16(12), 1050; https://doi.org/10.3390/info16121050 - 1 Dec 2025
Viewed by 343
Abstract
In intelligent multi-agent systems, particularly in drone combat scenarios, the challenges posed by rapidly changing environments and incomplete information significantly hinder effective strategy optimization. Traditional multi-agent reinforcement learning (MARL) approaches often encounter difficulties in adapting to the dynamic nature of adversarial environments, especially [...] Read more.
In intelligent multi-agent systems, particularly in drone combat scenarios, the challenges posed by rapidly changing environments and incomplete information significantly hinder effective strategy optimization. Traditional multi-agent reinforcement learning (MARL) approaches often encounter difficulties in adapting to the dynamic nature of adversarial environments, especially when enemy strategies are subject to continuous evolution, complicating agents’ ability to respond effectively. To address these challenges, this paper introduces a novel enhanced MARL framework, MADDPG-SASP, which integrates an improved self-attention mechanism with self-play within the MADDPG algorithm, thereby facilitating superior strategy optimization. The self-attention mechanism empowers agents to adaptively extract critical environmental features, thereby enhancing both the speed and accuracy of perception and decision-making processes. Concurrently, the adaptive self-battling mechanism iteratively refines agent strategies through continuous adversarial interactions, thereby bolstering the stability and flexibility of their responses. Empirical results indicate that after 600 rounds, the win rate of agents employing this framework saw a substantial increase, rising from 26.17% with the original MADDPG to a perfect 100%. Further validation through comparative experiments underscores the method’s efficacy, demonstrating considerable advantages in strategy optimization and agent performance in complex, dynamic environments. Moreover, in the Predator–Prey Scenario combat environment, when the enemy side employs a multi-agent strategy, the win rate for the drone agent side can reach 98.5% and 100%. Full article
Show Figures

Figure 1

15 pages, 1506 KB  
Review
Towards LLM Enhanced Decision: A Survey on Reinforcement Learning Based Ship Collision Avoidance
by Yizhou Wu, Jin Liu, Xingye Li, Junsheng Xiao, Tao Zhang, Haitong Xu and Lei Zhang
J. Mar. Sci. Eng. 2025, 13(12), 2275; https://doi.org/10.3390/jmse13122275 - 28 Nov 2025
Viewed by 670
Abstract
This comprehensive review examines the works of reinforcement learning (RL) in ship collision avoidance (SCA) from 2014 to the present, analyzing the methods designed for both single-agent and multi-agent collaborative paradigms. While prior research has demonstrated RL’s advantages in environmental adaptability, autonomous decision-making, [...] Read more.
This comprehensive review examines the works of reinforcement learning (RL) in ship collision avoidance (SCA) from 2014 to the present, analyzing the methods designed for both single-agent and multi-agent collaborative paradigms. While prior research has demonstrated RL’s advantages in environmental adaptability, autonomous decision-making, and online optimization over traditional control methods, this study systematically addresses the algorithmic improvements, implementation challenges, and functional roles of RL techniques in SCA, such as Deep Q-Network (DQN), Proximal Policy Optimization (PPO), and Multi-Agent Reinforcement Learning (MARL). It also highlights how these technologies address critical challenges in SCA, including dynamic obstacle avoidance, compliance with Convention on the International Regulations for Preventing Collisions at Sea (COLREGs), and coordination in dense traffic scenarios, while underscoring persistent limitations such as idealized assumptions, scalability issues, and robustness in uncertain environments. Contributions include a structured analysis of recent technological evolution, and a Large Language Model (LLM) based hierarchical architecture integrating perception, communication, decision-making, and execution layers for future SCA systems, which prioritizes the development of scalable, adaptive frameworks that ensure robust and compliant autonomous navigation in complex, real-world maritime environments. Full article
Show Figures

Figure 1

36 pages, 10303 KB  
Article
Optimizing Evacuation for Disabled Pedestrians with Heterogeneous Speeds: A Floor Field Cellular Automaton and Reinforcement Learning Approach
by Yimiao Lyu and Hongchun Wang
Buildings 2025, 15(22), 4191; https://doi.org/10.3390/buildings15224191 - 20 Nov 2025
Viewed by 553
Abstract
Safe and efficient building evacuation for heterogeneous populations, particularly individuals with disabilities, remains a critical challenge in emergency management. This study proposes a hybrid evacuation framework that integrates Floor Field Cellular Automaton (FFCA) with reinforcement learning, specifically a Deep Q-Network (DQN), to enhance [...] Read more.
Safe and efficient building evacuation for heterogeneous populations, particularly individuals with disabilities, remains a critical challenge in emergency management. This study proposes a hybrid evacuation framework that integrates Floor Field Cellular Automaton (FFCA) with reinforcement learning, specifically a Deep Q-Network (DQN), to enhance adaptive decision-making in dynamic and complex environments. The model incorporates velocity heterogeneity, friction-based conflict resolution, and real-time path planning to capture diverse mobility capabilities and interactions among evacuees. Simulation experiments were conducted under varying population densities, walking speeds, and exit configurations, considering four types of occupant groups: able-bodied individuals, wheelchair users, and people with visual or hearing impairments. The results demonstrate that the DQN-enhanced model consistently outperforms the conventional SFF + DFF approach, achieving significant reductions in evacuation time, particularly under high-density and reduced-speed scenarios. Notably, the DQN dynamically adapts evacuation paths to mitigate congestion, thereby improving both system efficiency and the safety of vulnerable groups. These findings highlight the potential of combining CA-based environmental modeling with reinforcement learning to develop adaptive and inclusive evacuation strategies. The proposed framework provides practical insights for designing evacuation protocols and intelligent navigation systems in public buildings. Future work will extend the proposed FFCA + DQN framework to more complex and realistic environments, including multi-exit and multi-level buildings, and further integrate multi-agent reinforcement learning (MARL) architectures to enable decentralized adaptation among heterogeneous evacuees. Furthermore, lightweight DQN variants and distributed training schemes will be explored to enhance computational scalability, while empirical data from evacuation drills and real-world case studies will be used for model calibration and validation, thereby improving predictive accuracy and generalizability. Full article
Show Figures

Figure 1

31 pages, 4356 KB  
Article
Dynamic Multi-Objective Controller Placement in SD-WAN: A GMM-MARL Hybrid Framework
by Abdulrahman M. Abdulghani, Azizol Abdullah, A. R. Rahiman, Nor Asilah Wati Abdul Hamid and Bilal Omar Akram
Network 2025, 5(4), 52; https://doi.org/10.3390/network5040052 - 11 Nov 2025
Viewed by 694
Abstract
Modern Software-Defined Wide Area Networks (SD-WANs) require adaptive controller placement addressing multi-objective optimization where latency minimization, load balancing, and fault tolerance must be simultaneously optimized. Traditional static approaches fail under dynamic network conditions with evolving traffic patterns and topology changes. This paper presents [...] Read more.
Modern Software-Defined Wide Area Networks (SD-WANs) require adaptive controller placement addressing multi-objective optimization where latency minimization, load balancing, and fault tolerance must be simultaneously optimized. Traditional static approaches fail under dynamic network conditions with evolving traffic patterns and topology changes. This paper presents a novel hybrid framework integrating Gaussian Mixture Model (GMM) clustering with Multi-Agent Reinforcement Learning (MARL) for dynamic controller placement. The approach leverages probabilistic clustering for intelligent MARL initialization, reducing exploration requirements. Centralized Training with Decentralized Execution (CTDE) enables distributed optimization through cooperative agents. Experimental evaluation using real-world topologies demonstrates a noticeable reduction in the latency, improvement in network balance, and significant computational efficiency versus existing methods. Dynamic adaptation experiments confirm superior scalability during network changes. The hybrid architecture achieves linear scalability through problem decomposition while maintaining real-time responsiveness, establishing practical viability. Full article
Show Figures

Figure 1

Back to TopTop