Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (219)

Search Parameters:
Keywords = MARL

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 21475 KiB  
Article
Palynostratigraphy of the “Muschelkalk Sedimentary Cycle” in the NW Iberian Range, Central Spain
by Manuel García-Ávila, Soledad García-Gil and José B. Diez
Geosciences 2025, 15(8), 299; https://doi.org/10.3390/geosciences15080299 - 4 Aug 2025
Abstract
The Muschelkalk sedimentary cycle in the northwestern region of the Iberian Range (central Spain) lies within a transitional area between the Iberian and Hesperia type Triassic domains. To improve the understanding of its paleopalynological record, fifty samples were analyzed from ten stratigraphic sections [...] Read more.
The Muschelkalk sedimentary cycle in the northwestern region of the Iberian Range (central Spain) lies within a transitional area between the Iberian and Hesperia type Triassic domains. To improve the understanding of its paleopalynological record, fifty samples were analyzed from ten stratigraphic sections corresponding to the Tramacastilla Dolostones Formation (TD Fm.), Cuesta del Castillo Sandstones and Siltstones Formation (CCSS Fm.), and Royuela Dolostones, Marls and Limestones Formation (RDML Fm.). Despite previous studies in the area, palynological data remain scarce or insufficiently detailed, highlighting the need for a systematic reassessment. Based on the identified palynological assemblages, the succession is assigned to an age spanning from the Fassanian to the Longobardian, with a possible extension into the base of the Julian (early Carnian). The results confirm that the siliciclastic unit (CCSS) represents a lateral facies change with respect to the carbonate formations of the upper Muschelkalk (TD and RDML). From a paleoecological perspective, the assemblages indicate a warm and predominantly dry environment, dominated by xerophytic conifers, although evidence of more humid local environments, such as marshes or coastal plains, is also observed. Full article
(This article belongs to the Section Sedimentology, Stratigraphy and Palaeontology)
Show Figures

Figure 1

16 pages, 1823 KiB  
Article
Collaborative Target Tracking Algorithm for Multi-Agent Based on MAPPO and BCTD
by Yuebin Zhou, Yunling Yue, Bolun Yan, Linkun Li, Jinsheng Xiao and Yuan Yao
Drones 2025, 9(8), 521; https://doi.org/10.3390/drones9080521 - 24 Jul 2025
Viewed by 275
Abstract
Target tracking is a representative task in multi-agent reinforcement learning (MARL), where agents must collaborate effectively in environments with dense obstacles, evasive targets, and high-dimensional observations—conditions that often lead to local optima and training inefficiencies. To address these challenges, this paper proposes a [...] Read more.
Target tracking is a representative task in multi-agent reinforcement learning (MARL), where agents must collaborate effectively in environments with dense obstacles, evasive targets, and high-dimensional observations—conditions that often lead to local optima and training inefficiencies. To address these challenges, this paper proposes a collaborative tracking algorithm for UAVs that integrates behavior cloning with temporal difference (BCTD) and multi-agent proximal policy optimization (MAPPO). Expert trajectories are generated using the artificial potential field (APF), followed by policy pre-training via behavior cloning and TD-based value optimization. MAPPO is then employed for dynamic fine-tuning, enhancing robustness and coordination. Experiments in a simulated environment show that the proposed MAPPO+BCTD framework outperforms MAPPO, QMIX, and MADDPG in success rate, convergence speed, and tracking efficiency. The proposed method effectively alleviates the local optimization problem of APF and the training inefficiency problem of RL, offering a scalable and reliable solution for dynamic multi-agent coordination. Full article
(This article belongs to the Special Issue Cooperative Perception for Modern Transportation)
Show Figures

Figure 1

19 pages, 1942 KiB  
Article
Adaptive Multi-Agent Reinforcement Learning with Graph Neural Networks for Dynamic Optimization in Sports Buildings
by Sen Chen, Xiaolong Chen, Qian Bao, Hongfeng Zhang and Cora Un In Wong
Buildings 2025, 15(14), 2554; https://doi.org/10.3390/buildings15142554 - 20 Jul 2025
Viewed by 325
Abstract
The dynamic scheduling optimization of sports facilities faces challenges posed by real-time demand fluctuations and complex interdependencies between facilities. To address the adaptability limitations of traditional centralized approaches, this study proposes a decentralized multi-agent reinforcement learning framework based on graph neural networks (GNNs). [...] Read more.
The dynamic scheduling optimization of sports facilities faces challenges posed by real-time demand fluctuations and complex interdependencies between facilities. To address the adaptability limitations of traditional centralized approaches, this study proposes a decentralized multi-agent reinforcement learning framework based on graph neural networks (GNNs). Experimental results demonstrate that in a simulated environment comprising 12 heterogeneous sports facilities, the proposed method achieves an operational efficiency of 0.89 ± 0.02, representing a 13% improvement over Centralized PPO, while user satisfaction reaches 0.85 ± 0.03, a 9% enhancement. When confronted with a sudden 30% surge in demand, the system recovers in just 90 steps, 33% faster than centralized methods. The GNN attention mechanism successfully captures critical dependencies between facilities, such as the connection weight of 0.32 ± 0.04 between swimming pools and locker rooms. Computational efficiency tests show that the system maintains real-time decision-making capability within 800 ms even when scaled to 50 facilities. These results verify that the method effectively balances decentralized decision-making with global coordination while maintaining low communication overhead (0.09 ± 0.01), offering a scalable and practical solution for resource management in complex built environments. Full article
(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)
Show Figures

Figure 1

23 pages, 6037 KiB  
Article
Integrated Assessment of Groundwater Vulnerability and Drinking Water Quality in Rural Wells: Case Study from Ceanu Mare Commune, Northern Transylvanian Basin, Romania
by Nicolae-Leontin Petruța, Ioana Monica Sur, Tudor Andrei Rusu, Timea Gabor and Tiberiu Rusu
Sustainability 2025, 17(14), 6530; https://doi.org/10.3390/su17146530 - 17 Jul 2025
Viewed by 462
Abstract
Groundwater contamination by nitrates (NO3) and nitrites (NO2) is an urgent problem in rural areas of Eastern Europe, with profound public health and sustainability implications. This paper presents an integrated assessment of groundwater vulnerability and water quality [...] Read more.
Groundwater contamination by nitrates (NO3) and nitrites (NO2) is an urgent problem in rural areas of Eastern Europe, with profound public health and sustainability implications. This paper presents an integrated assessment of groundwater vulnerability and water quality in rural wells in the Ceanu Mare commune, Cluj County, Romania—a representative area of the Northern Transylvania Basin, characterized by diverse geological structures, intensive agricultural activities, and incomplete public water infrastructure. This study combines detailed hydrochemical analyses, household-level studies, and geological context to identify and quantify key factors influencing nitrate and microbial contamination in rural wells, providing a comprehensive perspective on water quality challenges in the central part of Romania. This study adopts a multidisciplinary approach, integrating detailed geotechnical investigations conducted through four strategically located boreholes. These are complemented by extensive hydrogeological and lithological characterization, as well as rigorous chemical and microbiological analyses of nearby wells. The results reveal persistently elevated concentrations of NO3 and NO2, commonly associated with inadequate livestock waste management and the proximity of manure storage areas. Microbiological contamination was also frequent. In this study, the NO3 levels in well water ranged from 39.7 to 48 mg/L, reaching up to 96% of the EU/WHO threshold (50 mg/L), while the NO2 concentrations varied from 0.50 to 0.69 mg/L, exceeding the legal limit (0.5 mg/L) in 87% of the sampled wells. Ammonium (NH4+) was detected (0.25–0.34 mg/L) in all the wells, below the maximum allowed limit (0.5 mg/L) but indicative of ongoing organic pollution. All the well water samples were non-compliant for microbiological parameters, with E. coli detected in 100% of cases (5–13 CFU/100 mL). The regional clay–marl substrate offers only limited natural protection against pollutant infiltration, primarily due to lithological heterogeneity and discontinuities observed within the clay–marl layers in the study area. This research delivers a replicable model for rural groundwater assessment and addresses a critical gap in regional and European water safety studies. It also provides actionable recommendations for sustainable groundwater management, infrastructure development, and community risk reduction in line with EU water directives. Full article
Show Figures

Figure 1

39 pages, 1775 KiB  
Article
A Survey on UAV Control with Multi-Agent Reinforcement Learning
by Chijioke C. Ekechi, Tarek Elfouly, Ali Alouani and Tamer Khattab
Drones 2025, 9(7), 484; https://doi.org/10.3390/drones9070484 - 9 Jul 2025
Viewed by 1407
Abstract
Unmanned Aerial Vehicles (UAVs) have become increasingly prevalent in both governmental and civilian applications, offering significant reductions in operational costs by minimizing human involvement. There is a growing demand for autonomous, scalable, and intelligent coordination strategies in complex aerial missions involving multiple Unmanned [...] Read more.
Unmanned Aerial Vehicles (UAVs) have become increasingly prevalent in both governmental and civilian applications, offering significant reductions in operational costs by minimizing human involvement. There is a growing demand for autonomous, scalable, and intelligent coordination strategies in complex aerial missions involving multiple Unmanned Aerial Vehicles (UAVs). Traditional control techniques often fall short in dynamic, uncertain, or large-scale environments where decentralized decision-making and inter-agent cooperation are crucial. A potentially effective technique used for UAV fleet operation is Multi-Agent Reinforcement Learning (MARL). MARL offers a powerful framework for addressing these challenges by enabling UAVs to learn optimal behaviors through interaction with the environment and each other. Despite significant progress, the field remains fragmented, with a wide variety of algorithms, architectures, and evaluation metrics spread across domains. This survey aims to systematically review and categorize state-of-the-art MARL approaches applied to UAV control, identify prevailing trends and research gaps, and provide a structured foundation for future advancements in cooperative aerial robotics. The advantages and limitations of these techniques are discussed along with suggestions for further research to improve the effectiveness of MARL application to UAV fleet management. Full article
Show Figures

Figure 1

20 pages, 741 KiB  
Article
Long-Endurance Collaborative Search and Rescue Based on Maritime Unmanned Systems and Deep-Reinforcement Learning
by Pengyan Dong, Jiahong Liu, Hang Tao, Yang Zhao, Zhijie Feng and Hanjiang Luo
Sensors 2025, 25(13), 4025; https://doi.org/10.3390/s25134025 - 27 Jun 2025
Viewed by 325
Abstract
Maritime vision sensing can be applied to maritime unmanned systems to perform search and rescue (SAR) missions under complex marine environments, as multiple unmanned aerial vehicles (UAVs) and unmanned surface vehicles (USVs) are able to conduct vision sensing through the air, the water-surface, [...] Read more.
Maritime vision sensing can be applied to maritime unmanned systems to perform search and rescue (SAR) missions under complex marine environments, as multiple unmanned aerial vehicles (UAVs) and unmanned surface vehicles (USVs) are able to conduct vision sensing through the air, the water-surface, and underwater. However, in these vision-based maritime SAR systems, collaboration between UAVs and USVs is a critical issue for successful SAR operations. To address this challenge, in this paper, we propose a long-endurance collaborative SAR scheme which exploits the complementary strengths of the maritime unmanned systems. In this scheme, a swarm of UAVs leverages a multi-agent reinforcement-learning (MARL) method and probability maps to perform cooperative first-phase search exploiting UAV’s high altitude and wide field of view of vision sensing. Then, multiple USVs conduct precise real-time second-phase operations by refining the probabilistic map. To deal with the energy constraints of UAVs and perform long-endurance collaborative SAR missions, a multi-USV charging scheduling method is proposed based on MARL to prolong the UAVs’ flight time. Through extensive simulations, the experimental results verified the effectiveness of the proposed scheme and long-endurance search capabilities. Full article
(This article belongs to the Special Issue Underwater Vision Sensing System: 2nd Edition)
Show Figures

Figure 1

28 pages, 1293 KiB  
Article
Research on Multi-Agent Collaborative Scheduling Planning Method for Time-Triggered Networks
by Changsheng Chen, Anrong Zhao, Zhihao Zhang, Tao Zhang and Chao Fan
Electronics 2025, 14(13), 2575; https://doi.org/10.3390/electronics14132575 - 26 Jun 2025
Viewed by 306
Abstract
Time-triggered Ethernet combines time-triggered and event-triggered communication, and is suitable for fields with high real-time requirements. Aiming at the problem that the traditional scheduling algorithm is not effective in scheduling event-triggered messages, a message scheduling algorithm based on multi-agent reinforcement learning (MADDPG, Multi-Agent [...] Read more.
Time-triggered Ethernet combines time-triggered and event-triggered communication, and is suitable for fields with high real-time requirements. Aiming at the problem that the traditional scheduling algorithm is not effective in scheduling event-triggered messages, a message scheduling algorithm based on multi-agent reinforcement learning (MADDPG, Multi-Agent Deep Deterministic Policy Gradient) and a hybrid algorithm combining SMT (Satisfiability Modulo Theories) solver and MADDPG are proposed. This method aims to optimize the scheduling of event-triggered messages while maintaining the uniformity of time-triggered message scheduling, providing more time slots for event-triggered messages, and reducing their waiting time and end-to-end delay. Through the designed scheduling software, in the experiment, compared with the SMT-based algorithm and the traditional DQN (Deep Q-Network) algorithm, the new method shows better load balance and lower message jitter, and it is verified in the OPNET simulation environment that it can effectively reduce the delay of event-triggered messages. Full article
(This article belongs to the Special Issue Advanced Techniques for Multi-Agent Systems)
Show Figures

Figure 1

27 pages, 3603 KiB  
Article
Dual-Layer Optimization for Supply–Demand Balance in Urban Taxi Systems: Multi-Agent Reinforcement Learning with Dual-Attention Mechanisms
by Liping Yan and Renjie Tang
Electronics 2025, 14(13), 2562; https://doi.org/10.3390/electronics14132562 - 24 Jun 2025
Viewed by 390
Abstract
With the rapid growth of urban transportation demand, traditional taxi systems face challenges such as supply–demand imbalances and low dispatch efficiency. These methods, which rely on static data and predefined strategies, struggle to adapt to dynamic traffic environments. To address these issues, this [...] Read more.
With the rapid growth of urban transportation demand, traditional taxi systems face challenges such as supply–demand imbalances and low dispatch efficiency. These methods, which rely on static data and predefined strategies, struggle to adapt to dynamic traffic environments. To address these issues, this paper proposes a dual-layer Taxi Dispatch and Empty-Vehicle Repositioning (TDEVR) optimization framework based on Multi-Agent Reinforcement Learning (MARL). The framework separates the tasks of taxi matching and repositioning, enabling efficient coordination between the decision-making and execution layers. This design allows for the real-time integration of both global and local supply–demand information, ensuring adaptability to complex urban traffic conditions. A Multi-Agent Dual-Attention Reinforcement Learning (MADARL) algorithm is proposed to enhance decision-making and coordination, combining local and global attention mechanisms to improve local agents’ decision-making while optimizing global resource allocation. Experiments using a real-world New York City taxi dataset show that the TDEVR framework with MADARL leads to an average improvement of 20.63% in the Order Response Rate (ORR), a 15.29 increase in Platform Cumulative Revenue (PCR), and a 22.07 improvement in the Composite Index (CI). These results highlight the significant performance improvements achieved by the proposed framework in dynamic scenarios, demonstrating its ability to efficiently adapt to real-time fluctuations in supply and demand within urban traffic environments. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

28 pages, 1509 KiB  
Article
Adaptive Congestion Detection and Traffic Control in Software-Defined Networks via Data-Driven Multi-Agent Reinforcement Learning
by Kaoutar Boussaoud, Abdeslam En-Nouaary and Meryeme Ayache
Computers 2025, 14(6), 236; https://doi.org/10.3390/computers14060236 - 16 Jun 2025
Viewed by 552
Abstract
Efficient congestion management in Software-Defined Networks (SDNs) remains a significant challenge due to dynamic traffic patterns and complex topologies. Conventional congestion control techniques based on static or heuristic rules often fail to adapt effectively to real-time network variations. This paper proposes a data-driven [...] Read more.
Efficient congestion management in Software-Defined Networks (SDNs) remains a significant challenge due to dynamic traffic patterns and complex topologies. Conventional congestion control techniques based on static or heuristic rules often fail to adapt effectively to real-time network variations. This paper proposes a data-driven framework based on Multi-Agent Reinforcement Learning (MARL) to enable intelligent, adaptive congestion control in SDNs. The framework integrates two collaborative agents: a Congestion Classification Agent that identifies congestion levels using metrics such as delay and packet loss, and a Decision-Making Agent based on Deep Q-Learning (DQN or its variants), which selects the optimal actions for routing and bandwidth management. The agents are trained offline using both synthetic and real network traces (e.g., the MAWI dataset), and deployed in a simulated SDN testbed using Mininet and the Ryu controller. Extensive experiments demonstrate the superiority of the proposed system across key performance metrics. Compared to baseline controllers, including standalone DQN and static heuristics, the MARL system achieves up to 3.0% higher throughput, maintains end-to-end delay below 10 ms, and reduces packet loss by over 10% in real traffic scenarios. Furthermore, the architecture exhibits stable cumulative reward progression and balanced action selection, reflecting effective learning and policy convergence. These results validate the benefit of agent specialization and modular learning in scalable and intelligent SDN traffic engineering. Full article
Show Figures

Figure 1

59 pages, 4517 KiB  
Review
Artificial Intelligence Empowering Dynamic Spectrum Access in Advanced Wireless Communications: A Comprehensive Overview
by Abiodun Gbenga-Ilori, Agbotiname Lucky Imoize, Kinzah Noor and Paul Oluwadara Adebolu-Ololade
AI 2025, 6(6), 126; https://doi.org/10.3390/ai6060126 - 13 Jun 2025
Viewed by 1879
Abstract
This review paper examines the integration of artificial intelligence (AI) in wireless communication, focusing on cognitive radio (CR), spectrum sensing, and dynamic spectrum access (DSA). As the demand for spectrum continues to rise with the expansion of mobile users and connected devices, cognitive [...] Read more.
This review paper examines the integration of artificial intelligence (AI) in wireless communication, focusing on cognitive radio (CR), spectrum sensing, and dynamic spectrum access (DSA). As the demand for spectrum continues to rise with the expansion of mobile users and connected devices, cognitive radio networks (CRNs), leveraging AI-driven spectrum sensing and dynamic access, provide a promising solution to improve spectrum utilization. The paper reviews various deep learning (DL)-based spectrum-sensing methods, highlighting their advantages and challenges. It also explores the use of multi-agent reinforcement learning (MARL) for distributed DSA networks, where agents autonomously optimize power allocation (PA) to minimize interference and enhance quality of service. Additionally, the paper discusses the role of machine learning (ML) in predicting spectrum requirements, which is crucial for efficient frequency management in the fifth generation (5G) networks and beyond. Case studies show how ML can help self-optimize networks, reducing energy consumption while improving performance. The review also introduces the potential of generative AI (GenAI) for demand-planning and network optimization, enhancing spectrum efficiency and energy conservation in wireless networks (WNs). Finally, the paper highlights future research directions, including improving AI-driven network resilience, refining predictive models, and addressing ethical considerations. Overall, AI is poised to transform wireless communication, offering innovative solutions for spectrum management (SM), security, and network performance. Full article
(This article belongs to the Special Issue Artificial Intelligence for Network Management)
Show Figures

Figure 1

21 pages, 676 KiB  
Article
Service-Driven Dynamic Beam Hopping with Resource Allocation for LEO Satellites
by Huaixiu Xu, Lilan Liu and Zhizhong Zhang
Electronics 2025, 14(12), 2367; https://doi.org/10.3390/electronics14122367 - 10 Jun 2025
Viewed by 668
Abstract
Given the problems of uneven distribution, strong time variability of ground service demands, and low utilization rate of on-board resources in Low-Earth-Orbit (LEO) satellite communication systems, how to efficiently utilize limited beam resources to flexibly and dynamically serve ground users has become a [...] Read more.
Given the problems of uneven distribution, strong time variability of ground service demands, and low utilization rate of on-board resources in Low-Earth-Orbit (LEO) satellite communication systems, how to efficiently utilize limited beam resources to flexibly and dynamically serve ground users has become a research hotspot. This paper studies the dynamic resource allocation and interference suppression strategies for beam hopping satellite communication systems. Specifically, in the full-frequency-reuse scenario, we adopt spatial isolation techniques to avoid co-channel interference between beams and construct a multi-objective optimization problem by introducing weight coefficients, aiming to maximize user satisfaction and minimize transmission delay simultaneously. We model this optimization problem as a Markov decision process and apply a value decomposition network (VDN) algorithm based on cooperative multi-agent reinforcement learning (MARL-VDN) to reduce computational complexity. In this algorithm framework, each beam acts as an agent, making independent decisions on hopping patterns and power allocation strategies, while achieving multi-agent cooperative optimization through sharing global states and joint reward mechanisms. Simulation results show that the applied algorithm can effectively enhance user satisfaction, reduce delay, and maintain high resource utilization in dynamic service demand scenarios. Additionally, the offline-trained MARL-VDN model can be deployed on LEO satellites in a distributed mode to achieve real-time on-board resource allocation on demand. Full article
Show Figures

Figure 1

34 pages, 5161 KiB  
Article
Robust Adaptive Fractional-Order PID Controller Design for High-Power DC-DC Dual Active Bridge Converter Enhanced Using Multi-Agent Deep Deterministic Policy Gradient Algorithm for Electric Vehicles
by Seyyed Morteza Ghamari, Daryoush Habibi and Asma Aziz
Energies 2025, 18(12), 3046; https://doi.org/10.3390/en18123046 - 9 Jun 2025
Viewed by 727
Abstract
The Dual Active Bridge converter (DABC), known for its bidirectional power transfer capability and high efficiency, plays a crucial role in various applications, particularly in electric vehicles (EVs), where it facilitates energy storage, battery charging, and grid integration. The Dual Active Bridge Converter [...] Read more.
The Dual Active Bridge converter (DABC), known for its bidirectional power transfer capability and high efficiency, plays a crucial role in various applications, particularly in electric vehicles (EVs), where it facilitates energy storage, battery charging, and grid integration. The Dual Active Bridge Converter (DABC), when paired with a high-performance CLLC filter, is well-regarded for its ability to transfer power bidirectionally with high efficiency, making it valuable across a range of energy applications. While these features make the DABC highly efficient, they also complicate controller design due to nonlinear behavior, fast switching, and sensitivity to component variations. We have used a Fractional-order PID (FOPID) controller to benefit from the simple structure of classical PID controllers with lower complexity and improved flexibility because of additional filtering gains adopted in this method. However, for a FOPID controller to operate effectively under real-time conditions, its parameters must adapt continuously to changes in the system. To achieve this adaptability, a Multi-Agent Reinforcement Learning (MARL) approach is adopted, where each gain of the controller is tuned individually using the Deep Deterministic Policy Gradient (DDPG) algorithm. This structure enhances the controller’s ability to respond to external disturbances with greater robustness and adaptability. Meanwhile, finding the best initial gains in the RL structure can decrease the overall efficiency and tracking performance of the controller. To overcome this issue, Grey Wolf Optimization (GWO) algorithm is proposed to identify the most suitable initial gains for each agent, providing faster adaptation and consistent performance during the training process. The complete approach is tested using a Hardware-in-the-Loop (HIL) platform, where results confirm accurate voltage control and resilient dynamic behavior under practical conditions. In addition, the controller’s performance was validated under a battery management scenario where the DAB converter interacts with a nonlinear lithium-ion battery. The controller successfully regulated the State of Charge (SOC) through automated charging and discharging transitions, demonstrating its real-time adaptability for BMS-integrated EV systems. Consequently, the proposed MARL-FOPID controller reported better disturbance-rejection performance in different working cases compared to other conventional methods. Full article
(This article belongs to the Special Issue Power Electronics for Smart Grids: Present and Future Perspectives II)
Show Figures

Figure 1

24 pages, 4899 KiB  
Article
A Coordination Optimization Framework for Multi-Agent Reinforcement Learning Based on Reward Redistribution and Experience Reutilization
by Bo Yang, Linghang Gao, Fangzheng Zhou, Hongge Yao, Yanfang Fu, Zelong Sun, Feng Tian and Haipeng Ren
Electronics 2025, 14(12), 2361; https://doi.org/10.3390/electronics14122361 - 9 Jun 2025
Viewed by 667
Abstract
Cooperative multi-agent reinforcement learning (MARL) has emerged as a powerful paradigm for addressing complex real-world challenges, including autonomous robot control, strategic decision-making, and decentralized coordination in unmanned swarm systems. However, it still faces challenges in learning proper coordination among multiple agents. The lack [...] Read more.
Cooperative multi-agent reinforcement learning (MARL) has emerged as a powerful paradigm for addressing complex real-world challenges, including autonomous robot control, strategic decision-making, and decentralized coordination in unmanned swarm systems. However, it still faces challenges in learning proper coordination among multiple agents. The lack of effective knowledge sharing and experience interaction mechanisms among agents has led to substantial performance decline, especially in terms of low sampling efficiency and slow convergence rates, ultimately constraining the practical applicability of MARL. To address these challenges, this paper proposes a novel framework termed Reward redistribution and Experience reutilization based Coordination Optimization (RECO). This innovative approach employs a hierarchical experience pool mechanism that enhances exploration through strategic reward redistribution and experience reutilization. The RECO framework incorporates a sophisticated evaluation mechanism that assesses the quality of historical sampling data from individual agents and optimizes reward distribution by maximizing mutual information across hierarchical experience trajectories. Extensive comparative analyses of computational efficiency and performance metrics across diverse environments reveal that the proposed method not only enhances training efficiency in multi-agent gaming scenarios but also significantly strengthens algorithmic robustness and stability in dynamic environments. Full article
Show Figures

Graphical abstract

44 pages, 4373 KiB  
Review
Recent Advances in Multi-Agent Reinforcement Learning for Intelligent Automation and Control of Water Environment Systems
by Lei Jia and Yan Pei
Machines 2025, 13(6), 503; https://doi.org/10.3390/machines13060503 - 9 Jun 2025
Viewed by 3164
Abstract
Multi-agent reinforcement learning (MARL) has demonstrated significant application potential in addressing cooperative control, policy optimization, and task allocation problems in complex systems. This paper focuses on its applications and development in water environmental systems, providing a systematic review of the theoretical foundations of [...] Read more.
Multi-agent reinforcement learning (MARL) has demonstrated significant application potential in addressing cooperative control, policy optimization, and task allocation problems in complex systems. This paper focuses on its applications and development in water environmental systems, providing a systematic review of the theoretical foundations of multi-agent systems and reinforcement learning and summarizing three representative categories of mainstream MARL algorithms. Typical control scenarios in water systems are also examined. From the perspective of cooperative control, this paper investigates the modeling mechanisms and policy coordination strategies of MARL in key tasks such as water supply scheduling, hydro-energy co-regulation, and autonomous monitoring. It further analyzes the challenges and solutions for improving global cooperative efficiency under practical constraints such as limited resources, system heterogeneity, and unstable communication. Additionally, recent progress in cross-domain generalization, integrated communication–perception frameworks, and system-level robustness enhancement is summarized. This work aims to provide a theoretical foundation and key insights for advancing research and practical applications of MARL-based intelligent control in water infrastructure systems. Full article
(This article belongs to the Special Issue Recent Developments in Machine Design, Automation and Robotics)
Show Figures

Figure 1

23 pages, 1999 KiB  
Review
Multi-Agent Reinforcement Learning in Games: Research and Applications
by Haiyang Li, Ping Yang, Weidong Liu, Shaoqiang Yan, Xinyi Zhang and Donglin Zhu
Biomimetics 2025, 10(6), 375; https://doi.org/10.3390/biomimetics10060375 - 6 Jun 2025
Viewed by 1828
Abstract
Biological systems, ranging from ant colonies to neural ecosystems, exhibit remarkable self-organizing intelligence. Inspired by these phenomena, this study investigates how bio-inspired computing principles can bridge game-theoretic rationality and multi-agent adaptability. This study systematically reviews the convergence of multi-agent reinforcement learning (MARL) and [...] Read more.
Biological systems, ranging from ant colonies to neural ecosystems, exhibit remarkable self-organizing intelligence. Inspired by these phenomena, this study investigates how bio-inspired computing principles can bridge game-theoretic rationality and multi-agent adaptability. This study systematically reviews the convergence of multi-agent reinforcement learning (MARL) and game theory, elucidating the innovative potential of this integrated paradigm for collective intelligent decision-making in dynamic open environments. Building upon stochastic game and extensive-form game-theoretic frameworks, we establish a methodological taxonomy across three dimensions: value function optimization, policy gradient learning, and online search planning, thereby clarifying the evolutionary logic and innovation trajectories of algorithmic advancements. Focusing on complex smart city scenarios—including intelligent transportation coordination and UAV swarm scheduling—we identify technical breakthroughs in MARL applications for policy space modeling and distributed decision optimization. By incorporating bio-inspired optimization approaches, the investigation particularly highlights evolutionary computation mechanisms for dynamic strategy generation in search planning, alongside population-based learning paradigms for enhancing exploration efficiency in policy refinement. The findings reveal core principles governing how groups make optimal choices in complex environments while mapping the technological development pathways created by blending cross-disciplinary methods to enhance multi-agent systems. Full article
Show Figures

Figure 1

Back to TopTop