MDPI - Publisher of Open Access Journals

18 pages, 3227 KiB

Open AccessArticle

Optimized Adversarial Tactics for Disrupting Cooperative Multi-Agent Reinforcement Learning

by Guangze Yang, Xinyuan Miao, Yabin Peng, Wei Huang and Fan Zhang

Electronics 2025, 14(14), 2777; https://doi.org/10.3390/electronics14142777 - 10 Jul 2025

Viewed by 327

Multi-agent reinforcement learning has demonstrated excellent performance in complex decision-making tasks such as electronic games, power grid management, and autonomous driving. However, its vulnerability to adversarial attacks may impede its widespread application. Currently, research on adversarial attacks in reinforcement learning primarily focuses on [...] Read more.

Multi-agent reinforcement learning has demonstrated excellent performance in complex decision-making tasks such as electronic games, power grid management, and autonomous driving. However, its vulnerability to adversarial attacks may impede its widespread application. Currently, research on adversarial attacks in reinforcement learning primarily focuses on single-agent scenarios, while studies in multi-agent settings are relatively limited, especially regarding how to achieve optimized attacks with fewer steps. This paper aims to bridge the gap by proposing a heuristic exploration-based attack method named the Search for Key steps and Key agents Attack (SKKA). Unlike previous studies that train a reinforcement learning model to explore attack strategies, our approach relies on a constructed predictive model and a T-value function to search for the optimal attack strategy. The predictive model predicts the environment and agent states after executing the current attack for a certain period, based on simulated environment feedback. The T-value function is then used to evaluate the effectiveness of the current attack. We select the strategy with the highest attack effectiveness from all possible attacks and execute it in the real environment. Experimental results demonstrate that our attack method ensures maximum attack effectiveness while greatly reducing the number of attack steps, thereby improving attack efficiency. In the StarCraft Multi-Agent Challenge (SMAC) scenario, by attacking 5–15% of the time steps, we can reduce the win rate from 99% to nearly 0%. By attacking approximately 20% of the agents and 24% of the time steps, we can reduce the win rate to around 3%. Full article

(This article belongs to the Special Issue AI Applications of Multi-Agent Systems)

► Show Figures

Figure 1

21 pages, 9553 KiB

Open AccessArticle

Assisted-Value Factorization with Latent Interaction in Cooperate Multi-Agent Reinforcement Learning

by Zhitong Zhao, Ya Zhang, Siying Wang, Yang Zhou, Ruoning Zhang and Wenyu Chen

Mathematics 2025, 13(9), 1429; https://doi.org/10.3390/math13091429 - 27 Apr 2025

Viewed by 496

Abstract

With the development of value decomposition methods, multi-agent reinforcement learning (MARL) has made significant progress in balancing autonomous decision making with collective cooperation. However, the collaborative dynamics among agents are continuously changing. The current value decomposition methods struggle to adeptly handle these dynamic [...] Read more.

With the development of value decomposition methods, multi-agent reinforcement learning (MARL) has made significant progress in balancing autonomous decision making with collective cooperation. However, the collaborative dynamics among agents are continuously changing. The current value decomposition methods struggle to adeptly handle these dynamic changes, thereby impairing the effectiveness of cooperative policies. In this paper, we introduce the concept of latent interaction, upon which an innovative method for generating weights is developed. The proposed method derives weights from the history information, thereby enhancing the accuracy of value estimations. Building upon this, we further propose a dynamic masking mechanism that recalibrates history information in response to the activity level of agents, improving the precision of latent interaction assessments. Experimental results demonstrate the improved training speed and superior performance of the proposed method in both a multi-agent particle environment and the StarCraft Multi-Agent Challenge. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

15 pages, 717 KiB

Open AccessArticle

Integration of Causal Models and Deep Neural Networks for Recommendation Systems in Dynamic Environments: A Case Study in StarCraft II

by Fernando Moreira, Jairo Ivan Velez-Bedoya and Jeferson Arango-López

Appl. Sci. 2025, 15(8), 4263; https://doi.org/10.3390/app15084263 - 12 Apr 2025

Cited by 1 | Viewed by 675

Abstract

In the context of real-time strategy video games like StarCraft II, strategic decision-making is a complex challenge that requires adaptability and precision. This research creates a mixed recommendation system that uses causal models and deep neural networks to improve its ability to suggest [...] Read more.

In the context of real-time strategy video games like StarCraft II, strategic decision-making is a complex challenge that requires adaptability and precision. This research creates a mixed recommendation system that uses causal models and deep neural networks to improve its ability to suggest the best strategies based on the resources and conditions of the game. PySC2 and the official StarCraft II API collected data from 100 controlled matches, standardizing conditions with the Terran race. We created fake data using a Conditional Tabular Generative Adversarial Network to address data scarcity situations. These data were checked for accuracy using Kolmogorov–Smirnov tests and correlation analysis. The causal model, implemented with PyMC, captured key causal relationships between variables such as resources, military units, and strategies. These predictions were integrated as additional features into a deep neural network trained with PyTorch. The results show that the hybrid system is 1.1% more accurate and has a higher F1 score than a pure neural network. It also changes its suggestions based on the resources it has access to. However, certain limitations were identified, such as a bias toward offensive strategies in the original data. This approach highlights the potential of combining causal knowledge with machine learning for recommendation systems in dynamic environments. Full article

(This article belongs to the Special Issue Application of Machine Learning and Artificial Intelligence in Human-Computer Interaction)

► Show Figures

Figure 1

21 pages, 6382 KiB

Open AccessArticle

Hydrodynamic Performance of High-Speed Craft: A CFD Study on Spray Rails

by Muhammad Sulman, Simone Mancini, Rasul Niazmand Bilandi and Luigi Vitiello

J. Mar. Sci. Eng. 2025, 13(3), 438; https://doi.org/10.3390/jmse13030438 - 25 Feb 2025

Cited by 1 | Viewed by 1063

Abstract

In high-speed crafts, whisker spray increases viscous resistance by enlarging the wetted surface near the stagnation line. Spray rails (SRs) mitigate this issue by redirecting water flow, reducing the wetted surface, and lowering overall resistance. This study investigates the effect of SRs on [...] Read more.

In high-speed crafts, whisker spray increases viscous resistance by enlarging the wetted surface near the stagnation line. Spray rails (SRs) mitigate this issue by redirecting water flow, reducing the wetted surface, and lowering overall resistance. This study investigates the effect of SRs on the hydrodynamic performance of the C1 hull of Naples Systematic Series (NSS), focusing on the systematic variations in size, number, and placement. Numerical simulations, validated with towing tank results, were conducted using STAR CCM+ 2306. Mesh independence analysis was also performed to optimize computational efficiency. Key findings highlight the critical role of SR design in performance optimization. Wider SRs (e.g., three per side, 0.96% LWL) reduced resistance by up to 8.5% at high speeds (

{F r}_{\nabla} =

3.26), but slightly increased the resistance at lower speeds (~2%) due to a larger wetted surface. Narrower SRs (e.g., three per side, 0.48% LWL) achieved resistance reductions of up to 4.6%, while configurations with multiple SRs (e.g., three per side, 0.72% LWL) outperformed single-rail designs by reducing resistance up to 4%. Placement near the chine proved more effective than near the keel, offering a 4% additional reduction in resistance. Additionally, SRs generated lift, raising the hull, and reducing immersion. The study underscores the importance of optimizing SR size, number, and placement to enhance hydrodynamic efficiency, particularly for high-speed operations. Full article

(This article belongs to the Special Issue Ship Performance in Actual Seas)

► Show Figures

Figure 1

20 pages, 1192 KiB

Open AccessArticle

A Multitask-Based Transfer Framework for Cooperative Multi-Agent Reinforcement Learning

by Cheng Hu, Chenxu Wang, Weijun Luo, Chaowen Yang, Liuyu Xiang and Zhaofeng He

Appl. Sci. 2025, 15(4), 2216; https://doi.org/10.3390/app15042216 - 19 Feb 2025

Viewed by 1490

Abstract

Multi-agent reinforcement learning (MARL) has proven to be effective and promising in team collaboration tasks. Knowledge transfer in MARL has also received increasing attention. Compared to knowledge transfer in single-agent tasks, knowledge transfer in multi-agent tasks is more complex due to the need [...] Read more.

Multi-agent reinforcement learning (MARL) has proven to be effective and promising in team collaboration tasks. Knowledge transfer in MARL has also received increasing attention. Compared to knowledge transfer in single-agent tasks, knowledge transfer in multi-agent tasks is more complex due to the need to account for coordination among agents. However, existing knowledge transfer-based methods only focus on strategies or agent-level knowledge in a single task, and the transfer of knowledge in such a specific task to new and different types of tasks is likely to fail. In this paper, we propose a multitask-based training framework termed MTT in cooperative MARL, which aims to learn shared collaborative knowledge across multiple tasks simultaneously and then apply it to solve other related tasks. However, models obtained from multitask learning may fail on other tasks because the gradients from different tasks may conflict with each other. To obtain a model with shared knowledge, we provide conflict-free updates by ensuring a positive dot product between the final update and the gradient of each specific task. It also maintains a consistent optimization rate for all tasks. Experiments conducted in two popular environments, StarCraft II Multi-Agent Challenge and Google Research Football, demonstrate that our method outperforms the baselines, significantly improving the efficiency of team collaboration. Full article

► Show Figures

Figure 1

15 pages, 2741 KiB

Open AccessArticle

SC-Phi2: A Fine-Tuned Small Language Model for StarCraft II Build Order Prediction

by Muhammad Junaid Khan and Gita Sukthankar

AI 2024, 5(4), 2338-2352; https://doi.org/10.3390/ai5040115 - 13 Nov 2024

Cited by 2 | Viewed by 2032

Abstract

Background: This article introduces SC-Phi2, a fine-tuned StarCraft II small language model. Small language models, like Phi2, Gemma, and DistilBERT, are streamlined versions of large language models (LLMs) with fewer parameters that require less computational power and memory to run. Method: To teach [...] Read more.

Background: This article introduces SC-Phi2, a fine-tuned StarCraft II small language model. Small language models, like Phi2, Gemma, and DistilBERT, are streamlined versions of large language models (LLMs) with fewer parameters that require less computational power and memory to run. Method: To teach Microsoft’s Phi2 model about StarCraft, we create a new SC2 text dataset with information about StarCraft races, roles, and actions and use it to fine-tune Phi-2 with self-supervised learning. We pair this language model with a Vision Transformer (ViT) from the pre-trained BLIP-2 (Bootstrapping Language Image Pre-training) model, fine-tuning it on the StarCraft replay dataset, MSC. This enables us to construct dynamic prompts that include visual game state information. Results: Unlike the large models used in StarCraft LLMs such as GPT-3.5, Phi2 is trained primarily on textbook data and contains little inherent knowledge of StarCraft II beyond what is provided by our training process. By using LoRA (Low-rank Adaptation) and quantization, our model can be trained on a single GPU. We demonstrate that our model performs well at build order prediction, an important StarCraft macromanagement task. Conclusions: Our research on the usage of small models is a step towards reducing the carbon footprint of AI agents. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Figure 1

20 pages, 8941 KiB

Open AccessArticle

Comprehensive Analysis of Improved Hunter–Prey Algorithms in MPPT for Photovoltaic Systems Under Complex Localized Shading Conditions

by Zhuoxuan Li, Changxin Fu, Lixin Zhang and Jiawei Zhao

Electronics 2024, 13(21), 4148; https://doi.org/10.3390/electronics13214148 - 22 Oct 2024

Viewed by 1119

Abstract

The Hunter–Prey Optimization (HPO) algorithm represents a novel population-based optimization approach renowned for its efficacy in addressing intricate problems and optimization challenges. Photovoltaic (PV) systems, characterized by multi-peaked shading conditions, often pose a challenge to conventional maximum power point tracking (MPPT) techniques in [...] Read more.

The Hunter–Prey Optimization (HPO) algorithm represents a novel population-based optimization approach renowned for its efficacy in addressing intricate problems and optimization challenges. Photovoltaic (PV) systems, characterized by multi-peaked shading conditions, often pose a challenge to conventional maximum power point tracking (MPPT) techniques in accurately identifying the global maximum power point. In this research, an MPPT control strategy grounded in an improved Hunter–Prey Optimization (IHPO) algorithm is proposed. Eight distinct shading scenarios are meticulously crafted to assess the feasibility and effectiveness of the proposed MPPT method in capturing the maximum power point. A performance evaluation is conducted utilizing both MATLAB/simulation and an embedded system, alongside a comparative analysis with alternative power tracking methodologies, considering the diverse climatic conditions across different seasons. The simulation outcomes demonstrate the capability of the proposed control strategy in accurately tracking the global maximum power point, achieving a commendable efficiency of 100% across seven shading conditions, with a tracking response time of approximately 0.2 s. Verification results obtained from the experimental platform illustrate a tracking efficiency of 98.75% for the proposed method. Finally, the IHPO method’s output performance is evaluated on the StarSim Rapid Control Prototyping (RCP) platform, indicating a substantial enhancement in the tracking efficiency of the photovoltaic system while maintaining rapid response times. Full article

► Show Figures

Figure 1

21 pages, 5729 KiB

Open AccessArticle

An Effective Training Method for Counterfactual Multi-Agent Policy Network Based on Differential Evolution Algorithm

by Shaochun Qu, Ruiqi Guo, Zijian Cao, Jiawei Liu, Baolong Su and Minghao Liu

Appl. Sci. 2024, 14(18), 8383; https://doi.org/10.3390/app14188383 - 18 Sep 2024

Cited by 1 | Viewed by 1810

Abstract

Due to the advantages of a centralized critic to estimate the Q-function value and decentralized actors to optimize the agents’ policies, counterfactual multi-agent (COMA) stands out in most multi-agent reinforcement learning (MARL) algorithms. The sharing of policy parameters can improve sampling efficiency [...] Read more.

Due to the advantages of a centralized critic to estimate the Q-function value and decentralized actors to optimize the agents’ policies, counterfactual multi-agent (COMA) stands out in most multi-agent reinforcement learning (MARL) algorithms. The sharing of policy parameters can improve sampling efficiency and learning effectiveness, but it may lead to a lack of policy diversity. Hence, to balance parameter sharing and diversity among agents in COMA has been a persistent research topic. In this paper, an effective training method for a COMA policy network based on a differential evolution (DE) algorithm is proposed, named DE-COMA. DE-COMA introduces individuals in a population as computational units to construct the policy network with operations such as mutation, crossover, and selection. The average return of DE-COMA is set as the fitness function, and the best individual of policy network will be chosen for the next generation. By maintaining better parameter sharing to enhance parameter diversity, multi-agent strategies will become more exploratory. To validate the effectiveness of DE-COMA, experiments were conducted in the StarCraft II environment with 2s_vs_1sc, 2s3z, 3m, and 8m battle scenarios. Experimental results demonstrate that DE-COMA significantly outperforms the traditional COMA and most other multi-agent reinforcement learning algorithms in terms of win rate and convergence speed. Full article

► Show Figures

Figure 1

16 pages, 1931 KiB

Open AccessReview

Film-Induced Tourism, Destination Branding and Game of Thrones: A Review of the Peñíscola de Cine Project

by Pablo Jesús Huerta-Viso, Germán Llorca Abad and Lourdes Canós-Darós

Sustainability 2024, 16(1), 186; https://doi.org/10.3390/su16010186 - 25 Dec 2023

Cited by 2 | Viewed by 4254

Abstract

This paper addresses an alternative perspective on tourism success, emphasising sustainability over traditional quantitative metrics such as arrival numbers. It explores the impact of fiction films and TV series on individuals’ mental representations of destinations featured on screen, as well as the capacity [...] Read more.

This paper addresses an alternative perspective on tourism success, emphasising sustainability over traditional quantitative metrics such as arrival numbers. It explores the impact of fiction films and TV series on individuals’ mental representations of destinations featured on screen, as well as the capacity of film discourse to construct a brand aligned with local stakeholders’ interests. Qualitative methods have been employed, conducting a literature review on sustainable film tourism and destination branding. Local news and an interview with the head of the Peñíscola Film Office complemented academic insights. The primary goal is to examine the “Peñíscola de Cine” project as a paradigm of success, initiated by the city council of Peñíscola, Spain. This project positions the municipality as a natural film set through productions like Game of Thrones (2011–2019), illustrating how film can contribute to destination branding and community engagement. The study highlights the positive contribution of film tourism to sustainability by diversifying and de-seasonalising a territory’s offerings. It also attracts a more educated and environmentally conscious audience. However, it cautiously discusses the potential risks, as evidenced by misapplications in Goathland, England, and Skellig Michael, Ireland, following their appearances in Heartbeat (1992–2010) and Star Wars (1977–2019), respectively. The paper concludes by suggesting film-friendly measures for destination management organizations (DMOs), emphasising the pivotal role of film commissions and film offices in crafting effective marketing strategies and capturing the interest of audiovisual production companies. Full article

(This article belongs to the Special Issue Heritage, Cultural Tourism and Sustainability: Meaningful Travel for a Green Planet)

► Show Figures

Figure 1

25 pages, 49963 KiB

Open AccessArticle

Three-Dimensional Flight Corridor: An Occupancy Checking Process for Unmanned Aerial Vehicle Motion Planning inside Confined Spaces

by Sherif Mostafa and Alejandro Ramirez-Serrano

Robotics 2023, 12(5), 134; https://doi.org/10.3390/robotics12050134 - 29 Sep 2023

Cited by 4 | Viewed by 2857

Abstract

To deploy Unmanned Aerial Vehicles (UAVs) inside heterogeneous GPS-denied confined (potentially unknown) spaces, such as those encountered in mining and Urban Search and Rescue (USAR), requires the enhancement of numerous technologies. Of special interest is for UAVs to identify collision-freeSafe Flight Corridors ( [...] Read more.

To deploy Unmanned Aerial Vehicles (UAVs) inside heterogeneous GPS-denied confined (potentially unknown) spaces, such as those encountered in mining and Urban Search and Rescue (USAR), requires the enhancement of numerous technologies. Of special interest is for UAVs to identify collision-freeSafe Flight Corridors (SFC $^{+}$ ) within highly cluttered convex- and non-convex-shaped environments, which requires UAVs to perform advanced flight maneuvers while exploiting their flying capabilities. Within this paper, a novel auxiliary occupancy checking process that augments traditional 3D flight corridor generation is proposed. The 3D flight corridor is established as a topological structure based on a hand-crafted path either derived from a computer-generated environment or provided by the human operator, which captures humans’ preferences and desired flight intentions for the given space. This corridor is formulated as a series of interconnected overlapping convex polyhedra bounded by the perceived environmental geometries, which facilitates the generation of suitable 3D flight paths/trajectories that avoid local minima within the corridor boundaries. An occupancy check algorithm is employed to reduce the search space needed to identify 3D obstacle-free spaces in which their constructed polyhedron geometries are replaced with alternate convex polyhedra. To assess the feasibility and efficiency of the proposed SFC $^{+}$ methodology, a comparative study is conducted against the Star-Convex Method (SCM), a prominent algorithm in the field. The results reveal the superiority of the proposed SFC $^{+}$ methodology in terms of its computational efficiency and reduced search space for UAV maneuvering solutions. Various challenging confined-environment scenarios, each with different obstacle densities (confined scenarios), are utilized to verify the obtained outcomes. Full article

(This article belongs to the Special Issue UAV Systems and Swarm Robotics)

► Show Figures

Figure 1

16 pages, 5615 KiB

Open AccessArticle

Research on Maneuverability Prediction of Double Waterjet Propulsion High Speed Planing Craft

by Hua-Wei Sun, Jing-Lei Yang, Bo Liu, Hong-Wei Li, Jia-Feng Xiao and Han-Bing Sun

J. Mar. Sci. Eng. 2022, 10(12), 1978; https://doi.org/10.3390/jmse10121978 - 12 Dec 2022

Cited by 1 | Viewed by 2042

Abstract

A mathematical model for predicting the maneuvering motion of a ship is constructed, using a planing craft with dual waterjet propulsion as the object of study. This model is based on the standard approach of the MMG (Maneuvering Modeling Group) and uses the [...] Read more.

A mathematical model for predicting the maneuvering motion of a ship is constructed, using a planing craft with dual waterjet propulsion as the object of study. This model is based on the standard approach of the MMG (Maneuvering Modeling Group) and uses the Runge–Kutta algorithm to solve the differential equations. For the simulation of the turning and Z-shape maneuvering motion, the RANS equation is first solved using the program STAR-CCM + and then the PMM motion of the hull is simulated using the overlapping grid approach to derive the hydrodynamic derivative. The established method for predicting the ship’s maneuverability is feasible, as shown by the calculated results, which agree well with those obtained using data from the sea trials. This method was used to simulate the rudder rotation and Z-shape motion of the planing craft at medium and high speeds to predict the maneuverability index. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

12 pages, 2519 KiB

Open AccessArticle

Novel Reinforcement Learning Research Platform for Role-Playing Games

by Petra Csereoka, Bogdan-Ionuţ Roman, Mihai Victor Micea and Călin-Adrian Popa

Mathematics 2022, 10(22), 4363; https://doi.org/10.3390/math10224363 - 20 Nov 2022

Cited by 5 | Viewed by 2868

Abstract

The latest achievements in the field of reinforcement learning have encouraged the development of vision-based learning methods that compete with human-provided results obtained on various games and training environments. Convolutional neural networks together with Q-learning-based approaches have managed to solve and outperform human [...] Read more.

The latest achievements in the field of reinforcement learning have encouraged the development of vision-based learning methods that compete with human-provided results obtained on various games and training environments. Convolutional neural networks together with Q-learning-based approaches have managed to solve and outperform human players in environments such as Atari 2600, Doom or StarCraft II, but the niche of 3D realistic games with a high degree of freedom of movement and rich graphics remains unexplored, despite having the highest resemblance to real-world situations. In this paper, we propose a novel testbed to push the limits of deep learning methods, namely an OpenAI Gym-like environment based on Dark Souls III, a notoriously difficult role-playing game, where even human players have reportedly struggled. We explore two types of architectures, Deep Q-Network and Deep Recurrent Q-Network, providing the results of a first incursion into this new problem class. The source code for the training environment and baselines is made available. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning Based Methods and Applications)

► Show Figures

Figure 1

24 pages, 1102 KiB

Open AccessArticle

Pruning the Communication Bandwidth between Reinforcement Learning Agents through Causal Inference: An Innovative Approach to Designing a Smart Grid Power System

by Xianjie Zhang, Yu Liu, Wenjun Li and Chen Gong

Sensors 2022, 22(20), 7785; https://doi.org/10.3390/s22207785 - 13 Oct 2022

Cited by 3 | Viewed by 2628

Abstract

Electricity demands are increasing significantly and the traditional power grid system is facing huge challenges. As the desired next-generation power grid system, smart grid can provide secure and reliable power generation, and consumption, and can also realize the system’s coordinated and intelligent power [...] Read more.

Electricity demands are increasing significantly and the traditional power grid system is facing huge challenges. As the desired next-generation power grid system, smart grid can provide secure and reliable power generation, and consumption, and can also realize the system’s coordinated and intelligent power distribution. Coordinating grid power distribution usually requires mutual communication between power distributors to accomplish coordination. However, the power network is complex, the network nodes are far apart, and the communication bandwidth is often expensive. Therefore, how to reduce the communication bandwidth in the cooperative power distribution process task is crucially important. One way to tackle this problem is to build mechanisms to selectively send out communications, which allow distributors to send information at certain moments and key states. The distributors in the power grid are modeled as reinforcement learning agents, and the communication bandwidth in the power grid can be reduced by optimizing the communication frequency between agents. Therefore, in this paper, we propose a model for deciding whether to communicate based on the causal inference method, Causal Inference Communication Model (CICM). CICM regards whether to communicate as a binary intervention variable, and determines which intervention is more effective by estimating the individual treatment effect (ITE). It offers the optimal communication strategy about whether to send information while ensuring task completion. This method effectively reduces the communication frequency between grid distributors, and at the same time maximizes the power distribution effect. In addition, we test the method in StarCraft II and 3D environment habitation experiments, which fully proves the effectiveness of the method. Full article

(This article belongs to the Special Issue Wireless Sensor Networks in Smart Grid Communications)

► Show Figures

Figure 1

15 pages, 1434 KiB

Open AccessArticle

Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning

by Siying Wang, Wenyu Chen, Jian Hu, Siyue Hu and Liwei Huang

Mathematics 2022, 10(15), 2728; https://doi.org/10.3390/math10152728 - 2 Aug 2022

Cited by 7 | Viewed by 3157

Abstract

Leveraging global state information to enhance policy optimization is a common approach in multi-agent reinforcement learning (MARL). Even with the supplement of state information, the agents still suffer from insufficient exploration in the training stage. Moreover, training with batch-sampled examples from the replay [...] Read more.

Leveraging global state information to enhance policy optimization is a common approach in multi-agent reinforcement learning (MARL). Even with the supplement of state information, the agents still suffer from insufficient exploration in the training stage. Moreover, training with batch-sampled examples from the replay buffer will induce the policy overfitting problem, i.e., multi-agent proximal policy optimization (MAPPO) may not perform as good as independent PPO (IPPO) even with additional information in the centralized critic. In this paper, we propose a novel noise-injection method to regularize the policies of agents and mitigate the overfitting issue. We analyze the cause of policy overfitting in actor–critic MARL, and design two specific patterns of noise injection applied to the advantage function with random Gaussian noise to stabilize the training and enhance the performance. The experimental results on the Matrix Game and StarCraft II show the higher training efficiency and superior performance of our method, and the ablation studies indicate our method will keep higher entropy of agents’ policies during training, which leads to more exploration. Full article

(This article belongs to the Special Issue Artificial Neural Networks: Design and Applications)

► Show Figures

Figure 1

9 pages, 2235 KiB

Open AccessArticle

The Important Role of Global State for Multi-Agent Reinforcement Learning

by Shuailong Li, Wei Zhang, Yuquan Leng and Xiaohui Wang

Future Internet 2022, 14(1), 17; https://doi.org/10.3390/fi14010017 - 30 Dec 2021

Cited by 1 | Viewed by 3044

Abstract

Environmental information plays an important role in deep reinforcement learning (DRL). However, many algorithms do not pay much attention to environmental information. In multi-agent reinforcement learning decision-making, because agents need to make decisions combined with the information of other agents in the environment, [...] Read more.

Environmental information plays an important role in deep reinforcement learning (DRL). However, many algorithms do not pay much attention to environmental information. In multi-agent reinforcement learning decision-making, because agents need to make decisions combined with the information of other agents in the environment, this makes the environmental information more important. To prove the importance of environmental information, we added environmental information to the algorithm. We evaluated many algorithms on a challenging set of StarCraft II micromanagement tasks. Compared with the original algorithm, the standard deviation (except for the VDN algorithm) was smaller than that of the original algorithm, which shows that our algorithm has better stability. The average score of our algorithm was higher than that of the original algorithm (except for VDN and COMA), which shows that our work significantly outperforms existing multi-agent RL methods. Full article

(This article belongs to the Section Big Data and Augmented Intelligence)

► Show Figures

Figure 1

Search Results (22)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (22)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI