applsci-logo

Journal Browser

Journal Browser

Deep Reinforcement Learning for Multiagent Systems

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 February 2026) | Viewed by 8900

Special Issue Editors


E-Mail Website
Guest Editor
Centre for Future Transport and Cities, Coventry University, Priory St., Coventry CV1 5FB, UK
Interests: logics and formal verification; simulation and model-based testing; automotive systems; reinforcemnet learning; multi-agent context-aware systems
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Computer Science and Creative Technologies, University of the West of England, Bristol BS16 1QY, UK
Interests: metaheuristics; parallel computing; multi-agent systems; planning and scheduling
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The field of Deep Reinforcement Learning for Multiagent Systems (DRL-MAS) is rapidly growing and dynamic, revolutionizing how autonomous agents learn and interact in complex environments. This area of research is pushing the boundaries of artificial intelligence, enabling the development of sophisticated, cooperative, and competitive behaviours among multiple agents across various real-world applications. This Special Issue aims to highlight and disseminate the latest advancements in DRL-MAS, with a specific focus on environments where multiple autonomous agents interact and make decisions. These interactions can be cooperative, competitive, or a mixture of both, posing unique challenges and opportunities for research and application. This Special Issue invites submissions that explore innovative algorithms advancing the frontier of DRL-MAS in multiagent scenarios. This encompasses, but is not limited to, novel approaches for policy learning, value function approximation, and exploration–exploitation strategies specifically designed for multiagent environments. Additionally, contributions that offer new theoretical insights into the behaviour, convergence, and optimality of DRL-MAS methods in multiagent environments are highly encouraged. Such theoretical foundations are crucial for understanding the strengths and limitations of existing approaches and for guiding the development of more robust and efficient algorithms. Practical implementations and case studies are also of great interest. These papers should demonstrate the application of DRL-MAS in real-world multiagent systems, such as robotic teams, autonomous driving fleets, smart grid management, financial markets, and complex gaming environments. These practical insights are valuable for showcasing the potential of DRL-MAS to solve real-world problems that involve multiple interacting agents.

Dr. Rakib Abdur
Dr. Mehmet Aydin
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • cooperative and competitive machine learning in multiagent systems
  • cooperative multiagent reinforcement learning
  • competitive multiagent reinforcement learning
  • multiagent machine learning for policy fine-tune and optimization
  • multiagent communication and coordination
  • multiagent value function approximation
  • transfer learning and generalization of experience
  • decentralized control and decision-making in multiagent systems
  • exploration and exploitation strategies in multiagent environments
  • robustness and adaptability of DRL-MAS to dynamic environments
  • theoretical foundations and analysis of DRL-MAS algorithms

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 9736 KB  
Article
Explaining Metastable Cooperation in Independent Multi-Agent Boltzmann Q-Learning—A Deterministic Approximation
by David Goll, Wolfram Barfuss and Jobst Heitzig
Appl. Sci. 2026, 16(7), 3524; https://doi.org/10.3390/app16073524 - 3 Apr 2026
Viewed by 262
Abstract
Multi-agent reinforcement learning involves interacting agents whose learning processes are coupled through a shared environment. This work introduces a discrete-time approximation model for multi-agent Boltzmann Q-learning that accounts for agents’ update frequencies. We demonstrate why previous models do not accurately represent the actual [...] Read more.
Multi-agent reinforcement learning involves interacting agents whose learning processes are coupled through a shared environment. This work introduces a discrete-time approximation model for multi-agent Boltzmann Q-learning that accounts for agents’ update frequencies. We demonstrate why previous models do not accurately represent the actual stochastic learning dynamics while our model can reproduce several complex emergent dynamic regimes, including transient cooperation and metastable states in social dilemmas like the Prisoner’s Dilemma. We show that increasing the discount factor can prevent convergence by inducing oscillations through a supercritical Neimark–Sacker bifurcation, which transforms the unique stable fixed point into a stable limit cycle. This analysis provides a deeper understanding of the complexities of multi-agent learning dynamics and the conditions under which convergence may not be achieved. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning for Multiagent Systems)
Show Figures

Figure 1

12 pages, 9302 KB  
Article
Robust Vision-Language-Action Models via Object-Centric Learning and Distance-Based Chunk Alignment
by Sung-Gil Park, Yong-Geon Kim, Seuk-Woo Ryu, Byeong Gil Yoo, Sungeun Chung, Jeong-Seop Park, Woo-Jin Ahn and Myo-Taeg Lim
Appl. Sci. 2026, 16(7), 3376; https://doi.org/10.3390/app16073376 - 31 Mar 2026
Viewed by 544
Abstract
Vision–language–action (VLA) models have shown strong potential for enabling robots to interpret goals and perform complex manipulation tasks by integrating perception, language, and control. However, existing VLAs rely heavily on large-scale, diverse demonstration datasets, which are difficult and expensive to collect. When trained [...] Read more.
Vision–language–action (VLA) models have shown strong potential for enabling robots to interpret goals and perform complex manipulation tasks by integrating perception, language, and control. However, existing VLAs rely heavily on large-scale, diverse demonstration datasets, which are difficult and expensive to collect. When trained with limited data, they often overfit to irrelevant visual cues such as background, lighting, or viewpoint, resulting in weak generalization. To overcome this limitation, we propose a simple yet effective object-centric learning framework for VLA. For each sub-task, the framework leverages an instance segmentation foundation model to identify and track task-relevant objects, and trains the policy on both the original RGB scene and two object-focused representations: (i) a masked image emphasizing the target object and (ii) an object-only crop. These multiple visual inputs share the same action supervision, encouraging the policy to attend to the manipulated object rather than the surrounding context. Furthermore, a distance-based chunk alignment mechanism ensures smooth control transitions between consecutive predicted action segments. Experiments conducted in both simulation and real hardware demonstrate that the proposed method achieves robust performance and stable trajectories across various manipulation tasks, validating its practicality and efficiency in training object-aware robotic behaviors. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning for Multiagent Systems)
Show Figures

Figure 1

25 pages, 6915 KB  
Article
EXAONE-VLA: A Unified Vision–Language Framework for Mobile Manipulation via Semantic Topology and Hierarchical LLM Reasoning
by Jeong-Seop Park, Yong-Jun Lee, Jong-Chan Park, Sung-Gil Park, Jong-Jin Woo and Myo-Taeg Lim
Appl. Sci. 2026, 16(5), 2600; https://doi.org/10.3390/app16052600 - 9 Mar 2026
Viewed by 828
Abstract
This paper proposes a unified vision–language framework that translates user instructions into navigation for the mobile base and actions for the manipulator in indoor environments. In general, occupancy grid maps constructed via SLAM capture solely the geometric layout of the environment. This renders [...] Read more.
This paper proposes a unified vision–language framework that translates user instructions into navigation for the mobile base and actions for the manipulator in indoor environments. In general, occupancy grid maps constructed via SLAM capture solely the geometric layout of the environment. This renders the robot incapable of leveraging the semantic information required for object distinction. The proposed method encodes semantic information from vision–language models and the robot’s pose in a textual format, referred to as a semantic topological graph. Specifically, the models including GroundingDINO, LG EXAONE, and SAM2 extract object-level semantic information, which is subsequently used to identify room characteristics. A large language model then interprets user instructions to identify the final destination for navigation within the semantic topological graph, followed by reasoning to determine the suitable action network. Notably, the proposed text-based representation facilitates a substantial reduction in inference time, and its effectiveness is validated through real-world experiments. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning for Multiagent Systems)
Show Figures

Figure 1

19 pages, 4784 KB  
Article
Cooperative Formation Control of a Multi-Agent Khepera IV Mobile Robots System Using Deep Reinforcement Learning
by Gonzalo Garcia, Azim Eskandarian, Ernesto Fabregas, Hector Vargas and Gonzalo Farias
Appl. Sci. 2025, 15(4), 1777; https://doi.org/10.3390/app15041777 - 10 Feb 2025
Cited by 5 | Viewed by 3034
Abstract
The increasing complexity of autonomous vehicles has exposed the limitations of many existing control systems. Reinforcement learning (RL) is emerging as a promising solution to these challenges, enabling agents to learn and enhance their performance through interaction with the environment. Unlike traditional control [...] Read more.
The increasing complexity of autonomous vehicles has exposed the limitations of many existing control systems. Reinforcement learning (RL) is emerging as a promising solution to these challenges, enabling agents to learn and enhance their performance through interaction with the environment. Unlike traditional control algorithms, RL facilitates autonomous learning via a recursive process that can be fully simulated, thereby preventing potential damage to the actual robot. This paper presents the design and development of an RL-based algorithm for controlling the collaborative formation of a multi-agent Khepera IV mobile robot system as it navigates toward a target while avoiding obstacles in the environment by using onboard infrared sensors. This study evaluates the proposed RL approach against traditional control laws within a simulated environment using the CoppeliaSim simulator. The results show that the performance of the RL algorithm gives a sharper control law concerning traditional approaches without the requirement to adjust the control parameters manually. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning for Multiagent Systems)
Show Figures

Figure 1

20 pages, 2460 KB  
Article
Multi-Agent System for Emulating Personality Traits Using Deep Reinforcement Learning
by Georgios Liapis and Ioannis Vlahavas
Appl. Sci. 2024, 14(24), 12068; https://doi.org/10.3390/app142412068 - 23 Dec 2024
Cited by 2 | Viewed by 3052
Abstract
Conventional personality assessment methods depend on subjective input, while game-based AI predictive methods offer a dynamic and objective framework. However, training these models requires large and labeled datasets, which are challenging to obtain from real players with diverse personality traits. In this paper, [...] Read more.
Conventional personality assessment methods depend on subjective input, while game-based AI predictive methods offer a dynamic and objective framework. However, training these models requires large and labeled datasets, which are challenging to obtain from real players with diverse personality traits. In this paper, we propose a multi-agent system using Deep Reinforcement Learning in a game environment to generate the necessary labeled data. Each agent is trained with custom reward functions based on the HiDAC system that encourages trait-aligned behaviors to emulate specific personality traits based on the OCEAN personality trait model. The Multi-Agent Posthumous Credit Assignment (MA-POCA) algorithm facilitates continuous learning, allowing agents to emulate behaviors through self-play. The resulting gameplay data provide diverse, high-quality samples. This approach allows for robust individual and team assessments, as agent interactions reveal the impact of personality traits on team dynamics and performance. Ultimately, this methodology provides a scalable, unbiased methodology for human personality evaluation in various settings, establishing new standards for data-driven assessment methods. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning for Multiagent Systems)
Show Figures

Figure 1

Back to TopTop