MDPI - Publisher of Open Access Journals

16 pages, 3099 KiB

Open AccessArticle

Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control with Spatio-Temporal Attention Mechanism

by Wenzhe Jia and Mingyu Ji

Appl. Sci. 2025, 15(15), 8605; https://doi.org/10.3390/app15158605 - 3 Aug 2025

Viewed by 581

Traffic congestion in large-scale road networks significantly impacts urban sustainability. Traditional traffic signal control methods lack adaptability to dynamic traffic conditions. Recently, deep reinforcement learning (DRL) has emerged as a promising solution for optimizing signal control. This study proposes a Multi-Agent Deep Reinforcement [...] Read more.

Traffic congestion in large-scale road networks significantly impacts urban sustainability. Traditional traffic signal control methods lack adaptability to dynamic traffic conditions. Recently, deep reinforcement learning (DRL) has emerged as a promising solution for optimizing signal control. This study proposes a Multi-Agent Deep Reinforcement Learning (MADRL) framework for large-scale traffic signal control. The framework employs spatio-temporal attention networks to extract relevant traffic patterns and a hierarchical reinforcement learning strategy for coordinated multi-agent optimization. The problem is formulated as a Markov Decision Process (MDP) with a novel reward function that balances vehicle waiting time, throughput, and fairness. We validate our approach on simulated large-scale traffic scenarios using SUMO (Simulation of Urban Mobility). Experimental results demonstrate that our framework reduces vehicle waiting time by 25% compared to baseline methods while maintaining scalability across different road network sizes. The proposed spatio-temporal multi-agent reinforcement learning framework effectively optimizes large-scale traffic signal control, providing a scalable and efficient solution for smart urban transportation. Full article

► Show Figures

Figure 1

19 pages, 2833 KiB

Open AccessArticle

Research on AGV Path Planning Based on Improved DQN Algorithm

by Qian Xiao, Tengteng Pan, Kexin Wang and Shuoming Cui

Sensors 2025, 25(15), 4685; https://doi.org/10.3390/s25154685 - 29 Jul 2025

Viewed by 520

Abstract

Traditional deep reinforcement learning methods suffer from slow convergence speeds and poor adaptability in complex environments and are prone to falling into local optima in AGV system applications. To address these issues, in this paper, an adaptive path planning algorithm with an improved [...] Read more.

Traditional deep reinforcement learning methods suffer from slow convergence speeds and poor adaptability in complex environments and are prone to falling into local optima in AGV system applications. To address these issues, in this paper, an adaptive path planning algorithm with an improved Deep Q Network algorithm called the B-PER DQN algorithm is proposed. Firstly, a dynamic temperature adjustment mechanism is constructed, and the temperature parameters in the Boltzmann strategy are adaptively adjusted by analyzing the change trend of the recent reward window. Next, the Priority experience replay mechanism is introduced to improve the training efficiency and task diversity through experience grading sampling and random obstacle configuration. Then, a refined multi-objective reward function is designed, combined with direction guidance, step punishment, and end point reward, to effectively guide the agent in learning an efficient path. Our experimental results show that, compared with other algorithms, the improved algorithm proposed in this paper achieves a higher success rate and faster convergence in the same environment and represents an efficient and adaptive solution for reinforcement learning for path planning in complex environments. Full article

(This article belongs to the Special Issue Intelligent Control and Robotic Technologies in Path Planning)

► Show Figures

Figure 1

20 pages, 1292 KiB

Open AccessReview

AI-Driven Polypharmacology in Small-Molecule Drug Discovery

by Mena Abdelsayed

Int. J. Mol. Sci. 2025, 26(14), 6996; https://doi.org/10.3390/ijms26146996 - 21 Jul 2025

Viewed by 732

Abstract

Polypharmacology, the rational design of small molecules that act on multiple therapeutic targets, offers a transformative approach to overcome biological redundancy, network compensation, and drug resistance. This review outlines the scientific rationale for polypharmacology, highlighting its success across oncology, neurodegeneration, metabolic disorders, and [...] Read more.

Polypharmacology, the rational design of small molecules that act on multiple therapeutic targets, offers a transformative approach to overcome biological redundancy, network compensation, and drug resistance. This review outlines the scientific rationale for polypharmacology, highlighting its success across oncology, neurodegeneration, metabolic disorders, and infectious diseases. Emphasis is placed on how polypharmacological agents can synergize therapeutic effects, reduce adverse events, and improve patient compliance compared to combination therapies. We also explore how computational methods—spanning ligand-based modeling, structure-based docking, network pharmacology, and systems biology—enable target selection and multi-target ligand prediction. Recent advances in artificial intelligence (AI), particularly deep learning, reinforcement learning, and generative models, have further accelerated the discovery and optimization of multi-target agents. These AI-driven platforms are capable of de novo design of dual and multi-target compounds, some of which have demonstrated biological efficacy in vitro. Finally, we discuss the integration of omics data, CRISPR functional screens, and pathway simulations in guiding multi-target design, as well as the challenges and limitations of current AI approaches. Looking ahead, AI-enabled polypharmacology is poised to become a cornerstone of next-generation drug discovery, with potential to deliver more effective therapies tailored to the complexity of human disease. Full article

(This article belongs to the Special Issue Techniques and Strategies in Drug Design and Discovery, 3rd Edition)

► Show Figures

Figure 1

51 pages, 770 KiB

Open AccessSystematic Review

Novel Artificial Intelligence Applications in Energy: A Systematic Review

by Tai Zhang and Goran Strbac

Energies 2025, 18(14), 3747; https://doi.org/10.3390/en18143747 - 15 Jul 2025

Cited by 1 | Viewed by 708

Abstract

This systematic review examines state-of-the-art artificial intelligence applications in energy systems, assessing their performance, real-world deployments and transformative potential. Guided by PRISMA 2020, we searched Web of Science, IEEE Xplore, ScienceDirect, SpringerLink, and Google Scholar for English-language studies published between January 2015 and [...] Read more.

This systematic review examines state-of-the-art artificial intelligence applications in energy systems, assessing their performance, real-world deployments and transformative potential. Guided by PRISMA 2020, we searched Web of Science, IEEE Xplore, ScienceDirect, SpringerLink, and Google Scholar for English-language studies published between January 2015 and January 2025 that reported novel AI uses in energy, empirical results, or significant theoretical advances and passed peer review. After title–abstract screening and full-text assessment, it was determined that 129 of 3000 records met the inclusion criteria. The methodological quality, reproducibility and real-world validation were appraised, and the findings were synthesised narratively around four critical themes: reinforcement learning (35 studies), multi-agent systems (28), planning under uncertainty (25), and AI for resilience (22), with a further 19 studies covering other areas. Notable outcomes include DeepMind-based reinforcement learning cutting data centre cooling energy by 40%, multi-agent control boosting virtual power plant revenue by 28%, AI-enhanced planning slashing the computation time by 87% without sacrificing solution quality, battery management AI raising efficiency by 30%, and machine learning accelerating hydrogen catalyst discovery 200,000-fold. Across domains, AI consistently outperformed traditional techniques. The review is limited by its English-only scope, potential under-representation of proprietary industrial work, and the inevitable lag between rapid AI advances and peer-reviewed publication. Overall, the evidence positions AI as a pivotal enabler of cleaner, more reliable, and efficient energy systems, though progress will depend on data quality, computational resources, legacy system integration, equity considerations, and interdisciplinary collaboration. No formal review protocol was registered because this study is a comprehensive state-of-the-art assessment rather than a clinical intervention analysis. Full article

(This article belongs to the Special Issue Optimization and Machine Learning Approaches for Power Systems)

► Show Figures

Figure 1

22 pages, 2261 KiB

Open AccessArticle

Learning Deceptive Strategies in Adversarial Settings: A Two-Player Game with Asymmetric Information

by Sai Krishna Reddy Mareddy and Dipankar Maity

Appl. Sci. 2025, 15(14), 7805; https://doi.org/10.3390/app15147805 - 11 Jul 2025

Viewed by 445

Abstract

This study explores strategic deception and counter-deception in multi-agent reinforcement learning environments for a police officer–robber game. The research is motivated by real-world scenarios where agents must operate with partial observability and adversarial intent. We develop a suite of progressively complex grid-based environments [...] Read more.

This study explores strategic deception and counter-deception in multi-agent reinforcement learning environments for a police officer–robber game. The research is motivated by real-world scenarios where agents must operate with partial observability and adversarial intent. We develop a suite of progressively complex grid-based environments featuring dynamic goals, fake targets, and navigational obstacles. Agents are trained using deep Q-networks (DQNs) with game-theoretic reward shaping to encourage deceptive behavior in the robber and intent inference in the police officer. The robber learns to reach the true goal while misleading the police officer, and the police officer adapts to infer the robber’s intent and allocate resources effectively. The environments include fixed and dynamic layouts with varying numbers of goals and obstacles, allowing us to evaluate scalability and generalization. Experimental results demonstrate that the agents converge to equilibrium-like behaviors across all settings. The inclusion of obstacles increases complexity but also strengthens learned policies when guided by reward shaping. We conclude that integrating game theory with deep reinforcement learning enables the emergence of robust, deceptive strategies and effective counter-strategies, even in dynamic, high-dimensional environments. This work advances the design of intelligent agents capable of strategic reasoning under uncertainty and adversarial conditions. Full article

(This article belongs to the Special Issue Research Progress on the Application of Multi-agent Systems)

► Show Figures

Figure 1

17 pages, 624 KiB

Open AccessArticle

Parallel Simulation Multi-Sample Task Scheduling Approach Based on Deep Reinforcement Learning in Cloud Computing Environment

by Yuhao Xiao, Yping Yao and Feng Zhu

Mathematics 2025, 13(14), 2249; https://doi.org/10.3390/math13142249 - 11 Jul 2025

Viewed by 345

Abstract

Complex scenario analysis and evaluation simulations often involve multiple sets of simulation applications with different combinations of parameters, thus resulting in high computing power consumption, which is one of the factors that limits the efficiency of multi-sample parallel simulations. Cloud computing provides considerable [...] Read more.

Complex scenario analysis and evaluation simulations often involve multiple sets of simulation applications with different combinations of parameters, thus resulting in high computing power consumption, which is one of the factors that limits the efficiency of multi-sample parallel simulations. Cloud computing provides considerable amounts of cheap and convenient computing resources, thus providing efficient support for multi-sample simulation tasks. However, traditional simulation scheduling methods do not consider the collaborative parallel scheduling of multiple samples and multiple entities under multi-objective constraints. Deep reinforcement learning methods can continuously learn and adjust their strategies through interactions with the environment, demonstrating strong adaptability in response to dynamically changing task requirements. Therefore, herein, a parallel simulation multi-sample task scheduling method based on deep reinforcement learning in a cloud computing environment is proposed. The method collects cluster load information and simulation application information as state inputs in the cloud environment, designs a multi-objective reward function to balance the cost and execution efficiency, and uses deep Q-networks (DQNs) to train agents for intelligent scheduling of multi-sample parallel simulation tasks. In a real cloud environment, the proposed method demonstrates runtime reductions of 4–11% and execution cost savings of 11–22% compared to the Round-Robin algorithm, Best Fit algorithm, and genetic algorithm. Full article

(This article belongs to the Special Issue Advanced Optimization Modeling and Algorithms for Planning and Scheduling)

► Show Figures

Figure 1

29 pages, 870 KiB

Open AccessArticle

Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems

by Lativa Sid Ahmed Abdellahi, Zeinebou Zoubeir, Yahya Mohamed, Ahmedou Haouba and Sidi Hmetty

Mathematics 2025, 13(14), 2229; https://doi.org/10.3390/math13142229 - 9 Jul 2025

Viewed by 586

Abstract

This study presents a reinforcement learning–based approach to optimize replenishment policies in the presence of uncertainty, with the objective of minimizing total costs, including inventory holding, shortage, and ordering costs. The focus is on single-level assembly systems, where both component delivery lead times [...] Read more.

This study presents a reinforcement learning–based approach to optimize replenishment policies in the presence of uncertainty, with the objective of minimizing total costs, including inventory holding, shortage, and ordering costs. The focus is on single-level assembly systems, where both component delivery lead times and finished product demand are subject to randomness. The problem is formulated as a Markov decision process (MDP), in which an agent determines optimal order quantities for each component by accounting for stochastic lead times and demand variability. The Deep Q-Network (DQN) algorithm is adapted and employed to learn optimal replenishment policies over a fixed planning horizon. To enhance learning performance, we develop a tailored simulation environment that captures multi-component interactions, random lead times, and variable demand, along with a modular and realistic cost structure. The environment enables dynamic state transitions, lead time sampling, and flexible order reception modeling, providing a high-fidelity training ground for the agent. To further improve convergence and policy quality, we incorporate local search mechanisms and multiple action space discretizations per component. Simulation results show that the proposed method converges to stable ordering policies after approximately 100 episodes. The agent achieves an average service level of 96.93%, and stockout events are reduced by over 100% relative to early training phases. The system maintains component inventories within operationally feasible ranges, and cost components—holding, shortage, and ordering—are consistently minimized across 500 training episodes. These findings highlight the potential of deep reinforcement learning as a data-driven and adaptive approach to inventory management in complex and uncertain supply chains. Full article

(This article belongs to the Special Issue Advances in Operations Research for Logistic and Operations Management of Supply Chain Management)

► Show Figures

Figure 1

32 pages, 5154 KiB

Open AccessArticle

A Hierarchical Reinforcement Learning Framework for Multi-Agent Cooperative Maneuver Interception in Dynamic Environments

by Qinlong Huang, Yasong Luo, Zhong Liu, Jiawei Xia, Ming Chang and Jiaqi Li

J. Mar. Sci. Eng. 2025, 13(7), 1271; https://doi.org/10.3390/jmse13071271 - 29 Jun 2025

Viewed by 752

Abstract

To address the challenges of real-time decision-making and resource optimization in multi-agent cooperative interception tasks within dynamic environments, this paper proposes a hierarchical framework for reinforcement learning-based interception algorithm (HFRL-IA). By constructing a hierarchical Markov decision process (MDP) model based on dynamic game [...] Read more.

To address the challenges of real-time decision-making and resource optimization in multi-agent cooperative interception tasks within dynamic environments, this paper proposes a hierarchical framework for reinforcement learning-based interception algorithm (HFRL-IA). By constructing a hierarchical Markov decision process (MDP) model based on dynamic game equilibrium theory, the complex interception task is decomposed into two hierarchically optimized stages: dynamic task allocation and distributed path planning. At the high level, a sequence-to-sequence reinforcement learning approach is employed to achieve dynamic bipartite graph matching, leveraging a graph neural network encoder–decoder architecture to handle dynamically expanding threat targets. At the low level, an improved prioritized experience replay multi-agent deep deterministic policy gradient algorithm (PER-MADDPG) is designed, integrating curriculum learning and prioritized experience replay mechanisms to effectively enhance the interception success rate against complex maneuvering targets. Extensive simulations in diverse scenarios and comparisons with conventional task assignment strategies demonstrate the superiority of the proposed algorithm. Taking a typical scenario of 10 agents intercepting as an example, the HFRL-IA algorithm achieves a 22.51% increase in training rewards compared to the traditional end-to-end MADDPG algorithm, and the interception success rate is improved by 26.37%. This study provides a new methodological framework for distributed cooperative decision-making in dynamic adversarial environments, with significant application potential in areas such as maritime multi-agent security defense and marine environment monitoring. Full article

(This article belongs to the Special Issue Dynamics and Control of Marine Mechatronics)

► Show Figures

Figure 1

20 pages, 2579 KiB

Open AccessArticle

ERA-MADDPG: An Elastic Routing Algorithm Based on Multi-Agent Deep Deterministic Policy Gradient in SDN

by Wanwei Huang, Hongchang Liu, Yingying Li and Linlin Ma

Future Internet 2025, 17(7), 291; https://doi.org/10.3390/fi17070291 - 29 Jun 2025

Viewed by 377

Abstract

To address the fact that changes in network topology can have an impact on the performance of routing, this paper proposes an Elastic Routing Algorithm based on Multi-Agent Deep Deterministic Policy Gradient (ERA-MADDPG), which is implemented within the framework of Multi-Agent Deep Deterministic [...] Read more.

To address the fact that changes in network topology can have an impact on the performance of routing, this paper proposes an Elastic Routing Algorithm based on Multi-Agent Deep Deterministic Policy Gradient (ERA-MADDPG), which is implemented within the framework of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) in deep reinforcement learning. The algorithm first builds a three-layer architecture based on Software-Defined Networking (SDN). The top-down layers are the multi-agent layer, the controller layer, and the data layer. The architecture’s processing flow, including real-time data layer information collection and dynamic policy generation, enables the ERA-MADDPG algorithm to exhibit strong elasticity by quickly adjusting routing decisions in response to topology changes. The actor-critic framework combined with Convolutional Neural Networks (CNN) to implement the ERA-MADDPG routing algorithm effectively improves training efficiency, enhances learning stability, facilitates collaboration, and improves algorithm generalization and applicability. Finally, simulation experiments demonstrate that the convergence speed of the ERA-MADDPG routing algorithm outperforms that of the Multi-Agent Deep Q-Network (MADQN) algorithm and the Smart Routing based on Deep Reinforcement Learning (SR-DRL) algorithm, and the training speed in the initial phase is improved by approximately 20.9% and 39.1% compared to the MADQN algorithm and SR-DRL algorithm, respectively. The elasticity performance of ERA-MADDPG is quantified by re-convergence speed: under 5–15% topology node/link changes, its re-convergence speed is over 25% faster than that of MADQN and SR-DRL, demonstrating superior capability to maintain routing efficiency in dynamic environments. Full article

► Show Figures

Figure 1

20 pages, 6437 KiB

Open AccessArticle

Distributed Multi-Agent Deep Reinforcement Learning-Based Transmit Power Control in Cellular Networks

by Hun Kim and Jaewoo So

Sensors 2025, 25(13), 4017; https://doi.org/10.3390/s25134017 - 27 Jun 2025

Viewed by 513

Abstract

In a multi-cell network, interference management between adjacent cells is a key factor that determines the performance of the entire cellular network. In particular, in order to control inter-cell interference while providing a high data rate to users, it is very important for [...] Read more.

In a multi-cell network, interference management between adjacent cells is a key factor that determines the performance of the entire cellular network. In particular, in order to control inter-cell interference while providing a high data rate to users, it is very important for the base station (BS) of each cell to appropriately control the transmit power in the downlink. However, as the number of cells increases, controlling the downlink transmit power at the BS becomes increasingly difficult. In this paper, we propose a multi-agent deep reinforcement learning (MADRL)-based transmit power control scheme to maximize the sum rate in multi-cell networks. In particular, the proposed scheme incorporates a long short-term memory (LSTM) architecture into the MADRL scheme to retain state information across time slots and to use that information for subsequent action decisions, thereby improving the sum rate performance. In the proposed scheme, the agent of each BS uses only its local channel state information; consequently, it does not need to receive signal messages from adjacent agents. The simulation results show that the proposed scheme outperforms the existing MADRL scheme by reducing the amount of signal messages exchanged between links and improving the sum rate. Full article

(This article belongs to the Special Issue Future Wireless Communication Networks: 3rd Edition)

► Show Figures

Figure 1

28 pages, 1293 KiB

Open AccessArticle

Research on Multi-Agent Collaborative Scheduling Planning Method for Time-Triggered Networks

by Changsheng Chen, Anrong Zhao, Zhihao Zhang, Tao Zhang and Chao Fan

Electronics 2025, 14(13), 2575; https://doi.org/10.3390/electronics14132575 - 26 Jun 2025

Viewed by 345

Abstract

Time-triggered Ethernet combines time-triggered and event-triggered communication, and is suitable for fields with high real-time requirements. Aiming at the problem that the traditional scheduling algorithm is not effective in scheduling event-triggered messages, a message scheduling algorithm based on multi-agent reinforcement learning (MADDPG, Multi-Agent [...] Read more.

Time-triggered Ethernet combines time-triggered and event-triggered communication, and is suitable for fields with high real-time requirements. Aiming at the problem that the traditional scheduling algorithm is not effective in scheduling event-triggered messages, a message scheduling algorithm based on multi-agent reinforcement learning (MADDPG, Multi-Agent Deep Deterministic Policy Gradient) and a hybrid algorithm combining SMT (Satisfiability Modulo Theories) solver and MADDPG are proposed. This method aims to optimize the scheduling of event-triggered messages while maintaining the uniformity of time-triggered message scheduling, providing more time slots for event-triggered messages, and reducing their waiting time and end-to-end delay. Through the designed scheduling software, in the experiment, compared with the SMT-based algorithm and the traditional DQN (Deep Q-Network) algorithm, the new method shows better load balance and lower message jitter, and it is verified in the OPNET simulation environment that it can effectively reduce the delay of event-triggered messages. Full article

(This article belongs to the Special Issue Advanced Techniques for Multi-Agent Systems)

► Show Figures

Figure 1

31 pages, 1576 KiB

Open AccessArticle

Joint Caching and Computation in UAV-Assisted Vehicle Networks via Multi-Agent Deep Reinforcement Learning

by Yuhua Wu, Yuchao Huang, Ziyou Wang and Changming Xu

Drones 2025, 9(7), 456; https://doi.org/10.3390/drones9070456 - 24 Jun 2025

Viewed by 593

Abstract

Intelligent Connected Vehicles (ICVs) impose stringent requirements on real-time computational services. However, limited onboard resources and the high latency of remote cloud servers restrict traditional solutions. Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC), which deploys computing and storage resources at the network [...] Read more.

Intelligent Connected Vehicles (ICVs) impose stringent requirements on real-time computational services. However, limited onboard resources and the high latency of remote cloud servers restrict traditional solutions. Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC), which deploys computing and storage resources at the network edge, offers a promising solution. In UAV-assisted vehicular networks, jointly optimizing content and service caching, computation offloading, and UAV trajectories to maximize system performance is a critical challenge. This requires balancing system energy consumption and resource allocation fairness while maximizing cache hit rate and minimizing task latency. To this end, we introduce system efficiency as a unified metric, aiming to maximize overall system performance through joint optimization. This metric comprehensively considers cache hit rate, task computation latency, system energy consumption, and resource allocation fairness. The problem involves discrete decisions (caching, offloading) and continuous variables (UAV trajectories), exhibiting high dynamism and non-convexity, making it challenging for traditional optimization methods. Concurrently, existing multi-agent deep reinforcement learning (MADRL) methods often encounter training instability and convergence issues in such dynamic and non-stationary environments. To address these challenges, this paper proposes a MADRL-based joint optimization approach. We precisely model the problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and adopt the Multi-Agent Proximal Policy Optimization (MAPPO) algorithm, which follows the Centralized Training Decentralized Execution (CTDE) paradigm. Our method aims to maximize system efficiency by achieving a judicious balance among multiple performance metrics, such as cache hit rate, task delay, energy consumption, and fairness. Simulation results demonstrate that, compared to various representative baseline methods, the proposed MAPPO algorithm exhibits significant superiority in achieving higher cumulative rewards and an approximately 82% cache hit rate. Full article

► Show Figures

Figure 1

20 pages, 690 KiB

Open AccessArticle

Using Graph-Enhanced Deep Reinforcement Learning for Distribution Network Fault Recovery

by Yueran Liu, Peng Liao and Yang Wang

Machines 2025, 13(7), 543; https://doi.org/10.3390/machines13070543 - 23 Jun 2025

Viewed by 505

Abstract

Fault recovery in distribution networks is a complex, high-dimensional decision-making task characterized by partial observability, dynamic topology, and strong interdependencies among components. To address these challenges, this paper proposes a graph-based multi-agent deep reinforcement learning (DRL) framework for intelligent fault restoration in power [...] Read more.

Fault recovery in distribution networks is a complex, high-dimensional decision-making task characterized by partial observability, dynamic topology, and strong interdependencies among components. To address these challenges, this paper proposes a graph-based multi-agent deep reinforcement learning (DRL) framework for intelligent fault restoration in power distribution networks. The restoration problem is modeled as a partially observable Markov decision process (POMDP), where each agent employs graph neural networks to extract topological features and enhance environmental perception. To address the high-dimensionality of the action space, an action decomposition strategy is introduced, treating each switch operation as an independent binary classification task, which improves convergence and decision efficiency. Furthermore, a collaborative reward mechanism is designed to promote coordination among agents and optimize global restoration performance. Experiments on the PG&E 69-bus system demonstrate that the proposed method significantly outperforms existing DRL baselines. Specifically, it achieves up to 2.6% higher load recovery, up to 0.0 p.u. lower recovery cost, and full restoration in the midday scenario, with statistically significant improvements (

p < 0.05

or

p < 0.01

). These results highlight the effectiveness of graph-based learning and cooperative rewards in improving the resilience, efficiency, and adaptability of distribution network operations under varying conditions. Full article

(This article belongs to the Section Machines Testing and Maintenance)

► Show Figures

Figure 1

16 pages, 3161 KiB

Open AccessArticle

Multi-Link Fragmentation-Aware Deep Reinforcement Learning RSA Algorithm in Elastic Optical Network

by Jing Jiang, Yushu Su, Jingchi Cheng and Tao Shang

Photonics 2025, 12(7), 634; https://doi.org/10.3390/photonics12070634 - 22 Jun 2025

Viewed by 364

Abstract

Deep reinforcement learning has been extensively applied for resource allocation in elastic optical networks. However, many studies focus on link-level state analysis and rarely discuss the influence between links, which may affect the performance of allocation algorithms. In this paper, we propose a [...] Read more.

Deep reinforcement learning has been extensively applied for resource allocation in elastic optical networks. However, many studies focus on link-level state analysis and rarely discuss the influence between links, which may affect the performance of allocation algorithms. In this paper, we propose a multi-link fragmentation deep reinforcement learning-based routing and spectrum allocation algorithm (MFDRL-RSA). We number the links using a breadth-first numbering algorithm. Based on the numbering results, high-frequency links are selected to construct the network state matrix that reflects the resource distribution. According to the state matrix, we calculate a multi-link fragmentation degree, quantifying resource fragmentation within a representative subset of network. The MFDRL-RSA algorithm enhances the accuracy of the agent’s decision-making by incorporating it into the reward function, thereby improving its performance in routing decisions, which contributes to the overall allocation performance. Simulation results show that MFDRL-RSA achieves lower blocking rates compared to the reference algorithms, with reductions of 16.34%, 13.01%, and 7.42% in the NSFNET network and 19.33%, 15.17%, and 9.95% in the Cost-239 network. It also improves spectrum utilization by 12.28%, 9.83%, and 6.32% in NSFNET and by 13.92%, 11.55%, and 8.26% in Cost-239. Full article

(This article belongs to the Special Issue Advancements and Future Perspectives in All-Optical Detection and Reliability Improvement Technologies)

► Show Figures

Figure 1

28 pages, 40968 KiB

Open AccessArticle

Collaborative Search Algorithm for Multi-UAVs Under Interference Conditions: A Multi-Agent Deep Reinforcement Learning Approach

by Wei Wang, Yong Chen, Yu Zhang, Yong Chen and Yihang Du

Drones 2025, 9(6), 445; https://doi.org/10.3390/drones9060445 - 18 Jun 2025

Viewed by 485

Abstract

Unmanned aerial vehicles (UAVs) have emerged as a promising solution for collaborative search missions in complex environments. However, in the presence of interference, communication disruptions between UAVs and ground control stations can severely degrade coordination efficiency, leading to prolonged search times and reduced [...] Read more.

Unmanned aerial vehicles (UAVs) have emerged as a promising solution for collaborative search missions in complex environments. However, in the presence of interference, communication disruptions between UAVs and ground control stations can severely degrade coordination efficiency, leading to prolonged search times and reduced mission success rates. To address these challenges, this paper proposes a novel multi-agent deep reinforcement learning (MADRL) framework for joint spectrum and search collaboration in multi-UAV systems. The core problem is formulated as a combinatorial optimization task that simultaneously optimizes channel selection and heading angles to minimize the total search time under dynamic interference conditions. Due to the NP-hard nature of this problem, we decompose it into two interconnected Markov decision processes (MDPs): a spectrum collaboration subproblem solved using a received signal strength indicator (RSSI)-aware multi-agent proximal policy optimization (MAPPO) algorithm and a search collaboration subproblem addressed through a target probability map (TPM)-guided MAPPO approach with an innovative action-masking mechanism. Extensive simulations demonstrate superior performance compared to baseline methods (IPPO, QMIX, and IQL). Extensive experimental results demonstrate significant performance advantages, including 68.7% and 146.2% higher throughput compared to QMIX and IQL, respectively, along with 16.7–48.3% reduction in search completion steps versus baseline methods, while maintaining robust operations under dynamic interference conditions. The framework exhibits strong resilience to communication disruptions while maintaining stable search performance, validating its practical applicability in real-world interference scenarios. Full article

► Show Figures

Figure 1

Search Results (355)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (355)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI