MDPI - Publisher of Open Access Journals

28 pages, 7951 KB

Open AccessArticle

Task-Heterogeneous Formation Planning and Control for Unmanned Surface Vehicles Based on Hybrid Deep Reinforcement Learning

by Yawen Zhang, Wenkui Li, Chenyang Shan, Haoyu Bu and Bing Han

J. Mar. Sci. Eng. 2026, 14(10), 959; https://doi.org/10.3390/jmse14100959 (registering DOI) - 21 May 2026

Abstract

To address the control coupling challenges arising from task heterogeneity of unmanned surface vehicle (USV) formation, a distributed hybrid deep reinforcement learning (HDRL) framework is proposed. The framework decomposes the formation task into two subtasks: leader path planning using the single-agent deep reinforcement [...] Read more.

To address the control coupling challenges arising from task heterogeneity of unmanned surface vehicle (USV) formation, a distributed hybrid deep reinforcement learning (HDRL) framework is proposed. The framework decomposes the formation task into two subtasks: leader path planning using the single-agent deep reinforcement learning (SADRL) algorithm and follower formation tracking using the multi-agent deep reinforcement learning (MADRL) algorithm. By embedding the physical constraints of the real Otter USV into the training loop, the policy network outputs are mapped to propeller revolutions that conform to its dynamic characteristics. To optimize control performance, a dynamic gating mechanism triggered by formation position error is developed to mitigate multi-objective interference through temporal task scheduling. Concurrently, a mirror mapping mechanism leveraging the physical symmetry of the formation is designed to achieve policy sharing and data augmentation. Furthermore, the desired velocity calculated based on rigid-body kinematics is used to achieve kinematic-compensated formation tracking. The simulation results indicate that, compared to the SADRL algorithm, the planning success rate of HDRL is improved by 44.59%. Furthermore, compared to the MADRL algorithm, the integrated tracking performance is enhanced by 21.79–39.64%. Full article

(This article belongs to the Section Ocean Engineering)

25 pages, 4053 KB

Open AccessArticle

Resource Allocation for D2D Communications in Multi-Slice NOMA-Based Cellular Networks

by Lijun Dong, Jingjing Wu and Yitong Yang

Future Internet 2026, 18(5), 246; https://doi.org/10.3390/fi18050246 - 6 May 2026

Viewed by 181

Abstract

Significant challenges will be encountered in next-generation cellular networks to achieve both high spectral efficiency (SE) and diverse quality of service (QoS) requirements simultaneously, particularly under stringent bandwidth and power budgets within highly dynamic and dense topologies. To address these challenges, we formulate [...] Read more.

Significant challenges will be encountered in next-generation cellular networks to achieve both high spectral efficiency (SE) and diverse quality of service (QoS) requirements simultaneously, particularly under stringent bandwidth and power budgets within highly dynamic and dense topologies. To address these challenges, we formulate an optimization problem in a multi-slice non-orthogonal multiple access (NOMA) system with underlay device-to-device (D2D) communications. This problem aims to maximize SE and satisfy user QoS demands by jointly optimizing power allocation and resource block (RB) assignment. To solve this non-convex and NP-hard problem, we propose a resource allocation mechanism based on joint optimization and cooperative multi-agent deep reinforcement learning (MADRL). Specifically, we construct an optimization framework based on successive convex approximation (SCA) and the Lagrange duality method to derive an analytical iterative solution for the optimal power allocation under a given RB assignment, thereby avoiding the inherent discretization error of the action space in pure learning methods. Furthermore, we propose a cooperative multi-agent algorithm based on dueling double deep Q-Network (CMAD3QN) to address the discrete RB assignment problem. Simulation results demonstrate that, compared with benchmark schemes, the proposed scheme exhibits faster convergence speed and significantly enhances system spectral efficiency while ensuring slice isolation and resource constraints. Full article

(This article belongs to the Special Issue 6G Wireless Network Technologies)

► Show Figures

Figure 1

24 pages, 4822 KB

Open AccessArticle

Heuristic-Guided Safe Multi-Agent Reinforcement Learning for Resilient Spatio-Temporal Dispatch of Energy-Mobility Nexus Under Grid Faults

by Runtian Tang, Yang Wang, Wenan Li, Zhenghui Zhao and Xiaonan Shen

Electronics 2026, 15(9), 1868; https://doi.org/10.3390/electronics15091868 - 28 Apr 2026

Viewed by 355

Abstract

The increasing electrification of urban transportation has formulated a tightly coupled energy-mobility nexus. Under extreme disaster events or grid faults, rapidly restoring power supply capacity and re-dispatching shared electric vehicle (EV) fleets are critical for enhancing system resilience. Existing co-optimization methods face the [...] Read more.

The increasing electrification of urban transportation has formulated a tightly coupled energy-mobility nexus. Under extreme disaster events or grid faults, rapidly restoring power supply capacity and re-dispatching shared electric vehicle (EV) fleets are critical for enhancing system resilience. Existing co-optimization methods face the curse of dimensionality when dealing with high-dimensional discrete grid reconfigurations and continuous spatio-temporal EV queuing dynamics. While multi-agent deep reinforcement learning (MADRL) offers real-time responsiveness, it inherently struggles to satisfy strict physical constraints, frequently generating infeasible and unsafe actions. To bridge this gap, this paper proposes a heuristic-guided safe multi-agent reinforcement learning (Safe-MADRL) framework for the resilient dispatch of the energy-mobility nexus. Instead of relying solely on black-box neural networks, the framework structurally embeds physical models and heuristic solvers into the learning loop. A quantum particle swarm optimization (QPSO) algorithm acts as a heuristic action refiner to ensure that grid topology actions strictly comply with non-linear power flow and voltage constraints. Simultaneously, a mixed-integer linear programming (MILP) model coupled with a single-queue multi-server (SQMS) model serves as a safety projection layer. This layer mathematically guarantees EV battery energy continuity and accurately quantifies spatio-temporal queuing delays at charging stations. Case studies on a coupled IEEE 33-node distribution system and a regional transportation network demonstrate that the proposed Safe-MADRL framework achieves zero physical violations during training and significantly outperforms traditional mathematical optimization and pure learning-based methods in computational efficiency, system power loss reduction, and overall operational economy. Full article

► Show Figures

Figure 1

23 pages, 2975 KB

Open AccessArticle

Large-Scale Metro Train Timetable Rescheduling via Multi-Agent Deep Reinforcement Learning: A High-Dimensional Optimization Approach in Flatland Environment

by Jufen Yang, Haozhe Yang, Weikang Wang and Chengyang Xia

Appl. Sci. 2026, 16(7), 3338; https://doi.org/10.3390/app16073338 - 30 Mar 2026

Viewed by 340

Abstract

Metro train timetable rescheduling (TTR) is a critical task for ensuring the reliability of urban rail transit systems. However, with the increasing density of railway networks and the growing number of operational trains, TTR has evolved into a typical high-dimensional and large-scale optimization [...] Read more.

Metro train timetable rescheduling (TTR) is a critical task for ensuring the reliability of urban rail transit systems. However, with the increasing density of railway networks and the growing number of operational trains, TTR has evolved into a typical high-dimensional and large-scale optimization problem. Traditional mathematical programming and heuristic approaches often struggle with the “curse of dimensionality” and fail to provide real-time responses under stochastic disturbances. To address these challenges, this paper proposes a novel framework based on Multi-Agent Deep Reinforcement Learning (MADRL). Specifically, we model the TTR problem as a decentralized cooperative process and utilize the Multi-Agent Advantage Actor-Critic (MAA2C) algorithm to optimize train schedules dynamically. The proposed framework is implemented within the Flatland simulation environment, which allows for the representation of complex arbitrary topologies. We design a composite reward function that minimizes total delay deviation while maximizing passenger satisfaction, subject to constraints such as headway, operating time, and train capacity. Furthermore, to enhance the robustness of the model against high-dimensional state uncertainties, random disturbances following a negative exponential distribution are introduced during training. Experimental results across various scenarios—ranging from simple dual-track to complex random networks—demonstrate that the MAA2C-based approach significantly outperforms traditional baselines. It not only achieves faster convergence in small-scale scenarios but also demonstrates superior computational efficiency and scalability in large-scale environments, effectively minimizing passenger waiting times. This study validates the potential of MADRL in solving high-dimensional traffic control problems for intelligent transportation systems. Full article

(This article belongs to the Special Issue Advances in Transportation and Smart City)

► Show Figures

Figure 1

21 pages, 1611 KB

Open AccessArticle

Mobility-Aware Cooperative Optimization for Task Offloading and Resource Allocation in Multi-Edge Computing

by Dong Chen, Ximing Zhang, Kequan Lin, Chunhua Mei and Ru Huo

Algorithms 2026, 19(3), 221; https://doi.org/10.3390/a19030221 - 16 Mar 2026

Viewed by 404

Abstract

The rapid proliferation of mobile Internet of Things (IoT) devices has introduced significant resource scheduling challenges in multi-edge computing networks, where device mobility leads to dynamic network connectivity and load imbalance, complicating task offloading and resource management. To address these issues, this paper [...] Read more.

The rapid proliferation of mobile Internet of Things (IoT) devices has introduced significant resource scheduling challenges in multi-edge computing networks, where device mobility leads to dynamic network connectivity and load imbalance, complicating task offloading and resource management. To address these issues, this paper presents a mobility-driven hierarchical optimization framework for task offloading and computation resource allocation in multi-region edge computing environments, a functionally coupled hierarchical framework that integrates mobility-aware heuristic offloading with multi-agent deep deterministic policy gradient (MADDPG)-based resource allocation. Devices are first clustered according to their mobility patterns, and offloading decisions are dynamically made based on trajectory and dwell-time characteristics. Each edge server is modeled as an autonomous agent, and an MADDPG framework is adopted to collaboratively optimize resource allocation, with the joint objective of minimizing task processing delay and system energy consumption. Experimental evaluations under diverse mobility and workload conditions show that the proposed approach achieves a 19.0% reduction in task delay compared to the Multi-Objective Gray Wolf Optimization (MOGWO) method at the largest device scale (60 devices) and maintains comparable energy efficiency. Furthermore, it exhibits stronger adaptability and scheduling performance across varying mobility group distributions. These results confirm the effectiveness of the proposed method in enhancing system performance within dynamic mobile edge computing scenarios. Full article

► Show Figures

Figure 1

26 pages, 6088 KB

Open AccessArticle

An Enhanced MADDPG–A2C Framework for Optimized Resource Allocation in High-Speed Vehicular Networks

by Linna Hu, Weixian Zha, Penghao Xue, Shuhao Xie, Bin Guo and Wei Wang

Electronics 2026, 15(6), 1214; https://doi.org/10.3390/electronics15061214 - 13 Mar 2026

Viewed by 435

Abstract

To address the degradation in communication performance caused by the high mobility and dynamic uncertainty in vehicular network channels, this paper proposes a hybrid resource allocation framework that integrates the advantage actor–critic (A2C) algorithm with the multi-agent deep deterministic policy gradient (MADDPG) algorithm. [...] Read more.

To address the degradation in communication performance caused by the high mobility and dynamic uncertainty in vehicular network channels, this paper proposes a hybrid resource allocation framework that integrates the advantage actor–critic (A2C) algorithm with the multi-agent deep deterministic policy gradient (MADDPG) algorithm. By modeling the high-speed vehicular network environment, the resource allocation task is formulated as a multi-agent deep reinforcement learning (MADRL) problem within a continuous action space. The proposed framework leverages the advantage function to refine gradient estimation, thereby improving training stability and convergence behavior. Additionally, regularization penalty terms and constraint mechanisms are incorporated into the learning process to balance multiple communication objectives. Specifically, the method aims to maximize the throughput of vehicle-to-infrastructure (V2I) links while ensuring the transmission reliability of vehicle-to-vehicle (V2V) links. In simulation experiments, the proposed method performs better in terms of convergence. Compared with the conventional MADDPG algorithm, the average access success probability is improved by 1.6%, and the average V2I throughput increases by 3.5%, indicating a significant enhancement in overall vehicular communication efficiency and transmission performance. Full article

(This article belongs to the Special Issue AI-Driven Signal Processing and Resource Allocation in Wireless Networks)

► Show Figures

Figure 1

22 pages, 2733 KB

Open AccessArticle

Attention-Enhanced Multi-Agent Deep Reinforcement Learning for Inverter-Based Volt-VAR Control in Active Distribution Networks

by Wenwen Chen, Hao Niu, Linbo Liu, Jianglong Lin and Huan Quan

Mathematics 2026, 14(5), 839; https://doi.org/10.3390/math14050839 - 1 Mar 2026

Viewed by 559

Abstract

The increasing penetration of inverter-interfaced photovoltaic (PV) generation in active distribution networks (ADNs) intensifies fast voltage violations and makes real-time Volt-VAR control (VVC) challenging, especially when each inverter has only partial and noisy measurements and communication is limited. Existing local droop-type strategies lack [...] Read more.

The increasing penetration of inverter-interfaced photovoltaic (PV) generation in active distribution networks (ADNs) intensifies fast voltage violations and makes real-time Volt-VAR control (VVC) challenging, especially when each inverter has only partial and noisy measurements and communication is limited. Existing local droop-type strategies lack coordination, while fully centralized optimization/learning is often impractical for online deployment. To address these gaps, an attention-enhanced multi-agent deep reinforcement learning (MADRL) framework is developed for inverter-based VVC under the centralized training and decentralized execution (CTDE) paradigm. First, the voltage regulation problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP) to explicitly account for system stochasticity and temporal variability under partial observability. To solve this complex game, an attention-enhanced MADRL architecture is employed, where an agent-level attention mechanism is integrated into the centralized critic. Unlike traditional methods that treat all neighbor information equally, the proposed mechanism enables each inverter agent to dynamically prioritize and selectively focus on the most influential states from other agents, effectively capturing complex intercorrelations while enhancing training stability and learning efficiency. Operating under the CTDE paradigm, the framework realizes coordinated reactive power support using only local measurements, ensuring high scalability and practical implementability in communication-constrained environments. Simulations on the IEEE 33-bus system with six PV inverters show that the proposed method reduces the average voltage deviation on the test set from 0.0117 p.u. (droop control) and 0.0112 p.u. (MADDPG) to 0.0074 p.u., while maintaining millisecond-level execution time comparable to other MADRL baselines. Scalability tests with up to 12 agents further demonstrate robust performance of the proposed method under higher PV penetration. Full article

► Show Figures

Figure 1

23 pages, 795 KB

Open AccessArticle

Decentralized Computation Offloading Strategy via Multi-Agent Deep Reinforcement Learning for Multi-Access Edge Computing Systems

by Emmanuella Adu, Yeongmuk Lee, Jihwan Moon, Sooyoung Jang, Inkyu Bang and Taehoon Kim

Sensors 2026, 26(3), 914; https://doi.org/10.3390/s26030914 - 30 Jan 2026

Viewed by 812

Abstract

Multi-access edge computing (MEC) has been widely recognized as a promising solution for alleviating the computational burden on edge devices, particularly in supporting fast and real-time processing of resource-intensive applications. In this paper, we propose a decentralized offloading decision strategy based on multi-agent [...] Read more.

Multi-access edge computing (MEC) has been widely recognized as a promising solution for alleviating the computational burden on edge devices, particularly in supporting fast and real-time processing of resource-intensive applications. In this paper, we propose a decentralized offloading decision strategy based on multi-agent deep reinforcement learning (MADRL), aiming to minimize the overall task completion latency experienced by edge devices. Our proposed scheme adopts a grant-free access mechanism during the initialization of offloading in a fully decentralized manner, which serves as the key feature of our strategy. As a result, determining the optimal offloading factor becomes significantly more challenging due to the simultaneous access attempts from multiple edge devices. To resolve this problem, we consider a discrete action space-based deep reinforcement learning (DRL) approach, termed deep Q network (DQN), to enable each edge device to learn a decentralized computation offloading policy based solely on its local observation without requiring global network information. In our design, each edge device dynamically adjusts its offloading factor according to its observed channel state and the number of active users, thereby balancing local and remote computation loads adaptively. Furthermore, the proposed MADRL-based framework jointly accounts for user association and offloading decision optimization to mitigate access collisions and computation bottlenecks in a multi-user environment. We perform extensive computer simulations using MATLAB R2023b to evaluate the performance of the proposed strategy, focusing on the task completion latency under various system configurations. The numerical results demonstrate that our proposed strategy effectively reduces the overall task completion latency and achieves faster convergence of learning performance compared with conventional schemes, confirming the efficiency and scalability of the proposed decentralized approach. Full article

(This article belongs to the Section Communications)

► Show Figures

Figure 1

26 pages, 1012 KB

Open AccessArticle

AoI-Aware Data Collection in Heterogeneous UAV-Assisted WSNs: Strong-Agent Coordinated Coverage and Vicsek-Driven Weak-Swarm Control

by Lin Huang, Lanhua Li, Songhan Zhao, Daiming Qu and Jing Xu

Sensors 2026, 26(2), 419; https://doi.org/10.3390/s26020419 - 8 Jan 2026

Viewed by 454

Abstract

Unmanned aerial vehicle (UAV) swarms offer an efficient solution for data collection from widely distributed ground users (GUs). However, incomplete environment information and frequent changes make it challenging for standard centralized planning or pure reinforcement learning approaches to simultaneously maintain global solution quality [...] Read more.

Unmanned aerial vehicle (UAV) swarms offer an efficient solution for data collection from widely distributed ground users (GUs). However, incomplete environment information and frequent changes make it challenging for standard centralized planning or pure reinforcement learning approaches to simultaneously maintain global solution quality and local flexibility. We propose a hierarchical data collection framework for heterogeneous UAV-assisted wireless sensor networks (WSNs). A small set of high-capability UAVs (H-UAVs), equipped with substantial computational and communication resources, coordinate regional coverage, trajectory planning, and uplink transmission control for numerous resource-constrained low-capability UAVs (L-UAVs) across power-Voronoi-partitioned areas using multi-agent deep reinforcement learning (MADRL). Specifically, we employ Multi-Agent Deep Deterministic Policy Gradient (MADDPG) to enhance H-UAVs’ decision-making capabilities and enable coordinated actions. The partitions are dynamically updated based on GUs’ data generation rates and L-UAV density to balance workload and adapt to environmental dynamics. Concurrently, a large number of L-UAVs with limited onboard resources perform self-organized data collection from GUs and execute opportunistic relaying to a remote access point (RAP) via H-UAVs. Within each Voronoi cell, L-UAV motion follows a weighted Vicsek model that incorporates GUs’ age of information (AoI), link quality, and congestion avoidance. This spatial decomposition combined with decentralized weak-swarm control enables scalability to large-scale L-UAV deployments. Experiments demonstrate that the proposed strong and weak agent MADDPG (SW-MADDPG) scheme reduces AoI by 30% and 21% compared to No-Voronoi and Heuristic-HUAV baselines, respectively. Full article

(This article belongs to the Section Communications)

► Show Figures

Figure 1

26 pages, 3077 KB

Open AccessArticle

Coordinated Scheduling of BESS–ASHP Systems in Zero-Energy Houses Using Multi-Agent Reinforcement Learning

by Jing Li, Yang Xu, Yunqin Lu and Weijun Gao

Buildings 2026, 16(2), 274; https://doi.org/10.3390/buildings16020274 - 8 Jan 2026

Cited by 1 | Viewed by 579

Abstract

This paper addresses the critical challenge of multi-objective optimization in residential Home Energy Management Systems (HEMS) by proposing a novel framework based on an Improved Multi-Agent Proximal Policy Optimization (MAPPO) algorithm. The study specifically targets the low convergence efficiency of Multi-Agent Deep Reinforcement [...] Read more.

This paper addresses the critical challenge of multi-objective optimization in residential Home Energy Management Systems (HEMS) by proposing a novel framework based on an Improved Multi-Agent Proximal Policy Optimization (MAPPO) algorithm. The study specifically targets the low convergence efficiency of Multi-Agent Deep Reinforcement Learning (MADRL) for coupled Battery Energy Storage System (BESS) and Air Source Heat Pump (ASHP) operation. The framework synergistically integrates an action constraint projection mechanism with an economic-performance-driven dynamic learning rate modulation strategy, thereby significantly enhancing learning stability. Simulation results demonstrate that the algorithm improves training convergence speed by 35–45% compared to standard MAPPO. Economically, it delivers a cumulative cost reduction of 15.77% against rule-based baselines, outperforming both Independent Proximal Policy Optimization (IPPO) and standard MAPPO benchmarks. Furthermore, the method maximizes renewable energy utilization, achieving nearly 100% photovoltaic self-consumption under favorable conditions while ensuring robustness in extreme scenarios. Temporal analysis reveals the agents’ capacity for anticipatory decision-making, effectively learning correlations among generation, pricing, and demand to achieve seamless seasonal adaptability. These findings validate the superior performance of the proposed centralized training architecture, providing a robust solution for complex residential energy management. Full article

(This article belongs to the Special Issue Smart and Sustainable Buildings: Advancing Towards Net-Zero and Intelligent Control)

► Show Figures

Figure 1

24 pages, 3856 KB

Open AccessArticle

MA-PF-AD3PG: A Multi-Agent DRL Algorithm for Latency Minimization and Fairness Optimization in 6G IoV-Oriented UAV-Assisted MEC Systems

by Yitian Wang, Hui Wang and Haibin Yu

Drones 2026, 10(1), 9; https://doi.org/10.3390/drones10010009 - 25 Dec 2025

Viewed by 647

Abstract

The rapid proliferation of connected and autonomous vehicles in the 6G era demands ultra-reliable and low-latency computation with intelligent resource coordination. Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC) provides a flexible and scalable solution to extend coverage and enhance offloading efficiency for [...] Read more.

The rapid proliferation of connected and autonomous vehicles in the 6G era demands ultra-reliable and low-latency computation with intelligent resource coordination. Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC) provides a flexible and scalable solution to extend coverage and enhance offloading efficiency for dynamic Internet of Vehicles (IoV) environments. However, jointly optimizing task latency, user fairness, and service priority under time-varying channel conditions remains a fundamental challenge.To address this issue, this paper proposes a novel Multi-Agent Priority-based Fairness Adaptive Delayed Deep Deterministic Policy Gradient (MA-PF-AD3PG) algorithm for UAV-assisted MEC systems. An occlusion-aware dynamic deadline model is first established to capture real-time link blockage and channel fading. Based on this model, a priority–fairness coupled optimization framework is formulated to jointly minimize overall latency and balance service fairness across heterogeneous vehicular tasks. To efficiently solve this NP-hard problem, the proposed MA-PF-AD3PG integrates fairness-aware service preprocessing and an adaptive delayed update mechanism within a multi-agent deep reinforcement learning structure, enabling decentralized yet coordinated UAV decision-making. Extensive simulations demonstrate that MA-PF-AD3PG achieves superior convergence stability, 13–57% higher total rewards, up to 46% lower delay, and nearly perfect fairness compared with state-of-the-art Deep Reinforcement Learning (DRL) and heuristic methods. Full article

(This article belongs to the Section Drone Communications)

► Show Figures

Figure 1

17 pages, 2597 KB

Open AccessArticle

Optimization of Dynamic Scheduling for Flexible Job Shops Using Multi-Agent Deep Reinforcement Learning

by Jianqi Wang, Renwang Li and Qiang Wang

Processes 2025, 13(12), 4045; https://doi.org/10.3390/pr13124045 - 14 Dec 2025

Cited by 1 | Viewed by 1227

Abstract

This study proposes an optimization framework based on Multi-agent Deep Reinforcement Learning (MADRL), conducting a systematic exploration of FJSP under dynamic scenarios. The research analyzes the impact of two types of dynamic disturbance events—machine failures and order insertions—on the Dynamic Flexible Job Shop [...] Read more.

This study proposes an optimization framework based on Multi-agent Deep Reinforcement Learning (MADRL), conducting a systematic exploration of FJSP under dynamic scenarios. The research analyzes the impact of two types of dynamic disturbance events—machine failures and order insertions—on the Dynamic Flexible Job Shop Scheduling Problem (DFJSP). Furthermore, it integrates process selection agents and machine selection agents to devise solutions for handling dynamic events. Experimental results demonstrate that, when solving standard benchmark problems, the proposed multi-objective DFJSP scheduling method, based on the 3DQN algorithm and incorporating an event-triggered rescheduling strategy, effectively mitigates disruptions caused by dynamic events. Full article

(This article belongs to the Section Process Control, Modeling and Optimization)

► Show Figures

Figure 1

29 pages, 4559 KB

Open AccessArticle

A Novel Data-Driven Multi-Agent Reinforcement Learning Approach for Voltage Control Under Weak Grid Support

by Jiaxin Wu, Ziqi Wang, Ji Han, Qionglin Li, Ran Sun, Chenhao Li, Yuehan Cheng, Bokai Zhou, Jiaming Guo and Bocheng Long

Sensors 2025, 25(23), 7399; https://doi.org/10.3390/s25237399 - 4 Dec 2025

Cited by 2 | Viewed by 1281

Abstract

To address active voltage control in photovoltaic (PV)-integrated distribution networks characterized by weak voltage support conditions, this paper proposes a multi-agent deep reinforcement learning (MADRL)-based coordinated control method for PV clusters. First, the voltage control problem is formulated as a decentralized partially observable [...] Read more.

To address active voltage control in photovoltaic (PV)-integrated distribution networks characterized by weak voltage support conditions, this paper proposes a multi-agent deep reinforcement learning (MADRL)-based coordinated control method for PV clusters. First, the voltage control problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and a centralized training with decentralized execution (CTDE) framework is adopted, enabling each inverter to make independent decisions based solely on local measurements during the execution phase. To balance voltage compliance with energy efficiency, two barrier functions are designed to reshape the reward function, introducing an adaptive penalization mechanism: a steeper gradient in violation region to accelerate voltage recovery to the nominal range, and a gentler gradient in the safe region to minimize excessive reactive regulation and power losses. Furthermore, six representative MADRL algorithms—COMA, IDDPG, MADDPG, MAPPO, SQDDPG, and MATD3—are employed to solve the active voltage control problem of the distribution network. Case studies based on a modified IEEE 33-bus system demonstrate that the proposed framework ensures voltage compliance while effectively reducing network losses. The MADDPG algorithm achieves a Controllability Ratio (CR) of 91.9% while maintaining power loss at approximately 0.0695 p.u., demonstrating superior convergence and robustness. Comparisons with optimal power flow (OPF) and droop control methods confirm that the proposed approach significantly improves voltage stability and energy efficiency under model-free and communication-constrained weak grid conditions. Full article

(This article belongs to the Topic Advanced Strategies for Smart Grid Reliability and Energy Optimization)

► Show Figures

Figure 1

28 pages, 4458 KB

Open AccessArticle

Multi-UAV Cooperative Search in Partially Observable Low-Altitude Environments Based on Deep Reinforcement Learning

by Xiu-Xia Yang, Wen-Qiang Yao, Yi Zhang, Hao Yu and Chao Wang

Drones 2025, 9(12), 825; https://doi.org/10.3390/drones9120825 - 27 Nov 2025

Viewed by 1320

Abstract

Multi-Unmanned Aerial Vehicle (Multi-UAV) cooperative search represents a cutting-edge research direction in the field of unmanned aerial vehicle applications. The use of multi-UAV systems for low-altitude target search and area surveillance has become an effective means of enhancing security capabilities. In practical scenarios, [...] Read more.

Multi-Unmanned Aerial Vehicle (Multi-UAV) cooperative search represents a cutting-edge research direction in the field of unmanned aerial vehicle applications. The use of multi-UAV systems for low-altitude target search and area surveillance has become an effective means of enhancing security capabilities. In practical scenarios, UAVs rely on onboard sensors to acquire environmental information; however, due to the limited perceptual range of these sensors, their observation capabilities are inherently local and constrained. This paper investigates the problem of multi-UAV cooperative search in partially observable low-altitude environments, where each UAV possesses a circular sensing range with a finite radius. Target location information is only obtained when a target enters the field of view of any UAV. The objective is to achieve cooperative search and sustain continuous surveillance while ensuring safety among UAVs and with the environment. To address this challenge, we propose a novel multi-agent deep reinforcement learning (MADRL) algorithm named Normalizing Graph Attention Soft Actor-Critic (NGASAC). This algorithm integrates a normalizing flow (NL) layer and a multi-head graph attention network (MHGAT). The normalizing flow technique maps traditional Gaussian sampling to a more complex action distribution, thereby enhancing the expressiveness and flexibility of the policy. Simultaneously, by constructing a multi-head graph attention network that captures “obstacle–target” relationships, the algorithm improves the UAVs’ ability to learn and reason about complex spatial topologies, leading to significantly better performance in cooperative search and stable surveillance of hidden targets. Simulation results demonstrate that the NGASAC algorithm markedly outperforms baseline methods such as Multi-Agent Soft Actor-Critic (MASAC), Multi-Agent Proximal Policy Optimization (MAPPO), and Multi-Agent Deep Deterministic Policy Gradient (MADDPG) across multiple evaluation metrics, including success rate, task time, and obstacle avoidance capability. Furthermore, it exhibits strong generalization performance and robustness. Full article

► Show Figures

Figure 1

36 pages, 4374 KB

Open AccessReview

Spectrum Sensing in Cognitive Radio Internet of Things: State-of-the-Art, Applications, Challenges, and Future Prospects

by Akeem Abimbola Raji and Thomas O. Olwal

J. Sens. Actuator Netw. 2025, 14(6), 109; https://doi.org/10.3390/jsan14060109 - 13 Nov 2025

Cited by 3 | Viewed by 3059

Abstract

The proliferation of Internet of Things (IoT) devices due to remarkable developments in mobile connectivity has caused a tremendous increase in the consumption of broadband spectrums in fifth generation (5G) mobile access. In order to secure the continued growth of IoT, there is [...] Read more.

The proliferation of Internet of Things (IoT) devices due to remarkable developments in mobile connectivity has caused a tremendous increase in the consumption of broadband spectrums in fifth generation (5G) mobile access. In order to secure the continued growth of IoT, there is a need for efficient management of communication resources in the 5G wireless access. Cognitive radio (CR) is advanced to maximally utilize bandwidth spectrums in the radio communication network. The integration of CR into IoT networks is a promising technology that is aimed at productive utilization of the spectrum, with a view to making more spectral bands available to IoT devices for communication. An important function of CR is spectrum sensing (SS), which enables maximum utilization of the spectrum in the radio networks. Existing SS techniques demonstrate poor performance in noisy channel states and are not immune from the dynamic effects of wireless channels. This article presents a comprehensive review of various approaches commonly used for SS. Furthermore, multi-agent deep reinforcement learning (MADRL) is proposed for enhancing the accuracy of spectrum detection in erratic wireless channels. Finally, we highlight challenges that currently exist in SS in CRIoT networks and further state future research directions in this regard. Full article

► Show Figures

Figure 1

Search Results (56)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (56)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI