MDPI - Publisher of Open Access Journals

20 pages, 6437 KiB

Open AccessArticle

Distributed Multi-Agent Deep Reinforcement Learning-Based Transmit Power Control in Cellular Networks

by Hun Kim and Jaewoo So

Sensors 2025, 25(13), 4017; https://doi.org/10.3390/s25134017 - 27 Jun 2025

Viewed by 407

In a multi-cell network, interference management between adjacent cells is a key factor that determines the performance of the entire cellular network. In particular, in order to control inter-cell interference while providing a high data rate to users, it is very important for [...] Read more.

In a multi-cell network, interference management between adjacent cells is a key factor that determines the performance of the entire cellular network. In particular, in order to control inter-cell interference while providing a high data rate to users, it is very important for the base station (BS) of each cell to appropriately control the transmit power in the downlink. However, as the number of cells increases, controlling the downlink transmit power at the BS becomes increasingly difficult. In this paper, we propose a multi-agent deep reinforcement learning (MADRL)-based transmit power control scheme to maximize the sum rate in multi-cell networks. In particular, the proposed scheme incorporates a long short-term memory (LSTM) architecture into the MADRL scheme to retain state information across time slots and to use that information for subsequent action decisions, thereby improving the sum rate performance. In the proposed scheme, the agent of each BS uses only its local channel state information; consequently, it does not need to receive signal messages from adjacent agents. The simulation results show that the proposed scheme outperforms the existing MADRL scheme by reducing the amount of signal messages exchanged between links and improving the sum rate. Full article

(This article belongs to the Special Issue Future Wireless Communication Networks: 3rd Edition)

► Show Figures

Figure 1

31 pages, 1576 KiB

Open AccessArticle

Joint Caching and Computation in UAV-Assisted Vehicle Networks via Multi-Agent Deep Reinforcement Learning

by Yuhua Wu, Yuchao Huang, Ziyou Wang and Changming Xu

Drones 2025, 9(7), 456; https://doi.org/10.3390/drones9070456 - 24 Jun 2025

Viewed by 516

Abstract

Intelligent Connected Vehicles (ICVs) impose stringent requirements on real-time computational services. However, limited onboard resources and the high latency of remote cloud servers restrict traditional solutions. Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC), which deploys computing and storage resources at the network [...] Read more.

Intelligent Connected Vehicles (ICVs) impose stringent requirements on real-time computational services. However, limited onboard resources and the high latency of remote cloud servers restrict traditional solutions. Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC), which deploys computing and storage resources at the network edge, offers a promising solution. In UAV-assisted vehicular networks, jointly optimizing content and service caching, computation offloading, and UAV trajectories to maximize system performance is a critical challenge. This requires balancing system energy consumption and resource allocation fairness while maximizing cache hit rate and minimizing task latency. To this end, we introduce system efficiency as a unified metric, aiming to maximize overall system performance through joint optimization. This metric comprehensively considers cache hit rate, task computation latency, system energy consumption, and resource allocation fairness. The problem involves discrete decisions (caching, offloading) and continuous variables (UAV trajectories), exhibiting high dynamism and non-convexity, making it challenging for traditional optimization methods. Concurrently, existing multi-agent deep reinforcement learning (MADRL) methods often encounter training instability and convergence issues in such dynamic and non-stationary environments. To address these challenges, this paper proposes a MADRL-based joint optimization approach. We precisely model the problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and adopt the Multi-Agent Proximal Policy Optimization (MAPPO) algorithm, which follows the Centralized Training Decentralized Execution (CTDE) paradigm. Our method aims to maximize system efficiency by achieving a judicious balance among multiple performance metrics, such as cache hit rate, task delay, energy consumption, and fairness. Simulation results demonstrate that, compared to various representative baseline methods, the proposed MAPPO algorithm exhibits significant superiority in achieving higher cumulative rewards and an approximately 82% cache hit rate. Full article

► Show Figures

Figure 1

28 pages, 40968 KiB

Open AccessArticle

Collaborative Search Algorithm for Multi-UAVs Under Interference Conditions: A Multi-Agent Deep Reinforcement Learning Approach

by Wei Wang, Yong Chen, Yu Zhang, Yong Chen and Yihang Du

Drones 2025, 9(6), 445; https://doi.org/10.3390/drones9060445 - 18 Jun 2025

Viewed by 408

Abstract

Unmanned aerial vehicles (UAVs) have emerged as a promising solution for collaborative search missions in complex environments. However, in the presence of interference, communication disruptions between UAVs and ground control stations can severely degrade coordination efficiency, leading to prolonged search times and reduced [...] Read more.

Unmanned aerial vehicles (UAVs) have emerged as a promising solution for collaborative search missions in complex environments. However, in the presence of interference, communication disruptions between UAVs and ground control stations can severely degrade coordination efficiency, leading to prolonged search times and reduced mission success rates. To address these challenges, this paper proposes a novel multi-agent deep reinforcement learning (MADRL) framework for joint spectrum and search collaboration in multi-UAV systems. The core problem is formulated as a combinatorial optimization task that simultaneously optimizes channel selection and heading angles to minimize the total search time under dynamic interference conditions. Due to the NP-hard nature of this problem, we decompose it into two interconnected Markov decision processes (MDPs): a spectrum collaboration subproblem solved using a received signal strength indicator (RSSI)-aware multi-agent proximal policy optimization (MAPPO) algorithm and a search collaboration subproblem addressed through a target probability map (TPM)-guided MAPPO approach with an innovative action-masking mechanism. Extensive simulations demonstrate superior performance compared to baseline methods (IPPO, QMIX, and IQL). Extensive experimental results demonstrate significant performance advantages, including 68.7% and 146.2% higher throughput compared to QMIX and IQL, respectively, along with 16.7–48.3% reduction in search completion steps versus baseline methods, while maintaining robust operations under dynamic interference conditions. The framework exhibits strong resilience to communication disruptions while maintaining stable search performance, validating its practical applicability in real-world interference scenarios. Full article

► Show Figures

Figure 1

34 pages, 5896 KiB

Open AccessArticle

Networked Multi-Agent Deep Reinforcement Learning Framework for the Provision of Ancillary Services in Hybrid Power Plants

by Muhammad Ikram, Daryoush Habibi and Asma Aziz

Energies 2025, 18(10), 2666; https://doi.org/10.3390/en18102666 - 21 May 2025

Viewed by 437

Abstract

Inverter-based resources (IBRs) are becoming more prominent due to the increasing penetration of renewable energy sources that reduce power system inertia, compromising power system stability and grid support services. At present, optimal coordination among generation technologies remains a significant challenge for frequency control [...] Read more.

Inverter-based resources (IBRs) are becoming more prominent due to the increasing penetration of renewable energy sources that reduce power system inertia, compromising power system stability and grid support services. At present, optimal coordination among generation technologies remains a significant challenge for frequency control services. This paper presents a novel networked multi-agent deep reinforcement learning (N—MADRL) scheme for optimal dispatch and frequency control services. First, we develop a model-free environment consisting of a photovoltaic (PV) plant, a wind plant (WP), and an energy storage system (ESS) plant. The proposed framework uses a combination of multi-agent actor-critic (MAAC) and soft actor-critic (SAC) schemes for optimal dispatch of active power, mitigating frequency deviations, aiding reserve capacity management, and improving energy balancing. Second, frequency stability and optimal dispatch are formulated in the N—MADRL framework using the physical constraints under a dynamic simulation environment. Third, a decentralised coordinated control scheme is implemented in the HPP environment using communication-resilient scenarios to address system vulnerabilities. Finally, the practicality of the N—MADRL approach is demonstrated in a Grid2Op dynamic simulation environment for optimal dispatch, energy reserve management, and frequency control. Results demonstrated on the IEEE 14 bus network show that compared to PPO and DDPG, N—MADRL achieves 42.10% and 61.40% higher efficiency for optimal dispatch, along with improvements of 68.30% and 74.48% in mitigating frequency deviations, respectively. The proposed approach outperforms existing methods under partially, fully, and randomly connected scenarios by effectively handling uncertainties, system intermittency, and communication resiliency. Full article

(This article belongs to the Collection Artificial Intelligence and Smart Energy)

► Show Figures

Figure 1

35 pages, 3671 KiB

Open AccessFeature PaperArticle

Robust UAV-Oriented Wireless Communications via Multi-Agent Deep Reinforcement Learning to Optimize User Coverage

by Mahfizur Rahman Khan, Gowtham Raj Veeraswamy Premkumar and Bryan Van Scoy

Drones 2025, 9(5), 321; https://doi.org/10.3390/drones9050321 - 22 Apr 2025

Viewed by 1311

Abstract

In this study, we deploy drones as dynamic base stations to address the issue of optimizing user coverage in areas without fixed base station infrastructure. To optimize drone placement, we employ Deep Q-Learning, beginning with a centralized approach due to its simplicity and [...] Read more.

In this study, we deploy drones as dynamic base stations to address the issue of optimizing user coverage in areas without fixed base station infrastructure. To optimize drone placement, we employ Deep Q-Learning, beginning with a centralized approach due to its simplicity and ease of training. In this centralized approach, all drones are trained simultaneously. We also employ a decentralized technique in which each drone acts autonomously while sharing a common neural network, allowing for individualized learning. In addition, we explore the impacts of jamming on UAVs and provide a reliable approach for mitigating this interference. To boost robustness, we employ stochastic user distributions, which train our policy to successfully respond to a wide range of user situations. Full article

(This article belongs to the Special Issue UAV-Assisted Mobile Wireless Networks and Applications)

► Show Figures

Figure 1

21 pages, 21844 KiB

Open AccessArticle

Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway

by Jiajia Chen, Bingqing Zhu, Mengyu Zhang, Xiang Ling, Xiaobo Ruan, Yifan Deng and Ning Guo

World Electr. Veh. J. 2025, 16(4), 225; https://doi.org/10.3390/wevj16040225 - 10 Apr 2025

Viewed by 1436

Abstract

This study presents the first investigation into the problem of autonomous vehicle (AV) merging into existing platoons, proposing a multi-agent deep reinforcement learning (MA-DRL)-based cooperative control framework. The developed MA-DRL architecture enables coordinated learning among multiple autonomous agents to address the multi-objective coordination [...] Read more.

This study presents the first investigation into the problem of autonomous vehicle (AV) merging into existing platoons, proposing a multi-agent deep reinforcement learning (MA-DRL)-based cooperative control framework. The developed MA-DRL architecture enables coordinated learning among multiple autonomous agents to address the multi-objective coordination challenge through synchronized control of platoon longitudinal acceleration, AV steering and acceleration. To enhance training efficiency, we develop a dual-layer multi-agent maximum Q-value proximal policy optimization (MAMQPPO) method, which extends the multi-agent PPO algorithm (a policy gradient method ensuring stable policy updates) by incorporating maximum Q-value action selection for platoon gap control and discrete command generation. This method simplifies the training process by using maximum Q-value action policy optimization to learn platoon gap selection and discrete action commands. Furthermore, a partially decoupled reward function (PD-Reward) is designed to properly guide the behavioral actions of both AVs and platoons while accelerating network convergence. Comprehensive highway simulation experiments show the proposed method reduces merging time by 37.69% (12.4 s vs. 19.9 s) and energy consumption by 58% (3.56 kWh vs. 8.47 kWh) compared to existing methods (the quintic polynomial-based + PID (Proportional–Integral–Differential)). Full article

(This article belongs to the Special Issue Recent Advances in Autonomous Vehicles)

► Show Figures

Figure 1

14 pages, 635 KiB

Open AccessArticle

Knowledge-Enhanced Deep Reinforcement Learning for Multi-Agent Game

by Weiping Zeng, Xuefeng Yan, Fei Mo, Zheng Zhang, Shunfeng Li, Peng Wang and Chaoyu Wang

Electronics 2025, 14(7), 1347; https://doi.org/10.3390/electronics14071347 - 28 Mar 2025

Cited by 1 | Viewed by 630

Abstract

In modern naval confrontation systems, adversarial underwater unmanned vehicles (UUVs) pose significant challenges, which are deployed on unmanned aerial vehicles (UAVs) due to their inherent mobility and positional uncertainty. Effective neutralization threats demand sophisticated coordination strategies between distributed agents under partial observability. This [...] Read more.

In modern naval confrontation systems, adversarial underwater unmanned vehicles (UUVs) pose significant challenges, which are deployed on unmanned aerial vehicles (UAVs) due to their inherent mobility and positional uncertainty. Effective neutralization threats demand sophisticated coordination strategies between distributed agents under partial observability. This paper proposes a novel Knowledge-Enhanced Multi-Agent Deep Reinforcement Learning (MADRL) framework for coordinating UAV swarms against adversarial UUVs in asymmetric confrontation scenarios, specifically addressing three operational modes: area surveillance, summoned interception, and coordinated countermeasures. Our framework introduces three key innovations: (1) a probabilistic adversarial model integrating prior intelligence and real-time UAV sensor data to predict underwater trajectories; (2) a Multi-Agent Double Soft Actor–Critic (MADSAC) algorithm, addressing Red team coordination challenges. Experimental validation demonstrates superior performance over baseline methods in Blue target detection efficiency (38.7% improvement) and successful neutralization rate (52.1% increase), validated across escalating confrontation scenarios. Full article

(This article belongs to the Special Issue Advanced Control Strategies and Applications of Multi-Agent Systems)

► Show Figures

Figure 1

31 pages, 875 KiB

Open AccessEditor’s ChoiceArticle

Hierarchical Traffic Engineering in 3D Networks Using QoS-Aware Graph-Based Deep Reinforcement Learning

by Robert Kołakowski, Lechosław Tomaszewski, Rafał Tępiński and Sławomir Kukliński

Electronics 2025, 14(5), 1045; https://doi.org/10.3390/electronics14051045 - 6 Mar 2025

Viewed by 1175

Abstract

Ubiquitous connectivity is envisioned through the integration of terrestrial (TNs) and non-terrestrial networks (NTNs). However, NTNs face multiple routing and Quality of Service (QoS) provisioning challenges due to the mobility of network nodes. Distributed Software-Defined Networking (SDN) combined with Multi-Agent Deep Reinforcement Learning [...] Read more.

Ubiquitous connectivity is envisioned through the integration of terrestrial (TNs) and non-terrestrial networks (NTNs). However, NTNs face multiple routing and Quality of Service (QoS) provisioning challenges due to the mobility of network nodes. Distributed Software-Defined Networking (SDN) combined with Multi-Agent Deep Reinforcement Learning (MADRL) is widely used to introduce programmability and intelligent Traffic Engineering (TE) in TNs, yet applying DRL to NTNs is hindered by frequently changing state sizes, model scalability, and coordination issues. This paper introduces 3DQR, a novel TE framework that combines hierarchical multi-controller SDN, hierarchical MADRL based on Graph Neural Networks (GNNs), and network topology predictions for QoS path provisioning, effective load distribution, and flow rejection minimisation in future 3D networks. To enhance SDN scalability, introduced are metrics and path operations abstractions to facilitate domain agents coordination by the global agent. To the best of the authors’ knowledge, 3DQR is the first routing scheme to integrate MADRL and GNNs for optimising centralised routing and path allocation in SDN-based 3D mobile networks. The evaluations show up to a 14% reduction in flow rejection rate, a 50% improvement in traffic distribution, and effective QoS class prioritisation compared to baseline techniques. 3DQR also exhibits strong transfer capabilities, giving consistent performance gains in previously unseen environments. Full article

(This article belongs to the Special Issue Future Generation Non-Terrestrial Networks)

► Show Figures

Figure 1

20 pages, 4759 KiB

Open AccessArticle

Deep Reinforcement Learning-Based Secrecy Rate Optimization for Simultaneously Transmitting and Reflecting Reconfigurable Intelligent Surface-Assisted Unmanned Aerial Vehicle-Integrated Sensing and Communication Systems

by Jianwei Wang and Shuo Chen

Sensors 2025, 25(5), 1541; https://doi.org/10.3390/s25051541 - 2 Mar 2025

Cited by 1 | Viewed by 1337

Abstract

This study investigates security issues in a scenario involving a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-assisted unmanned aerial vehicle (UAV) with integrated sensing and communication (ISAC) functionality (UAV-ISAC). In this scenario, both legitimate users and eavesdropping users are present, which makes [...] Read more.

This study investigates security issues in a scenario involving a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-assisted unmanned aerial vehicle (UAV) with integrated sensing and communication (ISAC) functionality (UAV-ISAC). In this scenario, both legitimate users and eavesdropping users are present, which makes security a crucial concern. Our research goal is to extend the system’s coverage and improve its flexibility through the introduction of STAR-RIS, while ensuring secure transmission rates. To achieve this, we propose a secure transmission scheme through jointly optimizing the UAV-ISAC trajectory, transmit beamforming, and the phase and amplitude adjustments of the STAR-RIS reflective elements. The approach seeks to maximize the average secrecy rate while satisfying communication and sensing performance standards and transmission security constraints. As the considered problem involves coupled variables and is non-convex, it is difficult to solve using traditional optimization methods. To address this issue, we adopt a multi-agent deep reinforcement learning (MADRL) approach, which allows agents to interact with the environment to learn optimal strategies, effectively dealing with complex environments. The simulation results demonstrate that the proposed scheme significantly enhances the system’s average secrecy rate while satisfying communication, sensing, and security constraints. Full article

(This article belongs to the Section Communications)

► Show Figures

Figure 1

27 pages, 9470 KiB

Open AccessArticle

Multi-Objective Dynamic Path Planning with Multi-Agent Deep Reinforcement Learning

by Mengxue Tao, Qiang Li and Junxi Yu

J. Mar. Sci. Eng. 2025, 13(1), 20; https://doi.org/10.3390/jmse13010020 - 27 Dec 2024

Cited by 3 | Viewed by 2266

Abstract

Multi-agent reinforcement learning (MARL) is characterized by its simple structure and strong adaptability, which has led to its widespread application in the field of path planning. To address the challenge of optimal path planning for mobile agent clusters in uncertain environments, a multi-objective [...] Read more.

Multi-agent reinforcement learning (MARL) is characterized by its simple structure and strong adaptability, which has led to its widespread application in the field of path planning. To address the challenge of optimal path planning for mobile agent clusters in uncertain environments, a multi-objective dynamic path planning model (MODPP) based on multi-agent deep reinforcement learning (MADRL) has been proposed. This model is suitable for complex, unstable task environments characterized by dimensionality explosion and offers scalability. The approach consists of two components: an action evaluation module and an action decision module, utilizing a centralized training with decentralized execution (CTDE) training architecture. During the training process, agents within the cluster learn cooperative strategies while being able to communicate with one another. Consequently, they can navigate through task environments without communication, achieving collision-free paths that optimize multiple sub-objectives globally, minimizing time, distance, and overall costs associated with turning. Furthermore, in real-task execution, agents acting as mobile entities can perform real-time obstacle avoidance. Finally, based on the OpenAI Gym platform, environments such as simple multi-objective environment and complex multi-objective environment were designed to analyze the rationality and effectiveness of the multi-objective dynamic path planning through minimum cost and collision risk assessments. Additionally, the impact of reward function configuration on agent strategies was discussed. Full article

(This article belongs to the Special Issue Advanced Condition Monitoring and Intelligent Operation & Maintenance Technologies in Ships and Offshore Facilities)

► Show Figures

Figure 1

16 pages, 4711 KiB

Open AccessArticle

A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition

by Lei Sheng, Honghui Chen and Xiliang Chen

Algorithms 2024, 17(12), 579; https://doi.org/10.3390/a17120579 - 15 Dec 2024

Viewed by 1659

Abstract

The prevalent utilization of deterministic strategy algorithms in Multi-Agent Deep Reinforcement Learning (MADRL) for collaborative tasks has posed a significant challenge in achieving stable and high-performance cooperative behavior. Addressing the need for the balanced exploration and exploitation of multi-agent ant robots within a [...] Read more.

The prevalent utilization of deterministic strategy algorithms in Multi-Agent Deep Reinforcement Learning (MADRL) for collaborative tasks has posed a significant challenge in achieving stable and high-performance cooperative behavior. Addressing the need for the balanced exploration and exploitation of multi-agent ant robots within a partially observable continuous action space, this study introduces a multi-agent centralized strategy gradient algorithm grounded in a local state transition mechanism. In order to solve this challenge, the algorithm learns local state and local state-action representation from local observations and action values, thereby establishing a “local state transition” mechanism autonomously. As the input of the actor network, the automatically extracted local observation representation reduces the input state dimension, enhances the local state features closely related to the local state transition, and promotes the agent to use the local state features that affect the next observation state. To mitigate non-stationarity and reliability assignment issues in multi-agent environments, a centralized critic network evaluates the current joint strategy. The proposed algorithm, NST-FACMAC, is evaluated alongside other multi-agent deterministic strategy algorithms in a continuous control simulation environment using a multi-agent ant robot. The experimental results indicate accelerated convergence and higher average reward values in cooperative multi-agent ant simulation environments. Notably, in four simulated environments named Ant-v2 (2

\times

4), Ant-v2 (2

\times

4d), Ant-v2 (4

\times

2), and Manyant (2

\times

3), the algorithm demonstrates performance improvements of approximately 1.9%, 4.8%, 11.9%, and 36.1%, respectively, compared to the best baseline algorithm. These findings underscore the algorithm’s effectiveness in enhancing the stability of multi-agent ant robot control within dynamic environments. Full article

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

► Show Figures

Figure 1

17 pages, 7930 KiB

Open AccessArticle

DTPPO: Dual-Transformer Encoder-Based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments

by Anning Wei, Jintao Liang, Kaiyuan Lin, Ziyue Li and Rui Zhao

Drones 2024, 8(12), 720; https://doi.org/10.3390/drones8120720 - 29 Nov 2024

Cited by 1 | Viewed by 1401

Abstract

Existing multi-agent deep reinforcement learning (MADRL) methods for multi-UAV navigation face challenges in generalization, particularly when applied to unseen complex environments. To address these limitations, we propose a Dual-Transformer Encoder-Based Proximal Policy Optimization (DTPPO) method. DTPPO enhances multi-UAV collaboration through a [...] Read more.

Existing multi-agent deep reinforcement learning (MADRL) methods for multi-UAV navigation face challenges in generalization, particularly when applied to unseen complex environments. To address these limitations, we propose a Dual-Transformer Encoder-Based Proximal Policy Optimization (DTPPO) method. DTPPO enhances multi-UAV collaboration through a Spatial Transformer, which models inter-agent dynamics, and a Temporal Transformer, which captures temporal dependencies to improve generalization across diverse environments. This architecture allows UAVs to navigate new, unseen environments without retraining. Extensive simulations demonstrate that DTPPO outperforms current MADRL methods in terms of transferability, obstacle avoidance, and navigation efficiency across environments with varying obstacle densities. The results confirm DTPPO’s effectiveness as a robust solution for multi-UAV navigation in both known and unseen scenarios. Full article

(This article belongs to the Special Issue Advances in Detection, Security, and Communication for UAV)

► Show Figures

Figure 1

28 pages, 1238 KiB

Open AccessArticle

Resource Allocation in UAV-D2D Networks: A Scalable Heterogeneous Multi-Agent Deep Reinforcement Learning Approach

by Huayuan Wang, Hui Li, Xin Wang, Shilin Xia, Tao Liu and Ruonan Wang

Electronics 2024, 13(22), 4401; https://doi.org/10.3390/electronics13224401 - 10 Nov 2024

Cited by 1 | Viewed by 1663

Abstract

In unmanned aerial vehicle (UAV)-assisted device-to-device (D2D) caching networks, the uncertainty from unpredictable content demands and variable user positions poses a significant challenge for traditional optimization methods, often making them impractical. Multi-agent deep reinforcement learning (MADRL) offers significant advantages in optimizing multi-agent system [...] Read more.

In unmanned aerial vehicle (UAV)-assisted device-to-device (D2D) caching networks, the uncertainty from unpredictable content demands and variable user positions poses a significant challenge for traditional optimization methods, often making them impractical. Multi-agent deep reinforcement learning (MADRL) offers significant advantages in optimizing multi-agent system decisions and serves as an effective and practical alternative. However, its application in large-scale dynamic environments is severely limited by the curse of dimensionality and communication overhead. To resolve this problem, we develop a scalable heterogeneous multi-agent mean-field actor-critic (SH-MAMFAC) framework. The framework treats ground users (GUs) and UAVs as distinct agents and designs cooperative rewards to convert the resource allocation problem into a fully cooperative game, enhancing global network performance. We also implement a mixed-action mapping strategy to handle discrete and continuous action spaces. A mean-field MADRL framework is introduced to minimize individual agent training loads while enhancing total cache hit probability (CHP). The simulation results show that our algorithm improves CHP and reduces transmission delay. A comparative analysis with existing mainstream deep reinforcement learning (DRL) algorithms shows that SH-MAMFAC significantly reduces training time and maintains high CHP as GU count grows. Additionally, by comparing with SH-MAMFAC variants that do not include trajectory optimization or power control, the proposed joint design scheme significantly reduces transmission delay. Full article

(This article belongs to the Special Issue Advanced Techniques for Massive MIMO Systems in Next-Generation Wireless Communication and Networks)

► Show Figures

Figure 1

18 pages, 722 KiB

Open AccessArticle

Multi-Agent Deep Reinforcement Learning for Blockchain-Based Energy Trading in Decentralized Electric Vehicle Charger-Sharing Networks

by Yinjie Han, Jingyi Meng and Zihang Luo

Electronics 2024, 13(21), 4235; https://doi.org/10.3390/electronics13214235 - 29 Oct 2024

Cited by 3 | Viewed by 2486

Abstract

With The integration of renewable energy sources into smart grids and electric vehicle (EV) charger-sharing networks is essential for achieving the goal of environmental sustainability. However, the uneven distribution of distributed energy trading among EVs, fixed charging stations (FCSs), and mobile charging stations [...] Read more.

With The integration of renewable energy sources into smart grids and electric vehicle (EV) charger-sharing networks is essential for achieving the goal of environmental sustainability. However, the uneven distribution of distributed energy trading among EVs, fixed charging stations (FCSs), and mobile charging stations (MCSs) introduces challenges such as inadequate supply at FCSs and prolonged latencies at MCSs. In this paper, we propose a multi-agent deep reinforcement learning (MADRL)-based auction algorithm for energy trading that effectively balances charger supply with energy demand in distributed EV charging markets, while also reducing total charging latency. Specifically, this involves a MADRL-based hierarchical auction that dynamically adapts to real-time conditions, optimizing the balance of supply and demand. During energy trading, each EV, acting as a learning agent, can refine its bidding strategy to participate in various local energy trading markets, thus enhancing both individual utility and global social welfare. Furthermore, we design a cross-chain scheme to securely record and verify transaction results of energy trading in decentralized EV charger-sharing networks to ensure integrity and transparency. Finally, experimental results show that the proposed algorithm significantly outperforms both the second-price and double auctions in increasing global social welfare and reducing total charging latency. Full article

(This article belongs to the Special Issue Network Security Management in Heterogeneous Networks)

► Show Figures

Figure 1

19 pages, 4511 KiB

Open AccessArticle

Multi-Agent Deep Reinforcement Learning-Based Distributed Voltage Control of Flexible Distribution Networks with Soft Open Points

by Liang Zhang, Fan Yang, Dawei Yan, Guangchao Qian, Juan Li, Xueya Shi, Jing Xu, Mingjiang Wei, Haoran Ji and Hao Yu

Energies 2024, 17(21), 5244; https://doi.org/10.3390/en17215244 - 22 Oct 2024

Cited by 2 | Viewed by 1141

Abstract

The increasing number of distributed generators (DGs) leads to the frequent occurrence of voltage violations in distribution networks. The soft open point (SOP) can adjust the transmission power between feeders, leading to the evolution of traditional distribution networks into flexible distribution networks (FDN). [...] Read more.

The increasing number of distributed generators (DGs) leads to the frequent occurrence of voltage violations in distribution networks. The soft open point (SOP) can adjust the transmission power between feeders, leading to the evolution of traditional distribution networks into flexible distribution networks (FDN). The problem of voltage violations can be effectively tackled with the flexible control of SOPs. However, the centralized control method for SOP may make it difficult to achieve real-time control due to the limitations of communication. In this paper, a distributed voltage control method is proposed for FDN with SOPs based on the multi-agent deep reinforcement learning (MADRL) method. Firstly, a distributed voltage control framework is proposed, in which the updating algorithm of the intelligent agent of MADRL is expounded considering experience sharing. Then, a Markov decision process for multi-area SOP coordinated voltage control is proposed, where the control areas are divided based on electrical distance. Finally, an IEEE 33-node test system and a practical system in Taiwan are used to verify the effectiveness of the proposed method. It shows that the proposed multi-area SOP coordinated control method can achieve real-time control while ensuring a better control effect. Full article

(This article belongs to the Section A1: Smart Grids and Microgrids)

► Show Figures

Figure 1

Search Results (40)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (40)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI