MDPI - Publisher of Open Access Journals

21 pages, 4738 KiB

Open AccessArticle

Research on Computation Offloading and Resource Allocation Strategy Based on MADDPG for Integrated Space–Air–Marine Network

by Haixiang Gao

Entropy 2025, 27(8), 803; https://doi.org/10.3390/e27080803 - 28 Jul 2025

Abstract

This paper investigates the problem of computation offloading and resource allocation in an integrated space–air–sea network based on unmanned aerial vehicle (UAV) and low Earth orbit (LEO) satellites supporting Maritime Internet of Things (M-IoT) devices. Considering the complex, dynamic environment comprising M-IoT devices, [...] Read more.

This paper investigates the problem of computation offloading and resource allocation in an integrated space–air–sea network based on unmanned aerial vehicle (UAV) and low Earth orbit (LEO) satellites supporting Maritime Internet of Things (M-IoT) devices. Considering the complex, dynamic environment comprising M-IoT devices, UAVs and LEO satellites, traditional optimization methods encounter significant limitations due to non-convexity and the combinatorial explosion in possible solutions. A multi-agent deep deterministic policy gradient (MADDPG)-based optimization algorithm is proposed to address these challenges. This algorithm is designed to minimize the total system costs, balancing energy consumption and latency through partial task offloading within a cloud–edge-device collaborative mobile edge computing (MEC) system. A comprehensive system model is proposed, with the problem formulated as a partially observable Markov decision process (POMDP) that integrates association control, power control, computing resource allocation, and task distribution. Each M-IoT device and UAV acts as an intelligent agent, collaboratively learning the optimal offloading strategies through a centralized training and decentralized execution framework inherent in the MADDPG. The numerical simulations validate the effectiveness of the proposed MADDPG-based approach, which demonstrates rapid convergence and significantly outperforms baseline methods, and indicate that the proposed MADDPG-based algorithm reduces the total system cost by 15–60% specifically. Full article

(This article belongs to the Special Issue Space-Air-Ground-Sea Integrated Communication Networks)

► Show Figures

Figure 1

21 pages, 1207 KiB

Open AccessArticle

Flash-Attention-Enhanced Multi-Agent Deep Deterministic Policy Gradient for Mobile Edge Computing in Digital Twin-Powered Internet of Things

by Yuzhe Gao, Xiaoming Yuan, Songyu Wang, Lixin Chen, Zheng Zhang and Tianran Wang

Mathematics 2025, 13(13), 2164; https://doi.org/10.3390/math13132164 - 2 Jul 2025

Viewed by 298

Abstract

Offloading decisions and resource allocation problems in mobile edge computing (MEC) emerge as key challenges as they directly impact system performance and user experience in dynamic and resource-constrained Internet of Things (IoT) environments. This paper constructs a comprehensive and layered digital twin (DT) [...] Read more.

Offloading decisions and resource allocation problems in mobile edge computing (MEC) emerge as key challenges as they directly impact system performance and user experience in dynamic and resource-constrained Internet of Things (IoT) environments. This paper constructs a comprehensive and layered digital twin (DT) model for MEC, enabling real-time cooperation with the physical world and intelligent decision making. Within this model, a novel Flash-Attention-enhanced Multi-Agent Deep Deterministic Policy Gradient (FA-MADDPG) algorithm is proposed to effectively tackle MEC problems. It enhances the model by arming a critic network with attention to provide a high-quality decision. It also changes a matrix operation in a mathematical way to speed up the training process. Experiments are performed in our proposed DT environment, and results demonstrate that FA-MADDPG has good convergence. Compared with other algorithms, it achieves excellent performance in delay and energy consumption under various settings, with high time efficiency. Full article

(This article belongs to the Special Issue New Advances in Distributed Systems, Edge Intelligence, and Artificial Intelligence)

► Show Figures

Figure 1

32 pages, 5154 KiB

Open AccessArticle

A Hierarchical Reinforcement Learning Framework for Multi-Agent Cooperative Maneuver Interception in Dynamic Environments

by Qinlong Huang, Yasong Luo, Zhong Liu, Jiawei Xia, Ming Chang and Jiaqi Li

J. Mar. Sci. Eng. 2025, 13(7), 1271; https://doi.org/10.3390/jmse13071271 - 29 Jun 2025

Viewed by 451

Abstract

To address the challenges of real-time decision-making and resource optimization in multi-agent cooperative interception tasks within dynamic environments, this paper proposes a hierarchical framework for reinforcement learning-based interception algorithm (HFRL-IA). By constructing a hierarchical Markov decision process (MDP) model based on dynamic game [...] Read more.

To address the challenges of real-time decision-making and resource optimization in multi-agent cooperative interception tasks within dynamic environments, this paper proposes a hierarchical framework for reinforcement learning-based interception algorithm (HFRL-IA). By constructing a hierarchical Markov decision process (MDP) model based on dynamic game equilibrium theory, the complex interception task is decomposed into two hierarchically optimized stages: dynamic task allocation and distributed path planning. At the high level, a sequence-to-sequence reinforcement learning approach is employed to achieve dynamic bipartite graph matching, leveraging a graph neural network encoder–decoder architecture to handle dynamically expanding threat targets. At the low level, an improved prioritized experience replay multi-agent deep deterministic policy gradient algorithm (PER-MADDPG) is designed, integrating curriculum learning and prioritized experience replay mechanisms to effectively enhance the interception success rate against complex maneuvering targets. Extensive simulations in diverse scenarios and comparisons with conventional task assignment strategies demonstrate the superiority of the proposed algorithm. Taking a typical scenario of 10 agents intercepting as an example, the HFRL-IA algorithm achieves a 22.51% increase in training rewards compared to the traditional end-to-end MADDPG algorithm, and the interception success rate is improved by 26.37%. This study provides a new methodological framework for distributed cooperative decision-making in dynamic adversarial environments, with significant application potential in areas such as maritime multi-agent security defense and marine environment monitoring. Full article

(This article belongs to the Special Issue Dynamics and Control of Marine Mechatronics)

► Show Figures

Figure 1

20 pages, 2579 KiB

Open AccessArticle

ERA-MADDPG: An Elastic Routing Algorithm Based on Multi-Agent Deep Deterministic Policy Gradient in SDN

by Wanwei Huang, Hongchang Liu, Yingying Li and Linlin Ma

Future Internet 2025, 17(7), 291; https://doi.org/10.3390/fi17070291 - 29 Jun 2025

Viewed by 308

Abstract

To address the fact that changes in network topology can have an impact on the performance of routing, this paper proposes an Elastic Routing Algorithm based on Multi-Agent Deep Deterministic Policy Gradient (ERA-MADDPG), which is implemented within the framework of Multi-Agent Deep Deterministic [...] Read more.

To address the fact that changes in network topology can have an impact on the performance of routing, this paper proposes an Elastic Routing Algorithm based on Multi-Agent Deep Deterministic Policy Gradient (ERA-MADDPG), which is implemented within the framework of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) in deep reinforcement learning. The algorithm first builds a three-layer architecture based on Software-Defined Networking (SDN). The top-down layers are the multi-agent layer, the controller layer, and the data layer. The architecture’s processing flow, including real-time data layer information collection and dynamic policy generation, enables the ERA-MADDPG algorithm to exhibit strong elasticity by quickly adjusting routing decisions in response to topology changes. The actor-critic framework combined with Convolutional Neural Networks (CNN) to implement the ERA-MADDPG routing algorithm effectively improves training efficiency, enhances learning stability, facilitates collaboration, and improves algorithm generalization and applicability. Finally, simulation experiments demonstrate that the convergence speed of the ERA-MADDPG routing algorithm outperforms that of the Multi-Agent Deep Q-Network (MADQN) algorithm and the Smart Routing based on Deep Reinforcement Learning (SR-DRL) algorithm, and the training speed in the initial phase is improved by approximately 20.9% and 39.1% compared to the MADQN algorithm and SR-DRL algorithm, respectively. The elasticity performance of ERA-MADDPG is quantified by re-convergence speed: under 5–15% topology node/link changes, its re-convergence speed is over 25% faster than that of MADQN and SR-DRL, demonstrating superior capability to maintain routing efficiency in dynamic environments. Full article

► Show Figures

Figure 1

28 pages, 1293 KiB

Open AccessArticle

Research on Multi-Agent Collaborative Scheduling Planning Method for Time-Triggered Networks

by Changsheng Chen, Anrong Zhao, Zhihao Zhang, Tao Zhang and Chao Fan

Electronics 2025, 14(13), 2575; https://doi.org/10.3390/electronics14132575 - 26 Jun 2025

Viewed by 290

Abstract

Time-triggered Ethernet combines time-triggered and event-triggered communication, and is suitable for fields with high real-time requirements. Aiming at the problem that the traditional scheduling algorithm is not effective in scheduling event-triggered messages, a message scheduling algorithm based on multi-agent reinforcement learning (MADDPG, Multi-Agent [...] Read more.

Time-triggered Ethernet combines time-triggered and event-triggered communication, and is suitable for fields with high real-time requirements. Aiming at the problem that the traditional scheduling algorithm is not effective in scheduling event-triggered messages, a message scheduling algorithm based on multi-agent reinforcement learning (MADDPG, Multi-Agent Deep Deterministic Policy Gradient) and a hybrid algorithm combining SMT (Satisfiability Modulo Theories) solver and MADDPG are proposed. This method aims to optimize the scheduling of event-triggered messages while maintaining the uniformity of time-triggered message scheduling, providing more time slots for event-triggered messages, and reducing their waiting time and end-to-end delay. Through the designed scheduling software, in the experiment, compared with the SMT-based algorithm and the traditional DQN (Deep Q-Network) algorithm, the new method shows better load balance and lower message jitter, and it is verified in the OPNET simulation environment that it can effectively reduce the delay of event-triggered messages. Full article

(This article belongs to the Special Issue Advanced Techniques for Multi-Agent Systems)

► Show Figures

Figure 1

33 pages, 5490 KiB

Open AccessArticle

Comparative Evaluation of Reinforcement Learning Algorithms for Multi-Agent Unmanned Aerial Vehicle Path Planning in 2D and 3D Environments

by Mirza Aqib Ali, Adnan Maqsood, Usama Athar and Hasan Raza Khanzada

Drones 2025, 9(6), 438; https://doi.org/10.3390/drones9060438 - 16 Jun 2025

Viewed by 979

Abstract

Path planning in multi-agent UAV swarms is a crucial issue that involves avoiding collisions in dynamic, obstacle-filled environments while consuming the least amount of time and energy possible. This work comprehensively evaluates reinforcement learning (RL) algorithms for multi-agent UAV path planning in 2D [...] Read more.

Path planning in multi-agent UAV swarms is a crucial issue that involves avoiding collisions in dynamic, obstacle-filled environments while consuming the least amount of time and energy possible. This work comprehensively evaluates reinforcement learning (RL) algorithms for multi-agent UAV path planning in 2D and 3D simulated environments. First, we develop a 2D simulation setup using Python in which UAVs (quadcopters), represented as points in space, navigate toward their respective targets while avoiding static obstacles and inter-agent collisions. In the second phase, we transition this comparison to a physics-based 3D simulation, incorporating realistic UAV (fixed wing) dynamics and checkpoint-based navigation. We compared five algorithms, namely, Proximal Policy Optimization (PPO), Soft Actor–Critic (SAC), Deep Deterministic Policy Gradient (DDPG), Trust Region Policy Optimization (TRPO), and Multi–Agent DDPG (MADDPG), in various scenarios. Our findings reveal significant performance differences between the algorithms across multiple dimensions. DDPG consistently demonstrated superior reward optimization and collision avoidance performance, while PPO and MADDPG excelled in the execution time required to reach the goal. Furthermore, our findings reveal how algorithms perform while transitioning from a simplistic 2D setup to a realistic 3D physics-based environment, which is essential for performing sim-to-real transfer. This work provides valuable insights into the suitability of several reinforcement learning (RL) algorithms for developing autonomous systems and UAV swarm navigation. Full article

► Show Figures

Figure 1

20 pages, 1778 KiB

Open AccessArticle

Energy Management for Distributed Carbon-Neutral Data Centers

by Wenting Chang, Chuyi Liu, Guanyu Ren and Jianxiong Wan

Energies 2025, 18(11), 2861; https://doi.org/10.3390/en18112861 - 30 May 2025

Cited by 1 | Viewed by 331

Abstract

With the continuous expansion of data centers, their carbon emission has become a serious issue. A number of studies are committing to reduce the carbon emission of data centers. Carbon trading, carbon capture, and power-to-gas technologies are promising emission reduction techniques which are, [...] Read more.

With the continuous expansion of data centers, their carbon emission has become a serious issue. A number of studies are committing to reduce the carbon emission of data centers. Carbon trading, carbon capture, and power-to-gas technologies are promising emission reduction techniques which are, however, seldom applied to data centers. To bridge this gap, we propose a carbon-neutral architecture for distributed data centers, where each data center consists of three subsystems, i.e., an energy subsystem for energy supply, thermal subsystem for data center cooling, and carbon subsystem for carbon trading. Then, we formulate the energy management problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and develop a distributed solution framework using Multi-Agent Deep Deterministic Policy Gradient (MADDPG). Finally, simulations using real-world data show that a cost saving of 20.3% is provided. Full article

► Show Figures

Figure 1

18 pages, 18892 KiB

Open AccessArticle

A Bidding Strategy for Power Suppliers Based on Multi-Agent Reinforcement Learning in Carbon–Electricity–Coal Coupling Market

by Zhiwei Liao, Chengjin Li, Xiang Zhang, Qiyun Hu and Bowen Wang

Energies 2025, 18(9), 2388; https://doi.org/10.3390/en18092388 - 7 May 2025

Viewed by 434

Abstract

The deepening operation of the carbon emission trading market has reshaped the cost–benefit structure of the power generation side. In the process of participating in the market quotation, power suppliers not only need to calculate the conventional power generation cost but also need [...] Read more.

The deepening operation of the carbon emission trading market has reshaped the cost–benefit structure of the power generation side. In the process of participating in the market quotation, power suppliers not only need to calculate the conventional power generation cost but also need to coordinate the superimposed impact of carbon quota accounting on operating income, which causes the power suppliers a multi-time-scale decision-making collaborative optimization problem under the interaction of the carbon market, power market, and coal market. This paper focuses on the multi-market-coupling decision optimization problem of thermal power suppliers. It proposes a collaborative bidding decision framework based on a multi-agent deep deterministic policy gradient (MADDPG). Firstly, aiming at the time-scale difference of multi-sided market decision making, a decision-making cycle coordination scheme for the carbon–electricity–coal coupling market is proposed. Secondly, upper and lower optimization models for the bidding decision making of power suppliers are constructed. Then, based on the MADDPG algorithm, the multi-generator bidding scenario is simulated to solve the optimal multi-generator bidding strategy in the carbon–electricity–coal coupling market. Finally, the multi-scenario simulation based on the IEEE-5 node system shows that the model can effectively analyze the differential influence of a multi-market structure on the bidding strategy of power suppliers, verifying the superiority of the algorithm in convergence speed and revenue optimization. Full article

► Show Figures

Figure 1

13 pages, 3705 KiB

Open AccessArticle

Multi-Agent Reinforcement Learning-Based Control Method for Pedestrian Guidance Using the Mojiko Fireworks Festival Dataset

by Masato Kiyama, Motoki Amagasaki and Toshiaki Okamoto

Electronics 2025, 14(6), 1062; https://doi.org/10.3390/electronics14061062 - 7 Mar 2025

Viewed by 727

Abstract

With increasing incidents due to congestion at events, effective pedestrian guidance has become a critical safety concern. Recent research has explored the application of reinforcement learning to crowd simulation, where agents learn optimal actions through trial and error to maximize rewards based on [...] Read more.

With increasing incidents due to congestion at events, effective pedestrian guidance has become a critical safety concern. Recent research has explored the application of reinforcement learning to crowd simulation, where agents learn optimal actions through trial and error to maximize rewards based on environmental states. This study investigates the use of reinforcement learning and simulation techniques to mitigate pedestrian congestion through improved guidance systems. We employ the Multi-Agent Deep Deterministic Policy Gradient (MA-DDPG), a multi-agent reinforcement learning approach, and propose an enhanced method for learning the Q-function for actors within the MA-DDPG framework. Using the Mojiko Fireworks Festival dataset as a case study, we evaluated the effectiveness of our proposed method by comparing congestion levels with existing approaches. The results demonstrate that our method successfully reduces congestion, with agents exhibiting superior cooperation in managing crowd flow. This improvement in agent coordination suggests the potential for practical applications in real-world crowd management scenarios. Full article

(This article belongs to the Special Issue AI-Based Pervasive Application Services)

► Show Figures

Figure 1

16 pages, 2276 KiB

Open AccessArticle

Adaptive Control of VSG Inertia Damping Based on MADDPG

by Demu Zhang, Jing Zhang, Yu He, Tao Shen and Xingyan Liu

Energies 2024, 17(24), 6421; https://doi.org/10.3390/en17246421 - 20 Dec 2024

Viewed by 949

Abstract

As renewable energy sources become more integrated into the power grid, traditional virtual synchronous generator (VSG) control strategies have become inadequate for the current low-damping, low-inertia power systems. Therefore, this paper proposes a VSG inertia and damping adaptive control method based on multi-agent [...] Read more.

As renewable energy sources become more integrated into the power grid, traditional virtual synchronous generator (VSG) control strategies have become inadequate for the current low-damping, low-inertia power systems. Therefore, this paper proposes a VSG inertia and damping adaptive control method based on multi-agent deep deterministic policy gradient (MADDPG). The paper first introduces the working principles of virtual synchronous generators and establishes a corresponding VSG model. Based on this model, the influence of variations in virtual inertia (J) and damping (D) coefficients on fluctuations in active power output is examined, defining the action space for J and D. The proposed method is mainly divided into two phases: “centralized training and decentralized execution”. In the centralized training phase, each agent’s critic network shares global observation and action information to guide the actor network in policy optimization. In the decentralized execution phase, agents observe frequency deviations and the rate at which angular frequency changes, using reinforcement learning algorithms to adjust the virtual inertia J and damping coefficient D in real time. Finally, the effectiveness of the proposed MADDPG control strategy is validated through comparison with adaptive control and DDPG control methods. Full article

(This article belongs to the Special Issue Planning, Operation, and Control of New Power Systems)

► Show Figures

Figure 1

19 pages, 3567 KiB

Open AccessArticle

Multi-Agent Reinforcement Learning-Based Computation Offloading for Unmanned Aerial Vehicle Post-Disaster Rescue

by Lixing Wang and Huirong Jiao

Sensors 2024, 24(24), 8014; https://doi.org/10.3390/s24248014 - 15 Dec 2024

Cited by 1 | Viewed by 1539

Abstract

Natural disasters cause significant losses. Unmanned aerial vehicles (UAVs) are valuable in rescue missions but need to offload tasks to edge servers due to their limited computing power and battery life. This study proposes a task offloading decision algorithm called the multi-agent deep [...] Read more.

Natural disasters cause significant losses. Unmanned aerial vehicles (UAVs) are valuable in rescue missions but need to offload tasks to edge servers due to their limited computing power and battery life. This study proposes a task offloading decision algorithm called the multi-agent deep deterministic policy gradient with cooperation and experience replay (CER-MADDPG), which is based on multi-agent reinforcement learning for UAV computation offloading. CER-MADDPG emphasizes collaboration between UAVs and uses historical UAV experiences to classify and obtain optimal strategies. It enables collaboration among edge devices through the design of the ’critic’ network. Additionally, by defining good and bad experiences for UAVs, experiences are classified into two separate buffers, allowing UAVs to learn from them, seek benefits, avoid harm, and reduce system overhead. The performance of CER-MADDPG was verified through simulations in two aspects. First, the influence of key hyperparameters on performance was examined, and the optimal values were determined. Second, CER-MADDPG was compared with other baseline algorithms. The results show that compared with MADDPG and stochastic game-based resource allocation with prioritized experience replay, CER-MADDPG achieves the lowest system overhead and superior stability and scalability. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

28 pages, 5225 KiB

Open AccessArticle

MAARS: Multiagent Actor–Critic Approach for Resource Allocation and Network Slicing in Multiaccess Edge Computing

by Ducsun Lim and Inwhee Joe

Sensors 2024, 24(23), 7760; https://doi.org/10.3390/s24237760 - 4 Dec 2024

Viewed by 1252

Abstract

This paper presents a novel algorithm to address resource allocation and network-slicing challenges in multiaccess edge computing (MEC) networks. Network slicing divides a physical network into virtual slices, each tailored to efficiently allocate resources and meet diverse service requirements. To maximize the completion [...] Read more.

This paper presents a novel algorithm to address resource allocation and network-slicing challenges in multiaccess edge computing (MEC) networks. Network slicing divides a physical network into virtual slices, each tailored to efficiently allocate resources and meet diverse service requirements. To maximize the completion rate of user-computing tasks within these slices, the problem is decomposed into two subproblems: efficient core-to-edge slicing (ECS) and autonomous resource slicing (ARS). ECS facilitates collaborative resource distribution through cooperation among edge servers, while ARS dynamically manages resources based on real-time network conditions. The proposed solution, a multiagent actor–critic resource scheduling (MAARS) algorithm, employs a reinforcement learning framework. Specifically, MAARS utilizes a multiagent deep deterministic policy gradient (MADDPG) for efficient resource distribution in ECS and a soft actor–critic (SAC) technique for robust real-time resource management in ARS. Simulation results demonstrate that MAARS outperforms benchmark algorithms, including heuristic-based, DQN-based, and A2C-based methods, in terms of task completion rates, resource utilization, and convergence speed. Thus, this study offers a scalable and efficient framework for resource optimization and network slicing in MEC networks, providing practical benefits for real-world deployments and setting a new performance benchmark in dynamic environments. Full article

(This article belongs to the Special Issue Sensing and Mobile Edge Computing)

► Show Figures

Figure 1

16 pages, 8397 KiB

Open AccessArticle

Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)

by Almira Budiyanto, Keisuke Azetsu and Nobutomo Matsunaga

Automation 2024, 5(4), 597-612; https://doi.org/10.3390/automation5040034 - 27 Nov 2024

Viewed by 1589

Abstract

A method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognized environments. On the [...] Read more.

A method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognized environments. On the other hand, re-learning may be required in unrecognized circumstances by using the MADDPG method. Although the development of MADDPG using model-based learning and imitation learning has been applied to reduce learning time, it is unclear how the learning results are transferred when the number of robots changes. For example, in the GASIL-MADDPG (Generative adversarial self-imitation learning and Multi-agent Deep Deterministic Policy Gradient) method, how the results of three robot training can be transferred to the four robots’ neural networks is uncertain. Nowadays, Scaled Dot Product Attention (SDPA) has attracted attention and is highly impactful for its speed and accuracy in natural language processing. When transfer learning is combined with fast computation, the efficiency of edge-level re-learning is improved. This paper proposes a formation change algorithm that allows easy and fast multi-robot knowledge transfer using SDPA combined with MAPPO (Multi-Agent Proximal Policy Optimization), compared to other methods. This algorithm applies SDPA to multi-robot formation learning and performs fast learning by transferring the acquired knowledge of formation changes to a certain number of robots. The proposed algorithm is verified by simulating the robot formation change and was able to achieve dramatic high-speed learning capabilities. The proposed SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization) learned 20.83 times faster than the Deep Dyna-Q method. Furthermore, using transfer learning from a three-robot to five-robot case, the method shows that the learning time can be reduced by about 56.57 percent. A scenario of three-robot to five-robot is chosen based on the number of robots often used in cooperative robots. Full article

► Show Figures

Figure 1

25 pages, 5516 KiB

Open AccessArticle

Multi-UAV Path Planning for Air-Ground Relay Communication Based on Mix-Greedy MAPPO Algorithm

by Yiquan Wang, Yan Cui, Yu Yang, Zhaodong Li and Xing Cui

Drones 2024, 8(12), 706; https://doi.org/10.3390/drones8120706 - 26 Nov 2024

Viewed by 1628

Abstract

With the continuous development of modern UAV technology and communication technology, UAV-to-ground communication relay has become a research hotspot. In this paper, a Multi-Agent Reinforcement Learning (MARL) method based on the ε-greedy strategy and multi-agent proximal policy optimization (MAPPO) algorithm is proposed to [...] Read more.

With the continuous development of modern UAV technology and communication technology, UAV-to-ground communication relay has become a research hotspot. In this paper, a Multi-Agent Reinforcement Learning (MARL) method based on the ε-greedy strategy and multi-agent proximal policy optimization (MAPPO) algorithm is proposed to address the local optimization problem, improving the communication efficiency and task execution capability of UAV cluster control. This paper explores the path planning problem in multi-UAV-to-ground relay communication, with a special focus on the application of the proposed Mix-Greedy MAPPO algorithm. The state space, action space, communication model, training environment, and reward function are designed by comprehensively considering the actual tasks and entity characteristics such as safe distance, no-fly zones, survival in a threatened environment, and energy consumption. The results show that the Mix-Greedy MAPPO algorithm significantly improves communication probability, reduces energy consumption, avoids no-fly zones, and facilitates exploration compared to other algorithms in the multi-UAV ground communication relay path planning task. After training with the same number of steps, the Mix-Greedy MAPPO algorithm has an average reward score that is 45.9% higher than the MAPPO algorithm and several times higher than the multi-agent soft actor-critic (MASAC) and multi-agent deep deterministic policy gradient (MADDPG) algorithms. The experimental results verify the superiority and adaptability of the algorithm in complex environments. Full article

(This article belongs to the Special Issue Unmanned Aerial Vehicles for Enhanced Emergency Response)

► Show Figures

Figure 1

21 pages, 3095 KiB

Open AccessFeature PaperArticle

Multi-Agent Reinforcement Learning for Smart Community Energy Management

by Patrick Wilk, Ning Wang and Jie Li

Energies 2024, 17(20), 5211; https://doi.org/10.3390/en17205211 - 20 Oct 2024

Cited by 1 | Viewed by 2647

Abstract

This paper investigates a Local Strategy-Driven Multi-Agent Deep Deterministic Policy Gradient (LSD-MADDPG) method for demand-side energy management systems (EMS) in smart communities. LSD-MADDPG modifies the conventional MADDPG framework by limiting data sharing during centralized training to only discretized strategic information. During execution, it [...] Read more.

This paper investigates a Local Strategy-Driven Multi-Agent Deep Deterministic Policy Gradient (LSD-MADDPG) method for demand-side energy management systems (EMS) in smart communities. LSD-MADDPG modifies the conventional MADDPG framework by limiting data sharing during centralized training to only discretized strategic information. During execution, it relies solely on local information, eliminating post-training data exchange. This approach addresses critical challenges commonly faced by EMS solutions serving dynamic, increasing-scale communities, such as communication delays, single-point failures, scalability, and nonstationary environments. By leveraging and sharing only strategic information among agents, LSD-MADDPG optimizes decision-making while enhancing training efficiency and safeguarding data privacy—a critical concern in the community EMS. The proposed LSD-MADDPG has proven to be capable of reducing energy costs and flattening the community demand curve by coordinating indoor temperature control and electric vehicle charging schedules across multiple buildings. Comparative case studies reveal that LSD-MADDPG excels in both cooperative and competitive settings by ensuring fair alignment between individual buildings’ energy management actions and community-wide goals, highlighting its potential for advancing future smart community energy management. Full article

(This article belongs to the Special Issue Application of Machine Learning Tools for Energy System)

► Show Figures

Figure 1

Search Results (64)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (64)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI