Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (497)

Search Parameters:
Keywords = multi-agent deep reinforcement learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 3420 KB  
Article
DQN-Based Pre-Optimization for Dual-Scale Collaborative Topology Optimization of Anisotropic Materials
by Shuo Feng, Yuhao Yang, Ke Li, Qidong Han, Jinchen Cao and Junyi Du
Appl. Sci. 2026, 16(9), 4080; https://doi.org/10.3390/app16094080 - 22 Apr 2026
Abstract
Traditional topology optimization methods often face challenges such as slow convergence, high sensitivity to initial structures, and limited exploration of the design space when dealing with multi-physics coupling problems. To address these challenges, this study proposes an efficient design framework integrating reinforcement learning [...] Read more.
Traditional topology optimization methods often face challenges such as slow convergence, high sensitivity to initial structures, and limited exploration of the design space when dealing with multi-physics coupling problems. To address these challenges, this study proposes an efficient design framework integrating reinforcement learning and topology optimization. The framework first employs a Deep Q-Network (DQN) agent to dynamically adjust penalty factors, accelerating the convergence process, and uses its pre-optimization results as the initial conditions for the Bidirectional Evolutionary Structural Optimization (BESO) method, thereby enhancing optimization efficiency and structural performance. By introducing an anisotropic material model, the design space is expanded, further unlocking the potential for structural lightweighting. On this basis, a dual-objective optimization strategy for mechanical compliance and thermal compliance is adopted, enabling the final structure to adapt to various physical working conditions. Finally, the optimal design is extended from two-dimensional to three-dimensional, facilitating subsequent manufacturing and verification. Numerical examples demonstrate that compared with traditional methods, the proposed pre-optimization method achieves a 22.463% reduction in structural compliance and improves thermal management performance. The framework demonstrates robust convergence across different boundary conditions (MBB and cantilever beams) and expands the design space through anisotropic microstructures, offering a practical solution for multi-physics lightweight design. Full article
(This article belongs to the Special Issue Advanced Finite Element Method and Its Applications, Second Edition)
32 pages, 3077 KB  
Article
Market-Aware and Topology-Embedded Safe Reinforcement Learning for Virtual Power Plant Dispatch
by Yueping Xiang, Luoyi Li, Yanqiu Hou, Xiaoyu Dai, Wenfeng Peng, Zhuoyang Liu, Ziming Liu, Zicong Chen, Xingyu Hu and Lv He
World Electr. Veh. J. 2026, 17(4), 222; https://doi.org/10.3390/wevj17040222 - 21 Apr 2026
Abstract
To address the challenges faced by virtual power plants (VPPs) in uncertain market environments and complex distribution networks, including strong market coupling, difficulty in multi-resource coordination, and strict safety constraints, this paper proposes a Hierarchical Hybrid Intelligent Framework (H2IF). The proposed framework integrates [...] Read more.
To address the challenges faced by virtual power plants (VPPs) in uncertain market environments and complex distribution networks, including strong market coupling, difficulty in multi-resource coordination, and strict safety constraints, this paper proposes a Hierarchical Hybrid Intelligent Framework (H2IF). The proposed framework integrates a market-aware meta-game mechanism, a topology-embedded graph attention coordination method, and a risk-aware soft/hard constraint safety mechanism to achieve economically optimal dispatch of VPPs in complex dynamic scenarios. By explicitly modeling competitive market interactions, the proposed method enhances strategy robustness; by exploiting grid topology priors, it improves multi-agent coordination capability; and by combining differentiable projection with risk-constrained optimization, it jointly ensures operational safety and revenue stability. Simulation results on a modified IEEE 33-bus system demonstrate that H2IF outperforms mainstream deep reinforcement learning methods and rule-based dispatch strategies in overall performance. In the 24 × 300-step testing scenario, H2IF achieves an average single-episode operating cost of 38.23 k$, which is 28.9%, 40.4%, and 26.5% lower than those of MADDPG, SAC, and the rule-based method, respectively, while also yielding the lowest constraint violation level. Ablation studies further verify the effectiveness of each key module in improving profit, reducing operating costs, enhancing tracking performance, and strengthening safety. The results indicate that the proposed method enables coordinated optimization of economy, safety, and robustness for VPP dispatch under uncertain market and operating conditions. Full article
(This article belongs to the Section Marketing, Promotion and Socio Economics)
28 pages, 7163 KB  
Article
An Intelligent Arterial Traffic Control Framework for Visible Light-Connected Vehicles
by Gonçalo Galvão, Manuela Vieira, Manuel Augusto Vieira, Mário Véstias and Paula Louro
Smart Cities 2026, 9(4), 72; https://doi.org/10.3390/smartcities9040072 - 20 Apr 2026
Viewed by 49
Abstract
Inefficient urban traffic management remains a critical challenge, as conventional signal controllers—built on fixed timing plans—cannot cope with the dynamic nature of modern city traffic. This study addresses this limitation by developing a decentralized MARL-based framework capable of coordinating five interconnected intersections as [...] Read more.
Inefficient urban traffic management remains a critical challenge, as conventional signal controllers—built on fixed timing plans—cannot cope with the dynamic nature of modern city traffic. This study addresses this limitation by developing a decentralized MARL-based framework capable of coordinating five interconnected intersections as a unified traffic cell. Central to the proposed solution is the Strategic Anti-Blocking Phase Adjustment (SAPA) module, which enables intersections to autonomously modify phase durations in response to real-time traffic conditions. The framework is designed to handle heterogeneous demand patterns, with particular emphasis on arterial corridors connecting urban centers to peripheral zones. Integration of a Visible Light Communication (VLC) network allows continuous monitoring of key variables, including vehicle kinematics and pedestrian activity, feeding the agents with rich environmental feedback. Experimental evaluation confirms the effectiveness of the approach: the SAPA-augmented DQN achieves roughly 33% shorter vehicle queues and a ~70% reduction in pedestrian waiting counts relative to a standard DQN baseline. Remarkably, these gains bring the value-based method to a performance level comparable to MAPPO, a considerably more complex multi-agent policy optimization algorithm, establishing SAPA as an efficient and scalable enhancement for intelligent urban traffic control. Full article
Show Figures

Figure 1

24 pages, 11332 KB  
Article
Intelligent Optimization Methods for Cloud–Edge Collaborative Vehicular Networks via the Integration of Bayesian Decision-Making and Reinforcement Learning
by Youjian Yu, Zhaowei Song, Sifeng Zhu and Qinghua Zhang
Future Internet 2026, 18(4), 215; https://doi.org/10.3390/fi18040215 - 17 Apr 2026
Viewed by 126
Abstract
To improve vehicle user service quality and address data privacy and security issues in intelligent transportation vehicle networking systems, a three-tier communication architecture with cloud-edge-end collaboration was designed in this paper. A Bayesian decision criterion was utilized to divide user data segments into [...] Read more.
To improve vehicle user service quality and address data privacy and security issues in intelligent transportation vehicle networking systems, a three-tier communication architecture with cloud-edge-end collaboration was designed in this paper. A Bayesian decision criterion was utilized to divide user data segments into fine-grained slices based on their privacy levels, and differential privacy techniques were applied to protect the offloaded data. To achieve multi-objective optimization between user service quality and data privacy and security, the problem was formulated as a constrained Markov decision process. A communication model, a caching model, a latency model, an energy consumption model, and a data-fragment privacy protection model were designed. Additionally, a deep reinforcement learning algorithm based on the actor–critic approach was proposed for the collaborative and centralized training of multiple intelligent agents (CTMA-AC), enabling multi-objective optimization decision-making for the protection of offloaded private user data. Simulation experiments demonstrate that the proposed multi-agent collaborative privacy data offloading protection strategy can effectively safeguard private user data while ensuring high service quality. Full article
(This article belongs to the Section Network Virtualization and Edge/Fog Computing)
28 pages, 11994 KB  
Article
Multi-UAV Cooperative Path Planning Method Based on an Improved MADDPG Algorithm
by Feiqiao Zhang, Qian Wang and Xin Ma
Electronics 2026, 15(8), 1632; https://doi.org/10.3390/electronics15081632 - 14 Apr 2026
Viewed by 198
Abstract
To address cooperative path planning for multiple UAVs in complex environments, this paper proposes an improved multi-agent deep deterministic policy gradient algorithm, named Prioritized Experience Multi-Agent Deep Deterministic Policy Gradient (PE-MADDPG). An urban low-altitude inspection environment is first constructed within a reinforcement-learning framework, [...] Read more.
To address cooperative path planning for multiple UAVs in complex environments, this paper proposes an improved multi-agent deep deterministic policy gradient algorithm, named Prioritized Experience Multi-Agent Deep Deterministic Policy Gradient (PE-MADDPG). An urban low-altitude inspection environment is first constructed within a reinforcement-learning framework, in which dynamic constraints, safety-separation requirements, and formation-cooperation objectives are incorporated into a partially observable Markov decision process. To improve training effectiveness, prioritized experience replay is introduced to increase the utilization of informative samples, an adaptive exploration-noise strategy is designed to regulate exploration intensity, and a multi-head attention mechanism is embedded in the Critic network to enhance the representation of inter-agent interactions. Simulation results in a three-dimensional urban inspection scenario show that PE-MADDPG outperforms the selected benchmark methods in task completion rate, formation maintenance, flight efficiency, and energy consumption. These results provide an effective solution for urban low-altitude inspection tasks. Full article
Show Figures

Figure 1

32 pages, 12012 KB  
Article
Multi-Agent Reinforcement Learning-Based Intelligent Game Guidance with Complex Constraint
by Fucong Liu, Yang Guo, Shaobo Wang, Jin Wang and Zhengquan Liu
Aerospace 2026, 13(4), 365; https://doi.org/10.3390/aerospace13040365 - 14 Apr 2026
Viewed by 247
Abstract
For the complex problems of multi-aircraft cooperative game guidance with No-Fly Zone (NFZ) avoidance and cross-task constraint propagation, a deep deterministic policy gradient algorithm with temporal awareness and priority cooperative optimization (TP-MADDPG) is proposed. Based on the three-body cooperative guidance, a new coupled [...] Read more.
For the complex problems of multi-aircraft cooperative game guidance with No-Fly Zone (NFZ) avoidance and cross-task constraint propagation, a deep deterministic policy gradient algorithm with temporal awareness and priority cooperative optimization (TP-MADDPG) is proposed. Based on the three-body cooperative guidance, a new coupled guidance task is formed by adding the NFZ avoidance constraint. At the same time, considering the constraint compatibility problem in dynamic task switching, the cooperative aircraft are modeled as independent agents with differentiated policy networks. First, a nonlinear kinematic model of the three-body game constructed by Evader–Pursuer–Defender is established. And four complex constraint conditions, namely homing guidance, NFZ avoidance, collision avoidance, and cooperative guidance, are modeled separately. Secondly, the Long Short-Term Memory-based (LSTM) Actor–Critic framework is proposed to dynamically capture the evolution patterns of adversarial scenarios by mining hidden correlations in historical state-action sequences. This enables smooth policy transitions between the cooperative guidance phase and subsequent homing guidance phase, effectively addressing the challenges of environmental non-stationarity and temporal task dependencies. Then, a priority-driven adaptive sampling mechanism is proposed along with a heterogeneous roles cooperative reward function to specifically address credit assignment imbalance and sparse reward problems, respectively. The sampling mechanism capitalizes on the efficient retrieval properties of SumTree data structures while integrating bias correction techniques to expedite policy gradient convergence. The reward function utilizes the reward shaping method to formulate cooperative reward components that explicitly capture behavioral correlations among agents. Finally, simulations show that the proposed method significantly outperforms multi-agent reinforcement learning baselines, effectively improving the performance of cooperative game guidance under complex constraints. Full article
(This article belongs to the Special Issue Flight Guidance and Control)
Show Figures

Figure 1

34 pages, 6346 KB  
Article
Multi-Head Attention Deep Q-Network with Prioritized Experience Replay for UAV Path Planning in Dynamic Environments: A Bio-Inspired Approach
by Yang Li, Xinjie Qian, Jiexin Zhang, Xiao Yang and Chao Deng
Biomimetics 2026, 11(4), 268; https://doi.org/10.3390/biomimetics11040268 - 13 Apr 2026
Viewed by 230
Abstract
Unmanned Aerial Vehicles (UAVs) have become widely used tools for different applications including surveillance, search and rescue, and package delivery. However, autonomous path planning in dynamic environments with moving obstacles, wind disturbances, and energy constraints remains a significant challenge. This paper proposes a [...] Read more.
Unmanned Aerial Vehicles (UAVs) have become widely used tools for different applications including surveillance, search and rescue, and package delivery. However, autonomous path planning in dynamic environments with moving obstacles, wind disturbances, and energy constraints remains a significant challenge. This paper proposes a novel Multi-Head Attention Deep Q-Network with Prioritized Experience Replay (MA-DQN + PER) that integrates bio-inspired attention mechanisms with deep reinforcement learning for efficient UAV path planning. Our approach features a 46-dimensional state space that captures all environmental information, including static obstacles, wind conditions, and energy status. The proposed Attention-QNetwork architecture uses four specialized attention heads to selectively focus on different aspects of the environment, including obstacle avoidance, target tracking and energy management, and wind compensation. To improve sample efficiency and convergence speed, we incorporate Prioritized Experience Replay (PER) as well as Prioritized Experience Replay (PER) with a sum-tree data structure to improve sample efficiency and convergence speed. A curriculum learning strategy that includes 10 difficulty levels is designed to progressively enhance the agent’s capabilities. Extensive simulations demonstrate that our MA-DQN + PER approach reaches a 96% task success rate (defined as the percentage of episodes where the UAV successfully reaches the target without collision or battery depletion), while the convergence speed was 68% quicker than that of the baseline DQN. Our method demonstrates superior performance in path efficiency (+17%), energy consumption reduction (−26%), and collision avoidance compared to state-of-the-art algorithms. Full article
(This article belongs to the Section Bioinspired Sensorics, Information Processing and Control)
Show Figures

Figure 1

21 pages, 2353 KB  
Article
An Adaptive Bidding Strategy for Virtual Power Plants in Day-Ahead Markets Under Multiple Uncertainties
by Wei Yang and Wenjun Wang
Energies 2026, 19(8), 1878; https://doi.org/10.3390/en19081878 - 12 Apr 2026
Viewed by 409
Abstract
To address the challenges posed by multiple uncertainties in modern power systems to the market bidding of Virtual Power Plants (VPPs), this paper proposes an adaptive bidding strategy based on Deep Reinforcement Learning (DRL). First, a heterogeneous VPP aggregation model integrating dedicated energy [...] Read more.
To address the challenges posed by multiple uncertainties in modern power systems to the market bidding of Virtual Power Plants (VPPs), this paper proposes an adaptive bidding strategy based on Deep Reinforcement Learning (DRL). First, a heterogeneous VPP aggregation model integrating dedicated energy storage, Vehicle-to-Grid (V2G), and flexible loads is constructed, incorporating complex physical and operational constraints. Second, to overcome the “myopic” local optimality problem of traditional DRL in temporal arbitrage tasks, a potential-based reward shaping mechanism linked to future price trends is designed to guide the agent toward long-term optimal strategies. Finally, multi-dimensional comparative experiments and mechanism analyses are conducted in a simulated day-ahead electricity market. Simulation results demonstrate the following: (1) The proposed algorithm exhibits robust convergence stability and effectively handles stochastic noise in market prices and renewable generation. (2) Economically, the strategy significantly outperforms the rule-based strategy and remains highly competitive with the deterministic-optimization benchmark under perfect-information assumptions. (3) Mechanism analysis further reveals that the DRL agent breaks through the rigid logic of fixed thresholds, learning a non-linear dynamic game mechanism based on “Price-SOC” states, thereby achieving full-depth utilization of energy storage resources. This work provides an interpretable data-driven paradigm for intelligent VPP decision-making in uncertain environments. Full article
(This article belongs to the Special Issue Transforming Power Systems and Smart Grids with Deep Learning)
Show Figures

Figure 1

30 pages, 939 KB  
Article
AI-Driven Financial Solutions for Climate Resilience and Geopolitical Risk Mitigation in Low- and Middle-Income Countries
by Abdelrahman Mohamed Mohamed Saeed and Muhammad Ali
Economies 2026, 14(4), 134; https://doi.org/10.3390/economies14040134 - 10 Apr 2026
Viewed by 428
Abstract
Climate change disproportionately threatens low- and middle-income countries, yet integrated assessments combining socio-economic fragility with physical hazards remain limited. This study quantifies multi-dimensional climate vulnerability and derives optimized adaptation policies for six representative nations (Bangladesh, Colombia, Kenya, Morocco, Pakistan, Vietnam) by fusing socio-economic [...] Read more.
Climate change disproportionately threatens low- and middle-income countries, yet integrated assessments combining socio-economic fragility with physical hazards remain limited. This study quantifies multi-dimensional climate vulnerability and derives optimized adaptation policies for six representative nations (Bangladesh, Colombia, Kenya, Morocco, Pakistan, Vietnam) by fusing socio-economic indicators with climate risk data (2000–2024). A computational framework integrating unsupervised learning, dimensionality reduction, and predictive modeling was employed. Principal Component Analysis synthesized eight indicators into a Compound Vulnerability Score (CVS), while K-Means and DBSCAN identified distinct vulnerability regimes. XGBoost quantified driver importance, and Graph Neural Networks captured systemic interconnections. XGBoost identified projected drought risk (31.2%), precipitation change (18.1%), and poverty headcount (14.3%) as primary drivers. Graph networks demonstrated significant risk amplification in African nations (Morocco SRS: 0.728–0.874; Kenya SRS: 0.504–0.641) versus damping in Asian countries. A Reinforcement Learning (RL) agent was trained using Deep Q-Networks with experience replay to optimize intervention portfolios under budget constraints. The RL policy achieved a 23% reduction in systemic risk compared to uniform allocation baselines, generating context-specific priorities: drought management for Morocco (score 50) and Pakistan (40); poverty alleviation for Kenya (40); coastal protection for Bangladesh (40); agricultural resilience for Vietnam (35); and institutional capacity building for Colombia (50). In conclusion, socio-economic fragility non-linearly amplifies climate hazards, with poverty and drought risk constituting critical vulnerability multipliers. The AI-driven framework demonstrates that targeted interventions in high-sensitivity systems maximize systemic risk reduction. This integrated approach provides a replicable, evidence-based foundation for strategic adaptation finance allocation in an increasingly uncertain climate future. Full article
(This article belongs to the Special Issue Energy Consumption, Financial Development and Economic Growth)
Show Figures

Figure 1

39 pages, 6294 KB  
Article
Human-Assisted Deep Reinforcement Learning (HADRL) for Multi-Objective Tram Optimisation Problem
by Moneeb Ashraf, Stuart Hillmansen and Ning Zhao
Appl. Sci. 2026, 16(8), 3683; https://doi.org/10.3390/app16083683 - 9 Apr 2026
Viewed by 215
Abstract
Reducing traction energy in urban rail systems while preserving safety, punctuality, and passenger comfort remains challenging. Additionally, route-level tram studies that train deep reinforcement learning (DRL) policies using Operational Train Monitoring Recorder (OTMR) logs and benchmark them across multiple objectives remain limited. This [...] Read more.
Reducing traction energy in urban rail systems while preserving safety, punctuality, and passenger comfort remains challenging. Additionally, route-level tram studies that train deep reinforcement learning (DRL) policies using Operational Train Monitoring Recorder (OTMR) logs and benchmark them across multiple objectives remain limited. This study develops and evaluates a Human-Assisted Deep Reinforcement Learning (HADRL) framework for multi-objective tram control in an OTMR-grounded simulation. Two HADRL agents were trained using a human-assistance action mapping: a standard Proximal Policy Optimisation (PPO) baseline and a recurrent, history-augmented PPO. Their performance was compared against that of four human drivers using indices for speed-limit compliance, schedule deviation, traction energy, jerk-based comfort, and stopping accuracy. These performance measures were aggregated using the Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) with both equal and entropy-derived weights. Both HADRL agents reproduce the characteristic accelerate–coast–brake driving pattern, reduce traction energy relative to all human baselines, and achieve near-complete speed-limit compliance, all while remaining within the specified schedule-deviation and comfort thresholds. TOPSIS yields identical rankings under both weighting schemes, with Multi-Objective Tram Operation Non-Stationary Proximal Policy Optimisation (MOTO-NSPPO, a recurrent, history-augmented PPO) ranked first and PPO second. Full article
Show Figures

Figure 1

36 pages, 7325 KB  
Article
Intelligent Scheduling of Rail-Guided Shuttle Cars via Deep Reinforcement Learning Integrating Dynamic Graph Neural Networks and Transformer Model
by Fang Zhu and Shanshan Peng
Algorithms 2026, 19(4), 289; https://doi.org/10.3390/a19040289 - 8 Apr 2026
Viewed by 215
Abstract
With the rapid development of e-commerce and smart manufacturing, automated warehouse systems have become critical infrastructure for modern logistics. In China’s vast market, the dynamic scheduling of Rail-Guided Vehicles (RGVs) faces significant challenges due to complex task uncertainties, hierarchical supply chain structures, and [...] Read more.
With the rapid development of e-commerce and smart manufacturing, automated warehouse systems have become critical infrastructure for modern logistics. In China’s vast market, the dynamic scheduling of Rail-Guided Vehicles (RGVs) faces significant challenges due to complex task uncertainties, hierarchical supply chain structures, and real-time collision avoidance requirements. Traditional rule-based methods and static optimization models often fail to adapt to such dynamic environments. To address these issues, this paper proposes a novel hybrid deep reinforcement learning framework integrating a Dynamic Graph Neural Network (DGNN) and a Transformer model. The DGNN captures the spatiotemporal dependencies of the warehouse network topology, while the Transformer mechanism enhances long-range feature extraction for task prioritization. Furthermore, we design a centralized Deep Q-network (DQN) framework with parameterized action spaces to coordinate multiple RGVs collaboratively. While the system manages multiple physical vehicles, the learning architecture employs a single-agent global scheduler to avoid the non-stationarity issues inherent in multi-agent reinforcement learning. Experimental results based on real-world data from a large-scale electronics manufacturing warehouse demonstrate that our method reduces average task completion time by 18.5% and improves system throughput by 22.3% compared to state-of-the-art baselines. The proposed approach demonstrates potential for intelligent warehouse management in dynamic industrial scenarios. Full article
Show Figures

Figure 1

25 pages, 1501 KB  
Article
MA-JTATO: Multi-Agent Joint Task Association and Trajectory Optimization in UAV-Assisted Edge Computing System
by Yunxi Zhang and Zhigang Wen
Drones 2026, 10(4), 267; https://doi.org/10.3390/drones10040267 - 7 Apr 2026
Viewed by 425
Abstract
With the rapid development of applications such as smart cities and the industrial internet, the computation-intensive tasks generated by massive sensing devices pose significant challenges to traditional cloud computing paradigms. Unmanned aerial vehicle (UAV)-assisted edge computing systems, leveraging their high mobility and wide-area [...] Read more.
With the rapid development of applications such as smart cities and the industrial internet, the computation-intensive tasks generated by massive sensing devices pose significant challenges to traditional cloud computing paradigms. Unmanned aerial vehicle (UAV)-assisted edge computing systems, leveraging their high mobility and wide-area coverage capabilities, offer an innovative architecture for low-latency and highly reliable edge services. However, the practical deployment of such systems faces a highly complex multi-objective optimization problem featured by the tight coupling of task offloading decisions, UAV trajectory planning, and edge server resource allocation. Conventional optimization methods are difficult to adapt to the dynamic and high-dimensional characteristics of this problem, leading to suboptimal system performance. To address this critical challenge, this paper constructs an intelligent collaborative optimization framework for UAV-assisted edge computing systems and formulates the system quality of service (QoS) optimization problem as a mixed-integer non-convex programming problem with the dual objectives of minimizing task processing latency and reducing overall system energy consumption. A multi-agent joint task association and trajectory optimization (MA-JTATO) algorithm based on hybrid reinforcement learning is proposed to solve this intractable problem, which innovatively decouples the original coupled optimization problem into three interrelated subproblems and realizes their collaborative and efficient solution. Specifically, the Advantage Actor-Critic (A2C) algorithm is adopted to realize dynamic and optimal task association between UAVs and edge servers for discrete decision-making requirements; the multi-agent deep deterministic policy gradient (MADDPG) method is employed to achieve cooperative and energy-efficient trajectory planning for multiple UAVs to meet the needs of continuous control in dynamic environments; and convex optimization theory is applied to obtain a closed-form optimal solution for the efficient allocation of computational resources on edge servers. Simulation results demonstrate that the proposed MA-JTATO algorithm significantly outperforms traditional baseline algorithms in enhancing overall QoS, effectively validating the framework’s superior performance and robustness in dynamic and complex scenarios. Full article
(This article belongs to the Section Drone Communications)
Show Figures

Figure 1

16 pages, 1553 KB  
Article
Research on the Collaborative Optimization Method of Power Prediction and DRL Control
by Mengjie Li, Yongbao Liu and Xing He
Processes 2026, 14(7), 1150; https://doi.org/10.3390/pr14071150 - 3 Apr 2026
Viewed by 259
Abstract
This paper proposes a collaborative energy management strategy based on power prediction and deep reinforcement learning (DRL) to address the trade-offs among economic efficiency, durability, and dynamic performance in fuel cell hybrid power systems (FCHPS) under dynamic driving conditions. First, a hybrid prediction [...] Read more.
This paper proposes a collaborative energy management strategy based on power prediction and deep reinforcement learning (DRL) to address the trade-offs among economic efficiency, durability, and dynamic performance in fuel cell hybrid power systems (FCHPS) under dynamic driving conditions. First, a hybrid prediction model termed LSTM-LSSVM with Cascade Correction (LSTM-LSSVM-CC) is developed. The cascade correction (CC) mechanism adopts a hierarchical structure to capture both low-frequency steady-state trends and high-frequency dynamic fluctuations, which are typically challenging for single models to represent. By integrating an online residual correction mechanism, this model generates accurate future power demand sequences. Second, a Dynamic Spatio-Temporal Fusion (DSTF) method is introduced to construct a high-dimensional DRL state space. This approach integrates predicted data, historical residuals, and real-time system states, enabling the agent to perform anticipatory decision-making. Third, a Dynamic Hierarchical Adaptive Multi-Objective Optimization Framework (DHAMOF) is designed. This framework dynamically adjusts objective weights and constraint boundaries based on real-time operating characteristics, enabling adaptive switching of optimization priorities across diverse scenarios. Furthermore, a closed-loop control architecture comprising “prediction–decision–execution–feedback” is established. By incorporating rolling horizon optimization and a proportional-integral (PI) residual compensation mechanism, the proposed architecture effectively suppresses prediction error accumulation and mitigates communication delays. Simulation results under combined CLTC-P and WLTP driving cycles demonstrate that, compared to conventional fixed-weight strategies, the proposed method achieves an 11.3% reduction in hydrogen consumption, a 30.9% decrease in SOC fluctuation range, and a 55.3% reduction in power tracking error. Moreover, under disturbance scenarios involving prediction errors, sensor noise, and a 200 ms communication delay, the system exhibits superior robustness: the increase in hydrogen consumption is limited to within 8.3 g/100 km, and the power tracking error is reduced by 65.6% relative to uncorrected baselines. This collaborative optimization approach overcomes the limitations of traditional open-loop prediction and fixed-weight control, offering a novel technical pathway for the high-efficiency and stable operation of fuel cell hybrid power systems. Full article
(This article belongs to the Special Issue Recent Advances in Fuel Cell Technology and Its Application Process)
Show Figures

Figure 1

22 pages, 5390 KB  
Article
Joint Optimization of Time Slot and Power Allocation in Underwater Acoustic Communication Networks
by Xuan Geng and Yongkang Hu
Sensors 2026, 26(7), 2188; https://doi.org/10.3390/s26072188 - 1 Apr 2026
Viewed by 412
Abstract
This paper proposes a joint optimization algorithm based on reinforcement learning to address the time slot and power allocation problem in underwater acoustic communication networks (UACNs). By maximizing the total capacity of successful transmissions as the optimization objective, two sub-objectives are formulated corresponding [...] Read more.
This paper proposes a joint optimization algorithm based on reinforcement learning to address the time slot and power allocation problem in underwater acoustic communication networks (UACNs). By maximizing the total capacity of successful transmissions as the optimization objective, two sub-objectives are formulated corresponding to time-slot scheduling and power allocation. The sub-objective corresponding to time-slot scheduling is addressed by constructing a Markov Decision Process (MDP) model based on Deep Q-Network (DQN) learning. In this model, the agent learns the time slot allocation policy with the goal of increasing the number of successfully transmitted links while reducing the collision. For the sub-objective corresponding to power allocation, another MDP model is developed, solved by the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm, in which each underwater transmission node acts as an independent agent. The MADDPG approach enables the system to improve channel capacity under energy limitation, which maximizes the total capacity of successfully transmitted links. In terms of model execution, the DQN adopts a centralized training and time slot allocation, while MADDPG uses a centralized training and distributed execution to select the transmission power by each node. Simulation results show that the proposed joint optimization algorithm demonstrates better performance in the number of successfully transmitted links and channel capacity compared to TDMA, Slotted ALOHA, and other algorithms. Full article
(This article belongs to the Special Issue Sensor Networks and Communication with AI)
Show Figures

Figure 1

23 pages, 2351 KB  
Article
A Spatio-Temporal Attention-Based Multi-Agent Deep Reinforcement Learning Approach for Collaborative Community Energy Trading
by Sheng Chen, Yong Yan, Jiahua Hu and Changsen Feng
Energies 2026, 19(7), 1730; https://doi.org/10.3390/en19071730 - 1 Apr 2026
Viewed by 329
Abstract
The high penetration of distributed energy resources (DERs) poses numerous challenges to community energy management, including intense source-load stochasticity, synchronized load surges triggered by multi-agent gaming, and potential privacy breaches. To tackle these issues, this paper proposes a coordinated energy trading framework driven [...] Read more.
The high penetration of distributed energy resources (DERs) poses numerous challenges to community energy management, including intense source-load stochasticity, synchronized load surges triggered by multi-agent gaming, and potential privacy breaches. To tackle these issues, this paper proposes a coordinated energy trading framework driven by an intermediate market-rate pricing mechanism. Within this framework, a novel Multi-Agent Transformer Proximal Policy Optimization (MATPPO) algorithm is developed, adopting an LSTM–Transformer hybrid architecture and the centralized training with decentralized execution (CTDE) paradigm. During centralized training, an LSTM network extracts temporal evolution features from source-load data to handle environmental uncertainty, while a Transformer-based self-attention mechanism reconstructs the dynamic agent topology to capture spatial correlations. In the decentralized execution phase, prosumers make independent decisions using only local observations. This eliminates the need to upload internal device states, significantly enhancing the privacy of sensitive local information during the online execution phase. Additionally, a parameter-sharing mechanism enables agents to share policy networks, significantly enhancing algorithmic scalability. Simulation results demonstrate that MATPPO effectively mitigates power peaks and reduces the transformer capacity pressure at the main grid interface. Furthermore, it significantly lowers total community electricity costs while maintaining high computational efficiency in large-scale scenarios. Full article
Show Figures

Figure 1

Back to TopTop