MDPI - Publisher of Open Access Journals

18 pages, 4743 KB

Open AccessArticle

Reinforcement Learning-Based Super-Twisting Sliding Mode Control for Maglev Guidance System

by Junqi Xu, Wenshuo Wang, Chen Chen, Lijun Rong, Wen Ji and Zijian Guo

Actuators 2026, 15(3), 147; https://doi.org/10.3390/act15030147 - 3 Mar 2026

The high-speed Electromagnetic Suspension (EMS) maglev guidance system exhibits inherent characteristics of strong nonlinearity, parameter time-variation, and complex external disturbances. To further optimize and improve the control performance of the guidance system for high-speed maglev trains, a novel intelligent control strategy that integrates [...] Read more.

The high-speed Electromagnetic Suspension (EMS) maglev guidance system exhibits inherent characteristics of strong nonlinearity, parameter time-variation, and complex external disturbances. To further optimize and improve the control performance of the guidance system for high-speed maglev trains, a novel intelligent control strategy that integrates the Deep Deterministic Policy Gradient (DDPG) algorithm with Super-Twisting Sliding Mode Control (STSMC) is proposed. Focusing on a single-ended guidance unit with differential control of dual electromagnets, an STSMC controller is first designed based on a cascaded control framework. To overcome the limitation of offline parameter tuning in dynamic operational conditions, a reinforcement learning optimization framework employing DDPG is introduced. A multi-objective hybrid reward function is formulated, incorporating error convergence, sliding mode stability, and chattering suppression, thereby realizing the online self-tuning of core STSMC parameters via real-time interaction between the agent and the environment. Numerical simulations under typical disturbance conditions verify that the proposed DDPG-STSMC controller significantly reduces the amplitude of guidance gap variation and accelerates dynamic recovery compared to conventional PID control. Its superior performance in disturbance rejection, control accuracy, and operational adaptability is validated. This study, conducted through high-fidelity numerical simulations based on actual system parameters, provides a robust theoretical foundation for subsequent hardware-in-the-loop (HIL) experimentation. Full article

(This article belongs to the Special Issue Advanced Theory and Application of Magnetic Actuators—3rd Edition)

► Show Figures

Figure 1

23 pages, 13416 KB

Open AccessArticle

An Adaptive Ensemble Model Based on Deep Reinforcement Learning for the Prediction of Step-like Landslide Displacement

by Tengfei Gu, Lei Huang, Shunyao Tian, Zhichao Zhang, Huan Zhang and Yanke Zhang

Remote Sens. 2026, 18(5), 761; https://doi.org/10.3390/rs18050761 - 3 Mar 2026

Abstract

Accurate prediction of landslide displacement is crucial for hazard prevention. However, recurrent neural network (RNN) models have limitations in simultaneously capturing lag time and feature importance, and their black-box nature limits their interpretability. Moreover, the performance of single models varies across different deformation [...] Read more.

Accurate prediction of landslide displacement is crucial for hazard prevention. However, recurrent neural network (RNN) models have limitations in simultaneously capturing lag time and feature importance, and their black-box nature limits their interpretability. Moreover, the performance of single models varies across different deformation stages, especially during acceleration. To address these challenges, we propose an interpretable deep reinforcement learning-based adaptive ensemble (DRL-AE) framework. The method employs Seasonal and Trend decomposition using Loess to separate cumulative displacement into trend and periodic components. Trend and periodic sequences are predicted using double exponential smoothing and three RNN variants, respectively. An improved Convolutional Block Attention Module (ICBAM) enhances periodic feature extraction and provides temporal–spatial interpretability. The Deep Deterministic Policy Gradient algorithm adaptively integrates multi-model predictions in response to evolving environmental conditions. To validate the DRL-AE, a case study is conducted on the Baijiabao landslide in Zigui County, China. The results indicate that the DRL-AE substantially enhances prediction accuracy. For periodic displacement, it reduces MAE by 10.02% and RMSE by 6.65%, and increases R² by 4.27% compared with the ICBAM-GRU model. The results also confirm the effectiveness of ICBAM in feature extraction, and the generated heatmaps provide intuitive interpretability of the relevant triggering factors. Full article

► Show Figures

Figure 1

21 pages, 1099 KB

Open AccessArticle

Low-Latency Holographic Video Transmission in Indoor VLC Networks Assisted by Rotatable Photodetectors

by Wenzhe Wang and Long Zhang

Future Internet 2026, 18(3), 129; https://doi.org/10.3390/fi18030129 - 2 Mar 2026

Viewed by 33

Abstract

As a next-generation immersive service, holographic video enables users to move freely within a virtual world. This imposes stringent requirements on wireless networks. Given the massive bandwidth capacity inherent to visible light, visible light communication (VLC) can effectively meet the transmission requirements of [...] Read more.

As a next-generation immersive service, holographic video enables users to move freely within a virtual world. This imposes stringent requirements on wireless networks. Given the massive bandwidth capacity inherent to visible light, visible light communication (VLC) can effectively meet the transmission requirements of holographic video and is an ideal wireless technology for next-generation indoor immersive services. However, VLC channels are highly dependent on Line-of-Sight (LoS) links. Due to user mobility, traditional VLC systems relying on fixed-orientation Photodetectors (PDs) often suffer from severe channel fading, which significantly degrades the transmission performance. In this paper, we propose an indoor VLC holographic video transmission architecture supporting rotatable PDs, utilizing rotatable PDs mounted on Head-Mounted Displays (HMDs) to assist in holographic video transmission. To minimize the total transmission delay of all users, we address the holographic video transmission problem by jointly optimizing the transmit power allocation of VLC Access Points (APs) and the pitch and roll angles of the users’ PDs. By formulating the problem as a Markov Decision Process (MDP), we address it using a novel Deep Reinforcement Learning (DRL) strategy leveraging the Soft Actor–Critic (SAC) architecture. Simulation results demonstrate that the proposed scheme reduces the overall latency by up to 29.6% compared to the benchmark schemes. Furthermore, the convergence speed of the algorithm is improved by 35% compared to traditional deep reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG). Full article

► Show Figures

Graphical abstract

25 pages, 5606 KB

Open AccessArticle

Health-Aware Differentiated Energy Management for Multi-Stack Fuel Cell Hybrid Power Systems on Ships

by Lin Zhu, Yancheng Liu, Haohao Guo and Siyuan Liu

J. Mar. Sci. Eng. 2026, 14(5), 460; https://doi.org/10.3390/jmse14050460 - 28 Feb 2026

Viewed by 69

Abstract

This study proposes a health-aware energy management strategy based on the twin delayed deep deterministic policy gradient (TD3) algorithm for hybrid fuel cell/battery-powered ships. Unlike traditional approaches that treat multiple fuel cell stacks as homogeneous units, this strategy innovatively implements differentiated power allocation [...] Read more.

This study proposes a health-aware energy management strategy based on the twin delayed deep deterministic policy gradient (TD3) algorithm for hybrid fuel cell/battery-powered ships. Unlike traditional approaches that treat multiple fuel cell stacks as homogeneous units, this strategy innovatively implements differentiated power allocation based on the real-time state of health of each stack. The research first validates the superiority of the TD3 framework over the deep Q-learning framework at the algorithmic level. Further comparative experiments conducted across three scenarios with varying degrees of state of health differences show that, compared to the TD3 baseline strategy employing average power allocation, the health-aware differentiated TD3 strategy significantly reduces the total voyage cost of the system, with the cost-saving effect becoming more pronounced as the state of health disparity between stacks increases. Additionally, by incorporating rule-based constraints, the convergence speed of the TD3 algorithm is effectively enhanced, improving its feasibility for real-time control. Tests under dynamic and fluctuating load conditions further confirm the strategy’s effectiveness and applicability. In summary, the health-aware TD3 strategy proposed in this study not only provides an efficient and reliable energy management solution for hybrid-powered ships but also promotes the application of machine learning in the field of ship energy management. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

31 pages, 2520 KB

Open AccessArticle

Parameterized Reinforcement Learning with Route Guidance for Controlling Urban Road Traffic Networks

by Edwin M. Kataka, Thomas O. Olwal, Karim Djouani and Prosper Z. Sotenga

Future Transp. 2026, 6(2), 56; https://doi.org/10.3390/futuretransp6020056 - 28 Feb 2026

Viewed by 49

Abstract

Traditional macroscopic fundamental diagram (MFD)-based traffic perimeter metering control strategies rely on full knowledge of vehicle accumulation and inter-regional flow dynamics, assumptions that seldom hold in heterogeneous and highly variable real-world networks. Classical data-driven reinforcement learning methods face similar constraints, often converging slowly [...] Read more.

Traditional macroscopic fundamental diagram (MFD)-based traffic perimeter metering control strategies rely on full knowledge of vehicle accumulation and inter-regional flow dynamics, assumptions that seldom hold in heterogeneous and highly variable real-world networks. Classical data-driven reinforcement learning methods face similar constraints, often converging slowly and exhibiting low sample efficiency when confronted with such complexities. Motivated by these limitations, this paper proposes a Parameterized Deep Q-Network perimeter control (P-DQNPC) scheme designed for multi-region urban road networks. The framework jointly optimizes discrete actions (regional routing choices) and continuous actions (signal-timing or flow-duration regulation) within a model-free learning structure. The approach is first trained and validated on synthetic MFD data to establish stable and interpretable policy behavior under controlled conditions. It is then transferred and further evaluated using real-world measurements from the Performance Measurement System—San Francisco Bay Area (PeMS-SF), a dataset collected from 18,954 loop detectors across the California State Highway System. PeMS-SF is selected due to its high spatial and temporal resolution, broad network coverage, and strong ability to capture realistic and diverse congestion patterns qualities that support both rigorous validation and generalization to other metropolitan regions. Experimental results show that P-DQNPC consistently outperforms state-of-the-art baselines, including deep deterministic policy gradient, deep Q-network, and No-Control schemes. The proposed method achieves superior regulation of regional accumulations and demonstrates enhanced robustness in large, heterogeneous, and uncertain urban traffic environments. Full article

► Show Figures

Figure 1

20 pages, 3606 KB

Open AccessArticle

Autonomous Navigation of an Unmanned Underwater Vehicle via Safe Reinforcement Learning and Active Disturbance Rejection Control

by Qinze Chen, Yun Cheng, Yinlong Yuan and Liang Hua

J. Mar. Sci. Eng. 2026, 14(5), 425; https://doi.org/10.3390/jmse14050425 - 25 Feb 2026

Viewed by 157

Abstract

A two-layer control framework for unmanned underwater vehicle (UUV) navigation is proposed, combining a lower-layer active disturbance rejection controller (ADRC) with an upper-layer safe reinforcement learning (RL) policy for obstacle-avoidance navigation. The lower layer, utilizing ADRC, ensures high tracking accuracy and effective disturbance [...] Read more.

A two-layer control framework for unmanned underwater vehicle (UUV) navigation is proposed, combining a lower-layer active disturbance rejection controller (ADRC) with an upper-layer safe reinforcement learning (RL) policy for obstacle-avoidance navigation. The lower layer, utilizing ADRC, ensures high tracking accuracy and effective disturbance rejection, while the upper layer integrates the twin delayed deep deterministic policy gradient (TD3) algorithm, combined with a control barrier function (CBF)-based quadratic programming (QP) safety filter and safety-inspired reward shaping (SR). The method is evaluated in two simulation studies: (i) velocity and attitude control to assess tracking and disturbance rejection, and (ii) obstacle-avoidance navigation to assess learning efficiency, trajectory smoothness, and safety-related metrics. Simulation results show that ADRC achieves faster tracking and stronger disturbance rejection than a conventional proportional–integral–derivative (PID) controller. Moreover, the proposed TD3 + QP + SR scheme exhibits faster learning, smoother trajectories, and improved safety performance compared with RL baselines. These results indicate that the proposed framework enables efficient and safe UUV navigation in simulation scenarios with obstacles and disturbances. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

31 pages, 6983 KB

Open AccessArticle

Multi-Agent Deep Deterministic Policy Gradient-Based Coordinated Control for Urban Expressway Entrance–Arterial Interfaces

by Shunchao Wang, Zhigang Wu and Wangzi Yu

Systems 2026, 14(3), 231; https://doi.org/10.3390/systems14030231 - 25 Feb 2026

Viewed by 123

Abstract

Coordinated control of ramp metering, variable speed limits, and intersection signals is critical for mitigating congestion and enhancing efficiency at urban expressway–arterial interfaces. Existing strategies often operate in isolation, leading to fragmented responses and limited adaptability under heterogeneous traffic demands. This study develops [...] Read more.

Coordinated control of ramp metering, variable speed limits, and intersection signals is critical for mitigating congestion and enhancing efficiency at urban expressway–arterial interfaces. Existing strategies often operate in isolation, leading to fragmented responses and limited adaptability under heterogeneous traffic demands. This study develops a multi-agent reinforcement learning framework based on MADDPG to achieve cooperative decision-making across heterogeneous controllers. An asynchronous control cycle mechanism is designed to accommodate different temporal requirements of ramp meters, speed limits, and signal controllers, ensuring practical feasibility in real-time operations. A conflict-aware reward design further embeds density regulation, speed harmonization, and spillback prevention to stabilize flow dynamics. Simulation experiments on a calibrated urban network demonstrate that the proposed framework delays congestion onset, reduces shockwave propagation, and improves throughput compared with classical benchmarks. In particular, at the mainline merge, average travel time is reduced to 13.56 s (62.4% of VSL-only); at the ramp, occupancy is lowered to 6.4% (40.6% of ALINEA); and at the signalized approach, average delay decreases to 85.71 s (62.7% of actuated control). These results highlight the scalability and deployment potential of the proposed cooperative control approach for system-level traffic management in mixed traffic environments. Full article

► Show Figures

Figure 1

20 pages, 1580 KB

Open AccessArticle

An Intelligent Two-Stage Dispatch Framework for Cost and Carbon Reduction in Multi-Energy Virtual Power Plants

by Haochen Ni, Yonghua Wang, Xinfa Tang and Jingjing Wang

Processes 2026, 14(5), 743; https://doi.org/10.3390/pr14050743 - 25 Feb 2026

Viewed by 147

Abstract

To address the challenge of coordinating economic and environmental objectives for Multi-energy Virtual Power Plants (MEVPPs), particularly under ambitious decarbonization policies such as China’s “dual carbon” goals, this paper proposes a novel two-stage scheduling framework that integrates Deep Reinforcement Learning (DRL) with Model [...] Read more.

To address the challenge of coordinating economic and environmental objectives for Multi-energy Virtual Power Plants (MEVPPs), particularly under ambitious decarbonization policies such as China’s “dual carbon” goals, this paper proposes a novel two-stage scheduling framework that integrates Deep Reinforcement Learning (DRL) with Model Predictive Control (MPC). The core innovations include the following: (1) high-fidelity physical models capturing wind turbulence correction, photovoltaic temperature-irradiation coupling, and state-of-charge-dependent energy storage efficiency, improving equipment dynamic characterization accuracy by 12.7% compared to conventional models; (2) an enhanced Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm incorporating priority experience replay and adaptive noise exploration, which accelerates convergence by 15.6%; (3) a pioneering coordination architecture of “Day-Ahead MADDPG—Real-Time MPC” that manages uncertainties through bidirectional feedback, where real-time deviations refine the long-term policy via experience replay. Simulation results using historical data from a North China industrial park demonstrate that the framework reduces operating costs by 13.3% and carbon emissions by 17.7% compared to particle swarm optimization, outperforms standard DDPG with 3.2% lower operating costs, 5.8% lower carbon emissions, and a 3.3% higher renewable utilization rate (88.6%), and achieves 55% renewable penetration with only 4.1% curtailment. These results validate the framework’s scalability for high-renewable penetration grids and its real-time feasibility, as confirmed by edge computing deployment with latency below 50 ms. This study offers a technically viable and scalable solution for the operation of low-carbon virtual power plants (VPPs), supporting the transition towards sustainable power systems. Full article

(This article belongs to the Section AI-Enabled Process Engineering)

► Show Figures

Figure 1

26 pages, 7673 KB

Open AccessArticle

Deep Deterministic Policy Gradient-Based Parameter Adaptation for Synchronous Sliding-Mode Control with Time-Delay Estimation in Dual-Arm Robot Manipulators Under System Uncertainties

by Duc Thien Tran, Thanh Nha Nguyen, Thi Kim Tram Huynh and Kyoung Kwan Ahn

Appl. Sci. 2026, 16(4), 2042; https://doi.org/10.3390/app16042042 - 19 Feb 2026

Viewed by 190

Abstract

This paper presents a synchronous sliding-mode control with time-delay estimation (SSMC-TDE)-based adaptive control framework for coordinated motion control of dual-arm robotic manipulators operating under system uncertainties. The baseline SSMC-TDE scheme is constructed using synchronization and cross-coupling errors to ensure precise coordinated motion among [...] Read more.

This paper presents a synchronous sliding-mode control with time-delay estimation (SSMC-TDE)-based adaptive control framework for coordinated motion control of dual-arm robotic manipulators operating under system uncertainties. The baseline SSMC-TDE scheme is constructed using synchronization and cross-coupling errors to ensure precise coordinated motion among robot joints, while sliding-mode control effectively handles strong nonlinearities, and the time-delay estimation technique approximates lumped uncertainties arising from external disturbances, modeling errors, and payload variations. The stability of the closed-loop system is rigorously analyzed and guaranteed using the Lyapunov theory. To overcome performance degradation caused by manually tuned control gains, a deep reinforcement learning-assisted parameter adaptation mechanism is integrated into the SSMC-TDE structure. Specifically, a Deep Deterministic Policy Gradient (DDPG) algorithm is employed to adapt selected control gains online through a reward function designed to simultaneously enhance motion synchronization and reduce trajectory-tracking errors, while preserving the stability properties of the underlying controller. Simulation studies are conducted within a co-simulation framework integrating MATLAB/Simulink and ROS/Gazebo for a dual-arm robotic platform. Quantitative evaluations based on the root mean square error (RMSE) of trajectory-tracking and synchronization errors across all six joints demonstrate that, averaged over both scenarios, the proposed DDPG-assisted SSMC-TDE achieves an overall RMSE reduction of 35.52% and 99.3% compared with conventional SSMC and SSMC-TDE controllers, respectively, confirming its superior performance and robustness under system uncertainties. Full article

(This article belongs to the Special Issue Advanced Robotics, Mechatronics, and Automation)

► Show Figures

Figure 1

35 pages, 9343 KB

Open AccessArticle

Collaborative Control of Rear-Wheel Independent Drive Electric Vehicles During Tire Blowouts Using Broad-Extreme Reinforcement Learning: Simulation and Scaled Prototype Verification

by Xiaozheng Wang, Pak Kin Wong, Hengli Qi, Shiron Thalagala, Ziqi Yang, Jingyu Lu and Wei Huang

Vehicles 2026, 8(2), 40; https://doi.org/10.3390/vehicles8020040 - 18 Feb 2026

Viewed by 244

Abstract

Tire blowouts represent one of the most hazardous fault scenarios for electric vehicles (EVs). While collaborative active steering control (ASC) and direct yaw moment control (DYC) can theoretically maintain stability during these events, the strong coupling effects between them make controller design challenging. [...] Read more.

Tire blowouts represent one of the most hazardous fault scenarios for electric vehicles (EVs). While collaborative active steering control (ASC) and direct yaw moment control (DYC) can theoretically maintain stability during these events, the strong coupling effects between them make controller design challenging. To address this, an adaptive control algorithm based on broad-extreme reinforcement learning (RL), named broad critic extreme actor (BCEA), is proposed. Compared to traditional controllers, the proposed BCEA architecture is simpler to design and demonstrates enhanced robustness. Crucially, it achieves significantly faster training speed than traditional RL methods such as deep deterministic policy gradient (DDPG). Both simulation and scaled prototype tests verify the ability of the BCEA-based controller to maintain vehicle stability during different types of tire blowout scenarios. Furthermore, compared to traditional RL methods, the training efficiency is improved by more than 80%. These results indicate that the proposed BCEA controller is a promising advancement for vehicle stability control under critical failure conditions. Full article

(This article belongs to the Topic Vehicle Dynamics and Control, 2nd Edition)

► Show Figures

Figure 1

30 pages, 3711 KB

Open AccessArticle

An RNN-Enhanced Diverse Curriculum-Driven Learning Algorithm Based on Deep Reinforcement Learning for POMDPs with Limited Experience

by Ke Li, Kun Zhang, Ziqi Wei, Haiyin Piao, Binlin Yuan, Boxuan Wang and Jiangbo Cheng

Drones 2026, 10(2), 142; https://doi.org/10.3390/drones10020142 - 17 Feb 2026

Viewed by 272

Abstract

Autonomous flight is a critical capability for unmanned aerial vehicles (UAVs), enabling applications in wildlife and plant protection, infrastructure inspection, search and rescue, and other complex missions. Although some learning-based methods have achieved considerable progress, traditional algorithms still struggle with real-world challenges, due [...] Read more.

Autonomous flight is a critical capability for unmanned aerial vehicles (UAVs), enabling applications in wildlife and plant protection, infrastructure inspection, search and rescue, and other complex missions. Although some learning-based methods have achieved considerable progress, traditional algorithms still struggle with real-world challenges, due to the partially observable nature of environments and limited experience regarding the properties of dynamic unknown environments where threats and targets are movable and unpredictable. To address these difficulties, it is necessary to achieve autonomous guidance for UAVs performing long-range missions in dynamic environments (LRGDEs), and to develop a novel end-to-end algorithm that can overcome partial observability under limited state transitions. In this paper, we propose an RNN-enhanced Diverse Curriculum-driven Learning Algorithm (REDCRL) based on deep reinforcement learning. We modify the structure of traditional actor–critic networks and introduce Bi-LSTM into policy networks (referred to as Bi-LSTM-modified Policy Networks (BLPNs)) to alleviate observation incompleteness. Furthermore, to fully exploit the potential value of data and mitigate the problem of insufficient samples, we develop an Adaptive Multi-Feature Evaluation Experience Replay (AMFER) method to reshape the process of experience replay buffer construction and sampling. In addition, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is adopted to optimize UAV-maneuver decision policies. Compared with traditional algorithms, the proposed algorithm can accelerate policy convergence and improve the performance of the trained policy. Full article

(This article belongs to the Special Issue Advances in AI Large Models for Unmanned Aerial Vehicles)

► Show Figures

Figure 1

18 pages, 2972 KB

Open AccessArticle

Control Strategy for LLC Resonant Converter Based on TD3 Algorithm

by Xin Pan, Peng Chen and Jianfeng Zhao

Modelling 2026, 7(1), 39; https://doi.org/10.3390/modelling7010039 - 13 Feb 2026

Viewed by 198

Abstract

To address the limited dynamic voltage regulation performance of LLC resonant converters under wide input voltage and load variations, a reinforcement learning-based voltage control strategy is proposed in this paper. The twin delayed deep deterministic policy gradient (TD3) algorithm is adopted to learn [...] Read more.

To address the limited dynamic voltage regulation performance of LLC resonant converters under wide input voltage and load variations, a reinforcement learning-based voltage control strategy is proposed in this paper. The twin delayed deep deterministic policy gradient (TD3) algorithm is adopted to learn the nonlinear mapping between system states and control actions, enabling adaptive adjustment of the converter operating parameters. Based on the established LLC resonant converter simulation model, the state space, action space, and reward function of the agent are designed to ensure rapid control response to abrupt changes in input voltage and load. Compared with the conventional PI control strategy, the proposed TD3-based strategy provides faster control actions during operating condition transitions, effectively suppressing output voltage overshoot and undershoot, and shortening the settling time. Simulation results verify that the proposed method achieves improved dynamic response performance under various operating conditions, demonstrating its effectiveness and superiority in LLC resonant converter voltage regulation. Full article

► Show Figures

Figure 1

28 pages, 3958 KB

Open AccessArticle

Co-Optimization of Cooperative Adaptive Cruise Control and Energy Management for Plug-in Hybrid Electric Truck Platoons

by Xin Liu, Dong Mai, Jun Mao, Gang Zhang, Xiangning Wu and Yanmei Meng

Energies 2026, 19(4), 935; https://doi.org/10.3390/en19040935 - 11 Feb 2026

Viewed by 169

Abstract

To optimize fuel economy for platooning plug-in hybrid electric trucks, this paper proposes a co-optimization framework that integrates cooperative adaptive cruise control and energy management to enhance driving safety and fuel efficiency in complex traffic environments. The control strategy is divided into two [...] Read more.

To optimize fuel economy for platooning plug-in hybrid electric trucks, this paper proposes a co-optimization framework that integrates cooperative adaptive cruise control and energy management to enhance driving safety and fuel efficiency in complex traffic environments. The control strategy is divided into two layers: in the upper layer, a cooperative adaptive cruise control model based on distributed model predictive control (DMPC) is used to achieve stable platoon following and vehicle spacing, thus improving the overall platoon efficiency. In the lower layer, a distributed soft actor-critic (DSAC) algorithm is used for the fine-grained power distribution of plug-in hybrid electric trucks, enabling efficient energy utilization. The results demonstrate that this strategy significantly enhances the fuel economy and vehicle-following performance of plug-in hybrid truck platoons. Compared with the classical deep deterministic policy gradient (DDPG) algorithm, the energy management strategy based on the distributed soft actor-critic offers higher computational efficiency. Full article

► Show Figures

Figure 1

22 pages, 1345 KB

Open AccessArticle

Multi-UAVs Searching and Tracking for USV Swarm: A Center-Sub-Critics Reinforcement Learning Approach

by Ye Hou, Bo Li and Xueru Miao

Drones 2026, 10(2), 123; https://doi.org/10.3390/drones10020123 - 11 Feb 2026

Viewed by 226

Abstract

This work proposes a multiple unmanned aerial vehicles (UAVs) cooperative trajectory planning scheme constructed by multi-agent reinforcement learning with hybrid critics, improving the searching and tracking efficiency and fairness when the dynamic unmanned surface vehicle (USV) swarm exceeds the number of UAVs. A [...] Read more.

This work proposes a multiple unmanned aerial vehicles (UAVs) cooperative trajectory planning scheme constructed by multi-agent reinforcement learning with hybrid critics, improving the searching and tracking efficiency and fairness when the dynamic unmanned surface vehicle (USV) swarm exceeds the number of UAVs. A confidence map of targets’ existence probability with spatio-temporal decay is first established through a local information fusion mechanism based on Bayesian update theory. It leads to a reformulation of the problem model into a communication-enhanced partially observable Markov decision process. To suppress policy variance and credibility imbalance of the multi-UAVs, a center-sub-critics deep deterministic policy gradient algorithm is then proposed, combining multiple centralized critics with decentralized critics. Meanwhile, a segmented reward function is designed to incentivize the UAV to revisit detected targets. Finally, the simulation results compared with diverse baseline algorithms demonstrate the efficacy and scalability of the proposed scheme in this paper. Full article

(This article belongs to the Section Artificial Intelligence in Drones (AID))

► Show Figures

Figure 1

33 pages, 6485 KB

Open AccessArticle

Research on Energy Management Optimization for Hybrid-Powered Port Tugboat Systems Based on a Dual-Delay Deep Deterministic Policy Gradient Algorithm

by Zhao Li, Wuqiang Long and Hua Tian

Energies 2026, 19(4), 905; https://doi.org/10.3390/en19040905 - 9 Feb 2026

Viewed by 233

Abstract

To address the energy management challenge for methanol range-extended series hybrid systems in port tugboats, characterized by highly transient and intermittent operations, this study proposes a real-time energy management strategy based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. A high-fidelity [...] Read more.

To address the energy management challenge for methanol range-extended series hybrid systems in port tugboats, characterized by highly transient and intermittent operations, this study proposes a real-time energy management strategy based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. A high-fidelity forward simulation model was constructed and validated to train the TD3 agent. In simulations of typical port operation cycles, TD3 reduced methanol consumption by approximately 18.5%, 10.2%, and 7.3% compared to rule-based (RB), equivalent consumption minimization strategy (ECMS), and deep deterministic policy gradient (DDPG) approaches, respectively. Emissions such as NO_x and carbon dioxide (CO₂) were also significantly reduced, while maintaining superior battery state of charge (SOC). Its overall performance approximates global optimal (DP) performance with a gap of less than 2.5%, while retaining real-time online decision-making capability. Hardware-in-the-loop (HIL) testing further demonstrates that TD3 exhibits less than 1.8% performance degradation under actual communication and execution conditions, validating its engineering feasibility and deployment potential. This study provides methodological and experimental foundations for developing high-performance, low-emission, real-time energy management algorithms for port tugboats. Full article

(This article belongs to the Special Issue Sustainable and Low Carbon Development in the Energy Sector—2nd Edition)

► Show Figures

Figure 1

Search Results (674)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (674)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI