Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (177)

Search Parameters:
Keywords = twin-delayed deep deterministic policy gradient

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 6210 KB  
Article
Robust Path Planning via Deep Reinforcement Learning
by Daeyeol Kang, Jongyoon Park and Pileun Kim
Sensors 2026, 26(9), 2658; https://doi.org/10.3390/s26092658 - 24 Apr 2026
Abstract
Deep reinforcement learning (DRL) for autonomous mobile robot navigation faces several inherent limitations. The stochastic nature of actions generated by DRL policies can undermine performance consistency, while inefficient exploration frequently delays the learning process or prevents the discovery of optimal solutions. This research [...] Read more.
Deep reinforcement learning (DRL) for autonomous mobile robot navigation faces several inherent limitations. The stochastic nature of actions generated by DRL policies can undermine performance consistency, while inefficient exploration frequently delays the learning process or prevents the discovery of optimal solutions. This research aims to enhance the robustness of path planning by addressing these challenges. To achieve this goal, we propose a hybrid approach that integrates the flexible decision-making capabilities of deep reinforcement learning with the stability of traditional path planning. The proposed model adopts the Twin Delayed Deep Deterministic Policy Gradient (TD3) network as its base. Notably, we pre-process LiDAR point cloud data to extract only essential features for the state representation, thereby preventing performance degradation from high-dimensional inputs and improving computational efficiency. Our model optimizes the learning process through two core strategies. First, it prioritizes experience data generated during training based on negative rewards, guiding the model to learn more frequently from critical failures rather than redundant successes. Second, it dynamically compares the action proposed by the TD3 network with a goal-oriented action from a classical path-planning algorithm in real time. By selecting the action with the higher estimated value, the model guides the policy toward a stable and effective trajectory from the earliest stages of training. To validate the efficacy of our approach, we conducted simulation-based experiments comparing the performance of the proposed model with existing reinforcement learning networks. To ensure statistical significance and mitigate the impact of random initialization, all reported results are averaged over 10 independent runs with different random seeds. The results quantitatively demonstrate that our model achieves significantly higher and more stable reward values, confirming a robust improvement in the path-planning process. Full article
(This article belongs to the Special Issue Advancements in Autonomous Navigation Systems for UAVs)
31 pages, 1487 KB  
Article
Deep Reinforcement Learning-Based Dual-Loop Adaptive Control Method and Simulation for Loitering Munition Fuze
by Lingyun Zhang, Haojie Li, Chuanhao Zhang, Yuan Zhao, Shixiang Qiao and Hang Yu
Technologies 2026, 14(4), 239; https://doi.org/10.3390/technologies14040239 - 20 Apr 2026
Viewed by 124
Abstract
To address the poor adaptability and rigid initiation modes of the loitering munition fuze in complex environments and the inadequacy of single fuzzy control against strong interference, this paper proposes a dual-loop adaptive reconfiguration control method. The architecture integrates the Twin Delayed Deep [...] Read more.
To address the poor adaptability and rigid initiation modes of the loitering munition fuze in complex environments and the inadequacy of single fuzzy control against strong interference, this paper proposes a dual-loop adaptive reconfiguration control method. The architecture integrates the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm with fuzzy logic. The inner loop uses TD3 to dynamically optimize fuzzy scaling factors based on real-time interference and state deviations. Concurrently, the outer loop utilizes a Fuze Readiness Index (FRI) and a finite state machine to manage real-time multi-modal mission switching (e.g., proximity, delay, and airburst) and reverse safety-state conversions. Co-simulations under non-stationary composite interference show that the proposed method reduces the burst height RMSE by 82.4% and 61.6% compared with the fixed-threshold and standard fuzzy baselines under the considered non-stationary composite interference setting, respectively. The false alarm rate (FAR) is reduced to 0.15%, and the reconfiguration response time under sudden interference is shortened to 12 ms. Even under extreme conditions, such as 400 ms sensor signal loss, the relative error remains within 5%. These simulation results demonstrate the potential of the proposed architecture to improve precision, responsiveness, and robustness under dynamic interference conditions and show good robustness to intermittent observation loss within the simulated operating envelope. Full article
22 pages, 919 KB  
Article
Large Autonomous Driving Overtaking Decision and Control System Based on Hierarchical Reinforcement Learning
by Chen-Ning Wang and Xiuhui Tang
Electronics 2026, 15(8), 1711; https://doi.org/10.3390/electronics15081711 - 17 Apr 2026
Viewed by 156
Abstract
To address the bottlenecks of low sample efficiency and poor control accuracy in traditional single-layer reinforcement learning during autonomous driving overtaking, this paper proposes an overtaking decision and control system based on hierarchical reinforcement learning to decouple complex tasks in spatial and temporal [...] Read more.
To address the bottlenecks of low sample efficiency and poor control accuracy in traditional single-layer reinforcement learning during autonomous driving overtaking, this paper proposes an overtaking decision and control system based on hierarchical reinforcement learning to decouple complex tasks in spatial and temporal dimensions. A heterogeneous two-layer architecture is constructed, where the upper layer adopts the Proximal Policy Optimization algorithm to generate macroscopic discrete decisions, while the lower layer employs Twin Delayed Deep Deterministic Policy Gradient combined with Long Short-Term Memory to achieve smooth continuous control of steering and acceleration by perceiving temporal features of dynamic obstacles. A composite reward mechanism, integrating hard safety constraints and soft efficiency incentives, is designed to balance safety, efficiency, and comfort. Experimental results in complex scenarios with multiple interfering vehicles and random lane-changing behaviors demonstrate that the proposed system improves the training convergence speed by approximately 30% within 500,000 steps compared to single-layer algorithms. In tests across varying traffic densities, the system achieves a 98.3% success rate in medium-density scenarios with a collision rate of only 0.6%. In high-density challenges, the success rate remains above 95%, with the collision rate reduced by about 80% compared to baseline models. Furthermore, the lateral control deviation is strictly limited to within 0.2 m, and the longitudinal safety distance remains stable above 5 m. This system provides a robust, high-efficiency paradigm for autonomous overtaking. Full article
Show Figures

Figure 1

26 pages, 6083 KB  
Article
Gait Optimization Control of Spinal Quadruped Robot Based on Deep Reinforcement Learning
by Guozheng Song, Qinglin Ai, Lin Li, Xiaohang Shan, Chao Yang and Jianguo Yang
Sensors 2026, 26(8), 2407; https://doi.org/10.3390/s26082407 - 14 Apr 2026
Viewed by 272
Abstract
The spine enhances the flexibility of quadrupeds during locomotion. Inspired by this biological mechanism, this study incorporates an actuated spinal joint into a quadruped robot, enabling more natural motion and posture adjustment. To improve the motion stability of spinal robots in complex environments, [...] Read more.
The spine enhances the flexibility of quadrupeds during locomotion. Inspired by this biological mechanism, this study incorporates an actuated spinal joint into a quadruped robot, enabling more natural motion and posture adjustment. To improve the motion stability of spinal robots in complex environments, a deep reinforcement learning framework that integrates a central pattern generator (CPG) with the twin delayed deterministic policy gradient (TD3) algorithm is proposed to optimize the gait motion of the spinal quadruped robot. First, the structure and parameters of the quadruped robot with a spinal joint are analyzed and a CPG coupling model incorporating spinal motion parameters is designed. Subsequently, a TD3–CPG algorithm framework based on a joint incremental strategy is proposed to optimize the robot’s gait, exploring optimal control strategies for terrain adaptation through spinal motion integration. Finally, experiments are conducted on various obstacle terrains to validate the proposed algorithm. Simulation and experiment results demonstrate the effectiveness of the algorithm in optimizing the gait of the spinal quadruped robot, showing significant improvements in walking stability, speed, and terrain adaptability across different terrains. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

27 pages, 729 KB  
Article
RSMA-Assisted Fluid Antenna ISAC via Hierarchical Deep Reinforcement Learning
by Muhammad Sheraz, Teong Chee Chuah and It Ee Lee
Telecom 2026, 7(2), 41; https://doi.org/10.3390/telecom7020041 - 9 Apr 2026
Viewed by 328
Abstract
Integrated sensing and communications (ISAC) requires tight coordination between spatial signal design and multiple-access strategies to balance communication throughput and sensing accuracy under shared spectral and hardware constraints. However, existing ISAC frameworks with rate-splitting multiple access (RSMA) typically rely on fixed antenna arrays [...] Read more.
Integrated sensing and communications (ISAC) requires tight coordination between spatial signal design and multiple-access strategies to balance communication throughput and sensing accuracy under shared spectral and hardware constraints. However, existing ISAC frameworks with rate-splitting multiple access (RSMA) typically rely on fixed antenna arrays and decoupled optimization, which fundamentally limit their ability to adapt to fast channel variations and dynamic sensing requirements. This paper introduces a fluid antenna-enabled RSMA-assisted ISAC architecture, in which movable antenna ports are exploited as a new spatial degree of freedom to enhance adaptability in both communication and sensing operations. Fluid antenna systems (FAS) are deployed at both the base station and user terminals, allowing dynamic port selection that reshapes the effective channel and sensing beampattern in real time. We formulate a joint sum-rate maximization problem subject to explicit sensing-quality constraints, capturing the coupled impact of antenna port selection, RSMA rate allocation, and multi-beam transmit design. The proposed framework maximizes the communication sum-rate while ensuring that the sensing functionality satisfies a predefined sensing quality constraint. This constraint-based ISAC formulation guarantees that sufficient sensing power is directed toward the target while optimizing communication performance. The resulting optimization involves strongly coupled discrete and continuous decision variables, rendering conventional optimization methods ineffective. To address this challenge, a hierarchical deep reinforcement learning (HDRL) framework is developed, where an upper-layer deep Q-network (DQN) determines discrete antenna port selection and a lower-layer twin delayed deep deterministic policy gradient (TD3) algorithm optimizes continuous beamforming and rate-splitting parameters. Numerical results demonstrate that the proposed approach significantly improves system performance, achieving higher communication sum-rate while satisfying sensing requirements under dynamic propagation conditions. Full article
Show Figures

Figure 1

20 pages, 6792 KB  
Article
PER-TD3 Integrated with HER Mechanism: Improving Training Efficiency and Control Accuracy for PEMFC Differential Pressure Control
by Yuan Li, Baijun Lai, Jing Wang, Yan Sun, Donghai Hu and Hua Ding
World Electr. Veh. J. 2026, 17(4), 195; https://doi.org/10.3390/wevj17040195 - 8 Apr 2026
Viewed by 368
Abstract
The cathode and anode differential pressure control of a proton exchange membrane fuel cell (PEMFC) directly affects its service life and operating efficiency. Existing control methods find it difficult to cope with strong nonlinear perturbations, and fixed differential pressure control is prone to [...] Read more.
The cathode and anode differential pressure control of a proton exchange membrane fuel cell (PEMFC) directly affects its service life and operating efficiency. Existing control methods find it difficult to cope with strong nonlinear perturbations, and fixed differential pressure control is prone to pressure overshoot and threshold exceedance, resulting in unstable pressure regulation. In order to solve the current research problems, a reinforcement learning method based on hybrid experience replay (HP-TD3) is proposed. A CART-based algorithm is first used to classify the states of the test load, and a load-related segmented reward function is designed. In addition, a hindsight experience replay (HER) mechanism is incorporated into the Priority Experience Replay Twin Delayed Deep Deterministic Policy Gradient (PER-TD3) framework to improve sample utilization efficiency and training stability. Finally, the performance of HP-TD3 and its ability to cope with nonlinear disturbances are verified on a fuel cell control unit hardware-in-the-loop (FCU-HIL) platform. In the A test load (frequent switching and high low-load proportion), the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and the degradation index of the fuel cell dynamic performance (Δfc) of HP-TD3 are respectively reduced by 17.4%, 20.5%, and 13.3% compared to P-TD3; in the B test load (high-load operation and low switching frequency), these indicators are reduced by 25.7%, 29.4%, and 15.4% respectively. Full article
(This article belongs to the Section Storage Systems)
Show Figures

Figure 1

28 pages, 3267 KB  
Article
A Hierarchical Dynamic Path Planning Framework for Autonomous Vehicles Based on Physics-Informed Potential Field and TD3 Reinforcement Learning
by Yan Pan, Yu Wang and Bin Ran
Appl. Sci. 2026, 16(7), 3610; https://doi.org/10.3390/app16073610 - 7 Apr 2026
Viewed by 359
Abstract
Autonomous driving in dense traffic demands policies that ensure safety, accurate path tracking, and ride comfort, yet reinforcement learning (RL) alone suffers from low sample efficiency and weak safety guarantees, while classical artificial potential field (APF) methods lack adaptability to dynamic scenarios. This [...] Read more.
Autonomous driving in dense traffic demands policies that ensure safety, accurate path tracking, and ride comfort, yet reinforcement learning (RL) alone suffers from low sample efficiency and weak safety guarantees, while classical artificial potential field (APF) methods lack adaptability to dynamic scenarios. This paper proposes PIPF-TD3, which integrates APF theory with the Twin Delayed Deep Deterministic Policy Gradient (TD3) by embedding composite potential values and Doppler-weighted gradients as physics-informed features into the state vector. A Hybrid A* planner generates a reference path encoded as an attractive field; repulsive fields model nearby obstacles using real-time perception data; and a multi-objective reward function jointly optimizes path tracking, collision avoidance, and ride comfort. Experiments in CARLA 0.9.14 across two scenarios—a highway segment with mixed obstacles and a signalized intersection with conflicting turning movements—show that PIPF-TD3 achieves 100% task completion with zero collisions, whereas TD3 without potential field guidance suffers a 90% collision rate. PIPF-TD3 reduces mean cross-track error to 0.12 m (72.1% reduction over the rule-based FSM baseline), maintains 67.0% larger safety clearance, and yields RMS longitudinal and lateral accelerations of 1.12 and 0.75 m/s2, outperforming the FSM by 37.1% and 42.7%. These results confirm that Doppler-weighted physical priors substantially enhance RL-based driving safety and quality in complex traffic conditions. Full article
(This article belongs to the Section Transportation and Future Mobility)
Show Figures

Figure 1

24 pages, 17819 KB  
Article
GT-TD3: A Kinematics-Aware Graph-Transformer Framework for Stable Trajectory Tracking of High-Degree-of-Freedom (DOF) Manipulators
by Hanwen Miao, Haoran Hou, Zhaopeng Zhu, Zheng Chao and Rui Zhang
Machines 2026, 14(4), 397; https://doi.org/10.3390/machines14040397 - 5 Apr 2026
Viewed by 434
Abstract
Accurate trajectory tracking of redundant manipulators is difficult because the controller must simultaneously model local couplings between adjacent joints and global dependencies across the whole kinematic chain. Existing reinforcement learning methods typically employ multilayer perceptrons, which do not explicitly exploit manipulator structure and [...] Read more.
Accurate trajectory tracking of redundant manipulators is difficult because the controller must simultaneously model local couplings between adjacent joints and global dependencies across the whole kinematic chain. Existing reinforcement learning methods typically employ multilayer perceptrons, which do not explicitly exploit manipulator structure and therefore show limited stability and representation ability in high-dimensional continuous control tasks. This paper proposes GT-TD3, a Graph Transformer-enhanced-Twin Delayed Deep Deterministic Policy Gradient framework, for redundant manipulator trajectory tracking. The proposed actor first converts the raw system state into joint-level node features and uses a graph neural network to extract local kinematic coupling information. A Transformer is then employed to capture long-range dependencies among joints. To strengthen the use of structural priors, topology- and distance-related bias terms are incorporated into the attention mechanism, enabling the network to encode manipulator structure during global feature learning. Experiments on a 7-DoF KUKA iiwa manipulator in PyBullet demonstrate that GT-TD3 outperforms MLP, pure GNN, and pure Transformer baselines in tracking performance. The proposed method achieves more stable training, faster convergence, and smoother and more accurate end-effector motion. The results show that the integration of local graph modeling and structure-aware global attention provides an effective solution for high-precision trajectory tracking of redundant manipulators. Full article
(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)
Show Figures

Figure 1

17 pages, 1288 KB  
Article
An Energy Management Optimization Method for Arctic Space Environment Monitoring Buoys Based on Deep Reinforcement Learning
by Hui Zhu, Bingrui Li, Yan Chen, Yinke Dou, Yi Tian, Yahao Li, Huiguang Li and Zepeng Gao
Energies 2026, 19(6), 1487; https://doi.org/10.3390/en19061487 - 17 Mar 2026
Viewed by 296
Abstract
To address the long-term operational challenges of space environment monitoring buoys under extreme Arctic conditions, this paper proposes an energy management optimization method based on deep reinforcement learning (DRL). By constructing a buoy system model that integrates renewable energy sources, a primary lithium [...] Read more.
To address the long-term operational challenges of space environment monitoring buoys under extreme Arctic conditions, this paper proposes an energy management optimization method based on deep reinforcement learning (DRL). By constructing a buoy system model that integrates renewable energy sources, a primary lithium battery power supply, and a battery energy storage unit, combined with an Arctic environmental model incorporating low-temperature efficiency degradation, a reward function was designed to minimize power supply deficits while ensuring system reliability. The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm was employed to optimize energy scheduling strategies. Simulation results based on real Arctic data (August 2024–January 2025) demonstrate that integrating wind turbines significantly reduces reliance on primary lithium batteries. Specifically, the required lithium battery capacity was reduced by 87.5% (from 61.44 kWh to 7.685 kWh), and procurement costs were lowered by approximately $68,830 compared to non-rechargeable schemes1. This method significantly enhances the buoy’s endurance and scheduling intelligence, offering valid insights into energy management in intelligent polar observation equipment. Full article
(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)
Show Figures

Figure 1

30 pages, 1414 KB  
Article
Graph-Attention Constrained DRL for Joint Task Offloading and Resource Allocation in UAV-Assisted Internet of Vehicles
by Peiying Zhang, Xiangguo Zheng, Konstantin Igorevich Kostromitin, Wei Zhang, Huiling Shi and Lizhuang Tan
Drones 2026, 10(3), 201; https://doi.org/10.3390/drones10030201 - 13 Mar 2026
Viewed by 514
Abstract
Unmanned aerial vehicles (UAVs) acting as mobile aerial edge platforms can deliver on-demand communication and computing for the Internet of Vehicles (IoV) via flexible deployment and line-of-sight (LoS) links, improving reliability and reducing latency. However, high vehicle mobility, time-varying channels, and limited onboard [...] Read more.
Unmanned aerial vehicles (UAVs) acting as mobile aerial edge platforms can deliver on-demand communication and computing for the Internet of Vehicles (IoV) via flexible deployment and line-of-sight (LoS) links, improving reliability and reducing latency. However, high vehicle mobility, time-varying channels, and limited onboard energy make task offloading and resource coordination challenging. This paper studies joint task offloading and resource allocation in a UAV-assisted IoV system, where the UAV selects its hovering position from discrete candidate sites each time slot and splits vehicular tasks between the UAV and a roadside unit (RSU) to relieve backhaul congestion and enhance edge resource utilization. Considering vehicle mobility, multi-stage queue dynamics, and UAV energy consumption for communication, computation, and movement, the online optimization of position selection, task splitting, and bandwidth allocation is formulated as a constrained Markov decision process (CMDP). The goal is to maximize the number of tasks completed within the latency deadlines while satisfying the UAV energy budget. To solve this CMDP, we propose a graph-attention-based constrained twin delayed deep deterministic policy gradient (GAT-CTD3) algorithm. A graph attention network captures spatial correlations and resource competition among active vehicles, while a Lagrangian TD3 framework enforces long-term energy constraints and improves learning stability via twin critics, delayed policy updates, and target smoothing. The simulation results demonstrate that it outperforms the comparative scheme in terms of task completion rate, delay, and energy consumption per completed task, and exhibits strong robustness in situations with dense traffic. Full article
Show Figures

Figure 1

30 pages, 8205 KB  
Article
Path Planning for USVs in Complex Marine Environments Based on an Improved Hybrid TD3 Algorithm
by Zhenxing Zhang, Xiaohui Wang, Qiujie Wang, Mingwei Zhu and Mingkun Feng
Sensors 2026, 26(6), 1823; https://doi.org/10.3390/s26061823 - 13 Mar 2026
Viewed by 544
Abstract
Real-time path planning for Unmanned Surface Vehicles (USVs) in complex marine environments remains challenging due to unstructured environments, ocean current disturbances, and dynamic obstacles. This paper proposes an improved Hybrid Safety and Reward-Sensitive Twin Delayed Deep Deterministic Policy Gradient (H_RS_TD3) algorithm and constructs [...] Read more.
Real-time path planning for Unmanned Surface Vehicles (USVs) in complex marine environments remains challenging due to unstructured environments, ocean current disturbances, and dynamic obstacles. This paper proposes an improved Hybrid Safety and Reward-Sensitive Twin Delayed Deep Deterministic Policy Gradient (H_RS_TD3) algorithm and constructs a high-fidelity simulation environment based on GEBCO bathymetric data and CMEMS ocean current data. The path planning problem is formulated as a Markov Decision Process (MDP), where the state space incorporates multi-beam radar perception, ocean current disturbances, and relative goal information, while the action space outputs continuous thrust and rudder commands subject to vehicle dynamics constraints. The proposed framework integrates a risk-aware hybrid safety decision architecture, a Trajectory Predictor Network (TPN), a Curvature-driven Advantage-based Prioritized Experience Replay (CDA-PER) mechanism, and an uncertainty-aware conservative Q-learning strategy to enhance navigation safety, sample efficiency, and policy stability. Comprehensive simulations demonstrate that, compared with baseline deep reinforcement learning methods, the proposed approach achieves faster convergence, improved stability, and competitive path efficiency while consistently maintaining sufficient obstacle clearance and millisecond-level inference latency, validating its effectiveness and practical feasibility for safe USV navigation in realistic dynamic marine environments. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

25 pages, 6530 KB  
Article
Reinforcement Learning-Based Energy Storage Management for Microgrid Power Exchanges
by Federico Perquoti, Davide Milillo, Lorenzo Sabino, Michele Quercio, Francesco Riganti Fulginei, George Cristian Lazaroiu and Fabio Crescimbini
Eng 2026, 7(3), 126; https://doi.org/10.3390/eng7030126 - 9 Mar 2026
Viewed by 585
Abstract
Intelligent energy management systems are increasingly necessary for integrating renewable energy sources within microgrids. This paper investigates the application of a reinforcement learning (RL) neural network to optimize the operation of an electrochemical storage system in an environment composed of residential loads, commercial [...] Read more.
Intelligent energy management systems are increasingly necessary for integrating renewable energy sources within microgrids. This paper investigates the application of a reinforcement learning (RL) neural network to optimize the operation of an electrochemical storage system in an environment composed of residential loads, commercial loads, and a photovoltaic plant, all connected to the grid. A dataset combining market purchase prices, photovoltaic generation, and residential and commercial load profiles was generated and used to train a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent with the primary goal of deriving a reliable and adaptive post-training policy capable of maximizing photovoltaic self-consumption, minimizing operational costs through intelligent price arbitrage, and ensuring strict compliance with battery physical constraints. The system state includes battery state of charge, load demand, PV generation, and normalized market purchase prices, whereas the action represents the battery’s charge/discharge power, which is restricted from exporting energy to the grid. Results show that the agent learns to effectively store surplus PV energy and minimize grid dependency through dynamic charge management. The proposed approach outperforms strategies based solely on storing surplus self-generated energy and maintains the battery within safe operational limits. Tests with previously unseen data demonstrate robust, adaptive, and economically efficient energy management, highlighting the potential of reinforcement learning in intelligent energy systems. Full article
Show Figures

Figure 1

25 pages, 1793 KB  
Article
Computing Efficiency Optimization for UAV-Enabled Integrated Sensing, Computing, and Communication: A Memory-Based Deep Reinforcement Learning Approach
by Honghao Qi and Muqing Wu
Drones 2026, 10(3), 180; https://doi.org/10.3390/drones10030180 - 6 Mar 2026
Viewed by 554
Abstract
Unmanned aerial vehicles (UAVs) have emerged as a promising platform for supporting integrated sensing, computing, and communication (ISCC) functionality in Internet of Things (IoT) applications. This paper investigates a UAV-enabled ISCC network, where the UAV performs radar sensing and onboard edge computing with [...] Read more.
Unmanned aerial vehicles (UAVs) have emerged as a promising platform for supporting integrated sensing, computing, and communication (ISCC) functionality in Internet of Things (IoT) applications. This paper investigates a UAV-enabled ISCC network, where the UAV performs radar sensing and onboard edge computing with the computational assistance of ground access points (APs). Given the limited onboard energy, ensuring energy-efficient operation of UAVs is crucial to support the long-term sustainability of network performance. In this paper, we define computing efficiency as the ratio between the total number of successfully processed computational bits and the overall UAV energy consumption, under the constraint of a required sensing threshold. To maximize this performance metric, this paper jointly optimizes the beamforming vector, the CPU frequency, and the trajectory of the UAV. This optimization problem is modeled as a Markov decision process (MDP) and solved using a deep reinforcement learning (DRL) approach based on a memory mechanism. Specifically, a long short-term memory (LSTM) and twin delayed deep deterministic policy gradient (TD3)-based trajectory design and resource allocation (LTTDRA) algorithm is proposed. LSTM units are integrated into the actor and critic to effectively capture the temporal correlations in dynamic environments, thereby enhancing policy stability and accelerating algorithm convergence. The reward function is meticulously designed to alleviate sparse-penalty effects and learn high-performance strategies in complex environments with multiple constraints. Extensive simulations are conducted under various settings and network scenarios, and the results consistently indicate that the proposed approach substantially outperforms the baseline schemes. Full article
(This article belongs to the Special Issue Advances in UAV Networks Towards 6G)
Show Figures

Figure 1

25 pages, 5606 KB  
Article
Health-Aware Differentiated Energy Management for Multi-Stack Fuel Cell Hybrid Power Systems on Ships
by Lin Zhu, Yancheng Liu, Haohao Guo and Siyuan Liu
J. Mar. Sci. Eng. 2026, 14(5), 460; https://doi.org/10.3390/jmse14050460 - 28 Feb 2026
Viewed by 388
Abstract
This study proposes a health-aware energy management strategy based on the twin delayed deep deterministic policy gradient (TD3) algorithm for hybrid fuel cell/battery-powered ships. Unlike traditional approaches that treat multiple fuel cell stacks as homogeneous units, this strategy innovatively implements differentiated power allocation [...] Read more.
This study proposes a health-aware energy management strategy based on the twin delayed deep deterministic policy gradient (TD3) algorithm for hybrid fuel cell/battery-powered ships. Unlike traditional approaches that treat multiple fuel cell stacks as homogeneous units, this strategy innovatively implements differentiated power allocation based on the real-time state of health of each stack. The research first validates the superiority of the TD3 framework over the deep Q-learning framework at the algorithmic level. Further comparative experiments conducted across three scenarios with varying degrees of state of health differences show that, compared to the TD3 baseline strategy employing average power allocation, the health-aware differentiated TD3 strategy significantly reduces the total voyage cost of the system, with the cost-saving effect becoming more pronounced as the state of health disparity between stacks increases. Additionally, by incorporating rule-based constraints, the convergence speed of the TD3 algorithm is effectively enhanced, improving its feasibility for real-time control. Tests under dynamic and fluctuating load conditions further confirm the strategy’s effectiveness and applicability. In summary, the health-aware TD3 strategy proposed in this study not only provides an efficient and reliable energy management solution for hybrid-powered ships but also promotes the application of machine learning in the field of ship energy management. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

19 pages, 3606 KB  
Article
Autonomous Navigation of an Unmanned Underwater Vehicle via Safe Reinforcement Learning and Active Disturbance Rejection Control
by Qinze Chen, Yun Cheng, Yinlong Yuan and Liang Hua
J. Mar. Sci. Eng. 2026, 14(5), 425; https://doi.org/10.3390/jmse14050425 - 25 Feb 2026
Viewed by 515
Abstract
A two-layer control framework for unmanned underwater vehicle (UUV) navigation is proposed, combining a lower-layer active disturbance rejection controller (ADRC) with an upper-layer safe reinforcement learning (RL) policy for obstacle-avoidance navigation. The lower layer, utilizing ADRC, ensures high tracking accuracy and effective disturbance [...] Read more.
A two-layer control framework for unmanned underwater vehicle (UUV) navigation is proposed, combining a lower-layer active disturbance rejection controller (ADRC) with an upper-layer safe reinforcement learning (RL) policy for obstacle-avoidance navigation. The lower layer, utilizing ADRC, ensures high tracking accuracy and effective disturbance rejection, while the upper layer integrates the twin delayed deep deterministic policy gradient (TD3) algorithm, combined with a control barrier function (CBF)-based quadratic programming (QP) safety filter and safety-inspired reward shaping (SR). The method is evaluated in two simulation studies: (i) velocity and attitude control to assess tracking and disturbance rejection, and (ii) obstacle-avoidance navigation to assess learning efficiency, trajectory smoothness, and safety-related metrics. Simulation results show that ADRC achieves faster tracking and stronger disturbance rejection than a conventional proportional–integral–derivative (PID) controller. Moreover, the proposed TD3 + QP + SR scheme exhibits faster learning, smoother trajectories, and improved safety performance compared with RL baselines. These results indicate that the proposed framework enables efficient and safe UUV navigation in simulation scenarios with obstacles and disturbances. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

Back to TopTop