Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (105)

Search Parameters:
Keywords = twin delayed deep deterministic policy gradient (TD3)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 2070 KiB  
Article
Reinforcement Learning-Based Finite-Time Sliding-Mode Control in a Human-in-the-Loop Framework for Pediatric Gait Exoskeleton
by Matthew Wong Sang and Jyotindra Narayan
Machines 2025, 13(8), 668; https://doi.org/10.3390/machines13080668 - 30 Jul 2025
Viewed by 158
Abstract
Rehabilitation devices such as actuated lower-limb exoskeletons can provide essential mobility assistance for pediatric patients with gait impairments. Enhancing their control systems under conditions of user variability and dynamic disturbances remains a significant challenge, particularly in active-assist modes. This study presents a human-in-the-loop [...] Read more.
Rehabilitation devices such as actuated lower-limb exoskeletons can provide essential mobility assistance for pediatric patients with gait impairments. Enhancing their control systems under conditions of user variability and dynamic disturbances remains a significant challenge, particularly in active-assist modes. This study presents a human-in-the-loop control architecture for a pediatric lower-limb exoskeleton, combining outer-loop admittance control with robust inner-loop trajectory tracking via a non-singular terminal sliding-mode (NSTSM) controller. Designed for active-assist gait rehabilitation in children aged 8–12 years, the exoskeleton dynamically responds to user interaction forces while ensuring finite-time convergence under system uncertainties. To enhance adaptability, we augment the inner-loop control with a twin delayed deep deterministic policy gradient (TD3) reinforcement learning framework. The actor–critic RL agent tunes NSTSM gains in real-time, enabling personalized model-free adaptation to subject-specific gait dynamics and external disturbances. The numerical simulations show improved trajectory tracking, with RMSE reductions of 27.82% (hip) and 5.43% (knee), and IAE improvements of 40.85% and 10.20%, respectively, over the baseline NSTSM controller. The proposed approach also reduced the peak interaction torques across all the joints, suggesting more compliant and comfortable assistance for users. While minor degradation is observed at the ankle joint, the TD3-NSTSM controller demonstrates improved responsiveness and stability, particularly in high-load joints. This research contributes to advancing pediatric gait rehabilitation using RL-enhanced control, offering improved mobility support and adaptive rehabilitation outcomes. Full article
Show Figures

Figure 1

24 pages, 1147 KiB  
Article
A Channel-Aware AUV-Aided Data Collection Scheme Based on Deep Reinforcement Learning
by Lizheng Wei, Minghui Sun, Zheng Peng, Jingqian Guo, Jiankuo Cui, Bo Qin and Jun-Hong Cui
J. Mar. Sci. Eng. 2025, 13(8), 1460; https://doi.org/10.3390/jmse13081460 - 30 Jul 2025
Viewed by 69
Abstract
Underwater sensor networks (UWSNs) play a crucial role in subsea operations like marine exploration and environmental monitoring. A major challenge for UWSNs is achieving effective and energy-efficient data collection, particularly in deep-sea mining, where energy limitations and long-term deployment are key concerns. This [...] Read more.
Underwater sensor networks (UWSNs) play a crucial role in subsea operations like marine exploration and environmental monitoring. A major challenge for UWSNs is achieving effective and energy-efficient data collection, particularly in deep-sea mining, where energy limitations and long-term deployment are key concerns. This study introduces a Channel-Aware AUV-Aided Data Collection Scheme (CADC) that utilizes deep reinforcement learning (DRL) to improve data collection efficiency. It features an innovative underwater node traversal algorithm that accounts for unique underwater signal propagation characteristics, along with a DRL-based path planning approach to mitigate propagation losses and enhance data energy efficiency. CADC achieves a 71.2% increase in energy efficiency compared to existing clustering methods and shows a 0.08% improvement over the Deep Deterministic Policy Gradient (DDPG), with a 2.3% faster convergence than the Twin Delayed DDPG (TD3), and reduces energy cost to only 22.2% of that required by the TSP-based baseline. By combining a channel-aware traversal with adaptive DRL navigation, CADC effectively optimizes data collection and energy consumption in underwater environments. Full article
Show Figures

Figure 1

28 pages, 2959 KiB  
Article
Trajectory Prediction and Decision Optimization for UAV-Assisted VEC Networks: An Integrated LSTM-TD3 Framework
by Jiahao Xie and Hao Hao
Information 2025, 16(8), 646; https://doi.org/10.3390/info16080646 - 29 Jul 2025
Viewed by 108
Abstract
With the rapid development of intelligent transportation systems (ITSs) and Internet of Things (IoT), vehicle-mounted edge computing (VEC) networks are facing the challenge of handling increasingly growing computation-intensive and latency-sensitive tasks. In the UAV-assisted VEC network, by introducing mobile edge servers, the coverage [...] Read more.
With the rapid development of intelligent transportation systems (ITSs) and Internet of Things (IoT), vehicle-mounted edge computing (VEC) networks are facing the challenge of handling increasingly growing computation-intensive and latency-sensitive tasks. In the UAV-assisted VEC network, by introducing mobile edge servers, the coverage of ground infrastructure is effectively supplemented. However, there is still the problem of decision-making lag in a highly dynamic environment. This paper proposes a deep reinforcement learning framework based on the long short-term memory (LSTM) network for trajectory prediction to optimize resource allocation in UAV-assisted VEC networks. Uniquely integrating vehicle trajectory prediction with the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, this framework enables proactive computation offloading and UAV trajectory planning. Specifically, we design an LSTM network with an attention mechanism to predict the future trajectory of vehicles and integrate the prediction results into the optimization decision-making process. We propose state smoothing and data augmentation techniques to improve training stability and design a multi-objective optimization model that incorporates the Age of Information (AoI), energy consumption, and resource leasing costs. The simulation results show that compared with existing methods, the method proposed in this paper significantly reduces the total system cost, improves the information freshness, and exhibits better environmental adaptability and convergence performance under various network conditions. Full article
Show Figures

Figure 1

19 pages, 2893 KiB  
Article
Reactive Power Optimization of a Distribution Network Based on Graph Security Reinforcement Learning
by Xu Zhang, Xiaolin Gui, Pei Sun, Xing Li, Yuan Zhang, Xiaoyu Wang, Chaoliang Dang and Xinghua Liu
Appl. Sci. 2025, 15(15), 8209; https://doi.org/10.3390/app15158209 - 23 Jul 2025
Viewed by 190
Abstract
With the increasing integration of renewable energy, the secure operation of distribution networks faces significant challenges, such as voltage limit violations and increased power losses. To address the issue of reactive power and voltage security under renewable generation uncertainty, this paper proposes a [...] Read more.
With the increasing integration of renewable energy, the secure operation of distribution networks faces significant challenges, such as voltage limit violations and increased power losses. To address the issue of reactive power and voltage security under renewable generation uncertainty, this paper proposes a graph-based security reinforcement learning method. First, a graph-enhanced neural network is designed, to extract both topological and node-level features from the distribution network. Then, a primal-dual approach is introduced to incorporate voltage security constraints into the agent’s critic network, by constructing a cost critic to guide safe policy learning. Finally, a dual-critic framework is adopted to train the actor network and derive an optimal policy. Experiments conducted on real load profiles demonstrated that the proposed method reduced the voltage violation rate to 0%, compared to 4.92% with the Deep Deterministic Policy Gradient (DDPG) algorithm and 5.14% with the Twin Delayed DDPG (TD3) algorithm. Moreover, the average node voltage deviation was effectively controlled within 0.0073 per unit. Full article
(This article belongs to the Special Issue IoT Technology and Information Security)
Show Figures

Figure 1

33 pages, 3525 KiB  
Article
Investigation into the Performance Enhancement and Configuration Paradigm of Partially Integrated RL-MPC System
by Wanqi Guo and Shigeyuki Tateno
Mathematics 2025, 13(15), 2341; https://doi.org/10.3390/math13152341 - 22 Jul 2025
Viewed by 237
Abstract
The improvement of the partially integrated reinforcement learning-model predictive control (RL-MPC) system is developed in the paper by introducing the Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. This framework differs from the traditional ones, which completely [...] Read more.
The improvement of the partially integrated reinforcement learning-model predictive control (RL-MPC) system is developed in the paper by introducing the Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. This framework differs from the traditional ones, which completely substitute the MPC prediction model; instead, an RL agent refines predictions through feedback correction and thus maintains interpretability while improving robustness. Most importantly, the study details two configuration paradigms: decoupled (offline policy application) and coupled (online policy update) and tests them for their effectiveness in trajectory tracking tasks within simulation and real-life experiments. A decoupled framework based on TD3 showed significant improvements in control performance compared to the rest of the implemented paradigms, especially concerning Integral of Time-weighted Absolute Error (ITAE) and mean absolute error (MAE). This work also illustrated the advantages of partial integration in balancing adaptability and stability, thus making it suitable for real-time applications in robotics. Full article
Show Figures

Figure 1

20 pages, 3000 KiB  
Article
NRNH-AR: A Small Robotic Agent Using Tri-Fold Learning for Navigation and Obstacle Avoidance
by Carlos Vasquez-Jalpa, Mariko Nakano, Martin Velasco-Villa and Osvaldo Lopez-Garcia
Appl. Sci. 2025, 15(15), 8149; https://doi.org/10.3390/app15158149 - 22 Jul 2025
Viewed by 244
Abstract
We propose a tri-fold learning algorithm, called Neuroevolution of Hybrid Neural Networks in a Robotic Agent (acronym in Spanish, NRNH-AR), based on deep reinforcement learning (DRL), with self-supervised learning (SSL) and unsupervised learning (USL) steps, specifically designed to be implemented in a small [...] Read more.
We propose a tri-fold learning algorithm, called Neuroevolution of Hybrid Neural Networks in a Robotic Agent (acronym in Spanish, NRNH-AR), based on deep reinforcement learning (DRL), with self-supervised learning (SSL) and unsupervised learning (USL) steps, specifically designed to be implemented in a small autonomous navigation robot capable of operating in constrained physical environments. The NRNH-AR algorithm is designed for a small physical robotic agent with limited resources. The proposed algorithm was evaluated in four critical aspects: computational cost, learning stability, required memory size, and operation speed. The results obtained show that the performance of NRNH-AR is within the ranges of the Deep Q Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed Deep Deterministic Policy Gradient (TD3). The proposed algorithm comprises three types of learning algorithms: SSL, USL, and DRL. Thanks to the series of learning algorithms, the proposed algorithm optimizes the use of resources and demonstrates adaptability in dynamic environments, a crucial aspect of navigation robotics. By integrating computer vision techniques based on a Convolutional Neuronal Network (CNN), the algorithm enhances its abilities to understand visual observations of the environment rapidly and detect a specific object, avoiding obstacles. Full article
Show Figures

Figure 1

25 pages, 14579 KiB  
Article
A Hybrid Path Planning Framework Integrating Deep Reinforcement Learning and Variable-Direction Potential Fields
by Yunfei Bi and Xi Fang
Mathematics 2025, 13(14), 2312; https://doi.org/10.3390/math13142312 - 20 Jul 2025
Viewed by 393
Abstract
To address the local optimality in path planning for logistics robots using APF (artificial potential field) and the stagnation problem when encountering trap obstacles, this paper proposes VDPF (variable-direction potential field) combined with RL (reinforcement learning) to effectively solve these problems. First, based [...] Read more.
To address the local optimality in path planning for logistics robots using APF (artificial potential field) and the stagnation problem when encountering trap obstacles, this paper proposes VDPF (variable-direction potential field) combined with RL (reinforcement learning) to effectively solve these problems. First, based on obstacle distribution, an obstacle classification algorithm is designed, enabling the robot to select appropriate obstacle avoidance strategies according to obstacle types. Second, the attractive force and repulsive force in APF are separated, and the direction of the repulsive force is modified to break the local optimum, allowing the robot to focus on handling current obstacle avoidance tasks. Finally, the improved APF is integrated with the TD3 (Twin Delayed Deep Deterministic Policy Gradient) algorithm, and a weight factor is introduced to adjust the robot’s acting forces. By sacrificing a certain level of safety for a larger exploration space, the robot is guided to escape from local optima and trap regions. Experimental results show that the improved algorithm effectively mitigates the trajectory oscillation of the robot and can efficiently solve the problems of local optimum and trap obstacles in the APF method. Compared with the algorithm APF-TD3 in scenarios with five obstacles, the proposed algorithm reduces the GS (Global Safety) by 8.6% and shortens the length by 8.3%. In 10 obstacle scenarios, the proposed algorithm reduces the GS by 29.8% and shortens the length by 9.7%. Full article
Show Figures

Figure 1

27 pages, 28182 KiB  
Article
Addressing Local Minima in Path Planning for Drones with Reinforcement Learning-Based Vortex Artificial Potential Fields
by Boyi Xiao, Lujun Wan, Xueyan Han, Zhilong Xi, Chenbo Ding and Qiang Li
Machines 2025, 13(7), 600; https://doi.org/10.3390/machines13070600 - 11 Jul 2025
Viewed by 194
Abstract
In complex environments, autonomous navigation for quadrotor drones presents challenges in terms of obstacle avoidance and path planning. Traditional artificial potential field (APF) methods are plagued by issues such as getting stuck in local minima and inadequate handling of dynamic obstacles. This paper [...] Read more.
In complex environments, autonomous navigation for quadrotor drones presents challenges in terms of obstacle avoidance and path planning. Traditional artificial potential field (APF) methods are plagued by issues such as getting stuck in local minima and inadequate handling of dynamic obstacles. This paper introduces a layered obstacle avoidance structure that merges vortex artificial potential (VAPF) fields with reinforcement learning (RL) for motion control. This approach dynamically adjusts the target position through VAPF, strategically guiding the drone to avoid obstacles indirectly. Additionally, it employs the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm to facilitate the training of the motion controller. Simulation experiments demonstrate that the incorporation of the VAPF effectively mitigates the issue of local minima and significantly enhances the success rate of drone navigation, reduces the average arrival time and the number of sharp turns, and results in smoother paths. This solution harmoniously combines the flexibility of VAPF methods with the precision of RL for motion control, offering an effective strategy for autonomous navigation of quadrotor drones in complex environments. Full article
(This article belongs to the Special Issue Intelligent Control Techniques for Unmanned Aerial Vehicles)
Show Figures

Figure 1

19 pages, 5332 KiB  
Article
Adaptive Control Strategy for the PI Parameters of Modular Multilevel Converters Based on Dual-Agent Deep Reinforcement Learning
by Jiale Liu, Weide Guan, Yongshuai Lu and Yang Zhou
Electronics 2025, 14(11), 2270; https://doi.org/10.3390/electronics14112270 - 31 May 2025
Viewed by 467
Abstract
As renewable energy sources are integrated into power grids on a large scale, modular multilevel converter-high voltage direct current (MMC-HVDC) systems face two significant challenges: traditional PI (proportional integral) controllers have limited dynamic regulation capabilities due to their fixed parameters, while improved PI [...] Read more.
As renewable energy sources are integrated into power grids on a large scale, modular multilevel converter-high voltage direct current (MMC-HVDC) systems face two significant challenges: traditional PI (proportional integral) controllers have limited dynamic regulation capabilities due to their fixed parameters, while improved PI controllers encounter implementation difficulties stemming from the complexity of their control strategies. This article proposes a dual-agent adaptive control framework based on the twin delayed deep deterministic policy gradient (TD3) algorithm. This framework facilitates the dynamic adjustment of PI parameters for both voltage and current dual-loop control and capacitor voltage balancing, utilizing a collaboratively optimized agent architecture without reliance on complex control logic or precise mathematical models. Simulation results demonstrate that, compared with fixed-parameter PI controllers, the proposed method significantly reduces DC voltage regulation time while achieving precise dynamic balance control of capacitor voltage and effective suppression of circulating current, thereby notably enhancing system stability and dynamic response characteristics. This approach offers new solutions for dynamic optimization control in MMC-HVDC systems. Full article
(This article belongs to the Section Power Electronics)
Show Figures

Figure 1

23 pages, 3540 KiB  
Article
A Low-Carbon Economic Scheduling Strategy for Multi-Microgrids with Communication Mechanism-Enabled Multi-Agent Deep Reinforcement Learning
by Lei Nie, Bo Long, Meiying Yu, Dawei Zhang, Xiaolei Yang and Shi Jing
Electronics 2025, 14(11), 2251; https://doi.org/10.3390/electronics14112251 - 31 May 2025
Cited by 1 | Viewed by 477
Abstract
To facilitate power system decarbonization, optimizing clean energy integration has emerged as a critical pathway for establishing sustainable power infrastructure. This study addresses the multi-timescale operational challenges inherent in power networks with high renewable penetration, proposing a novel stochastic dynamic programming framework that [...] Read more.
To facilitate power system decarbonization, optimizing clean energy integration has emerged as a critical pathway for establishing sustainable power infrastructure. This study addresses the multi-timescale operational challenges inherent in power networks with high renewable penetration, proposing a novel stochastic dynamic programming framework that synergizes intraday microgrid dispatch with a multi-phase carbon cost calculation mechanism. A probabilistic carbon flux quantification model is developed, incorporating source–load carbon flow tracing and nonconvex carbon pricing dynamics to enhance environmental–economic co-optimization constraints. The spatiotemporally coupled multi-microgrid (MMG) coordination paradigm is reformulated as a continuous state-action Markov game process governed by stochastic differential Stackelberg game principles. A communication mechanism-enabled multi-agent twin-delayed deep deterministic policy gradient (CMMA-TD3) algorithm is implemented to achieve Pareto-optimal solutions through cyber–physical collaboration. Results of the measurements in the MMG containing three microgrids show that the proposed approach reduces operation costs by 61.59% and carbon emissions by 27.95% compared to the least effective benchmark solution. Full article
Show Figures

Figure 1

28 pages, 6914 KiB  
Article
Guided Reinforcement Learning with Twin Delayed Deep Deterministic Policy Gradient for a Rotary Flexible-Link System
by Carlos Saldaña Enderica, José Ramon Llata and Carlos Torre-Ferrero
Robotics 2025, 14(6), 76; https://doi.org/10.3390/robotics14060076 - 31 May 2025
Viewed by 1274
Abstract
This study proposes a robust methodology for vibration suppression and trajectory tracking in rotary flexible-link systems by leveraging guided reinforcement learning (GRL). The approach integrates the twin delayed deep deterministic policy gradient (TD3) algorithm with a linear quadratic regulator (LQR) acting as a [...] Read more.
This study proposes a robust methodology for vibration suppression and trajectory tracking in rotary flexible-link systems by leveraging guided reinforcement learning (GRL). The approach integrates the twin delayed deep deterministic policy gradient (TD3) algorithm with a linear quadratic regulator (LQR) acting as a guiding controller during training. Flexible-link mechanisms common in advanced robotics and aerospace systems exhibit oscillatory behavior that complicates precise control. To address this, the system is first identified using experimental input-output data from a Quanser® virtual plant, generating an accurate state-space representation suitable for simulation-based policy learning. The hybrid control strategy enhances sample efficiency and accelerates convergence by incorporating LQR-generated trajectories during TD3 training. Internally, the TD3 agent benefits from architectural features such as twin critics, delayed policy updates, and target action smoothing, which collectively improve learning stability and reduce overestimation bias. Comparative results show that the guided TD3 controller achieves superior performance in terms of vibration damping, transient response, and robustness, when compared to conventional LQR, fuzzy logic, neural networks, and GA-LQR approaches. Although the controller was validated using a high-fidelity digital twin, it has not yet been deployed on the physical plant. Future work will focus on real-time implementation and structural robustness testing under parameter uncertainty. Overall, this research demonstrates that guided reinforcement learning can yield stable and interpretable policies that comply with classical control criteria, offering a scalable and generalizable framework for intelligent control of flexible mechanical systems. Full article
(This article belongs to the Section Industrial Robots and Automation)
Show Figures

Figure 1

37 pages, 13864 KiB  
Article
LSTM-Enhanced Deep Reinforcement Learning for Robust Trajectory Tracking Control of Skid-Steer Mobile Robots Under Terra-Mechanical Constraints
by Jose Manuel Alcayaga, Oswaldo Anibal Menéndez, Miguel Attilio Torres-Torriti, Juan Pablo Vásconez, Tito Arévalo-Ramirez and Alvaro Javier Prado Romo
Robotics 2025, 14(6), 74; https://doi.org/10.3390/robotics14060074 - 29 May 2025
Viewed by 2164
Abstract
Autonomous navigation in mining environments is challenged by complex wheel–terrain interaction, traction losses caused by slip dynamics, and sensor limitations. This paper investigates the effectiveness of Deep Reinforcement Learning (DRL) techniques for the trajectory tracking control of skid-steer mobile robots operating under terra-mechanical [...] Read more.
Autonomous navigation in mining environments is challenged by complex wheel–terrain interaction, traction losses caused by slip dynamics, and sensor limitations. This paper investigates the effectiveness of Deep Reinforcement Learning (DRL) techniques for the trajectory tracking control of skid-steer mobile robots operating under terra-mechanical constraints. Four state-of-the-art DRL algorithms, i.e., Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor–Critic (SAC), are selected to evaluate their ability to generate stable and adaptive control policies under varying environmental conditions. To address the inherent partial observability in real-world navigation, this study presents an original approach that integrates Long Short-Term Memory (LSTM) networks into DRL-based controllers. This allows control agents to retain and leverage temporal dependencies to infer unobservable system states. The developed agents were trained and tested in simulations and then assessed in field experiments under uneven terrain and dynamic model parameter changes that lead to traction losses in mining environments, targeting various trajectory tracking tasks, including lemniscate and squared-type reference trajectories. This contribution strengthens the robustness and adaptability of DRL agents by enabling better generalization of learned policies compared with their baseline counterparts, while also significantly improving trajectory tracking performance. In particular, LSTM-based controllers achieved reductions in tracking errors of 10%, 74%, 21%, and 37% for DDPG-LSTM, PPO-LSTM, TD3-LSTM, and SAC-LSTM, respectively, compared with their non-recurrent counterparts. Furthermore, DDPG-LSTM and TD3-LSTM reduced their control effort through the total variation in control input by 15% and 20% compared with their respective baseline controllers, respectively. Findings from this work provide valuable insights into the role of memory-augmented reinforcement learning for robust motion control in unstructured and high-uncertainty environments. Full article
(This article belongs to the Section Intelligent Robots and Mechatronics)
Show Figures

Figure 1

36 pages, 11692 KiB  
Article
Integrating Model Predictive Control with Deep Reinforcement Learning for Robust Control of Thermal Processes with Long Time Delays
by Kevin Marlon Soza Mamani and Alvaro Javier Prado Romo
Processes 2025, 13(6), 1627; https://doi.org/10.3390/pr13061627 - 22 May 2025
Viewed by 1105
Abstract
Thermal processes with prolonged and variable delays pose considerable difficulties due to unpredictable system dynamics and external disturbances, often resulting in diminished control effectiveness. This work presents a hybrid control strategy that synthesizes deep reinforcement learning (DRL) strategies with nonlinear model predictive control [...] Read more.
Thermal processes with prolonged and variable delays pose considerable difficulties due to unpredictable system dynamics and external disturbances, often resulting in diminished control effectiveness. This work presents a hybrid control strategy that synthesizes deep reinforcement learning (DRL) strategies with nonlinear model predictive control (NMPC) to improve the robust control performance of a thermal process with a long time delay. In this approach, NMPC cost functions are formulated as learning functions to achieve control objectives in terms of thermal tracking and disturbance rejection, while an actor–critic (AC) reinforcement learning agent dynamically adjusts control actions through an adaptive policy based on the exploration and exploitation of real-time data about the thermal process. Unlike conventional NMPC approaches, the proposed framework removes the need for predefined terminal cost tuning and strict constraint formulations during the control execution at runtime, which are typically required to ensure robust stability. To assess performance, a comparative study was conducted evaluating NMPC against AC-based controllers built upon policy gradient algorithms such as the deep deterministic policy gradient (DDPG) and the twin delayed deep deterministic policy gradient (TD3). The proposed method was experimentally validated using a temperature control laboratory (TCLab) testbed featuring long and varying delays. Results demonstrate that while the NMPC–AC hybrid approach maintains tracking control performance comparable to NMPC, the proposed technique acquires adaptability while tracking and further strengthens robustness in the presence of uncertainties and disturbances under dynamic system conditions. These findings highlight the benefits of integrating DRL with NMPC to enhance reliability in thermal process control and optimize resource efficiency in thermal applications. Full article
(This article belongs to the Section Process Control and Monitoring)
Show Figures

Figure 1

20 pages, 7183 KiB  
Article
A Two-Stage Strategy Integrating Gaussian Processes and TD3 for Leader–Follower Coordination in Multi-Agent Systems
by Xicheng Zhang, Bingchun Jiang, Fuqin Deng and Min Zhao
J. Sens. Actuator Netw. 2025, 14(3), 51; https://doi.org/10.3390/jsan14030051 - 14 May 2025
Viewed by 1283
Abstract
In mobile multi-agent systems (MASs), achieving effective leader–follower coordination under unknown dynamics poses significant challenges. This study proposes a two-stage cooperative strategy that integrates Gaussian Processes (GPs) for modeling and a Twin Delayed Deep Deterministic Policy Gradient (TD3) for policy optimization (GPTD3), aiming [...] Read more.
In mobile multi-agent systems (MASs), achieving effective leader–follower coordination under unknown dynamics poses significant challenges. This study proposes a two-stage cooperative strategy that integrates Gaussian Processes (GPs) for modeling and a Twin Delayed Deep Deterministic Policy Gradient (TD3) for policy optimization (GPTD3), aiming to enhance adaptability and multi-objective optimization. Initially, GPs are utilized to model the uncertain dynamics of agents based on sensor data, providing a stable and noiseless training virtual environment for the first phase of TD3 strategy network training. Subsequently, a TD3-based compensation learning mechanism is introduced to reduce consensus errors among multiple agents by incorporating the position state of other agents. Additionally, the approach employs an enhanced dual-layer reward mechanism tailored to different stages of learning, ensuring robustness and improved convergence speed. Experimental results using a differential drive robot simulation demonstrate the superiority of this method over traditional controllers. The integration of the TD3 compensation network further improves the cooperative reward among agents. Full article
Show Figures

Figure 1

24 pages, 1196 KiB  
Article
Integrated Guidance and Control for Strap-Down Flight Vehicle: A Deep Reinforcement Learning Approach
by Qinglong Zhang, Bin Zhao, Yifu Jiang, Jingyan Zhang and Jiale Zhang
Aerospace 2025, 12(5), 400; https://doi.org/10.3390/aerospace12050400 - 1 May 2025
Viewed by 341
Abstract
This paper proposes a three-dimensional (3D) deep reinforcement learning-based integrated guidance and control (DRLIGC) method, which is restricted by the narrow field-of-view (FOV) constraint of the strap-down seeker. By leveraging the data-driven nature of the deep reinforcement learning (DRL) algorithm, this method mitigates [...] Read more.
This paper proposes a three-dimensional (3D) deep reinforcement learning-based integrated guidance and control (DRLIGC) method, which is restricted by the narrow field-of-view (FOV) constraint of the strap-down seeker. By leveraging the data-driven nature of the deep reinforcement learning (DRL) algorithm, this method mitigates the challenges associated with integrated guidance and control (IGC) method design arising from model dependencies, thereby addressing the inherent complexity of the IGC model. Firstly, according to different states and actions, the pitch and yaw channels of the six-degree-of-freedom (6-DOF) IGC model are modeled as Markov decision processes (MDPs). Secondly, a channel-by-channel progressive training method based on the twin delayed deep deterministic policy gradient (TD3) algorithm is proposed. The agents of the pitch and yaw channels are trained using the TD3 algorithm independently, which substantially alleviates the complexity of the training process, while the roll channel is stabilized through the application of the back-stepping method. Thirdly, a comprehensive reward function is designed to simultaneously address the narrow FOV constraint and enhance the target engagement capability. Additionally, this function mitigates the issue of sparse rewards to some extent. Through Monte Carlo (MC) and comparative simulation verification, it is shown that the DRLIGC method proposed in this paper can effectively approach the target while maintaining the narrow FOV constraint and also has good robustness. Full article
(This article belongs to the Special Issue Integrated Guidance and Control for Aerospace Vehicles)
Show Figures

Figure 1

Back to TopTop