MDPI - Publisher of Open Access Journals

24 pages, 2070 KiB

Open AccessArticle

Reinforcement Learning-Based Finite-Time Sliding-Mode Control in a Human-in-the-Loop Framework for Pediatric Gait Exoskeleton

by Matthew Wong Sang and Jyotindra Narayan

Machines 2025, 13(8), 668; https://doi.org/10.3390/machines13080668 - 30 Jul 2025

Viewed by 158

Abstract

Rehabilitation devices such as actuated lower-limb exoskeletons can provide essential mobility assistance for pediatric patients with gait impairments. Enhancing their control systems under conditions of user variability and dynamic disturbances remains a significant challenge, particularly in active-assist modes. This study presents a human-in-the-loop [...] Read more.

Rehabilitation devices such as actuated lower-limb exoskeletons can provide essential mobility assistance for pediatric patients with gait impairments. Enhancing their control systems under conditions of user variability and dynamic disturbances remains a significant challenge, particularly in active-assist modes. This study presents a human-in-the-loop control architecture for a pediatric lower-limb exoskeleton, combining outer-loop admittance control with robust inner-loop trajectory tracking via a non-singular terminal sliding-mode (NSTSM) controller. Designed for active-assist gait rehabilitation in children aged 8–12 years, the exoskeleton dynamically responds to user interaction forces while ensuring finite-time convergence under system uncertainties. To enhance adaptability, we augment the inner-loop control with a twin delayed deep deterministic policy gradient (TD3) reinforcement learning framework. The actor–critic RL agent tunes NSTSM gains in real-time, enabling personalized model-free adaptation to subject-specific gait dynamics and external disturbances. The numerical simulations show improved trajectory tracking, with RMSE reductions of 27.82% (hip) and 5.43% (knee), and IAE improvements of 40.85% and 10.20%, respectively, over the baseline NSTSM controller. The proposed approach also reduced the peak interaction torques across all the joints, suggesting more compliant and comfortable assistance for users. While minor degradation is observed at the ankle joint, the TD3-NSTSM controller demonstrates improved responsiveness and stability, particularly in high-load joints. This research contributes to advancing pediatric gait rehabilitation using RL-enhanced control, offering improved mobility support and adaptive rehabilitation outcomes. Full article

(This article belongs to the Special Issue Bridging the Control Theory, Optimization, and Learning: Application in Robotics)

► Show Figures

Figure 1

24 pages, 1147 KiB

Open AccessArticle

A Channel-Aware AUV-Aided Data Collection Scheme Based on Deep Reinforcement Learning

by Lizheng Wei, Minghui Sun, Zheng Peng, Jingqian Guo, Jiankuo Cui, Bo Qin and Jun-Hong Cui

J. Mar. Sci. Eng. 2025, 13(8), 1460; https://doi.org/10.3390/jmse13081460 - 30 Jul 2025

Viewed by 69

Abstract

Underwater sensor networks (UWSNs) play a crucial role in subsea operations like marine exploration and environmental monitoring. A major challenge for UWSNs is achieving effective and energy-efficient data collection, particularly in deep-sea mining, where energy limitations and long-term deployment are key concerns. This [...] Read more.

Underwater sensor networks (UWSNs) play a crucial role in subsea operations like marine exploration and environmental monitoring. A major challenge for UWSNs is achieving effective and energy-efficient data collection, particularly in deep-sea mining, where energy limitations and long-term deployment are key concerns. This study introduces a Channel-Aware AUV-Aided Data Collection Scheme (CADC) that utilizes deep reinforcement learning (DRL) to improve data collection efficiency. It features an innovative underwater node traversal algorithm that accounts for unique underwater signal propagation characteristics, along with a DRL-based path planning approach to mitigate propagation losses and enhance data energy efficiency. CADC achieves a 71.2% increase in energy efficiency compared to existing clustering methods and shows a 0.08% improvement over the Deep Deterministic Policy Gradient (DDPG), with a 2.3% faster convergence than the Twin Delayed DDPG (TD3), and reduces energy cost to only 22.2% of that required by the TSP-based baseline. By combining a channel-aware traversal with adaptive DRL navigation, CADC effectively optimizes data collection and energy consumption in underwater environments. Full article

(This article belongs to the Special Issue Development of Theories and Systems in Underwater Communications and Networks)

► Show Figures

Figure 1

28 pages, 2959 KiB

Open AccessArticle

Trajectory Prediction and Decision Optimization for UAV-Assisted VEC Networks: An Integrated LSTM-TD3 Framework

by Jiahao Xie and Hao Hao

Information 2025, 16(8), 646; https://doi.org/10.3390/info16080646 - 29 Jul 2025

Viewed by 108

Abstract

With the rapid development of intelligent transportation systems (ITSs) and Internet of Things (IoT), vehicle-mounted edge computing (VEC) networks are facing the challenge of handling increasingly growing computation-intensive and latency-sensitive tasks. In the UAV-assisted VEC network, by introducing mobile edge servers, the coverage [...] Read more.

With the rapid development of intelligent transportation systems (ITSs) and Internet of Things (IoT), vehicle-mounted edge computing (VEC) networks are facing the challenge of handling increasingly growing computation-intensive and latency-sensitive tasks. In the UAV-assisted VEC network, by introducing mobile edge servers, the coverage of ground infrastructure is effectively supplemented. However, there is still the problem of decision-making lag in a highly dynamic environment. This paper proposes a deep reinforcement learning framework based on the long short-term memory (LSTM) network for trajectory prediction to optimize resource allocation in UAV-assisted VEC networks. Uniquely integrating vehicle trajectory prediction with the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, this framework enables proactive computation offloading and UAV trajectory planning. Specifically, we design an LSTM network with an attention mechanism to predict the future trajectory of vehicles and integrate the prediction results into the optimization decision-making process. We propose state smoothing and data augmentation techniques to improve training stability and design a multi-objective optimization model that incorporates the Age of Information (AoI), energy consumption, and resource leasing costs. The simulation results show that compared with existing methods, the method proposed in this paper significantly reduces the total system cost, improves the information freshness, and exhibits better environmental adaptability and convergence performance under various network conditions. Full article

► Show Figures

Figure 1

19 pages, 2893 KiB

Open AccessArticle

Reactive Power Optimization of a Distribution Network Based on Graph Security Reinforcement Learning

by Xu Zhang, Xiaolin Gui, Pei Sun, Xing Li, Yuan Zhang, Xiaoyu Wang, Chaoliang Dang and Xinghua Liu

Appl. Sci. 2025, 15(15), 8209; https://doi.org/10.3390/app15158209 - 23 Jul 2025

Viewed by 190

Abstract

With the increasing integration of renewable energy, the secure operation of distribution networks faces significant challenges, such as voltage limit violations and increased power losses. To address the issue of reactive power and voltage security under renewable generation uncertainty, this paper proposes a [...] Read more.

With the increasing integration of renewable energy, the secure operation of distribution networks faces significant challenges, such as voltage limit violations and increased power losses. To address the issue of reactive power and voltage security under renewable generation uncertainty, this paper proposes a graph-based security reinforcement learning method. First, a graph-enhanced neural network is designed, to extract both topological and node-level features from the distribution network. Then, a primal-dual approach is introduced to incorporate voltage security constraints into the agent’s critic network, by constructing a cost critic to guide safe policy learning. Finally, a dual-critic framework is adopted to train the actor network and derive an optimal policy. Experiments conducted on real load profiles demonstrated that the proposed method reduced the voltage violation rate to 0%, compared to 4.92% with the Deep Deterministic Policy Gradient (DDPG) algorithm and 5.14% with the Twin Delayed DDPG (TD3) algorithm. Moreover, the average node voltage deviation was effectively controlled within 0.0073 per unit. Full article

(This article belongs to the Special Issue IoT Technology and Information Security)

► Show Figures

Figure 1

33 pages, 3525 KiB

Open AccessArticle

Investigation into the Performance Enhancement and Configuration Paradigm of Partially Integrated RL-MPC System

by Wanqi Guo and Shigeyuki Tateno

Mathematics 2025, 13(15), 2341; https://doi.org/10.3390/math13152341 - 22 Jul 2025

Viewed by 237

Abstract

The improvement of the partially integrated reinforcement learning-model predictive control (RL-MPC) system is developed in the paper by introducing the Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. This framework differs from the traditional ones, which completely [...] Read more.

The improvement of the partially integrated reinforcement learning-model predictive control (RL-MPC) system is developed in the paper by introducing the Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. This framework differs from the traditional ones, which completely substitute the MPC prediction model; instead, an RL agent refines predictions through feedback correction and thus maintains interpretability while improving robustness. Most importantly, the study details two configuration paradigms: decoupled (offline policy application) and coupled (online policy update) and tests them for their effectiveness in trajectory tracking tasks within simulation and real-life experiments. A decoupled framework based on TD3 showed significant improvements in control performance compared to the rest of the implemented paradigms, especially concerning Integral of Time-weighted Absolute Error (ITAE) and mean absolute error (MAE). This work also illustrated the advantages of partial integration in balancing adaptability and stability, thus making it suitable for real-time applications in robotics. Full article

► Show Figures

Figure 1

20 pages, 3000 KiB

Open AccessArticle

NRNH-AR: A Small Robotic Agent Using Tri-Fold Learning for Navigation and Obstacle Avoidance

by Carlos Vasquez-Jalpa, Mariko Nakano, Martin Velasco-Villa and Osvaldo Lopez-Garcia

Appl. Sci. 2025, 15(15), 8149; https://doi.org/10.3390/app15158149 - 22 Jul 2025

Viewed by 244

Abstract

We propose a tri-fold learning algorithm, called Neuroevolution of Hybrid Neural Networks in a Robotic Agent (acronym in Spanish, NRNH-AR), based on deep reinforcement learning (DRL), with self-supervised learning (SSL) and unsupervised learning (USL) steps, specifically designed to be implemented in a small [...] Read more.

We propose a tri-fold learning algorithm, called Neuroevolution of Hybrid Neural Networks in a Robotic Agent (acronym in Spanish, NRNH-AR), based on deep reinforcement learning (DRL), with self-supervised learning (SSL) and unsupervised learning (USL) steps, specifically designed to be implemented in a small autonomous navigation robot capable of operating in constrained physical environments. The NRNH-AR algorithm is designed for a small physical robotic agent with limited resources. The proposed algorithm was evaluated in four critical aspects: computational cost, learning stability, required memory size, and operation speed. The results obtained show that the performance of NRNH-AR is within the ranges of the Deep Q Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed Deep Deterministic Policy Gradient (TD3). The proposed algorithm comprises three types of learning algorithms: SSL, USL, and DRL. Thanks to the series of learning algorithms, the proposed algorithm optimizes the use of resources and demonstrates adaptability in dynamic environments, a crucial aspect of navigation robotics. By integrating computer vision techniques based on a Convolutional Neuronal Network (CNN), the algorithm enhances its abilities to understand visual observations of the environment rapidly and detect a specific object, avoiding obstacles. Full article

(This article belongs to the Special Issue Advanced Technologies in Intelligent Software Methodologies, Tools, and Techniques)

► Show Figures

Figure 1

25 pages, 14579 KiB

Open AccessFeature PaperArticle

A Hybrid Path Planning Framework Integrating Deep Reinforcement Learning and Variable-Direction Potential Fields

by Yunfei Bi and Xi Fang

Mathematics 2025, 13(14), 2312; https://doi.org/10.3390/math13142312 - 20 Jul 2025

Viewed by 393

Abstract

To address the local optimality in path planning for logistics robots using APF (artificial potential field) and the stagnation problem when encountering trap obstacles, this paper proposes VDPF (variable-direction potential field) combined with RL (reinforcement learning) to effectively solve these problems. First, based [...] Read more.

To address the local optimality in path planning for logistics robots using APF (artificial potential field) and the stagnation problem when encountering trap obstacles, this paper proposes VDPF (variable-direction potential field) combined with RL (reinforcement learning) to effectively solve these problems. First, based on obstacle distribution, an obstacle classification algorithm is designed, enabling the robot to select appropriate obstacle avoidance strategies according to obstacle types. Second, the attractive force and repulsive force in APF are separated, and the direction of the repulsive force is modified to break the local optimum, allowing the robot to focus on handling current obstacle avoidance tasks. Finally, the improved APF is integrated with the TD3 (Twin Delayed Deep Deterministic Policy Gradient) algorithm, and a weight factor is introduced to adjust the robot’s acting forces. By sacrificing a certain level of safety for a larger exploration space, the robot is guided to escape from local optima and trap regions. Experimental results show that the improved algorithm effectively mitigates the trajectory oscillation of the robot and can efficiently solve the problems of local optimum and trap obstacles in the APF method. Compared with the algorithm APF-TD3 in scenarios with five obstacles, the proposed algorithm reduces the GS (Global Safety) by 8.6% and shortens the length by 8.3%. In 10 obstacle scenarios, the proposed algorithm reduces the GS by 29.8% and shortens the length by 9.7%. Full article

► Show Figures

Figure 1

27 pages, 28182 KiB

Open AccessArticle

Addressing Local Minima in Path Planning for Drones with Reinforcement Learning-Based Vortex Artificial Potential Fields

by Boyi Xiao, Lujun Wan, Xueyan Han, Zhilong Xi, Chenbo Ding and Qiang Li

Machines 2025, 13(7), 600; https://doi.org/10.3390/machines13070600 - 11 Jul 2025

Viewed by 194

Abstract

In complex environments, autonomous navigation for quadrotor drones presents challenges in terms of obstacle avoidance and path planning. Traditional artificial potential field (APF) methods are plagued by issues such as getting stuck in local minima and inadequate handling of dynamic obstacles. This paper [...] Read more.

In complex environments, autonomous navigation for quadrotor drones presents challenges in terms of obstacle avoidance and path planning. Traditional artificial potential field (APF) methods are plagued by issues such as getting stuck in local minima and inadequate handling of dynamic obstacles. This paper introduces a layered obstacle avoidance structure that merges vortex artificial potential (VAPF) fields with reinforcement learning (RL) for motion control. This approach dynamically adjusts the target position through VAPF, strategically guiding the drone to avoid obstacles indirectly. Additionally, it employs the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm to facilitate the training of the motion controller. Simulation experiments demonstrate that the incorporation of the VAPF effectively mitigates the issue of local minima and significantly enhances the success rate of drone navigation, reduces the average arrival time and the number of sharp turns, and results in smoother paths. This solution harmoniously combines the flexibility of VAPF methods with the precision of RL for motion control, offering an effective strategy for autonomous navigation of quadrotor drones in complex environments. Full article

(This article belongs to the Special Issue Intelligent Control Techniques for Unmanned Aerial Vehicles)

► Show Figures

Figure 1

19 pages, 5332 KiB

Open AccessArticle

Adaptive Control Strategy for the PI Parameters of Modular Multilevel Converters Based on Dual-Agent Deep Reinforcement Learning

by Jiale Liu, Weide Guan, Yongshuai Lu and Yang Zhou

Electronics 2025, 14(11), 2270; https://doi.org/10.3390/electronics14112270 - 31 May 2025

Viewed by 467

Abstract

As renewable energy sources are integrated into power grids on a large scale, modular multilevel converter-high voltage direct current (MMC-HVDC) systems face two significant challenges: traditional PI (proportional integral) controllers have limited dynamic regulation capabilities due to their fixed parameters, while improved PI [...] Read more.

As renewable energy sources are integrated into power grids on a large scale, modular multilevel converter-high voltage direct current (MMC-HVDC) systems face two significant challenges: traditional PI (proportional integral) controllers have limited dynamic regulation capabilities due to their fixed parameters, while improved PI controllers encounter implementation difficulties stemming from the complexity of their control strategies. This article proposes a dual-agent adaptive control framework based on the twin delayed deep deterministic policy gradient (TD3) algorithm. This framework facilitates the dynamic adjustment of PI parameters for both voltage and current dual-loop control and capacitor voltage balancing, utilizing a collaboratively optimized agent architecture without reliance on complex control logic or precise mathematical models. Simulation results demonstrate that, compared with fixed-parameter PI controllers, the proposed method significantly reduces DC voltage regulation time while achieving precise dynamic balance control of capacitor voltage and effective suppression of circulating current, thereby notably enhancing system stability and dynamic response characteristics. This approach offers new solutions for dynamic optimization control in MMC-HVDC systems. Full article

(This article belongs to the Section Power Electronics)

► Show Figures

Figure 1

23 pages, 3540 KiB

Open AccessArticle

A Low-Carbon Economic Scheduling Strategy for Multi-Microgrids with Communication Mechanism-Enabled Multi-Agent Deep Reinforcement Learning

by Lei Nie, Bo Long, Meiying Yu, Dawei Zhang, Xiaolei Yang and Shi Jing

Electronics 2025, 14(11), 2251; https://doi.org/10.3390/electronics14112251 - 31 May 2025

Cited by 1 | Viewed by 477

Abstract

To facilitate power system decarbonization, optimizing clean energy integration has emerged as a critical pathway for establishing sustainable power infrastructure. This study addresses the multi-timescale operational challenges inherent in power networks with high renewable penetration, proposing a novel stochastic dynamic programming framework that [...] Read more.

To facilitate power system decarbonization, optimizing clean energy integration has emerged as a critical pathway for establishing sustainable power infrastructure. This study addresses the multi-timescale operational challenges inherent in power networks with high renewable penetration, proposing a novel stochastic dynamic programming framework that synergizes intraday microgrid dispatch with a multi-phase carbon cost calculation mechanism. A probabilistic carbon flux quantification model is developed, incorporating source–load carbon flow tracing and nonconvex carbon pricing dynamics to enhance environmental–economic co-optimization constraints. The spatiotemporally coupled multi-microgrid (MMG) coordination paradigm is reformulated as a continuous state-action Markov game process governed by stochastic differential Stackelberg game principles. A communication mechanism-enabled multi-agent twin-delayed deep deterministic policy gradient (CMMA-TD3) algorithm is implemented to achieve Pareto-optimal solutions through cyber–physical collaboration. Results of the measurements in the MMG containing three microgrids show that the proposed approach reduces operation costs by 61.59% and carbon emissions by 27.95% compared to the least effective benchmark solution. Full article

(This article belongs to the Special Issue Advances in Power System Dynamics, Stability, Control and Dispatch with Large-Scale Renewable Energy Penetrated, 2nd Edition)

► Show Figures

Figure 1

28 pages, 6914 KiB

Open AccessArticle

Guided Reinforcement Learning with Twin Delayed Deep Deterministic Policy Gradient for a Rotary Flexible-Link System

by Carlos Saldaña Enderica, José Ramon Llata and Carlos Torre-Ferrero

Robotics 2025, 14(6), 76; https://doi.org/10.3390/robotics14060076 - 31 May 2025

Viewed by 1274

Abstract

This study proposes a robust methodology for vibration suppression and trajectory tracking in rotary flexible-link systems by leveraging guided reinforcement learning (GRL). The approach integrates the twin delayed deep deterministic policy gradient (TD3) algorithm with a linear quadratic regulator (LQR) acting as a [...] Read more.

This study proposes a robust methodology for vibration suppression and trajectory tracking in rotary flexible-link systems by leveraging guided reinforcement learning (GRL). The approach integrates the twin delayed deep deterministic policy gradient (TD3) algorithm with a linear quadratic regulator (LQR) acting as a guiding controller during training. Flexible-link mechanisms common in advanced robotics and aerospace systems exhibit oscillatory behavior that complicates precise control. To address this, the system is first identified using experimental input-output data from a Quanser^® virtual plant, generating an accurate state-space representation suitable for simulation-based policy learning. The hybrid control strategy enhances sample efficiency and accelerates convergence by incorporating LQR-generated trajectories during TD3 training. Internally, the TD3 agent benefits from architectural features such as twin critics, delayed policy updates, and target action smoothing, which collectively improve learning stability and reduce overestimation bias. Comparative results show that the guided TD3 controller achieves superior performance in terms of vibration damping, transient response, and robustness, when compared to conventional LQR, fuzzy logic, neural networks, and GA-LQR approaches. Although the controller was validated using a high-fidelity digital twin, it has not yet been deployed on the physical plant. Future work will focus on real-time implementation and structural robustness testing under parameter uncertainty. Overall, this research demonstrates that guided reinforcement learning can yield stable and interpretable policies that comply with classical control criteria, offering a scalable and generalizable framework for intelligent control of flexible mechanical systems. Full article

(This article belongs to the Section Industrial Robots and Automation)

► Show Figures

Figure 1

37 pages, 13864 KiB

Open AccessArticle

LSTM-Enhanced Deep Reinforcement Learning for Robust Trajectory Tracking Control of Skid-Steer Mobile Robots Under Terra-Mechanical Constraints

by Jose Manuel Alcayaga, Oswaldo Anibal Menéndez, Miguel Attilio Torres-Torriti, Juan Pablo Vásconez, Tito Arévalo-Ramirez and Alvaro Javier Prado Romo

Robotics 2025, 14(6), 74; https://doi.org/10.3390/robotics14060074 - 29 May 2025

Viewed by 2164

Abstract

Autonomous navigation in mining environments is challenged by complex wheel–terrain interaction, traction losses caused by slip dynamics, and sensor limitations. This paper investigates the effectiveness of Deep Reinforcement Learning (DRL) techniques for the trajectory tracking control of skid-steer mobile robots operating under terra-mechanical [...] Read more.

Autonomous navigation in mining environments is challenged by complex wheel–terrain interaction, traction losses caused by slip dynamics, and sensor limitations. This paper investigates the effectiveness of Deep Reinforcement Learning (DRL) techniques for the trajectory tracking control of skid-steer mobile robots operating under terra-mechanical constraints. Four state-of-the-art DRL algorithms, i.e., Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor–Critic (SAC), are selected to evaluate their ability to generate stable and adaptive control policies under varying environmental conditions. To address the inherent partial observability in real-world navigation, this study presents an original approach that integrates Long Short-Term Memory (LSTM) networks into DRL-based controllers. This allows control agents to retain and leverage temporal dependencies to infer unobservable system states. The developed agents were trained and tested in simulations and then assessed in field experiments under uneven terrain and dynamic model parameter changes that lead to traction losses in mining environments, targeting various trajectory tracking tasks, including lemniscate and squared-type reference trajectories. This contribution strengthens the robustness and adaptability of DRL agents by enabling better generalization of learned policies compared with their baseline counterparts, while also significantly improving trajectory tracking performance. In particular, LSTM-based controllers achieved reductions in tracking errors of 10%, 74%, 21%, and 37% for DDPG-LSTM, PPO-LSTM, TD3-LSTM, and SAC-LSTM, respectively, compared with their non-recurrent counterparts. Furthermore, DDPG-LSTM and TD3-LSTM reduced their control effort through the total variation in control input by 15% and 20% compared with their respective baseline controllers, respectively. Findings from this work provide valuable insights into the role of memory-augmented reinforcement learning for robust motion control in unstructured and high-uncertainty environments. Full article

(This article belongs to the Section Intelligent Robots and Mechatronics)

► Show Figures

Figure 1

36 pages, 11692 KiB

Open AccessArticle

Integrating Model Predictive Control with Deep Reinforcement Learning for Robust Control of Thermal Processes with Long Time Delays

by Kevin Marlon Soza Mamani and Alvaro Javier Prado Romo

Processes 2025, 13(6), 1627; https://doi.org/10.3390/pr13061627 - 22 May 2025

Viewed by 1105

Abstract

Thermal processes with prolonged and variable delays pose considerable difficulties due to unpredictable system dynamics and external disturbances, often resulting in diminished control effectiveness. This work presents a hybrid control strategy that synthesizes deep reinforcement learning (DRL) strategies with nonlinear model predictive control [...] Read more.

Thermal processes with prolonged and variable delays pose considerable difficulties due to unpredictable system dynamics and external disturbances, often resulting in diminished control effectiveness. This work presents a hybrid control strategy that synthesizes deep reinforcement learning (DRL) strategies with nonlinear model predictive control (NMPC) to improve the robust control performance of a thermal process with a long time delay. In this approach, NMPC cost functions are formulated as learning functions to achieve control objectives in terms of thermal tracking and disturbance rejection, while an actor–critic (AC) reinforcement learning agent dynamically adjusts control actions through an adaptive policy based on the exploration and exploitation of real-time data about the thermal process. Unlike conventional NMPC approaches, the proposed framework removes the need for predefined terminal cost tuning and strict constraint formulations during the control execution at runtime, which are typically required to ensure robust stability. To assess performance, a comparative study was conducted evaluating NMPC against AC-based controllers built upon policy gradient algorithms such as the deep deterministic policy gradient (DDPG) and the twin delayed deep deterministic policy gradient (TD3). The proposed method was experimentally validated using a temperature control laboratory (TCLab) testbed featuring long and varying delays. Results demonstrate that while the NMPC–AC hybrid approach maintains tracking control performance comparable to NMPC, the proposed technique acquires adaptability while tracking and further strengthens robustness in the presence of uncertainties and disturbances under dynamic system conditions. These findings highlight the benefits of integrating DRL with NMPC to enhance reliability in thermal process control and optimize resource efficiency in thermal applications. Full article

(This article belongs to the Section Process Control and Monitoring)

► Show Figures

Figure 1

20 pages, 7183 KiB

Open AccessArticle

A Two-Stage Strategy Integrating Gaussian Processes and TD3 for Leader–Follower Coordination in Multi-Agent Systems

by Xicheng Zhang, Bingchun Jiang, Fuqin Deng and Min Zhao

J. Sens. Actuator Netw. 2025, 14(3), 51; https://doi.org/10.3390/jsan14030051 - 14 May 2025

Viewed by 1283

Abstract

In mobile multi-agent systems (MASs), achieving effective leader–follower coordination under unknown dynamics poses significant challenges. This study proposes a two-stage cooperative strategy that integrates Gaussian Processes (GPs) for modeling and a Twin Delayed Deep Deterministic Policy Gradient (TD3) for policy optimization (GPTD3), aiming [...] Read more.

In mobile multi-agent systems (MASs), achieving effective leader–follower coordination under unknown dynamics poses significant challenges. This study proposes a two-stage cooperative strategy that integrates Gaussian Processes (GPs) for modeling and a Twin Delayed Deep Deterministic Policy Gradient (TD3) for policy optimization (GPTD3), aiming to enhance adaptability and multi-objective optimization. Initially, GPs are utilized to model the uncertain dynamics of agents based on sensor data, providing a stable and noiseless training virtual environment for the first phase of TD3 strategy network training. Subsequently, a TD3-based compensation learning mechanism is introduced to reduce consensus errors among multiple agents by incorporating the position state of other agents. Additionally, the approach employs an enhanced dual-layer reward mechanism tailored to different stages of learning, ensuring robustness and improved convergence speed. Experimental results using a differential drive robot simulation demonstrate the superiority of this method over traditional controllers. The integration of the TD3 compensation network further improves the cooperative reward among agents. Full article

► Show Figures

Figure 1

24 pages, 1196 KiB

Open AccessArticle

Integrated Guidance and Control for Strap-Down Flight Vehicle: A Deep Reinforcement Learning Approach

by Qinglong Zhang, Bin Zhao, Yifu Jiang, Jingyan Zhang and Jiale Zhang

Aerospace 2025, 12(5), 400; https://doi.org/10.3390/aerospace12050400 - 1 May 2025

Viewed by 341

Abstract

This paper proposes a three-dimensional (3D) deep reinforcement learning-based integrated guidance and control (DRLIGC) method, which is restricted by the narrow field-of-view (FOV) constraint of the strap-down seeker. By leveraging the data-driven nature of the deep reinforcement learning (DRL) algorithm, this method mitigates [...] Read more.

This paper proposes a three-dimensional (3D) deep reinforcement learning-based integrated guidance and control (DRLIGC) method, which is restricted by the narrow field-of-view (FOV) constraint of the strap-down seeker. By leveraging the data-driven nature of the deep reinforcement learning (DRL) algorithm, this method mitigates the challenges associated with integrated guidance and control (IGC) method design arising from model dependencies, thereby addressing the inherent complexity of the IGC model. Firstly, according to different states and actions, the pitch and yaw channels of the six-degree-of-freedom (6-DOF) IGC model are modeled as Markov decision processes (MDPs). Secondly, a channel-by-channel progressive training method based on the twin delayed deep deterministic policy gradient (TD3) algorithm is proposed. The agents of the pitch and yaw channels are trained using the TD3 algorithm independently, which substantially alleviates the complexity of the training process, while the roll channel is stabilized through the application of the back-stepping method. Thirdly, a comprehensive reward function is designed to simultaneously address the narrow FOV constraint and enhance the target engagement capability. Additionally, this function mitigates the issue of sparse rewards to some extent. Through Monte Carlo (MC) and comparative simulation verification, it is shown that the DRLIGC method proposed in this paper can effectively approach the target while maintaining the narrow FOV constraint and also has good robustness. Full article

(This article belongs to the Special Issue Integrated Guidance and Control for Aerospace Vehicles)

► Show Figures

Figure 1

Search Results (105)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (105)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI