Next Article in Journal
Kriging-Variance-Informed Multi-Robot Path Planning and Task Allocation for Efficient Mapping of Soil Properties
Previous Article in Journal
Task Scheduling with Mobile Robots—A Systematic Literature Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Guided Reinforcement Learning with Twin Delayed Deep Deterministic Policy Gradient for a Rotary Flexible-Link System

by
Carlos Saldaña Enderica
1,2,*,†,
José Ramon Llata
1,† and
Carlos Torre-Ferrero
1,†
1
Department of Electronic Technology, Systems Engineering and Automation, Universidad de Cantabria, Avda. de los Castros, 39005 Santander, Cantabria, Spain
2
Facultad de Sistemas y Telecomunicaciones, Universidad Estatal Península de Santa Elena, Santa Elena, La Libertad 7047, Ecuador
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Robotics 2025, 14(6), 76; https://doi.org/10.3390/robotics14060076
Submission received: 30 March 2025 / Revised: 17 May 2025 / Accepted: 26 May 2025 / Published: 31 May 2025
(This article belongs to the Section Industrial Robots and Automation)

Abstract

This study proposes a robust methodology for vibration suppression and trajectory tracking in rotary flexible-link systems by leveraging guided reinforcement learning (GRL). The approach integrates the twin delayed deep deterministic policy gradient (TD3) algorithm with a linear quadratic regulator (LQR) acting as a guiding controller during training. Flexible-link mechanisms common in advanced robotics and aerospace systems exhibit oscillatory behavior that complicates precise control. To address this, the system is first identified using experimental input-output data from a Quanser® virtual plant, generating an accurate state-space representation suitable for simulation-based policy learning. The hybrid control strategy enhances sample efficiency and accelerates convergence by incorporating LQR-generated trajectories during TD3 training. Internally, the TD3 agent benefits from architectural features such as twin critics, delayed policy updates, and target action smoothing, which collectively improve learning stability and reduce overestimation bias. Comparative results show that the guided TD3 controller achieves superior performance in terms of vibration damping, transient response, and robustness, when compared to conventional LQR, fuzzy logic, neural networks, and GA-LQR approaches. Although the controller was validated using a high-fidelity digital twin, it has not yet been deployed on the physical plant. Future work will focus on real-time implementation and structural robustness testing under parameter uncertainty. Overall, this research demonstrates that guided reinforcement learning can yield stable and interpretable policies that comply with classical control criteria, offering a scalable and generalizable framework for intelligent control of flexible mechanical systems.
Keywords: guided reinforcement learning; deep reinforcement learning; TD3; linear quadratic regulator; hybrid control; vibration suppression; flexible link systems; robotics guided reinforcement learning; deep reinforcement learning; TD3; linear quadratic regulator; hybrid control; vibration suppression; flexible link systems; robotics

Share and Cite

MDPI and ACS Style

Saldaña Enderica, C.; Llata, J.R.; Torre-Ferrero, C. Guided Reinforcement Learning with Twin Delayed Deep Deterministic Policy Gradient for a Rotary Flexible-Link System. Robotics 2025, 14, 76. https://doi.org/10.3390/robotics14060076

AMA Style

Saldaña Enderica C, Llata JR, Torre-Ferrero C. Guided Reinforcement Learning with Twin Delayed Deep Deterministic Policy Gradient for a Rotary Flexible-Link System. Robotics. 2025; 14(6):76. https://doi.org/10.3390/robotics14060076

Chicago/Turabian Style

Saldaña Enderica, Carlos, José Ramon Llata, and Carlos Torre-Ferrero. 2025. "Guided Reinforcement Learning with Twin Delayed Deep Deterministic Policy Gradient for a Rotary Flexible-Link System" Robotics 14, no. 6: 76. https://doi.org/10.3390/robotics14060076

APA Style

Saldaña Enderica, C., Llata, J. R., & Torre-Ferrero, C. (2025). Guided Reinforcement Learning with Twin Delayed Deep Deterministic Policy Gradient for a Rotary Flexible-Link System. Robotics, 14(6), 76. https://doi.org/10.3390/robotics14060076

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop