GT-TD3: A Kinematics-Aware Graph-Transformer Framework for Stable Trajectory Tracking of High-Degree-of-Freedom (DOF) Manipulators
Abstract
1. Introduction
2. Related Work
2.1. From Traditional Control to Deep Reinforcement Learning
2.2. Structured Neural Architectures in Robotic Control
2.3. Position of Our Work
3. Methods
3.1. Overall Framework of GT-TD3
3.2. TD3-Based Learning Framework
| Algorithm 1. Training Procedure of GT-TD3 |
| Input: actor , twin critics , target networks , , , replay buffer , total steps , warm-up steps , policy delay , discount factor , soft update rate , target noise , clipping threshold Initialize target networks: , , |
![]() |
3.3. Joint-Based State Encoding
3.4. Local Dependency Modeling Through Gated Graph Aggregation
3.5. Kinematic-Aware Transformer with Sinusoidal Positional Encoding
3.6. Cross-Scale Feature Fusion and Action Output Head
3.7. Reward Function
4. Experiments and Results
4.1. Experimental Setup
4.2. Learning Dynamics and Convergence Analysis
4.2.1. Task Performance and Sample Efficiency
4.2.2. Trajectory Fidelity and Kinematic Optimization
4.3. Test Results Analysis
4.4. Stability Analysis
4.5. Trajectory Tracking Evaluation
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| CNN | Convolutional Neural Network |
| DDPG | Deep Deterministic Policy Gradient |
| DH | Denavit–Hartenberg |
| DQN | Deep Q-Network |
| GNN | Graph Neural Network |
| GT-TD3 | Graph Transformer-Twin Delayed Deep Deterministic Policy Gradient |
| MLP | Multilayer Perceptron |
| KAPE | Kinematic-Aware Positional Encoding |
| PID | Proportional-Integral-Derivative |
| RMSE | Root Mean Square Error |
| SAC | Soft Actor–Critic |
| TD3 | Twin Delayed Deep Deterministic Policy Gradient |
| URDF | Unified Robot Description Format |
References
- Billard, A.; Kragic, D. Trends and challenges in robot manipulation. Science 2019, 364, eaat8414. [Google Scholar] [CrossRef]
- Verl, A.; Valente, A.; Melkote, S.; Brecher, C.; Ozturk, E.; Tunc, L.T. Robots in machining. CIRP Ann. 2019, 68, 799–822. [Google Scholar] [CrossRef]
- Taylor, R.H.; Menciassi, A.; Fichtinger, G.; Fiorini, P.; Dario, P. Medical robotics and computer-integrated surgery. IEEE Trans. Biomed. Eng. 2016, 63, 2079–2094. [Google Scholar]
- Su, Y.; Zheng, C.; Mercorelli, P. Robust approximate fixed-time tracking control for uncertain robot manipulators. Mech. Syst. Signal Process. 2020, 135, 106379. [Google Scholar] [CrossRef]
- Khan, H.; Lee, M.C.; Suh, J.; Kim, R. Enhancing robot end-effector trajectory tracking using virtual force-tracking impedance control. Adv. Intell. Syst. 2025, 7, 2400380. [Google Scholar] [CrossRef]
- Qiu, X.; Cai, Z.; Peng, H. Path planning of a continuum robot’s end-effector for assembly missions in unstructured environments. In Proceedings of the 2022 IEEE 5th Advanced Information Management, Communication, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 16–18 December 2022; pp. 539–543. [Google Scholar]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. Correction to “A formal basis for the heuristic determination of minimum cost paths”. ACM SIGART Bull. 1972, 37, 28–29. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar] [CrossRef]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double Q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Tang, C.; Abbatematteo, B.; Hu, J.; Chandra, R.; Martín-Martín, R.; Stone, P. Deep reinforcement learning for robotics: A survey of real-world successes. arXiv 2024, arXiv:2408.03539. [Google Scholar] [CrossRef]
- Susanto, E.; Sumaryo, S.; Rahmat, B. Neural network control for dynamics of a 3DOF robot arm. In Proceedings of the 2024 IEEE 10th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA), Bandung, Indonesia, 30–31 July 2024; pp. 196–200. [Google Scholar]
- Yan, Z.; Chang, Y.; Yuan, L.; Wei, F.; Wang, X.; Dong, X.; Han, H. Deep learning-driven robot arm control fusing convolutional visual perception and predictive modeling for motion planning. J. Organ. End User Comput. 2024, 36, 1–29. [Google Scholar] [CrossRef]
- Keppler, M.; Lakatos, D.; Ott, C.; Albu-Schäffer, A. Minimally model-based trajectory tracking and variable impedance control for flexible-joint robots. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 3314–3320. [Google Scholar]
- Khan, A.; Tolstaya, E.; Ribeiro, A.; Kumar, V. Graph policy gradients for large scale robot control. In Proceedings of the Conference on Robot Learning (CoRL), Osaka, Japan, 30 October–1 November 2019; pp. 823–834. [Google Scholar]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef]
- Hart, P.; Knoll, A. Graph neural networks and reinforcement learning for behavior generation in semantic environments. arXiv 2020, arXiv:2006.12576. [Google Scholar] [CrossRef]
- Kazemi, E.; Soltani, I. MarineFormer: A Transformer-based navigation policy model for collision avoidance in marine environment. arXiv 2024, arXiv:2410.13973. [Google Scholar]
- Alon, U.; Yahav, E. On the bottleneck of graph neural networks and its practical implications. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Li, W.; Luo, H.; Lin, Z.; Zhang, C.; Lu, Z.; Ye, D. A survey on transformers in reinforcement learning. arXiv 2023, arXiv:2301.03044. [Google Scholar] [CrossRef]
- Ang, K.H.; Chong, G.; Li, Y. PID control system analysis, design, and technology. IEEE Trans. Control Syst. Technol. 2005, 13, 559–576. [Google Scholar]
- Ma, L.; Xue, J.; Kawabata, K.; Zhu, J.; Ma, C.; Zheng, N. A fast RRT algorithm for motion planning of autonomous road vehicles. In Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014; pp. 1033–1038. [Google Scholar]
- Huang, S.; Chen, Q.; Zhang, X.; Sun, J.; Schwager, M. ParticleFormer: A 3D point cloud world model for multi-object, multi-material robotic manipulation. arXiv 2025, arXiv:2506.23126. [Google Scholar]
- Wang, T.; Liao, R.; Ba, J.; Fidler, S. NerveNet: Learning structured policy with graph neural networks. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Khan, A.; Ribeiro, A.; Kumar, V.; Francis, A.G. Graph neural networks for motion planning. arXiv 2020, arXiv:2006.06248. [Google Scholar] [CrossRef]
- Zhang, Q.; Zhang, X.; Ye, Z.; Mi, J. MSTT: A multi-spatio-temporal graph attention model for pedestrian trajectory prediction. Sensors 2025, 25, 4850. [Google Scholar] [CrossRef] [PubMed]
- Parisotto, E.; Song, F.; Rae, J.; Pascanu, R.; Gulcehre, C.; Jayakumar, S.; Jaderberg, M.; Kaufman, R.L.; Clark, A.; Noury, S.; et al. Stabilizing transformers for reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), Virtual, 12–18 July 2020; pp. 7487–7498. [Google Scholar]
- Tejomurtula, S.; Kak, S. Inverse kinematics in robotics using neural networks. Inf. Sci. 1999, 116, 147–164. [Google Scholar] [CrossRef]
- Gao, R. Inverse kinematics solution of robotics based on neural network algorithms. J. Ambient Intell. Humaniz. Comput. 2020, 11, 6199–6209. [Google Scholar] [CrossRef]
- Cagigas-Muñiz, D. Artificial neural networks for inverse kinematics problem in articulated robots. Eng. Appl. Artif. Intell. 2023, 126, 107175. [Google Scholar] [CrossRef]
- Sheng, Z.; Huang, Z.; Chen, S. Kinematics-aware multigraph attention network with residual learning for heterogeneous trajectory prediction. J. Intell. Connect. Veh. 2024, 7, 138–150. [Google Scholar] [CrossRef]
- Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong, R.; Welinder, P.; McGrew, B.; Tobin, J.; Abbeel, P.; Zaremba, W. Hindsight experience replay. In Advances in Neural Information Processing Systems (NeurIPS); NeurIPS Foundation: San Diego, CA, USA, 2017. [Google Scholar]
- Peng, X.B.; Abbeel, P.; Levine, S.; van de Panne, M. DeepMimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 2018, 37, 143. [Google Scholar] [CrossRef]
- Rajeswaran, A.; Kumar, V.; Gupta, A.; Vezzani, G.; Schulman, J.; Todorov, E.; Levine, S. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In Proceedings of the Robotics: Science and Systems (RSS), Pittsburgh, PA, USA, 26–30 June 2018. [Google Scholar]













| Module | Parameter | Value | Configuration Details |
|---|---|---|---|
| Graph Topology | Nodes/Edges | 1-hop chain graph with self-loops | |
| Configured dim | |||
| GNN Encoder | Layers | 2 | Gated message passing with LayerNorm |
| Hidden dimension | 64 | Row-normalized adjacency | |
| Readout | Mean + Max | Global pooling | |
| Transformer | Encoder layers | 2 | KAPE-based encoder |
| Attention heads | 4 | Multi-head self-attention | |
| Embedding dim | Latent feature space | ||
| FNN dimension | Expansion ratio = 4 | ||
| Activation | Hidden/Output | LeakyReLU/ELU/Tanh | Implementation-consistent |
| Parameter | Symbol | Value | Description |
|---|---|---|---|
| Total Timesteps | 500,000 | Maximum training timesteps | |
| Warm-up Steps | 25,000 | Random exploration phase | |
| Min-batch Size | 512 | Increased for stability | |
| Replay Buffer Size | 500,000 | Extended memory capacity | |
| Learning Rate | 1 × 10−5 | Unified for Actor/Critic | |
| Discount Factor | 0.99 | Future reward weight | |
| Soft Update Rate | 0.003 | Target network update | |
| Policy Delay | 2 | Delayed Actor updates | |
| Exploration Noise | 0.1 | Action noise (Gaussian) | |
| Policy Smooth Noise | 0.2 | Target policy noise | |
| Noise Clip | 0.5 | Noise clipping range |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Miao, H.; Hou, H.; Zhu, Z.; Chao, Z.; Zhang, R. GT-TD3: A Kinematics-Aware Graph-Transformer Framework for Stable Trajectory Tracking of High-Degree-of-Freedom (DOF) Manipulators. Machines 2026, 14, 397. https://doi.org/10.3390/machines14040397
Miao H, Hou H, Zhu Z, Chao Z, Zhang R. GT-TD3: A Kinematics-Aware Graph-Transformer Framework for Stable Trajectory Tracking of High-Degree-of-Freedom (DOF) Manipulators. Machines. 2026; 14(4):397. https://doi.org/10.3390/machines14040397
Chicago/Turabian StyleMiao, Hanwen, Haoran Hou, Zhaopeng Zhu, Zheng Chao, and Rui Zhang. 2026. "GT-TD3: A Kinematics-Aware Graph-Transformer Framework for Stable Trajectory Tracking of High-Degree-of-Freedom (DOF) Manipulators" Machines 14, no. 4: 397. https://doi.org/10.3390/machines14040397
APA StyleMiao, H., Hou, H., Zhu, Z., Chao, Z., & Zhang, R. (2026). GT-TD3: A Kinematics-Aware Graph-Transformer Framework for Stable Trajectory Tracking of High-Degree-of-Freedom (DOF) Manipulators. Machines, 14(4), 397. https://doi.org/10.3390/machines14040397

