Improved PPO Optimization for Robotic Arm Grasping Trajectory Planning and Real-Robot Migration
Abstract
1. Introduction
2. Related Work
3. Optimized PPO Algorithm Architecture for Robotic Arm Grasping
3.1. Proximal Policy Optimization (PPO)
3.2. State and Action Space Design
3.3. Loss Function and Reward Design
4. Improved PPO Algorithm Design Based on Simulated Annealing Algorithm
4.1. Simulated Annealing (SA)
4.2. Design of Improved PPO Algorithm for SA Based on Crawling
5. Experiment of Robotic Arm Trajectory Planning Based on Improved PPO Algorithm
5.1. 3D Simulation Modeling
5.2. Grabbing Task Trajectory Planning
5.3. Simulation and Analysis
- Increased ability to jump out of the local optimum at the beginning of training and fast convergence at the later stage;
- A 6.5% absolute increase in optimal model crawl success rate (98% vs. 92%);
- A 7.2% reduction in steps per successful episode (143 vs. 154).
5.4. Sim-to-Real
- Position randomization. We randomly varied target positions and obstacles during training to improve the model’s generalization ability. This randomization helps the model adapt to complex scenarios in the real world caused by different target positions;
- Adaptive noise injection. We implemented adaptive noise injection in the observation space and introduced noise into joint angle measurements to simulate the uncertainty of sensors in the real world;
- Standardization of the unified observation space. We achieved consistent standardization between simulated and real-world observations to avoid inference failures caused by inconsistent observation spaces;
- Coordinate System Transformation. Accurate coordinate transformation is achieved through hand-eye calibration to ensure consistency between the real and simulated worlds.
5.5. Future Work
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Conflicts of Interest
References
- Wei, K.; Ren, B. A method on dynamic path planning for robotic manipulator autonomous obstacle avoidance based on an improved RRT algorithm. Sensors 2018, 18, 571. [Google Scholar] [CrossRef] [PubMed]
- Xin, J.; Zhao, H.; Liu, D.; Li, M. Application of deep reinforcement learning in mobile robot path planning. In Proceedings of the 2017 Chinese Automation Congress (CAC), Jinan, China, 20–22 October 2017; pp. 7112–7116. [Google Scholar] [CrossRef]
- Rakshit, A.; Konar, A.; Nagar, A.K. A hybrid brain-computer interface for closed-loop position control of a robot arm. IEEE/CAA J. Autom. Sin. 2020, 7, 1344–1360. [Google Scholar] [CrossRef]
- Zhang, J.; Tao, D. Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things. IEEE Internet Things J. 2021, 8, 7789–7817. [Google Scholar] [CrossRef]
- Shimizu, M.; Kakuya, H.; Yoon, W.-K.; Kitagaki, K.; Kosuge, K. Analytical Inverse Kinematic Computation for 7-DOF Redundant Manipulators With Joint Limits and Its Application to Redundancy Resolution. IEEE Trans. Robot. 2008, 24, 1131–1142. [Google Scholar] [CrossRef]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- Katoch, S.; Chauhan, S.S.; Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar] [CrossRef] [PubMed]
- Morales, E.F.; Zaragoza, J.H. An introduction to reinforcement learning. IEEE Trans. Neural Netw. 2011, 11, 219–354. [Google Scholar]
- Isele, D.; Rahimi, R.; Cosgun, A.; Subramanian, K.; Fujimura, K. Navigating Occluded Intersections with Autonomous Vehicles Using Deep Reinforcement Learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 2034–2039. [Google Scholar] [CrossRef]
- Zhang, X.H.; Zhang, C.; Wang, M.Y.; Wang, Y.; Du, Y.Y.; Mao, Q.H.; Lü, X.Y. Virtual control technology of cantilever roadheader driven by digital twin. Comput. Integr. Manuf. Syst. 2021, 27, 1617–1628. (In Chinese) [Google Scholar]
- Chen, G. Application of digital twin technology in petrochemical industry. Pet. Refin. Eng. 2022, 52, 44–49. (In Chinese) [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar] [CrossRef]
- Wang, Z.; Schaul, T.; Hessel, M.; Van Hasselt, H.; Lanctot, M.; De Freitas, N. Dueling Network Architec-tures for Deep Reinforcement Learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1995–2003. [Google Scholar]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 2094–2100. [Google Scholar] [CrossRef]
- Li, H.; Zhao, Z.; Lei, G.; Guo, L.; Bi, Z.; Lin, T. Robotic arm control method based on deep reinforcement learning. Syst. Simul. 2019, 31, 2452–2457. (In Chinese) [Google Scholar]
- Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.; Moritz, P. Trust Region Policy Optimization. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Chen, X.; Diao, D.; Chen, H.; Yao, H.; Piao, H.; Sun, Z.; Yang, Z.; Goebel, R.; Jiang, B.; Chang, Y. The Sufficiency of Off-Policyness and Soft Clipping: PPO Is Still Insufficient according to an Off-Policy Measure. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 7078–7086. [Google Scholar] [CrossRef]
- Zhou, Y.; Gao, H.; Tian, Y. Proximal policy optimization algorithm based on clipping optimization and policy guidance. J. Comput. Appl. 2024, 44, 2334–2341. [Google Scholar] [CrossRef]
- Yu, B.; Lee, J.; Park, S.H.; Kim, M. From Unified Robot Description Format to DH Parameters: Examinations of Two Different Approaches for Manipulator. IEEE Access 2024, 12, 133441–133455. [Google Scholar] [CrossRef]
- Delahaye, D.; Chaimatanan, S.; Mongeau, M. Simulated Annealing: From Basics to Applications. In Handbook of Metaheuristics, 2nd ed.; Gendreau, M., Potvin, J.-Y., Eds.; International Series in Operations Research & Management Science; Springer: Cham, Switzerland, 2019; Volume 272, pp. 1–35. [Google Scholar] [CrossRef]
- Raffin, A.; Hill, A.; Gleave, A. Stable-baselines3: Reliable reinforcement learning implementations. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
- Yang, X.; Ji, Z.; Wu, J.; Lai, Y.K. An Open-Source Multi-goal Reinforcement Learning Environment for Robotic Manipulation with Pybullet. In Proceedings of the Towards Autonomous Robotic Systems (TAROS 2021), Lincoln, UK, 8–10 September 2021; Fox, C., Gao, J., Ghalamzan Esfahani, A., Saaj, M., Hanheide, M., Parsons, S., Eds.; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2021; Volume 13054. [Google Scholar] [CrossRef]
- Scheikl, P.M.; Tagliabue, E.; Gyenes, B.; Wagner, M.; Dall’ALba, D.; Fiorini, P.; Mathis-Ullrich, F. Sim-to-Real Transfer for Visual Reinforcement Learning of Deformable Object Manipulation for Robot-Assisted Surgery. IEEE Robot. Autom. Lett. 2023, 8, 560–567. [Google Scholar] [CrossRef]
- Martin, J.B.; Yu, T.; Moutarde, F. Pre-trained image encoder for data-efficient reinforcement learning and sim-to-real transfer on robotic-manipulation tasks. In Proceedings of the CoRL 2022 Workshop on Pre-training Robot Learning, Auckland, New Zealand, 14 December 2022. [Google Scholar]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; Association for Computing Machinery: New York, NY, USA, 2009; pp. 41–48. [Google Scholar]
Software and Hardware | Detailed Information |
---|---|
Processors | Intel I7-12800HX |
Discrete Graphics Card | NVIDIA GeForce RTX 4070 |
Operating technique | Windows-11 |
Physical Simulation Library | Pybullet |
Deep Reinforcement Learning Library | Stable-Baselines3: 2.0.0 |
Python Environment | 3.10.16 |
Customizing the Environment to Create Libraries | Gymnasium: 0.28.1 |
Parameter | Value |
---|---|
max_step (Maximum number of execution steps) | 20,000,000 |
initial_lr (Initialized learning rate) | 0.0003 |
min_lr (Minimum learning rate) | 0.000001 |
annealing_coefficient | 0.98 |
batch_size (Number of samples in a single training session) | 256 |
clip_range= | 0.2 |
Gmma (discount factor) | 0.99 |
n_steps (Number of environmental steps collected per update) | 2048 |
n_epoches (Number of training rounds executed per update) | 10 |
Single_length_max (Maximum length of a single training session) | 200 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, C.; Liu, Z.; Li, L.; Ji, Z.; Li, C.; Liang, J.; Li, Y. Improved PPO Optimization for Robotic Arm Grasping Trajectory Planning and Real-Robot Migration. Sensors 2025, 25, 5253. https://doi.org/10.3390/s25175253
Li C, Liu Z, Li L, Ji Z, Li C, Liang J, Li Y. Improved PPO Optimization for Robotic Arm Grasping Trajectory Planning and Real-Robot Migration. Sensors. 2025; 25(17):5253. https://doi.org/10.3390/s25175253
Chicago/Turabian StyleLi, Chunlei, Zhe Liu, Liang Li, Zeyu Ji, Chenbo Li, Jiaxing Liang, and Yafeng Li. 2025. "Improved PPO Optimization for Robotic Arm Grasping Trajectory Planning and Real-Robot Migration" Sensors 25, no. 17: 5253. https://doi.org/10.3390/s25175253
APA StyleLi, C., Liu, Z., Li, L., Ji, Z., Li, C., Liang, J., & Li, Y. (2025). Improved PPO Optimization for Robotic Arm Grasping Trajectory Planning and Real-Robot Migration. Sensors, 25(17), 5253. https://doi.org/10.3390/s25175253