DRFW-TQC: Reinforcement Learning for Robotic Strawberry Picking with Dynamic Regularization and Feature Weighting
Abstract
1. Introduction
- A novel posture optimization reward function is developed to guide precise end-effector alignment with target strawberries, incorporating both spatial positioning and orientation constraints to ensure optimal grasping postures in cluttered environments.
- The DRFW-TQC framework implements group-wise feature weighting and Dynamic L2 Regularization techniques within the Truncated Quantile Critics [9] algorithm to accelerate convergence and enhance training stability, effectively addressing the sparse reward problem inherent in complex picking scenarios.
- The proposed framework further incorporates a transfer strategy [10] that systematically migrates both the experience replay buffer and the learned networks to complex operational conditions, significantly improving training efficiency in challenging complex conditions.
2. Materials and Methods
2.1. Strawberry Picking Simulation Environment Modeling
2.2. Posture Optimization Reward Function
- The attitude alignment condition is satisfied.
- The end-effector contacts the target strawberry.
- The gripper jaws successfully close.
2.3. State and Action Space Characterization
2.4. DRFW-TQC Framework for Training Optimization
2.4.1. Truncated Quantile Critics Algorithm
2.4.2. Group-Wise Feature Weighting Network
2.4.3. Dynamic L2 Regularization
2.4.4. Multi-Objective Strawberry Picking Transfer Strategy
3. Experimental Design and Evaluation Metrics
- Average cumulative reward (AR), which measures the average of the cumulative rewards in the final ten epochs;
- Picking success rate (PS), which is the ratio of the number of successful episodes to the total episodes in the last ten epochs;
- Angular error (AE), or , is used to evaluate the accuracy of the end-effector’s grasping in the last ten epochs;
- First Success Step (FS), which is the total number of steps required for the first successful pick, indicates exploration efficiency;
- Timeout count (TO), the total number of truncated episodes, reflects the efficiency of policy convergence.
4. Results and Discussion
4.1. Results of Simulation Experiments
4.2. Experimental Results on Transfer Effects
4.3. Experimental Results for Hyperparameters
4.3.1. Hyperparameter Analysis of Distance and Attitude Alignment Reward Function
4.3.2. Hyperparameter Analysis of GFWN
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
TQC | Truncated Quantile Critics |
GFWN | Group-Wise Feature Weighting Network |
DL2 | Dynamic L2 Regularization |
DDPG | Deep Deterministic Policy Gradient |
SAC | Soft Actor-Critic |
AR | average cumulative reward |
PS | picking success rate |
AE | angular error |
FS | First Success Step |
TO | timeout count |
References
- Gunderman, A.; Collins, J.; Myers, A.; Threlfall, R.; Chen, Y. Tendon-Driven Soft Robotic Gripper for Blackberry Harvesting. IEEE Robot. Autom. Lett. 2022, 7, 2652–2659. [Google Scholar] [CrossRef]
- Li, Y.; Feng, Q.; Zhang, Y.; Peng, C.; Ma, Y.; Liu, C.; Ru, M.; Sun, J.; Zhao, C. Peduncle collision-free grasping based on deep reinforcement learning for tomato harvesting robot. Comput. Electron. Agric. 2024, 216, 108488. [Google Scholar] [CrossRef]
- Miao, Z.; Chen, Y.; Yang, L.; Hu, S.; Xiong, Y. A Fast Path-Planning Method for Continuous Harvesting of Table-Top Grown Strawberries. IEEE Trans. AgriFood Electron. 2025, 3, 233–245. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, K.; Yang, L.; Zhang, D.; Cui, T.; Yu, Y.; Liu, H. Design and simulation experiment of ridge planting strawberry picking manipulator. Comput. Electron. Agric. 2023, 208, 107690. [Google Scholar] [CrossRef]
- Magistri, F.; Pan, Y.; Bartels, J.; Behley, J.; Stachniss, C.; Lehnert, C. Improving Robotic Fruit Harvesting Within Cluttered Environments Through 3D Shape Completion. IEEE Robot. Autom. Lett. 2024, 9, 7357–7364. [Google Scholar] [CrossRef]
- Rizwan, A.; Khan, A.N.; Ibrahim, M.; Ahmad, R.; Iqbal, N.; Kim, D.H. Optimal environment control and fruits delivery tracking system using blockchain for greenhouse. Comput. Electron. Agric. 2024, 220, 108889. [Google Scholar] [CrossRef]
- Liu, Y.; Ping, Y.; Zhang, L.; Wang, L.; Xu, X. Scheduling of decentralized robot services in cloud manufacturing with deep reinforcement learning. Robot. Comput.-Integr. Manuf. 2023, 80, 102454. [Google Scholar] [CrossRef]
- Li, T.; Xie, F.; Zhao, Z.; Zhao, H.; Guo, X.; Feng, Q. A multi-arm robot system for efficient apple harvesting: Perception, task plan and control. Comput. Electron. Agric. 2023, 211, 107979. [Google Scholar] [CrossRef]
- Kuznetsov, A.; Shvechikov, P.; Grishin, A.; Vetrov, D. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In Proceedings of the International Conference on Machine Learning, Online, 13–18 July 2020; pp. 5556–5566. [Google Scholar]
- Zhu, Z.; Lin, K.; Jain, A.K.; Zhou, J. Transfer learning in deep reinforcement learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13344–13362. [Google Scholar] [CrossRef]
- Gronauer, S.; Kissel, M.; Sacchetto, L.; Korte, M.; Diepold, K. Using simulation optimization to improve zero-shot policy transfer of quadrotors. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 10170–10176. [Google Scholar]
- Liu, Y.; Xu, H.; Liu, D.; Wang, L. A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping. Robot. Comput.-Integr. Manuf. 2022, 78, 102365. [Google Scholar] [CrossRef]
- Al Ali, A.; Shi, J.-F.; Zhu, Z.H. Path planning of 6-DOF free-floating space robotic manipulators using reinforcement learning. Acta Astronaut. 2024, 224, 367–378. [Google Scholar] [CrossRef]
- Gao, Y.; Wu, J.; Yang, X.; Ji, Z. Efficient hierarchical reinforcement learning for mapless navigation with predictive neighbouring space scoring. IEEE Trans. Autom. Sci. Eng. 2023, 165, 677–688. [Google Scholar] [CrossRef]
- Gan, Y.; Li, P.; Jiang, H.; Wang, G.; Jin, Y.; Chen, X.; Ji, J. A reinforcement learning method for motion control with constraints on an HPN Arm. IEEE Robot. Autom. Lett. 2022, 7, 12006–12013. [Google Scholar] [CrossRef]
- Goldenits, G.; Mallinger, K.; Raubitzek, S.; Neubauer, T. Current applications and potential future directions of reinforcement learning-based Digital Twins in agriculture. Smart Agric. Technol. 2024, 8, 100512. [Google Scholar] [CrossRef]
- Panerati, J.; Zheng, H.; Zhou, S.; Xu, J.; Prorok, A.; Schoellig, A.P. Learning to Fly—A Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 7512–7519. [Google Scholar]
- Ishmatuka, C.; Soesanti, I.; Ataka, A. Autonomous Pick-and-Place Using Excavator Based on Deep Reinforcement Learning. In Proceedings of the 2023 15th International Conference on Information Technology and Electrical Engineering (ICITEE), Chiang Mai, Thailand, 26–27 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 19–24. [Google Scholar]
- Yang, S.; Zhang, W.; Song, R.; Cheng, J.; Li, Y. Learning multi-object dense descriptor for autonomous goal-conditioned grasping. IEEE Robot. Autom. Lett. 2021, 6, 4109–4116. [Google Scholar] [CrossRef]
- Xie, F.; Guo, Z.; Li, T.; Feng, Q.; Zhao, C. Dynamic Task Planning for Multi-Arm Harvesting Robots Under Multiple Constraints Using Deep Reinforcement Learning. Horticulturae 2025, 11, 88. [Google Scholar] [CrossRef]
- He, Z.; Li, J.; Wu, F.; Shi, H.; Hwang, K.-S. Derl: Coupling decomposition in action space for reinforcement learning task. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 8, 1030–1043. [Google Scholar] [CrossRef]
- Shao, Y.; Zhou, H.; Zhao, S.; Fan, X.; Jiang, J. A Control Method of Robotic Arm Based on Improved Deep Deterministic Policy Gradient. In Proceedings of the 2023 IEEE International Conference on Mechatronics and Automation (ICMA), Harbin, China, 6–9 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 473–478. [Google Scholar]
- Zhang, Y.; Li, G.; Al-Ani, M. Robust learning-based model predictive control for wave energy converters. IEEE Trans. Sustain. Energy 2024, 15, 1957–1967. [Google Scholar] [CrossRef]
- Xiao, M.; Wang, D.; Wu, M.; Liu, K.; Xiong, H.; Zhou, Y.; Fu, Y. Traceable group-wise self-optimizing feature transformation learning: A dual optimization perspective. ACM Trans. Knowl. Discov. Data 2024, 18, 1–22. [Google Scholar] [CrossRef]
- Han, Z.; Yang, Y.; Zhang, C.; Zhang, L.; Zhou, J.T.; Hu, Q. Selective learning: Towards robust calibration with dynamic regularization. arXiv 2024, arXiv:2402.08384. [Google Scholar]
- Mysore, S.; Mabsout, B.; Mancuso, R.; Saenko, K. Regularizing action policies for smooth control with reinforcement learning. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1810–1816. [Google Scholar]
- Gupta, D.; Hazarika, B.B.; Berlin, M. Robust regularized extreme learning machine with asymmetric Huber loss function. Neural Comput. Appl. 2020, 32, 12971–12998. [Google Scholar] [CrossRef]
- Yan, H.; Shao, D. Enhancing Transformer Training Efficiency with Dynamic Dropout. arXiv 2024, arXiv:2411.03236. [Google Scholar]
- Lyle, C.; Rowland, M.; Dabney, W.; Kwiatkowska, M.; Gal, Y. Learning dynamics and generalization in deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 14560–14581. [Google Scholar]
- Jawaddi, S.N.A.; Ismail, A. Integrating OpenAI Gym and CloudSim Plus: A simulation environment for DRL Agent training in energy-driven cloud scaling. Simul. Model. Pract. Theory 2024, 130, 102858. [Google Scholar] [CrossRef]
- Raffin, A.; Hi, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
- Gharakhani, H.; Thomasson, J.A.; Lu, Y. An end-effector for robotic cotton harvesting. Smart Agric. Technol. 2022, 2, 100043. [Google Scholar] [CrossRef]
- Huang, A.; Yu, C.; Feng, J.; Tong, X.; Yorozu, A.; Ohya, A.; Hu, Y. A motion planning method for winter jujube harvesting robotic arm based on optimized Informed-RRT* algorithm. Smart Agric. Technol. 2025, 10, 100732. [Google Scholar] [CrossRef]
- Cao, H.G.; Zeng, W.; Wu, I.C. Reinforcement learning for picking cluttered general objects with dense object descriptors. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 6358–6364. [Google Scholar]
- Bi, A.; Zhang, C. Robot Arm Grasping based on Multi-threaded PPO Reinforcement Learning Algorithm. In Proceedings of the 2024 3rd International Conference on Artificial Intelligence, Human-Computer Interaction and Robotics (AIHCIR), Hong Kong, China, 15–17 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 369–373. [Google Scholar]
Method | FS() | TO | AE | AR () | PS |
---|---|---|---|---|---|
TQC | 26.4 ± 3.7 | 294 ± 19 | 0.192 ± 0.015 | 20.87 ± 4.15 | 0.734 ± 0.060 |
TQC+DL2 | 16.4 ± 2.4 | 256 ± 16 | 0.175 ± 0.018 | 24.25 ± 3.93 | 0.807 ± 0.050 |
TQC+GFWN | 25.9 ± 5.1 | 271 ± 17 | 0.179 ± 0.012 | 22.69 ± 4.01 | 0.851 ± 0.031 |
DRFW-TQC | 15.1 ± 1.7 | 238 ± 17 | 0.153 ± 0.015 | 26.94 ± 3.62 | 0.851 ± 0.044 |
SAC | 35.8 ± 5.5 | 301 ± 9 | 0.171 ± 0.016 | 9.52 ± 1.93 | 0.827± 0.039 |
DDPG | 37.1 ± 4.7 | 419 ± 3 | 0.447 ± 0.031 | −16.54 ± 1.25 | 0.074 ± 0.023 |
Comparison | Metric | Relative Improvement | p-Value | Effect Size |
---|---|---|---|---|
DRFW-TQC vs. TQC | PS | +16.0% | 0.38 | |
AR | +29.1% | 0.26 | ||
AE | −20.3% | −0.46 | ||
TO | −19.0% | −0.57 | ||
FS | −42.7% | −0.71 | ||
DRFW-TQC vs. SAC | PS | +2.9% | 0.09 | |
AR | +183.0% | 0.93 | ||
AE | −10.5% | −0.20 | ||
TO | −20.9% | −0.84 | ||
FS | −57.7% | −0.93 | ||
DRFW-TQC vs. DDPG | PS | +1050.0% | 3.56 | |
AR | +262.9% | 2.62 | ||
AE | −65.8% | −2.19 | ||
TO | −43.2% | −2.71 | ||
FS | −59.2% | −1.15 |
Condition | FS | TO | AE | AR () | PS |
---|---|---|---|---|---|
With transfer strategy | 44 | 111 | 0.091 | 62.21 ± 4.22 | 0.891 ± 0.032 |
Without | 21,992 | 279 | 0.188 | 41.04 ± 5.35 | 0.774 ± 0.056 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zheng, A.; Fang, Z.; Li, Z.; Dong, H.; Li, K. DRFW-TQC: Reinforcement Learning for Robotic Strawberry Picking with Dynamic Regularization and Feature Weighting. AgriEngineering 2025, 7, 208. https://doi.org/10.3390/agriengineering7070208
Zheng A, Fang Z, Li Z, Dong H, Li K. DRFW-TQC: Reinforcement Learning for Robotic Strawberry Picking with Dynamic Regularization and Feature Weighting. AgriEngineering. 2025; 7(7):208. https://doi.org/10.3390/agriengineering7070208
Chicago/Turabian StyleZheng, Anping, Zirui Fang, Zixuan Li, Hao Dong, and Ke Li. 2025. "DRFW-TQC: Reinforcement Learning for Robotic Strawberry Picking with Dynamic Regularization and Feature Weighting" AgriEngineering 7, no. 7: 208. https://doi.org/10.3390/agriengineering7070208
APA StyleZheng, A., Fang, Z., Li, Z., Dong, H., & Li, K. (2025). DRFW-TQC: Reinforcement Learning for Robotic Strawberry Picking with Dynamic Regularization and Feature Weighting. AgriEngineering, 7(7), 208. https://doi.org/10.3390/agriengineering7070208