Retrieval Augment: Robust Path Planning for Fruit-Picking Robot Based on Real-Time Policy Reconstruction
Abstract
1. Introduction
- 1.
- Propose a decision framework for real-time policy reconstruction based on experience retrieval.
- 2.
- Propose a new method for collecting and characterizing high-quality robot empirical data.
- 3.
- Instantiate a specific policy reconstruction scheme: an action fusion method based on real-time evaluation and rejection sampling is proposed for policy reconstruction.
2. Problem Formulation
3. Retrieval Augment: Real-Time Policy Reconstruction
3.1. Overview
3.1.1. Original Agent Training
3.1.2. Experience Base
3.1.3. Experience Retrieval
3.1.4. Real-Time Policy Reconstruction
3.2. Original Agent Training
3.2.1. Observation Space
3.2.2. Action Space
3.2.3. Reward Function
3.3. Experience Base
| Algorithm 1 Data Collecting Based on Hierarchical Collaborative Path Exploring |
| Require: Task scene , start position , operating point k Ensure: ,
|
3.4. Policy Reconstruction
| Algorithm 2 Real-Time Policy Reconstruction |
| Require: Primitive agent’s policy , experience set , observation , threshold Ensure: Action
|
4. Experiments
- 1.
- Effectiveness of Augment agent: Does our policy reconstruction method for augment agent confer performance advantages? We compare the performance of augment agents—derived from original agents trained with varying numbers of episodes—under complex tasks, and benchmark them against traditional path planning algorithms commonly used in fruit-picking robots.
- 2.
- Efficacy of Experience Retrieving: Given that our method relies on retrieving using an experience base, is the proposed similarity-based experience retrieving method effective, and to what extent do augment agents depend on the experience data? To answer this, we construct experience base of varying sizes and examine the impact of library size on the performance of retrieving method as described in Section 3.1. We further compare the performance of augment agent under different experience similarity metrics.
- 3.
- Sim-to-Real Transferability: Can the proposed method retain its effectiveness and advantages when migrating from simulation to real-world settings? We migrate the top-performing algorithms from our simulation experiments, along with our proposed method, to real-world tasks and evaluate their performance comparatively.
4.1. Effectiveness of Augment Agent
4.2. Efficacy of Experience Retrieving
4.3. Real World
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| RRT | Rapidly Exploring Random Trees |
| DRL | Deep Reinforcement Learning |
| DOF | Degrees of Freedom |
| SAC | Soft Actor–Critic |
| NLP | Natural language processing |
| EM | Expectation Maximization Algorithm |
| GMM | Gaussian Mixture Model |
| ROS | Robot Operating System |
Appendix A. Notation Summary
Appendix A.1. Basic Problem Formulation Symbols
| Robot workspace | |
| Free space and obstacle space | |
| R | Fruit-picking robot |
| Task goal set | |
| f | Target fruit |
| k | Operation (cutting) point |
| Target branch | |
| Main branch | |
| Joint angle vector | |
| End-effector pose | |
| p | End-effector position |
| q | End-effector orientation (quaternion) |
| s | Robot state |
| Initial state | |
| Optimal cutting state |
Appendix A.2. Experience Retrieval Symbols
| Simple and complex task sets | |
| Grid-based environment representation | |
| H | Hu moment feature |
| Scene similarity function | |
| Similarity weights | |
| Retrieved experience |
Appendix A.3. Policy Reconstruction Symbols
| Original DRL policy | |
| Reconstructed policy | |
| a | Action |
| Action from original policy | |
| Action from experience | |
| Fused action | |
| Rejection sampling threshold |
Appendix A.4. Reward and Observation Symbols
| Observation vector | |
| Action vector | |
| Distance reward | |
| Pose reward | |
| Trajectory smoothness reward | |
| Collision penalty | |
| Invalid action penalty | |
| Reward weights |
References
- LaValle, S.; Kuffner, J. Randomized kinodynamic planning. In Proceedings of the 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C), Detroit, MI, USA, 10–15 May 1999; Volume 1, pp. 473–479. [Google Scholar] [CrossRef]
- Van Henten, E.; Hemming, J.; Van Tuijl, B.; Kornet, J.; Bontsema, J. Collision-free Motion Planning for a Cucumber Picking Robot. Biosyst. Eng. 2003, 86, 135–144. [Google Scholar] [CrossRef]
- Chiang, H.T.; Malone, N.; Lesser, K.; Oishi, M.; Tapia, L. Path-guided artificial potential fields with stochastic reachable sets for motion planning in highly dynamic environments. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 2347–2354. [Google Scholar] [CrossRef]
- La Valle, A.J.; Sakcak, B.; LaValle, S.M. Bang-Bang Boosting of RRTs. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 2869–2876. [Google Scholar] [CrossRef]
- Linard, A.; Torre, I.; Bartoli, E.; Sleat, A.; Leite, I.; Tumova, J. Real-Time RRT* with Signal Temporal Logic Preferences. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 8621–8627. [Google Scholar] [CrossRef]
- Hemming, J.; Bac, C.W.; Tuijl, B.; Barth, R.; Bontsema, J.; Pekkeriet, E.; Van Henten, E. A robot for harvesting sweet-pepper in greenhouses. In Proceedings of the International Conference of Agricultural Engineering—AgEng 2014, Zurich, Switzerland, 6–10 July 2014. [Google Scholar]
- Wang, D.; Dong, Y.; Lian, J.; Gu, D. Adaptive end-effector pose control for tomato harvesting robots. J. Field Robot. 2023, 40, 535–551. [Google Scholar] [CrossRef]
- Malik, A.; Lischuk, Y.; Henderson, T.; Prazenica, R. A Deep Reinforcement-Learning Approach for Inverse Kinematics Solution of a High Degree of Freedom Robotic Manipulator. Robotics 2022, 11, 44. [Google Scholar] [CrossRef]
- Orsula, A.; Bøgh, S.; Olivares-Mendez, M.; Martinez, C. Learning to Grasp on the Moon from 3D Octree Observations with Deep Reinforcement Learning. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 4112–4119. [Google Scholar] [CrossRef]
- Yandun, F.; Parhar, T.; Silwal, A.; Clifford, D.; Yuan, Z.; Levine, G.; Yaroshenko, S.; Kantor, G. Reaching Pruning Locations in a Vine Using a Deep Reinforcement Learning Policy. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 2400–2406. [Google Scholar] [CrossRef]
- Lin, G.; Zhu, L.; Li, J.; Zou, X.; Tang, Y. Collision-free path planning for a guava-harvesting robot based on recurrent deep reinforcement learning. Comput. Electron. Agric. 2021, 188, 106350. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft Actor-Critic Algorithms and Applications. arXiv 2019, arXiv:1812.05905. [Google Scholar] [CrossRef]
- Chiang, H.T.L.; Hsu, J.; Fiser, M.; Tapia, L.; Faust, A. RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators from RL Policies. arXiv 2019, arXiv:1907.04799. [Google Scholar] [CrossRef]
- Wang, X.; Zhou, J.; Xu, Y.; Liu, z. Research on low-loss and high-efficiency picking sequence planning of safflower-filaments based on improved deep reinforcement learning. Comput. Electron. Agric. 2025, 237, 110692. [Google Scholar] [CrossRef]
- Li, H.; He, Z.; Wang, Y.; Ding, X.; Cui, Y. Research on the mechanized harvesting strategy for clustered kiwi fruits based on deep reinforcement learning. Comput. Electron. Agric. 2025, 237, 110686. [Google Scholar] [CrossRef]
- Yi, T.; Zhang, D.; Luo, L.; Wang, Y.; Liu, B. View planning for grape harvesting based on self-supervised deep reinforcement learning under occlusion. Comput. Electron. Agric. 2025, 239, 110913. [Google Scholar] [CrossRef]
- Liu, Y.; Gao, P.; Zheng, C.; Tian, L.; Tian, Y. A Deep Reinforcement Learning Strategy Combining Expert Experience Guidance for a Fruit-Picking Manipulator. Electronics 2022, 11, 311. [Google Scholar] [CrossRef]
- Li, Y.; Feng, Q.; Zhang, Y.; Peng, C.; Ma, Y.; Liu, C.; Ru, M.; Sun, J.; Zhao, C. Peduncle collision-free grasping based on deep reinforcement learning for tomato harvesting robot. Comput. Electron. Agric. 2024, 216, 108488. [Google Scholar] [CrossRef]
- Xie, C.W.; Sun, S.; Xiong, X.; Zheng, Y.; Zhao, D.; Zhou, J. RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 19265–19274. [Google Scholar] [CrossRef]
- Humphreys, P.C.; Guez, A.; Tieleman, O.; Sifre, L.; Weber, T.; Lillicrap, T. Large-Scale Retrieval for Reinforcement Learning. arXiv 2022, arXiv:2206.05314. [Google Scholar] [CrossRef]
- Chen, B.; Gong, L.; Yu, C.; Du, X.; Chen, J.; Xie, S.; Le, X.; Li, Y.; Liu, C. Workspace decomposition based path planning for fruit-picking robot in complex greenhouse environment. Comput. Electron. Agric. 2023, 215, 108353. [Google Scholar] [CrossRef]
- Dijkstra, E.W. A note on two problems in connexion with graphs. Numer. Math. 1959, 1, 269–271. [Google Scholar] [CrossRef]
- Coumans, E.; Bai, Y. PyBullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning, 2016–2021. Available online: http://pybullet.org (accessed on 1 September 2022).
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar] [CrossRef]
- Strub, M.P.; Gammell, J.D. Adaptively Informed Trees (AIT*) and Effort Informed Trees (EIT*): Asymmetric bidirectional sampling-based path planning. Int. J. Robot. Res. 2021, 41, 390–417. [Google Scholar] [CrossRef]
- Gammell, J.D.; Srinivasa, S.S.; Barfoot, T.D. Batch Informed Trees (BIT*): Sampling-based optimal planning via the heuristically guided search of implicit random geometric graphs. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 3067–3074. [Google Scholar] [CrossRef]
- Sucan, I.A.; Moll, M.; Kavraki, L.E. The Open Motion Planning Library. IEEE Robot. Autom. Mag. 2012, 19, 72–82. [Google Scholar] [CrossRef]






| Trail | Success Trails | Success Rate | Average Path Length | Average Path Time | |
|---|---|---|---|---|---|
| BIT* | 10 | 4 | 0.4 | 0.524 | 12.6 s |
| AIT* | 10 | 3 | 0.3 | 0.533 | 11.3 s |
| agent | 10 | 7 | 0.7 | 0.211 | 10.5 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chen, B.; Zhang, S.; He, Z.; Gong, L. Retrieval Augment: Robust Path Planning for Fruit-Picking Robot Based on Real-Time Policy Reconstruction. Sustainability 2026, 18, 829. https://doi.org/10.3390/su18020829
Chen B, Zhang S, He Z, Gong L. Retrieval Augment: Robust Path Planning for Fruit-Picking Robot Based on Real-Time Policy Reconstruction. Sustainability. 2026; 18(2):829. https://doi.org/10.3390/su18020829
Chicago/Turabian StyleChen, Binhao, Shuo Zhang, Zichuan He, and Liang Gong. 2026. "Retrieval Augment: Robust Path Planning for Fruit-Picking Robot Based on Real-Time Policy Reconstruction" Sustainability 18, no. 2: 829. https://doi.org/10.3390/su18020829
APA StyleChen, B., Zhang, S., He, Z., & Gong, L. (2026). Retrieval Augment: Robust Path Planning for Fruit-Picking Robot Based on Real-Time Policy Reconstruction. Sustainability, 18(2), 829. https://doi.org/10.3390/su18020829

