Push-or-Avoid: Deep Reinforcement Learning of Obstacle-Aware Harvesting for Orchard Robots
Abstract
1. Introduction
- (1)
- An “avoid-rigid, push-through-soft” end-to-end autonomous harvesting framework is proposed. This framework explicitly models the controllable contact with flexible branches as a risk cost within the reward function and propagates uncertainties from the reconstruction process to the planning phase, enabling the policy to choose to push aside flexible obstacles for harvesting under safety constraints.
- (2)
- An AE-TD3 algorithm integrating expert priors is proposed. By encoding expert experience into executable constraints and guiding signals, the algorithm significantly narrows the policy search space and improves sample efficiency. Additionally, a dynamic Action Mask module is designed to proactively shield against dangerous actions that may collide with rigid trunks, working in conjunction with risk-sensitive rewards to reduce ineffective exploration and collisions during training.
- (3)
- An online 3DGS reconstruction module is constructed. Combining detection results from a multi-task network, it continuously represents fruits and flexible branches using anisotropic Gaussian primitives. This forms a continuous and stable scene representation, enhancing the reliability of collision detection and feasibility judgment for the agent.
2. Materials and Methods
2.1. Overall Framework Design
2.2. Acquisition of Apple Fruit and Obstacle Information
2.3. 3D Scene Modeling and Representation
2.4. Deep Reinforcement Learning-Based Obstacle Avoidance Path Planning
2.5. Dynamic Safety Constraint Mechanism
2.6. Training Hyperparameters and Strategies
- (1)
- Target distance reward—encouraging the agent to approach the target position of the current phase. The reward calculation is based on the change in target position, i.e., rewarding the reduction in target distance at each step. Through this reward, the agent is driven to move towards the target location, thereby accelerating the approach speed.where represents the target position for the current phase, is the corresponding reward coefficient, and is the current time step. This reward encourages movement in the target direction by continuously reducing the distance.
- (2)
- Target grasping reward, used to guide the agent to successfully grasp the target fruit. The reward calculation for the grasping action considers the distance between the robot and the target fruit, encouraging the agent to grasp the target.where is the position of the target fruit, is the corresponding reward coefficient, and represents the grasping reward.
- (3)
- Placing reward—guiding the robotic arm to successfully grasp the target fruit and complete the placing action at the designated location (fruit bin), constructed as:where is the target placement position, and is the corresponding reward coefficient.
- (4)
- Task completion reward—when the target of a specific phase is completed, a stage-based reward is granted to encourage the agent to complete the task stepwise.where is the task completion reward coefficient, and is an indicator function that returns 1 when the task is completed and 0 otherwise.
- (5)
- Time penalty—to encourage the agent to complete the task as soon as possible and reduce unnecessary steps.where is the time penalty coefficient, and is the current time step. This reward term helps avoid ineffective long waits or repetitive actions.
- (6)
- Gripper collision penalty—to prevent excessive contact between the robotic arm gripper and flexible branches, which leads to harvesting failure or damage to the end-effector. When entrapment of a branch is detected, it is considered an adverse collision, and the system applies a negative reward.where is the penalty coefficient for dangerous behavior. When the detection system identifies that the gripper is touching branches, the system automatically applies a negative reward to ensure the agent does not execute dangerous actions. During simulation training, collision detector geometries are added inside the gripper to detect whether branches are trapped between the fingers.
3. Results and Analysis
3.1. Experimental Platform and Test Conditions
3.2. Virtual Environment Simulation Experiment
3.3. Field Experiments
- (1)
- Grasping Success Rate (%): The ratio of the number of fruits successfully grasped by the end-effector to the number of fruits localized by the vision system.
- (2)
- Harvesting Success Rate (%): The ratio of the number of fruits successfully released into the basket to the number of fruits localized by the vision system.
- (3)
- Collision Rate (%): The incidence where a fruit was successfully grasped, but the robotic arm collided with a rigid branch, or a flexible branch got stuck in the gripper.
- (4)
- Average Time (s): The time consumed to execute the entire harvesting process for a target fruit.
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Algorithmic Details of Semantic 3DGS Construction
| Algorithm A1: Incremental Semantic 3DGS Construction |
| do |
| then |
| . |
References
- Fu, H.; Guo, Z.; Feng, Q.; Xie, F.; Zuo, Y.; Li, T. MSOAR-YOLOv10: Multi-scale occluded apple detection for enhanced harvest robotics. Horticulturae 2024, 10, 1246. [Google Scholar] [CrossRef]
- Li, T.; Xie, F.; Zhao, Z.; Zhao, H.; Guo, X.; Feng, Q. A multi-arm robot system for efficient apple harvesting: Perception, task plan and control. Comput. Electron. Agric. 2023, 211, 107979. [Google Scholar] [CrossRef]
- Jin, Y. Research Progress Analysis of Robotics Selective Harvesting Technologies. Trans. Chin. Soc. Agric. Mach. 2020, 51, 1–17. [Google Scholar]
- Zhang, K.; Lammers, K.; Chu, P.; Li, Z.; Lu, R. An automated apple harvesting robot—From system design to field evaluation. J. Field Robot. 2024, 41, 2384–2400. [Google Scholar] [CrossRef]
- Au, W.; Zhou, H.; Liu, T.; Kok, E.; Wang, X.; Wang, M.; Chen, C. The Monash Apple Retrieving System: A review on system intelligence and apple harvesting performance. Comput. Electron. Agric. 2023, 213, 108164. [Google Scholar] [CrossRef]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- Wang, D.; Tan, D.; Liu, L. Particle swarm optimization algorithm: An overview. Soft Comput. 2018, 22, 387–408. [Google Scholar] [CrossRef]
- Warren, C.W. Global path planning using artificial potential fields. In 1989 IEEE International Conference on Robotics and Automation; IEEE Computer Society: Washington, DC, USA, 1989; pp. 316–321. [Google Scholar]
- Chen, P.C.; Hwang, Y.K. SANDROS: A dynamic graph search algorithm for motion planning. IEEE Trans. Robot. Autom. 2002, 14, 390–403. [Google Scholar] [CrossRef]
- Bohlin, R.; Kavraki, L.E. Path planning using lazy PRM. In Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065); IEEE: New York, NY, USA, 2000; Volume 1, pp. 521–528. [Google Scholar]
- Karaman, S.; Walter, M.R.; Perez, A.; Frazzoli, E.; Teller, S. Anytime motion planning using the RRT. In 2011 IEEE International Conference on Robotics and Automation; IEEE: New York, NY, USA, 2011; pp. 1478–1483. [Google Scholar]
- Wang, P.; Ghergherehchi, M.; Kim, J.; Zhang, M.; Song, J. Transformer-based path planning for single-arm and dual-arm robots in dynamic environments. Int. J. Adv. Manuf. Technol. 2025, 139, 3801–3819. [Google Scholar] [CrossRef]
- Ye, L.; Duan, J.; Yang, Z.; Zou, X.; Chen, M.; Zhang, S. Collision-free motion planning for the litchi-picking robot. Comput. Electron. Agric. 2021, 185, 106151. [Google Scholar] [CrossRef]
- Luo, L.; Wen, H.; Lu, Q.; Huang, H.; Chen, W.; Zou, X.; Wang, C. Collision-free path-planning for six-dof serial harvesting robot based on energy optimal and artificial potential field. Complexity 2018, 2018, 3563846. [Google Scholar] [CrossRef]
- Zhang, B.; Yin, C.; Fu, Y.; Xia, Y.; Fu, W. Harvest motion planning for mango picking robot based on improved RRT-Connect. Biosyst. Eng. 2024, 248, 177–189. [Google Scholar] [CrossRef]
- Wang, D.; Dong, Y.; Lian, J.; Gu, D. Adaptive end-effector pose control for tomato harvesting robots. J. Field Robot. 2023, 40, 535–551. [Google Scholar] [CrossRef]
- Yang, Y.; Luo, X.; Li, W.; Liu, C.; Ye, Q.; Liang, P. AAPF*: A safer autonomous vehicle path planning algorithm based on the improved A* algorithm and APF algorithm. Clust. Comput. 2024, 27, 11393–11406. [Google Scholar] [CrossRef]
- Akay, B.; Karaboga, D. Artificial bee colony algorithm for large-scale problems and engineering design optimization. J. Intell. Manuf. 2012, 23, 1001–1014. [Google Scholar] [CrossRef]
- Alamoudi, O.; Al-Hashimi, M. On the Energy Behaviors of the Bellman–Ford and Dijkstra Algorithms: A Detailed Empirical Study. J. Sens. Actuator Netw. 2024, 13, 67. [Google Scholar] [CrossRef]
- Karaman, S.; Frazzoli, E. Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, Y.; Feng, Q.; Sun, J.; Peng, C.; Gao, L.; Chen, L. Compliant Motion Planning Integrating Human Skill for Robotic Arm Collecting Tomato Bunch Based on Improved DDPG. Plants 2025, 14, 634. [Google Scholar] [CrossRef]
- Luo, L.; Liu, B.; Chen, M.; Wang, J.; Wei, H.; Lu, Q.; Luo, S. DRL-enhanced 3D detection of occluded stems for robotic grape harvesting. Comput. Electron. Agric. 2025, 229, 109736. [Google Scholar]
- Li, Y.; Feng, Q.; Zhang, Y.; Peng, C.; Ma, Y.; Liu, C.; Ru, M.; Sun, J.; Zhao, C. Peduncle collision-free grasping based on deep reinforcement learning for tomato harvesting robot. Comput. Electron. Agric. 2024, 216, 108488. [Google Scholar] [CrossRef]
- Yi, T.; Zhang, D.; Luo, L.; Wang, Y.; Liu, B. View planning for grape harvesting based on self-supervised deep reinforcement learning under occlusion. Comput. Electron. Agric. 2025, 239, 110913. [Google Scholar] [CrossRef]
- Huang, Y. Deep Q-networks. In Deep Reinforcement Learning: Fundamentals, Research and Applications; Springer: Singapore, 2020; pp. 135–160. [Google Scholar]
- Gao, L.; Zhao, Y.; Han, J.; Liu, H. Research on multi-view 3D reconstruction technology based on SFM. Sensors 2022, 22, 4366. [Google Scholar] [CrossRef] [PubMed]
- Gené-Mola, J.; Sanz-Cortiella, R.; Rosell-Polo, J.R.; Escola, A.; Gregorio, E. In-field apple size estimation using photogrammetry-derived 3D point clouds: Comparison of 4 different methods considering fruit occlusions. Comput. Electron. Agric. 2021, 188, 106343. [Google Scholar] [CrossRef]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Li, S.; Li, C.; Zhu, W.; Yu, B.; Zhao, Y.; Wan, C.; You, H.; Shi, H.; Lin, Y. Instant-3D: Instant neural radiance field training towards on-device AR/VR 3D reconstruction. In Proceedings of the 50th Annual International Symposium on Computer Architecture; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1–13. [Google Scholar]
- Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 2023, 42, 139. [Google Scholar] [CrossRef]
- Xiao, F.; Wang, H.; Xu, Y.; Zhang, R. Fruit detection and recognition based on deep learning for automatic harvesting: An overview and review. Agronomy 2023, 13, 1625. [Google Scholar] [CrossRef]
- De-An, Z.; Jidong, L.; Wei, J.; Ying, Z.; Yu, C. Design and control of an apple harvesting robot. Biosyst. Eng. 2011, 110, 112–122. [Google Scholar] [CrossRef]
- Yoshida, T.; Onishi, Y.; Kawahara, T.; Fukao, T. Automated harvesting by a dual-arm fruit harvesting robot. ROBOMECH J. 2022, 9, 19. [Google Scholar] [CrossRef]
- Noda, S.; Kogoshi, M.; Iijima, W. Robot Simulation on Agri-Field Point Cloud With Centimeter Resolution. IEEE Access 2025, 13, 14404–14416. [Google Scholar] [CrossRef]
- Lin, G.; Zhu, L.; Li, J.; Zou, X.; Tang, Y. Collision-free path planning for a guava-harvesting robot based on recurrent deep reinforcement learning. Comput. Electron. Agric. 2021, 188, 106350. [Google Scholar]
- Fu, H.; Li, T.; Feng, Q.; Chen, L. MT-WavYOLO: Bridging multi-task learning and 3D frustum fusion for non-destructive robotic harvesting of occluded orchard fruits. Comput. Electron. Agric. 2026, 242, 111335. [Google Scholar] [CrossRef]
- Wu, C.; Ruan, J.; Cui, H.; Zhang, B.; Li, T.; Zhang, K. The application of machine learning based energy management strategy in multi-mode plug-in hybrid electric vehicle, part I: Twin Delayed Deep Deterministic Policy Gradient algorithm design for hybrid mode. Energy 2023, 262, 125084. [Google Scholar] [CrossRef]
- Egbomwan, O.E.; Liu, S.; Chaoui, H. Twin delayed deep deterministic policy gradient (TD3) based virtual inertia control for inverter-interfacing DGs in microgrids. IEEE Syst. J. 2022, 17, 2122–2132. [Google Scholar] [CrossRef]














| Parameter | Value |
|---|---|
| Action Dimension | 8 |
| State Dimension | 425 |
| Network Size | (512, 512) |
| Batch Size | 1024 |
| Replay Buffer Size | |
| Learning Rate | 0.001 |
| Max Episodes | 1500 |
| Max Steps Per Episode | 600 |
| Policy Noise | 0.2 |
| Parameter | Description | Value |
|---|---|---|
| Equivalent stiffness of branch compliance | ||
| Equivalent damping of branch compliance | ||
| Sliding friction coefficient between apple and silicone gripper | 0.80 | |
| Sliding friction coefficient between apple and branch surface | 0.35 | |
| Sliding friction coefficient between silicone gripper and branch | 0.60 | |
| Joint damping of revolute joints (R) | 0.8 | |
| Joint damping of prismatic joints (P) | 50 | |
| Angular damping of branch hinge | 0.6 | |
| Angular stiffness of branch hinge | 12 |
| Scenario | Method | Grasping Success Rate (%) | Harvesting Success Rate (%) | Collision Rate (%) | Average Time (s) |
|---|---|---|---|---|---|
| Scenario 1 | RRT | 80.0 | 70.0 | 21.4 | 13.2 |
| DQN | 80.0 | 75.0 | 26.7 | 12.8 | |
| TD3 | 85.0 | 75.0 | 20.0 | 12.1 | |
| AE-TD3 | 95.0 | 90.0 | 5.6 | 9.5 | |
| Scenario 2 | RRT | 70.0 | 60.0 | 25.0 | 14.1 |
| DQN | 75.0 | 70.0 | 21.4 | 13.7 | |
| TD3 | 80.0 | 70.0 | 21.4 | 13.3 | |
| AE-TD3 | 90.0 | 85.0 | 11.8 | 10.7 | |
| Scenario 3 | RRT | 60.0 | 45.0 | 33.3 | 15.8 |
| DQN | 65.0 | 55.0 | 27.3 | 14.5 | |
| TD3 | 70.0 | 60.0 | 25.0 | 14.2 | |
| AE-TD3 | 85.0 | 75.0 | 20.0 | 11.6 | |
| Scenario 4 | RRT | 55.0 | 40.0 | 37.5 | 16.2 |
| DQN | 60.0 | 50.0 | 30.0 | 15.1 | |
| TD3 | 65.0 | 55.0 | 27.3 | 14.8 | |
| AE-TD3 | 80.0 | 75.0 | 13.3 | 12.1 |
| Method | Grasping Success Rate (%) | Harvesting Success Rate (%) | Collision Rate (%) | Average Time (s) |
|---|---|---|---|---|
| RRT | 62.2 | 53.3 | 29.2 | 16.3 |
| DQN | 67.4 | 60.9 | 32.1 | 15.3 |
| TD3 | 70.2 | 63.8 | 30.0 | 14.5 |
| AE-TD3 | 81.3 | 77.1 | 16.2 | 12.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Fu, H.; Li, T.; Feng, Q.; Chen, L. Push-or-Avoid: Deep Reinforcement Learning of Obstacle-Aware Harvesting for Orchard Robots. Agriculture 2026, 16, 670. https://doi.org/10.3390/agriculture16060670
Fu H, Li T, Feng Q, Chen L. Push-or-Avoid: Deep Reinforcement Learning of Obstacle-Aware Harvesting for Orchard Robots. Agriculture. 2026; 16(6):670. https://doi.org/10.3390/agriculture16060670
Chicago/Turabian StyleFu, Heng, Tao Li, Qingchun Feng, and Liping Chen. 2026. "Push-or-Avoid: Deep Reinforcement Learning of Obstacle-Aware Harvesting for Orchard Robots" Agriculture 16, no. 6: 670. https://doi.org/10.3390/agriculture16060670
APA StyleFu, H., Li, T., Feng, Q., & Chen, L. (2026). Push-or-Avoid: Deep Reinforcement Learning of Obstacle-Aware Harvesting for Orchard Robots. Agriculture, 16(6), 670. https://doi.org/10.3390/agriculture16060670

