Intermittent Stop-Move Motion Planning for Dual-Arm Tomato Harvesting Robot in Greenhouse Based on Deep Reinforcement Learning
Abstract
:1. Introduction
- (1)
- A real-time tomato cluster coordinate projection method was proposed to obtain the distribution of tomato clusters in a greenhouse environment;
- (2)
- A deep reinforcement learning algorithm, DDPG, was employed to generate the sequence of parking nodes for the vehicle of a dual-arm harvesting robot. This replaced the conventional method whereby tomato-harvesting robots would cease and harvest at each target they encountered. This novel approach ensured that both arms were actively engaged in harvesting tasks during each parking phase, with each arm executing multiple harvests;
- (3)
- The trained policy model was implemented on an actual dual-arm robot and compared with a grid-based and area division parking node planning algorithm. The feasibility of real-time mapping and efficient parking planning was demonstrated. While ensuring no tomatoes were missed, this approach addressed the main limitation of existing planning methods, which failed to consider the simultaneous operation of both arms of the robot. This significantly reduced the frequency of the vehicle’s stops, thereby enhancing the overall harvesting efficiency and enabling effective task distribution between the two arms of the robot.
2. Materials and Methods
2.1. Greenhouse Environment
2.2. Dual-Arm Harvesting Robot System and Workspace Area
2.3. Tomato Cluster Coordinate Projection and Mapping
- First, we designated a specific area in the image (the red transparent rectangular area formed by and in Figure 4f), reducing the tracking area to minimize changes in the bounding box ratio. Specifically, we divided the area into left and right sections based on the center line ;
- Then, the outputs by ByteTrack within the specific area were divided into two types—the first type included tomato clusters with centroids on the left side of the specific area, and another type was tomato clusters on the right side;
- Next, we continuously updated the IDs of tomato clusters in the left and the right of the specific area in real time. If a new tomato cluster ID moved from the left section to the right section or vice versa (Algorithm 1), we outputted the centroid coordinates of the tomato cluster, along with the current position of the vehicle through an Xsens MTi-30 IMU (Inertial measurement unit).
Algorithm 1: The tomato cluster tracking method. |
Input: ; ; ; ; ; Arrayleft; Arrayright; Output: Arrayleft; Arrayright;
|
2.4. Infrequency Movement Planning for Multi-Arm Robot Vehicle
2.4.1. Background of DDPG
2.4.2. Training DDPG for Movement Planning
The Interaction Environment
- At the beginning of each episode, a 10 × 2 m projection map was constructed, taking into account the real greenhouse plant distribution patterns. We inserted the curves representing the main stems of tomato plants (depicted as green lines) at regular intervals of 0.42 m into the map;
- Randomly, 0 to 5 target points (illustrated as blue dots) were selected on the main stem curves to represent the positions of tomato clusters. Notably, each point’s height was constrained within the range of 0.31 m to 0.79 m;
- In the map, a vehicle (represented by a gray rectangle) and its harvesting area (outlined by a red dashed line) were constructed. The vehicle was capable of moving left to right along the X-axis. If a target point fell within the harvesting area, the corresponding blue dot turned red, indicating the tomato had been harvested. Notably, each time the vehicle stopped, a black dot was generated on the x-axis to mark the parking location.
State and Action Space
Reward Function
Algorithm 2: The parking location evaluation method. |
Input: robot locations X; the set consisted of target points within the map; the set of target points within the harvesting area of the B1 arm ; the set of target points within the harvesting area of the B2 arm ; harvest area width w; the interspace between the B1 and B2 arms . Output: ;
|
2.5. Experimental Setup
2.5.1. Simulation Experiments
2.5.2. Field Experiments
- Experiment 1—As illustrated in Figure 6, we used a traditional grid-based path planning algorithm [12,34] to generate vehicle movement paths. Based on the size of the arm’s harvesting area, the algorithm progressively constructed a global grid map and eliminated grids without target points. The intersection points of each grid’s central axis with the X-axis serve as the nodes for the generated movement paths;
- 2.
- Experiment 2—The process of the area division algorithm [13] is illustrated in Figure 7. Initially, the closest fruit coordinate to the origin was chosen as area 1’s center (p0), with its effective picking area marked, including points p1 and p2 (Figure 7a). Next, areas centered on p1 and p2 were compared. p2, with more fruit points, became the new center (Figure 7b–d). This process was repeated for subsequent areas, starting from the closest point to the previous center (Figure 7e–h). The intersection points of each area’s central axis with the X-axis serve as the nodes for the generated movement paths;
- 3.
- Experiment 3—The final experiment utilized the vehicle movement planning method grounded in DRL, as outlined in this paper. It aimed to dynamically generate the action sequence of the dual-arm robot vehicle, encompassing its incremental displacement and the number of stops.
3. Results and Discussions
3.1. Simulation Experiments
3.2. Field Experiments
3.3. Discussion and Future Work
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Maureira, F.; Rajagopalan, K.; Stöckle, C.O. Evaluating tomato production in open-field and high-tech greenhouse systems. J. Clean. Prod. 2022, 337, 130459. [Google Scholar] [CrossRef]
- Li, Y.; Feng, Q.; Zhang, Y.; Peng, C.; Ma, Y.; Liu, C.; Ru, M.; Sun, J.; Zhao, C. Peduncle collision-free grasping based on deep reinforcement learning for tomato harvesting robot. Comput. Electron. Agric. 2024, 216, 108488. [Google Scholar] [CrossRef]
- Li, Y.; Feng, Q.; Li, T.; Xie, F.; Liu, C.; Xiong, Z. Advance of target visual information acquisition technology for fresh fruit robotic harvesting: A review. Agronomy 2022, 12, 1336. [Google Scholar] [CrossRef]
- Taqi, F.; Al-Langawi, F.; Abdulraheem, H.; El-Abd, M. A cherry-tomato harvesting robot. In Proceedings of the 2017 18th International Conference on Advanced Robotics, Hong Kong, China, 10–12 July 2017; pp. 463–468. [Google Scholar] [CrossRef]
- Xiong, Y.; Ge, Y.; Grimstad, L.; From, P.J. An autonomous strawberry-harvesting robot: Design, development, integration, and field evaluation. J. Field Robot. 2020, 37, 202–224. [Google Scholar] [CrossRef]
- Park, Y.; Seol, J.; Pak, J.; Jo, Y.; Kim, C.; Son, H.I. Human-centered approach for an efficient cucumber harvesting robot system: Harvest ordering, visual servoing, and end-effector. Comput. Electron. Agric. 2023, 212, 108116. [Google Scholar] [CrossRef]
- Barnett, J.; Duke, M.; Au, C.K.; Lim, S.H. Work distribution of multiple Cartesian robot arms for kiwifruit harvesting. Comput. Electron. Agric. 2020, 169, 105202. [Google Scholar] [CrossRef]
- Wrobel, S. Israeli Startup Develops First AI Robot for Picking Tomatoes. Available online: https://www.timesofisrael.com/israeli-startup-develops-first-ai-robot-for-picking-tomatoes/ (accessed on 2 February 2023).
- Li, T.; Xie, F.; Zhao, Z.; Zhao, H.; Guo, X.; Feng, Q. A multi-arm robot system for efficient apple harvesting: Perception, task plan and control. Comput. Electron. Agric. 2023, 211, 107979. [Google Scholar] [CrossRef]
- Wang, N.; Yang, X.; Wang, T.; Xiao, J.; Zhang, M.; Wang, H.; Li, H. Collaborative path planning and task allocation for multiple agricultural machines. Comput. Electron. Agric. 2023, 213, 108218. [Google Scholar] [CrossRef]
- Lee, T.K.; Baek, S.H.; Choi, Y.H.; Oh, S.Y. Smooth coverage path planning and control of mobile robots based on high-resolution grid map representation. Rob. Auton. Syst. 2011, 59, 801–812. [Google Scholar] [CrossRef]
- Gabriely, Y.; Rimon, E. Spiral-STC: An on-line coverage algorithm of grid environments by a mobile robot. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation, Washington, DC, USA, 11–15 May 2002; pp. 954–960. [Google Scholar] [CrossRef]
- Wang, Y.; He, Z.; Cao, D.; Ma, L.; Li, K.; Jia, L.; Cui, Y. Coverage path planning for kiwifruit picking robots based on deep reinforcement learning. Comput. Electron. Agric. 2023, 205, 107593. [Google Scholar] [CrossRef]
- Liu, Y.; Xu, H.; Liu, D.; Wang, L. A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping. Robot. Comput. Integr. Manuf. 2022, 78, 102365. [Google Scholar] [CrossRef]
- Lin, G.; Zhu, L.; Li, J.; Zou, X.; Tang, Y. Collision-free path planning for a guava-harvesting robot based on recurrent deep reinforcement learning. Comput. Electron. Agric. 2021, 188, 106350. [Google Scholar] [CrossRef]
- James, J.Q.; Yu, W.; Gu, J. Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3806–3817. [Google Scholar] [CrossRef]
- Ottoni, A.L.C.; Nepomuceno, E.G.; Oliveira, M.S.D.; Oliveira, D.C.R. Reinforcement learning for the traveling salesman problem with refueling. Complex Intell. Syst. 2022, 8, 2001–2015. [Google Scholar] [CrossRef]
- Kyaw, P.T.; Paing, A.; Thu, T.T.; Mohan, R.E.; Le, A.V.; Veerajagadheswar, P. Coverage path planning for decomposition reconfigurable grid-maps using deep reinforcement learning based travelling salesman problem. IEEE Access 2020, 8, 225945–225956. [Google Scholar] [CrossRef]
- Martini, M.; Cerrato, S.; Salvetti, F.; Angarano, S.; Chiaberge, M. Position-agnostic autonomous navigation in vineyards with deep reinforcement learning. In Proceedings of the IEEE International Conference on Automation Science and Engineering (CASE), Mexico City, Mexico, 20–24 August 2022; pp. 477–484. [Google Scholar] [CrossRef]
- Bac, C.W.; Hemming, J.; van Tuijl, B.A.J.; Barth, R.; Wais, E.; van Henten, E.J. Performance evaluation of a harvesting robot for sweet pepper. J. Field Robot. 2017, 34, 1123–1139. [Google Scholar] [CrossRef]
- Li, Y.; Feng, Q.; Liu, C.; Xiong, Z.; Sun, Y.; Xie, F.; Li, T.; Zhao, C. MTA-YOLACT: Multitask-aware network on fruit bunch identification for cherry tomato robotic harvesting. Eur. J. Agron. 2023, 146, 126812. [Google Scholar] [CrossRef]
- Jun, J.; Kim, J.; Seol, J.; Kim, J.; Son, H.I. Towards an efficient tomato harvesting robot: 3d perception, manipulation, and end-effector. IEEE Access 2021, 9, 17631–17640. [Google Scholar] [CrossRef]
- Wang, D.; Dong, Y.; Lian, J.; Gu, D. Adaptive end-effector pose control for tomato harvesting robots. J. Field Robot. 2023, 40, 535–551. [Google Scholar] [CrossRef]
- Rong, J.; Zhou, H.; Zhang, F.; Yuan, T.; Wang, P. Tomato cluster detection and counting using improved YOLOv5 based on RGB-D fusion. Comput. Electron. Agric. 2023, 207, 107741. [Google Scholar] [CrossRef]
- Shen, L.; Liu, M.; Weng, C.; Zhang, J.; Dong, F.; Zheng, F. ColorByte: A real time MOT method using fast appearance feature based on ByteTrack. In Proceedings of the 2022 Tenth International Conference on Advanced Cloud and Big Data (CBD), Guilin, China, 4–5 November 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Xie, B.; Jiao, W.; Wen, C.; Hou, S.; Zhang, F.; Liu, K.; Li, J. Feature detection method for hind leg segmentation of sheep carcass based on multi-scale dual attention U-Net. Comput. Electron. Agric. 2021, 191, 106482. [Google Scholar] [CrossRef]
- Rong, J.; Wang, P.; Wang, T.; Hu, L.; Yuan, T. Fruit pose recognition and directional orderly grasping strategies for tomato harvesting robots. Comput. Electron. Agric. 2022, 202, 107430. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar] [CrossRef]
- Zhong, J.; Wang, T.; Cheng, L. Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics. Complex Intell. Syst. 2022, 8, 1899–1912. [Google Scholar] [CrossRef]
- Lindner, T.; Milecki, A.; Wyrwał, D. Positioning of the robotic arm using different reinforcement learning algorithms. Int. J. Control. Autom. Syst. 2021, 19, 1661–1676. [Google Scholar] [CrossRef]
- Fujimoto, S.; Van Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2022; pp. 2587–2601. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2022; pp. 1861–1870. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft Actor-Critic algorithms and applications. arXiv 2018, arXiv:1812.05905v2. [Google Scholar]
- Kingma, D.P.; Ba, J.L. Adam: A method for stochastic optimization. arXiv 2015, arXiv:1412.6980. [Google Scholar] [CrossRef]
- Zhu, K.; Zhang, T. Deep reinforcement learning based mobile robot navigation: A review. Tsinghua Sci. Technol. 2021, 26, 674–691. [Google Scholar] [CrossRef]
Object | Parameters | Value |
---|---|---|
Simulation environment | Robot vehicle size | m |
Vehicle driving distance | m | |
Arm harvest area | m | |
Distance between the two arms | 0.9 m | |
Number of target points | 100120 | |
Left crop row tomato cluster projection area | ||
Right crop row tomato cluster projection area | ||
DDPG | Number of episodes | 105 |
Max episode step | 50 | |
Mini-batch Size | 128 | |
Discount factor | 0.99 | |
Decay coefficient | 0.0001 | |
Soft update factor | 0.001 | |
Replay buffer size | 106 | |
Learning rate of actor network | 0.0001 | |
Learning rate of critic network | 0.001 | |
Optimizer for SGD | Adam [34] |
Algorithm | 1 | 1 | 1 | /m 1 |
---|---|---|---|---|
DDPG | 27.4 ± 0.5 | 0.4 ± 0.7 | 3.3 ± 0.8 | 0.096 ± 0.005 |
SAC | 25.5 ± 0.5 | 8.2 ± 4.1 | 4.9 ± 1.4 | 0.100 ± 0.007 |
TD3 | 22.7 ± 0.9 | 17.2 ± 3.5 | 4.2 ± 1.9 | 0.103 ± 0.006 |
Algorithm | 1 | 1 | 1 | /m 1 | Process Speed/ms 1 |
---|---|---|---|---|---|
Grid map | 43 | 0 | 43 | 0.103 | 3.7 |
Area division | 36 | 0 | 34 | 0.080 | 14.1 |
DDPG | 23 | 0 | 4 | 0.093 | 6.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Y.; Feng, Q.; Zhang, Y.; Peng, C.; Zhao, C. Intermittent Stop-Move Motion Planning for Dual-Arm Tomato Harvesting Robot in Greenhouse Based on Deep Reinforcement Learning. Biomimetics 2024, 9, 105. https://doi.org/10.3390/biomimetics9020105
Li Y, Feng Q, Zhang Y, Peng C, Zhao C. Intermittent Stop-Move Motion Planning for Dual-Arm Tomato Harvesting Robot in Greenhouse Based on Deep Reinforcement Learning. Biomimetics. 2024; 9(2):105. https://doi.org/10.3390/biomimetics9020105
Chicago/Turabian StyleLi, Yajun, Qingchun Feng, Yifan Zhang, Chuanlang Peng, and Chunjiang Zhao. 2024. "Intermittent Stop-Move Motion Planning for Dual-Arm Tomato Harvesting Robot in Greenhouse Based on Deep Reinforcement Learning" Biomimetics 9, no. 2: 105. https://doi.org/10.3390/biomimetics9020105
APA StyleLi, Y., Feng, Q., Zhang, Y., Peng, C., & Zhao, C. (2024). Intermittent Stop-Move Motion Planning for Dual-Arm Tomato Harvesting Robot in Greenhouse Based on Deep Reinforcement Learning. Biomimetics, 9(2), 105. https://doi.org/10.3390/biomimetics9020105