A Review of Research on Fruit and Vegetable Picking Robots Based on Deep Learning
Abstract
1. Introduction
2. Deep Learning Overview
2.1. Convolutional Neural Networks
2.2. Object Detection
2.3. Deep Reinforcement Learning
2.4. Semantic Segmentation
3. Key Technology and Application of Deep Learning in Picking Robots
3.1. Visual Perception and Target Recognition
3.1.1. Fruit Detection and Classification
3.1.2. Stem and Leaf Segmentation
Type | Ref. | Crop | Year | Task | Network Framework and Algorithms | F1 Score | Precision | mAP | Other Metric |
---|---|---|---|---|---|---|---|---|---|
Fruit detection and classification | [59] | White asparagus | 2024 | Object detection | HGCA-YOLO | 92.4% | - | 95.2% | |
[60] | White asparagus | 2022 | Object detection | YOLO5-Spear | 96.8% | - | 1 | ||
[61] | Citrus | 2024 | Object detection | YOLOv4-Tiny | - | 93.5% | 93.25% | Recall: 93.73% | |
[62] | Strawberry | 2023 | Object detection | YOLOv7-Multi-Scale | 92% | - | 89% | ||
[39] | Apple | 2024 | Object detection | YOLOv5s-BC | 84.32% | 99.8% | 88.7% | ||
[23] | Orange | 2023 | Object detection | CNN | 96.5% | 98% | - | Recall: 94.8% | |
[21] | Strawberry | 2023 | Object detection | MiniNet CNN | - | - | - | MAE: 4.8% | |
[38] | Apple | 2023 | target location | YOLO-V5, ORB-SLAM3 | - | - | - | RMSE: 26 mm | |
[63] | Tomato | 2023 | Object detection | YOLOv3, YOLOv8, EfficientDet | - | 76.42% | 93.73% | ||
[64] | Grapevine | 2023 | Image segmentation | DenseNet | - | 98.00 ± 0.13% | - | ||
[65] | Tea Chrysanthemum | 2021 | Object detection | TC-YOLO | - | - | 92.49% | ||
[66] | Apple, pear | 2019 | Object detection | CNN | 81% | 95% | - | ||
[67] | Apple | 2020 | Object detection | Mobile-DasNet | 85.1% | - | 86.3% | ||
Stem and leaf segmentation | [68] | Strawberries | 2024 | Cutting point detection | DeepLabV3+, ResNet-50 | 91% | - | - | MBF: 74.2% |
[69] | Strawberries | 2022 | Object detection | YOLOv5m | 89.4% | 89.0% | 91.5% | Recall: 89.8% | |
[22] | Table Grapes | 2023 | Object detection | Mask R-CNN | 94.7% | 95.6% | - | Recall: 93.8% | |
[70] | Strawberries | 2024 | Object detection | YOLOv7 | 97.9% | 99.0% | 99.8% | Recall: 96.9% | |
[72] | Cherry tomato | 2023 | Image segmentation | MTA-YOLACT, ResNet-101 | 95.4% | 98.9% | 45.3% | Recall: 92.1% | |
[57] | Grape | 2021 | Image segmentation | DeepLabV3+ | 98.63% | 87.5% | - | Recall: 89.2% | |
[73] | Kiwifruit | 2024 | Object detection | LaC-Gwc Net, YOLOv8m | - | - | 93.1% | ||
[74] | Tomato | 2023 | Object detection | YOLOv5 | - | - | 91.7% |
3.2. Path Planning and Motion Control
3.2.1. Deep Reinforcement Learning-Based Path Planning
3.2.2. Multimodal Environment Perception and Semantic Understanding
3.2.3. Deep Learning-Based Machine Vision Localization
3.2.4. Multitask-Oriented Robot Motion Planning
Ref. | Crop | Year | Task | Network Framework and Algorithms | F1 Score | mAP | Acc | Other Metric |
---|---|---|---|---|---|---|---|---|
[47] | Guava | 2021 | Path Planning | RNN, DDPG | - | - | - | Success Rate: 90.90% |
[48] | Cherry Tomato | 2024 | Path Planning | YOLOv5, DDPG | 95.7% | - | - | Detection speed: 16.5 FPS |
[76] | Cherry Tomato | 2024 | Object detection | YOLOv4, ORB-SLAM3 | - | - | - | Recall: 87.5% |
[77] | Apple | 2022 | Object detection | YOLACT, Apple 3D Network | 89% | - | - | IoU: 87.3% |
[78] | Strawberry | 2020 | Object detection | Mask R-CNN | - | - | - | Success Rate: 65.1% |
[79] | Apple | 2024 | Object detection | DaSNet-v2, HigherHRNet | - | 89.11% | - | |
[80] | - | 2024 | Path Planning | DRL-MPC-GNNs | 92.91% | - | 96.82% | Recall: 91.47% |
[82] | Apple | 2023 | Path Planning | Multi-task DCNN | 71.0% | 67.3% | 92.0% | Recall: 74.4% |
3.3. Intelligent Control of End Effector
3.3.1. Target Attribute Extraction and Grip Force Prediction for Haptic Perception
3.3.2. Optimal Gripping Posture Detection by Fusing Visual Information
Ref. | Crop | Year | Task | Network Framework and Algorithms | Precision | Acc | Recall | Other Metric |
---|---|---|---|---|---|---|---|---|
[83] | 15 types of fruits | 2023 | Force Prediction | Transformer DNN | 97.33% | 97.33% | - | MAE: 0.216 |
[84] | Peach | 2025 | Force Prediction | CNN, CNN-LSTM | - | - | - | R2: 94.2% |
[85] | Tomato, Nectarine | 2023 | Force Prediction | CNN-LSTM, ResNet | 95.7% | 97.8% | 98.1% | IoU: 90.2% |
[87] | Tomato | 2024 | Force Prediction | CNN, LSTM | 95.7% | 97.8% | 98.1% | - |
[89] | Kiwifruit | 2022 | Grab detection | GG-CNN2, YOLO v4 | 76.0% | - | - | Success rate: 88.7% |
[91] | Yellow Peaches | 2023 | Object detection | CM-YOLOv5s-CSPVoVnet | 87.8% | - | 94.2% | - |
[93] | - | 2018 | Grab detection | Two-stream CNNs | 93.4% | - | 77.2% | - |
3.3.3. Collision-Free Control of Robotic Arm Attitude
3.3.4. Deep Learning-Based Collaborative Energy Efficiency Optimization for Multiple Robots
3.4. Summary
4. Training and Optimization of Deep Learning Models in Picking Robots
4.1. Dataset Construction and Labeling Methods
4.2. Model Training Strategy
4.3. Model Optimization and Deployment
4.4. Analysis of Model Adaptability and Scalability
5. Challenges and Future Trends
5.1. Robust Perception Algorithms for Complex Environments
5.2. Deep Multimodal Information Fusion Framework
5.3. New Paradigm for Small-Sample Learning in Agriculture
5.4. Data-Driven Autonomous Operation Planning
5.5. New Mode of Human–Computer Interaction in Intelligent Agriculture
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Xiao, F.; Wang, H.; Li, Y.; Cao, Y.; Lv, X.; Xu, G. Object Detection and Recognition Techniques Based on Digital Image Processing and Traditional Machine Learning for Fruit and Vegetable Harvesting Robots: An Overview and Review. Agronomy 2023, 13, 639. [Google Scholar] [CrossRef]
- Xiong, Y.; Ge, Y.; Grimstad, L.; From, P.J. An autonomous strawberry-harvesting robot: Design, development, integration, and field evaluation. J. Field Robot. 2019, 37, 202–224. [Google Scholar] [CrossRef]
- Khosro Anjom, F.; Vougioukas, S.G.; Slaughter, D.C. Development of a linear mixed model to predict the picking time in strawberry harvesting processes. Biosyst. Eng. 2018, 166, 76–89. [Google Scholar] [CrossRef]
- Zhang, K.; Lammers, K.; Chu, P.; Li, Z.; Lu, R. System design and control of an apple harvesting robot. Mechatronics 2021, 79, 102644. [Google Scholar] [CrossRef]
- Xie, H.; Zhang, D.; Yang, L.; Cui, T.; He, X.; Zhang, K.; Zhang, Z. Development, Integration, and Field Evaluation of a Dual-Arm Ridge Cultivation Strawberry Autonomous Harvesting Robot. J. Field Robot. 2024; early view. [Google Scholar] [CrossRef]
- Bac, C.W.; van Henten, E.J.; Hemming, J.; Edan, Y. Harvesting Robots for High-value Crops: State-of-the-art Review and Challenges Ahead. J. Field Robot. 2014, 31, 888–911. [Google Scholar] [CrossRef]
- Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J.; Berkeley, U. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A.; Recognition, P. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Sa, I.; Ge, Z.; Dayoub, F.; Upcroft, B.; Perez, T.; McCool, C. DeepFruits: A Fruit Detection System Using Deep Neural Networks. Sensors 2016, 16, 1222. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Gao, P.; Zheng, C.; Tian, L.; Tian, Y. A Deep Reinforcement Learning Strategy Combining Expert Experience Guidance for a Fruit-Picking Manipulator. Electronics 2022, 11, 311. [Google Scholar] [CrossRef]
- Frid-Adar, M.; Klang, E.; Amitai, M.M.; Goldberger, J.; Greenspan, H. Synthetic data augmentation using GAN for improved liver lesion classification. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; Volume 83, pp. 289–293. [Google Scholar] [CrossRef]
- Talaei Khoei, T.; Ould Slimane, H.; Kaabouch, N. Deep learning: Systematic review, models, challenges, and research directions. Neural Comput. Appl. 2023, 35, 23103–23124. [Google Scholar] [CrossRef]
- Espinoza, S.; Aguilera, C.; Rojas, L.; Campos, P.G. Analysis of Fruit Images With Deep Learning: A Systematic Literature Review and Future Directions. IEEE Access 2024, 12, 3837–3859. [Google Scholar] [CrossRef]
- Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Vargas-Hakim, G.-A.; Mezura-Montes, E.; Acosta-Mesa, H.-G. A Review on Convolutional Neural Network Encodings for Neuroevolution. IEEE Trans. Evol. Comput. 2022, 26, 12–27. [Google Scholar] [CrossRef]
- Yao, G.; Lei, T.; Zhong, J. A review of Convolutional-Neural-Network-based action recognition. Pattern Recognit. Lett. 2019, 118, 14–22. [Google Scholar] [CrossRef]
- Chen, F.; Li, S.; Han, J.; Ren, F.; Yang, Z. Review of Lightweight Deep Convolutional Neural Networks. Arch. Comput. Methods Eng. 2023, 31, 1915–1937. [Google Scholar] [CrossRef]
- Yu, Y.; Zhang, K.; Yang, L.; Zhang, D. Fruit detection for strawberry harvesting robot in non-structural environment based on Mask-RCNN. Comput. Electron. Agric. 2019, 163, 104846. [Google Scholar] [CrossRef]
- Ge, Y.; From, P.J.; Xiong, Y. Multi-view gripper internal sensing for the regression of strawberry ripeness using a mini-convolutional neural network for robotic harvesting. Comput. Electron. Agric. 2024, 216, 108474. [Google Scholar] [CrossRef]
- Coll-Ribes, G.; Torres-Rodríguez, I.J.; Grau, A.; Guerra, E.; Sanfeliu, A. Accurate detection and depth estimation of table grapes and peduncles for robot harvesting, combining monocular depth estimation and CNN methods. Comput. Electron. Agric. 2023, 215, 108362. [Google Scholar] [CrossRef]
- Zeeshan, S.; Aized, T.; Riaz, F. The Design and Evaluation of an Orange-Fruit Detection Model in a Dynamic Environment Using a Convolutional Neural Network. Sustainability 2023, 15, 4329. [Google Scholar] [CrossRef]
- Attri, I.; Awasthi, L.K.; Sharma, T.P.; Rathee, P. A review of deep learning techniques used in agriculture. Ecol. Inform. 2023, 77, 102217. [Google Scholar] [CrossRef]
- Simonyan, A.Z.K. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, Ababa, Ethiopia, 24–28 April 2025; p. 1556. [Google Scholar]
- Yang, L.; Yu, X.; Zhang, S.; Long, H.; Zhang, H.; Xu, S.; Liao, Y. GoogLeNet based on residual network and attention mechanism identification of rice leaf diseases. Comput. Electron. Agric. 2023, 204, 107543. [Google Scholar] [CrossRef]
- Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar]
- Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
- Xiuling, Z.; Huijuan, W.; Yu, S.; Gang, C.; Suhua, Z.; Quanbo, Y. Starting from the structure: A review of small object detection based on deep learning. Image Vis. Comput. 2024, 146, 105054. [Google Scholar] [CrossRef]
- Sumit; Shrishti, B.; Sunita, J.; Urvi, R. Comprehensive Review of R-CNN and its Variant Architectures. Int. Res. J. Adv. Eng. Hub 2024, 2, 959–966. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, Washington, DC, USA, 6 June 2016; pp. 1137–1149. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Sirisha, U.; Praveen, S.P.; Srinivasu, P.N.; Barsocchi, P.; Bhoi, A.K. Statistical Analysis of Design Aspects of Various YOLO-Based Deep Learning Models for Object Detection. Int. J. Comput. Intell. Syst. 2023, 16, 126. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, Y. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Rong, J.; Wang, P.; Yang, Q.; Huang, F. A Field-Tested Harvesting Robot for Oyster Mushroom in Greenhouse. Agronomy 2021, 11, 1210. [Google Scholar] [CrossRef]
- Zhong, M.; Han, R.; Liu, Y.; Huang, B.; Chai, X.; Liu, Y. Development, integration, and field evaluation of an autonomous Agaricus bisporus picking robot. Comput. Electron. Agric. 2024, 220, 108871. [Google Scholar] [CrossRef]
- Liu, T.; Kang, H.; Chen, C. ORB-Livox: A real-time dynamic system for fruit detection and localization. Comput. Electron. Agric. 2023, 209, 107834. [Google Scholar] [CrossRef]
- Liu, J.; Liu, Z. YOLOv5s-BC: An improved YOLOv5s-based method for real-time apple detection. J. Real Time Image Process. 2024, 21, 88. [Google Scholar] [CrossRef]
- Le, N.; Rathour, V.S.; Yamazaki, K.; Luu, K.; Savvides, M. Deep reinforcement learning in computer vision: A comprehensive survey. Artif. Intell. Rev. 2021, 55, 2733–2819. [Google Scholar] [CrossRef]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.M.O.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D.J.C. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft Actor-Critic Algorithms and Applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Li, Y.; Feng, Q.; Zhang, Y.; Peng, C.; Ma, Y.; Liu, C.; Ru, M.; Sun, J.; Zhao, C. Peduncle collision-free grasping based on deep reinforcement learning for tomato harvesting robot. Comput. Electron. Agric. 2024, 216, 108488. [Google Scholar] [CrossRef]
- Wang, Y.; He, Z.; Cao, D.; Ma, L.; Li, K.; Jia, L.; Cui, Y. Coverage path planning for kiwifruit picking robots based on deep reinforcement learning. Comput. Electron. Agric. 2023, 205, 107593. [Google Scholar] [CrossRef]
- Lin, G.; Zhu, L.; Li, J.; Zou, X.; Tang, Y. Collision-free path planning for a guava-harvesting robot based on recurrent deep reinforcement learning. Comput. Electron. Agric. 2021, 188, 106350. [Google Scholar] [CrossRef]
- Li, Y.; Feng, Q.; Zhang, Y.; Peng, C.; Zhao, C. Intermittent Stop-Move Motion Planning for Dual-Arm Tomato Harvesting Robot in Greenhouse Based on Deep Reinforcement Learning. Biomimetics 2024, 9, 105. [Google Scholar] [CrossRef]
- Yang, J.; Ni, J.; Li, Y.; Wen, J.; Chen, D. The Intelligent Path Planning System of Agricultural Robot via Reinforcement Learning. Sensors 2022, 22, 4316. [Google Scholar] [CrossRef]
- Yi, Z.; Chang, T.; Li, S.; Liu, R.; Zhang, J.; Hao, A. Scene-Aware Deep Networks for Semantic Segmentation of Images. IEEE Access 2019, 7, 69184–69193. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. In Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, Seoul, Republic of Korea, 11–12 June 2017; pp. 2481–2495. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Lecture Notes in Computer Science. pp. 234–241. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Jia, W.; Sun, M.; Hou, S.; Zheng, Y. A novel green apple segmentation algorithm based on ensemble U-Net under complex orchard environment. Comput. Electron. Agric. 2021, 180, 105900. [Google Scholar] [CrossRef]
- Rasheed, M.; Jasim, W.M.; Farhan, R. Enhancing robotic grasping with attention mechanism and advanced UNet architectures in generative grasping convolutional neural networks. Alex. Eng. J. 2024, 102, 149–158. [Google Scholar] [CrossRef]
- Peng, Y.; Zhao, S.; Liu, J. Segmentation of Overlapping Grape Clusters Based on the Depth Region Growing Method. Electronics 2021, 10, 2813. [Google Scholar] [CrossRef]
- Kang, S.; Li, D.; Li, B.; Zhu, J.; Long, S.; Wang, J. Maturity identification and category determination method of broccoli based on semantic segmentation models. Comput. Electron. Agric. 2024, 217, 108633. [Google Scholar] [CrossRef]
- Zhang, P.; Dai, N.; Liu, X.; Yuan, J.; Xin, Z. A novel lightweight model HGCA-YOLO: Application to recognition of invisible spears for white asparagus robotic harvesting. Comput. Electron. Agric. 2024, 220, 108852. [Google Scholar] [CrossRef]
- Zhang, P.; Liu, X.; Yuan, J.; Liu, C. YOLO5-spear: A robust and real-time spear tips locator by improving image augmentation and lightweight network for selective harvesting robot of white asparagus. Biosyst. Eng. 2022, 218, 43–61. [Google Scholar] [CrossRef]
- Xiao, X.; Jiang, Y.; Wang, Y. A Method of Robot Picking Citrus Based on 3D Detection. IEEE Instrum. Meas. Mag. 2024, 27, 50–58. [Google Scholar] [CrossRef]
- Chai, J.J.K.; Xu, J.-L.; O’Sullivan, C. Real-Time Detection of Strawberry Ripeness Using Augmented Reality and Deep Learning. Sensors 2023, 23, 7639. [Google Scholar] [CrossRef]
- Lee, G.; Yonrith, P.; Yeo, D.; Hong, A. Enhancing detection performance for robotic harvesting systems through RandAugment. Eng. Appl. Artif. Intell. 2023, 123, 106445. [Google Scholar] [CrossRef]
- Škrabánek, P.; Doležel, P.; Matoušek, R. RGB images-driven recognition of grapevine varieties using a densely connected convolutional network. Log. J. IGPL 2023, 31, 618–633. [Google Scholar] [CrossRef]
- Qi, C.; Gao, J.; Pearson, S.; Harman, H.; Chen, K.; Shu, L. Tea chrysanthemum detection under unstructured environments using the TC-YOLO model. Expert Syst. Appl. 2022, 193, 116473. [Google Scholar] [CrossRef]
- Bresilla, K.; Perulli, G.D.; Boini, A.; Morandi, B.; Corelli Grappadelli, L.; Manfrini, L. Single-Shot Convolution Neural Networks for Real-Time Fruit Detection Within the Tree. Front. Plant Sci. 2019, 10, 611. [Google Scholar] [CrossRef]
- Kang, H.; Zhou, H.; Wang, X.; Chen, C. Real-Time Fruit Recognition and Grasping Estimation for Robotic Apple Harvesting. Sensors 2020, 20, 5670. [Google Scholar] [CrossRef]
- Fujinaga, T. Strawberries recognition and cutting point detection for fruit harvesting and truss pruning. Precis. Agric. 2024, 25, 1262–1283. [Google Scholar] [CrossRef]
- Lemsalu, M.; Bloch, V.; Backman, J.; Pastell, M. Real-Time CNN-based Computer Vision System for Open-Field Strawberry Harvesting Robot. IFAC Pap. 2022, 55, 24–29. [Google Scholar] [CrossRef]
- Li, Y.; Wang, W.; Guo, X.; Wang, X.; Liu, Y.; Wang, D. Recognition and Positioning of Strawberries Based on Improved YOLOv7 and RGB-D Sensing. Agriculture 2024, 14, 624. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, Z.; Luo, L.; Wei, H.; Wang, W.; Chen, M.; Luo, S. DualSeg: Fusing transformer and CNN structure for image segmentation in complex vineyard environment. Comput. Electron. Agric. 2023, 206, 107682. [Google Scholar] [CrossRef]
- Li, Y.; Feng, Q.; Liu, C.; Xiong, Z.; Sun, Y.; Xie, F.; Li, T.; Zhao, C. MTA-YOLACT: Multitask-aware network on fruit bunch identification for cherry tomato robotic harvesting. Eur. J. Agron. 2023, 146, 126821. [Google Scholar] [CrossRef]
- Jing, X.; Jiang, H.; Niu, S.; Zhang, H.; Gilbert Murengami, B.; Wu, Z.; Li, R.; Zhou, C.; Ye, H.; Chen, J.; et al. End-to-end stereo matching network with two-stage partition filtering for full-resolution depth estimation and precise localization of kiwifruit for robotic harvesting. Comput. Electron. Agric. 2024, 225, 109333. [Google Scholar] [CrossRef]
- Kim, T.; Lee, D.-H.; Kim, K.-C.; Kim, Y.-J. 2D pose estimation of multiple tomato fruit-bearing systems for robotic harvesting. Comput. Electron. Agric. 2023, 211, 108004. [Google Scholar] [CrossRef]
- Miao, Z.; Yu, X.; Li, N.; Zhang, Z.; He, C.; Li, Z.; Deng, C.; Sun, T. Efficient tomato harvesting robot based on image processing and deep learning. Precis. Agric. 2022, 24, 254–287. [Google Scholar] [CrossRef]
- Chen, M.; Chen, Z.; Luo, L.; Tang, Y.; Cheng, J.; Wei, H.; Wang, J. Dynamic visual servo control methods for continuous operation of a fruit harvesting robot working throughout an orchard. Comput. Electron. Agric. 2024, 219, 108774. [Google Scholar] [CrossRef]
- Wang, X.; Kang, H.; Zhou, H.; Au, W.; Chen, C. Geometry-aware fruit grasping estimation for robotic harvesting in apple orchards. Comput. Electron. Agric. 2022, 193, 106716. [Google Scholar] [CrossRef]
- Xiong, Y.; Ge, Y.; From, P.J. An obstacle separation method for robotic picking of fruits in clusters. Comput. Electron. Agric. 2020, 175, 105397. [Google Scholar] [CrossRef]
- Kok, E.; Chen, C. Occluded apples orientation estimator based on deep learning model for robotic harvesting. Comput. Electron. Agric. 2024, 219, 108781. [Google Scholar] [CrossRef]
- Li, Z.; Shi, N.; Zhao, L.; Zhang, M. Deep reinforcement learning path planning and task allocation for multi-robot collaboration. Alex. Eng. J. 2024, 109, 408–423. [Google Scholar] [CrossRef]
- Martini, M.; Eirale, A.; Cerrato, S.; Chiaberge, M. PIC4rl-gym: A ROS2 Modular Framework for Robots Autonomous Navigation with Deep Reinforcement Learning. In Proceedings of the 2023 3rd International Conference on Computer, Control and Robotics, Zhangjiajie, China, 24–26 March 2023; pp. 198–202. [Google Scholar] [CrossRef]
- Li, T.; Xie, F.; Zhao, Z.; Zhao, H.; Guo, X.; Feng, Q. A multi-arm robot system for efficient apple harvesting: Perception, task plan and control. Comput. Electron. Agric. 2023, 211, 107979. [Google Scholar] [CrossRef]
- Huang, R.; Zheng, W.; Zhang, B.; Zhou, J.; Cui, Z.; Zhang, Z. Deep learning with tactile sequences enables fruit recognition and force prediction for damage-free grasping. Comput. Electron. Agric. 2023, 211, 107985. [Google Scholar] [CrossRef]
- Ma, C.; Ying, Y.; Xie, L. Development of a visuo-tactile sensor for non-destructive peach firmness and contact force measurement suitable for robotic arm applications. Food Chem. 2025, 467, 142282. [Google Scholar] [CrossRef]
- Lin, J.; Hu, Q.; Xia, J.; Zhao, L.; Du, X.; Li, S.; Chen, Y.; Wang, X. Non-destructive fruit firmness evaluation using a soft gripper and vision-based tactile sensing. Comput. Electron. Agric. 2023, 214, 108256. [Google Scholar] [CrossRef]
- Li, S.; Sun, W.; Liang, Q.; Liu, C.; Liu, J. Assessing fruit hardness in robot hands using electric gripper actuators with tactile sensors. Sens. Actuators A Phys. 2024, 365, 114843. [Google Scholar] [CrossRef]
- Han, Y.; Yu, K.; Batra, R.; Boyd, N.; Mehta, C.; Zhao, T.; She, Y.; Hutchinson, S.; Zhao, Y. Learning Generalizable Vision-Tactile Robotic Grasping Strategy for Deformable Objects via Transformer. IEEE/ASME Trans. Mechatron. 2024, 34, 554–566. [Google Scholar] [CrossRef]
- Zhang, T.; Huang, Z.; You, W.; Lin, J.; Tang, X.; Huang, H. An Autonomous Fruit and Vegetable Harvester with a Low-Cost Gripper Using a 3D Sensor. Sensors 2019, 20, 93. [Google Scholar] [CrossRef] [PubMed]
- Ma, L.; He, Z.; Zhu, Y.; Jia, L.; Wang, Y.; Ding, X.; Cui, Y. A Method of Grasping Detection for Kiwifruit Harvesting Robot Based on Deep Learning. Agronomy 2022, 12, 3096. [Google Scholar] [CrossRef]
- Sun, Q.; Zhong, M.; Chai, X.; Zeng, Z.; Yin, H.; Zhou, G.; Sun, T. Citrus pose estimation from an RGB image for automated harvesting. Comput. Electron. Agric. 2023, 211, 108022. [Google Scholar] [CrossRef]
- Wang, Y.; Wu, H.; Zhu, Z.; Ye, Y.; Qian, M. Continuous picking of yellow peaches with recognition and collision-free path. Comput. Electron. Agric. 2023, 214, 108273. [Google Scholar] [CrossRef]
- Zhang, H.; Li, X.; Wang, L.; Liu, D.; Wang, S. Construction and Optimization of a Collaborative Harvesting System for Multiple Robotic Arms and an End-Picker in a Trellised Pear Orchard Environment. Agronomy 2023, 14, 80. [Google Scholar] [CrossRef]
- Ni, P.; Zhang, W.; Bai, W.; Lin, M.; Cao, Q. A New Approach Based on Two-stream CNNs for Novel Objects Grasping in Clutter. J. Intell. Robot. Syst. 2018, 94, 161–177. [Google Scholar] [CrossRef]
- Tabakis, I.-M.; Dasygenis, M. Deep Reinforcement Learning-Based Path Planning for Dynamic and Heterogeneous Environments. In Proceedings of the 2024 Panhellenic Conference on Electronics & Telecommunications (PACET), Thessaloniki, Greece, 28–29 March 2024; pp. 1–4. [Google Scholar]
- Lin, J. Path planning based on reinforcement learning. Appl. Comput. Eng. 2023, 5, 853–858. [Google Scholar] [CrossRef]
- Choi, J.; Lee, G.; Lee, C. Reinforcement learning-based dynamic obstacle avoidance and integration of path planning. Intell. Serv. Robot. 2021, 14, 663–677. [Google Scholar] [CrossRef] [PubMed]
- Zhao, W.; Queralta, J.P.; Li, Q.; Westerlund, T.; Engineering, A. Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning. In Proceedings of the 2020 5th International Conference on Robotics and Automation Engineering (ICRAE), Singapore, 20–22 November 2020; pp. 7–12. [Google Scholar] [CrossRef]
- Ao, T.; Zhang, K.; Shi, H.; Jin, Z.; Zhou, Y.; Liu, F. Energy-Efficient Multi-UAVs Cooperative Trajectory Optimization for Communication Coverage: An MADRL Approach. Remote Sens. 2023, 15, 429. [Google Scholar] [CrossRef]
- Wu, B.; Suh, S.J.A. Deep Reinforcement Learning for Decentralized Multi-Robot Control: A DQN Approach to Robustness and Information Integration. arXiv 2024, arXiv:2408.11339. [Google Scholar] [CrossRef]
- Xiao, Y.; Xiao, L.; Wan, K.; Yang, H.; Zhang, Y.; Wu, Y.; Zhang, Y. Reinforcement Learning Based Energy-Efficient Collaborative Inference for Mobile Edge Computing. IEEE Trans. Commun. 2023, 71, 864–876. [Google Scholar] [CrossRef]
- Hwang, H.-J.; Cho, J.-H.; Kim, Y.-T. Deep Learning-Based Real-Time 6D Pose Estimation and Multi-Mode Tracking Algorithms for Citrus-Harvesting Robots. Machines 2024, 12, 642. [Google Scholar] [CrossRef]
- Zhang, L.; Jia, J.; Gui, G.; Hao, X.; Gao, W.; Wang, M. Deep Learning Based Improved Classification System for Designing Tomato Harvesting Robot. IEEE Access 2018, 6, 67940–67950. [Google Scholar] [CrossRef]
- Qi, C.; Gao, J.; Chen, K.; Shu, L.; Pearson, S. Tea Chrysanthemum Detection by Leveraging Generative Adversarial Networks and Edge Computing. Front. Plant Sci. 2022, 13, 850606. [Google Scholar] [CrossRef] [PubMed]
- Sa, I.; Lim, J.Y.; Ahn, H.S.; MacDonald, B. deepNIR: Datasets for Generating Synthetic NIR Images and Improved Fruit Detection System Using Deep Learning Techniques. Sensors 2022, 22, 4721. [Google Scholar] [CrossRef]
- Dai, Y.; Zhao, P.; Wang, Y. Maturity discrimination of tobacco leaves for tobacco harvesting robots based on a Multi-Scale branch attention neural network. Comput. Electron. Agric. 2024, 224, 109133. [Google Scholar] [CrossRef]
- Kim, J.; Pyo, H.; Jang, I.; Kang, J.; Ju, B.; Ko, K. Tomato harvesting robotic system based on Deep-ToMaToS: Deep learning network using transformation loss for 6D pose estimation of maturity classified tomatoes with side-stem. Comput. Electron. Agric. 2022, 201, 107300. [Google Scholar] [CrossRef]
- Wang, W.; Shi, Y.; Liu, W.; Che, Z. An Unstructured Orchard Grape Detection Method Utilizing YOLOv5s. Agriculture 2024, 14, 262. [Google Scholar] [CrossRef]
- Bargoti, S.; Underwood, J. Deep Fruit Detection in Orchards. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3626–3633. [Google Scholar] [CrossRef]
- Károly, A.I.; Tirczka, S.; Gao, H.; Rudas, I.J.; Galambos, P. Increasing the Robustness of Deep Learning Models for Object Segmentation: A Framework for Blending Automatically Annotated Real and Synthetic Data. IEEE Trans. Cybern. 2024, 54, 25–38. [Google Scholar] [CrossRef]
- Nguyen, H.-T.; Cheah, C.C.; Toh, K.-A. An analytic layer-wise deep learning framework with applications to robotics. Automatica 2022, 135, 110007. [Google Scholar] [CrossRef]
- Song, C.; Wang, K.; Wang, C.; Tian, Y.; Wei, X.; Li, C.; An, Q.; Song, J. TDPPL-Net: A Lightweight Real-Time Tomato Detection and Picking Point Localization Model for Harvesting Robots. IEEE Access 2023, 11, 37650–37664. [Google Scholar] [CrossRef]
- Ayranci, A.A.; Erkmen, B. Edge Computing and Robotic Applications in Modern Agriculture. In Proceedings of the 2024 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Istanbul, Turkey, 23–25 May 2024; pp. 1–6. [Google Scholar]
- Zhang, X.; Cao, Z.; Dong, W. Overview of Edge Computing in the Agricultural Internet of Things: Key Technologies, Applications, Challenges. IEEE Access 2020, 8, 141748–141761. [Google Scholar] [CrossRef]
- Wang, C.; Zou, X.; Tang, Y.; Luo, L.; Feng, W. Localisation of litchi in an unstructured environment using binocular stereo vision. Biosyst. Eng. 2016, 145, 39–51. [Google Scholar] [CrossRef]
- Tituaña, L.; Gholami, A.; He, Z.; Xu, Y.; Karkee, M.; Ehsani, R. A small autonomous field robot for strawberry harvesting. Smart Agric. Technol. 2024, 8, 100454. [Google Scholar] [CrossRef]
- Chu, P.; Li, Z.; Lammers, K.; Lu, R.; Liu, X.J.A. DeepApple: Deep Learning-based Apple Detection using a Suppression Mask R-CNN. arXiv 2020, arXiv:2010.09870. [Google Scholar]
- Li, Z.; Wang, J.; Gao, G.; Lei, Y.; Zhao, C.; Wang, Y.; Bai, H.; Liu, Y.; Guo, X.; Li, Q. SGSNet: A lightweight deep learning model for strawberry growth stage detection. Front. Plant Sci. 2024, 15, 91706. [Google Scholar] [CrossRef]
- Su, Z.; Zhang, C.; Yan, T.; Zhu, J.; Zeng, Y.; Lu, X.; Gao, P.; Feng, L.; He, L.; Fan, L. Application of Hyperspectral Imaging for Maturity and Soluble Solids Content Determination of Strawberry with Deep Learning Approaches. Front. Plant Sci. 2021, 12, 736334. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H.J.A. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q.V.J.A. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar] [CrossRef]
- Woehrle, T.; Sivakumar, A.N.V.; Uppalapati, N.K.; Chowdhary, G.J.A. MetaCropFollow: Few-Shot Adaptation with Meta-Learning for Under-Canopy Navigation. arXiv 2024, arXiv:2411.14092. [Google Scholar] [CrossRef]
- Ghadirzadeh, A.; Chen, X.; Poklukar, P.; Finn, C.; Björkman, M.; Kragic, D. Systems. Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms. In Proceedings of the IEEE/RJS International Conference on Intelligent Robots and Systems, Prague, Czech Republic, 27 September–1 October 2021; Volume 96, pp. 1274–1280. [Google Scholar] [CrossRef]
Method Category | Typical Networks | Mechanism | Advantage | Limitation |
---|---|---|---|---|
Convolutional neural network | LeNet, AlexNet VGGNet, ResNet | Hierarchical feature extraction and representation learning of images through local connectivity, weight sharing, and spatial dimensionality reduction. | Automatically learns rich features from a large amount of image data and performs well in tasks such as fruit detection and ripeness determination. | Dependent on a large amount of labeled data; the amount of computation required in training and inference is huge, leading to high requirements for computational resources and storage space; it is susceptible to noise, occlusion, and adversarial attacks in feature learning. |
Target detection | YOLO Series, SSD RCNN, FastR-CNN | Through end-to-end feature learning, candidate region generation, classification, and regression, the mapping from images to target bounding boxes and categories is realized. | Quickly and accurately identifies and locates mature targets in complex agricultural production environments to support picking decisions and path planning | Generalization and robustness to be further optimized with respect to scene lighting issues, shooting angle, fruit shape, etc. |
Deep reinforcement learning | DQN, DDPG, SAC | Approximating the value function or policy function of reinforcement learning through deep neural networks, thus realizing end-to-end autonomous learning and decision making. | Adaptive extraction of useful features in the face of complex, high-dimensional environments. Strong generalization ability, able to deal with problems such as continuous action space. | Requires a large amount of environmental interaction data, slower training and learning process, easily affected by hyperparameters, initialization and other factors, poor model interpretability, limiting its application effect. |
Semantic segmentation | FCN, SegNet, U-Net, DeepLab | Each pixel within the image is systematically classified to ascertain the specific region or category to which it belongs. | It is capable of achieving extremely high target recognition and classification accuracy at the pixel level, accurately distinguishing between different target classes. | Due to the reliance on extensive pixel-level sample data, the computational requirements are substantial, leading to suboptimal real-time performance and reduced efficiency in practical applications. For targets with small areas or low frequencies, issues such as inaccurate segmentation and inadequate handling of fuzzy boundaries arise. |
Technical Category | Technology Type | Main Advantages | Main Disadvantages | Representative Technologies (Frameworks/Methodologies) |
---|---|---|---|---|
Visual perception and target recognition | CNN-based Classification/Detection | Strong feature representation, adaptable to various fruits | Large model size, slow inference, difficult to deploy | ResNet, EfficientNet |
YOLO Series Object Detection | Fast detection, suitable for real-time systems | Poor accuracy for small/occluded targets | YOLOv5, YOLOv7, YOLO-NAS | |
Attention Mechanisms (Spatial/Channel) | Focuses on key areas, improves recognition in cluttered scenes | Increased complexity, harder to train | CBAM, SE, Transformer Attention | |
Multimodal Fusion (RGB + D) | Strong spatial understanding, better in occluded scenarios | Complex calibration, high hardware cost | DenseFusion, PointNet, RGB-DNet | |
Self-Supervised/Few-Shot Learning | Less dependence on annotations, fast adaptation | Weak generalization, unstable training | SimCLR, Meta-RCNN, FSL-YOLO | |
Path planning and motion control | Deep Reinforcement Learning (DRL) | Learns adaptive paths and obstacle avoidance | Slow convergence, needs large-scale simulation | PPO, DDPG, SAC, TD3 |
GNN + Reinforcement Learning | Learns orchard topology, enables intelligent decision making | Complex architecture, hard to train | GNN-RL, GraphNav, GAT-RL | |
Imitation Learning | Efficient training from expert demonstrations | Lacks exploration, cannot self-correct errors | Behavior Cloning, DAgger | |
Deep Visual Navigation | No map needed, end-to-end instruction prediction | Low fault tolerance, image quality dependent | Deep Visual Planner, VNNet | |
Learning-Based SLAM | Supports dynamic environment mapping and navigation | Complex fusion, hard to deploy | DL-SLAM, Neural SLAM, DROID-SLAM | |
Intelligent control of end effector | Deep Learning-Based Force Control | Adjusts gripping force automatically, reduces damage | Requires accurate sensors and data | DeepForce, GripNet, TactileCNN |
Multimodal Fusion (Vision + Tactile + Force) | High adaptability, robust grasping | Complex system integration, real-time challenges | Visuo-Tactile Fusion Net, GelSightNet | |
Reinforcement Learning for Grasping | Learns grasping strategies for different shapes | High training cost, low sample efficiency | QT-Opt, VPG, RL-GraspNet | |
Generative Models for Grasping (VAE/GAN) | Predicts grasp strategies under incomplete observations | Unstable inference, hard to train | GraspGAN, VAE-Grasp | |
Graph-Based Grasping Strategy (GNN) | Suitable for fruit clusters, captures structural features | Complex modeling, limited real-world application | GGCNN, GAT-Grasp |
Dataset | Data Volume | Data Acquisition Methods | Data Processing Methods |
---|---|---|---|
White asparagus dataset | 6498 images (4248 non-rain enhanced + 2250 rain enhanced) | Image acquisition was performed using an industrial camera (Basler acA2500-14gc, made in Basler AG, Ahrensburg, Germany) with an image resolution of 2540 × 1920 pixels. | Multi-scale combined images, resampling-based image transformation |
Apple orchards dataset | 768 RGB-D images, 1132 color images | Using the Intel RealSense D435 RGB-D camera (made in Intel Corporation, Santa Clara, CA, USA) | Spatial conversion, color distortion, point cloud processing |
Citrus Orchard Dataset | 10,000 images | Using the Intel RealSense D435 RGB-D Camera (made in Intel Corporation, Santa Clara, CA, USA) | Using BlenderProc in conjunction with Blender to generate virtual datasets |
Tomato Dataset | 200 images | Manual shooting | Geometric transformations, random noise |
Tea Chrysanthemum Dataset | 26,432 images | Using an Apple X phone (made in Apple Inc., Cupertino, CA, USA) with an image resolution of 1080 × 1920 | TC-GAN |
Orange Dataset | 2000 images | Using the Azure Kinect RGB-D camera (made in Microsoft Corporation, Washington, DC, USA) with a resolution of 1920 × 1080 | Flip, rotate, and change lighting on images |
Bell Pepper Dataset | 1615 pairs of RGB + NIR images | Using the multispectral camera acquisition | Synthesis of NIR image from RGB image using GAN |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tan, Y.; Liu, X.; Zhang, J.; Wang, Y.; Hu, Y. A Review of Research on Fruit and Vegetable Picking Robots Based on Deep Learning. Sensors 2025, 25, 3677. https://doi.org/10.3390/s25123677
Tan Y, Liu X, Zhang J, Wang Y, Hu Y. A Review of Research on Fruit and Vegetable Picking Robots Based on Deep Learning. Sensors. 2025; 25(12):3677. https://doi.org/10.3390/s25123677
Chicago/Turabian StyleTan, Yarong, Xin Liu, Jinmeng Zhang, Yigang Wang, and Yanxiang Hu. 2025. "A Review of Research on Fruit and Vegetable Picking Robots Based on Deep Learning" Sensors 25, no. 12: 3677. https://doi.org/10.3390/s25123677
APA StyleTan, Y., Liu, X., Zhang, J., Wang, Y., & Hu, Y. (2025). A Review of Research on Fruit and Vegetable Picking Robots Based on Deep Learning. Sensors, 25(12), 3677. https://doi.org/10.3390/s25123677