Research on a Method for Identifying and Localizing Goji Berries Based on Binocular Stereo Vision Technology
Abstract
1. Introduction
2. Data Acquisition
2.1. Data Set Capture
2.2. Data Set Augmentation
3. Method
3.1. Testing Procedure
3.2. YOLO-VitBiS Object Detection Network
3.3. Binocular Stereoscopic Vision Technology
3.3.1. Stereo Camera Calibration
3.3.2. CRE Stereo Matching Algorithm
3.4. Experimental Setup and Evaluation Metrics
4. Experimental Results and Analysis
4.1. Training Results for the YOLO-VitBiS Object Detection Network
4.1.1. Ablation Experiment
4.1.2. Comparative Experiments
4.2. Analysis of Spatial Positioning Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ren, X.; Lin, G.; Hui, J.; Zhang, J.; Zhang, J. Current Situation and Sustainable Development Suggestions for the Small Berry Industry in Xinjiang. North. Hortic. 2023, 3, 127–132. (In Chinese) [Google Scholar]
- Li, Y.; Hu, Z.; Zhang, Y.; Wei, L. Research Progress on Mechanized Harvesting Technology and Equipment for Goji Berries. J. Chin. Agric. Mech. 2024, 45, 16–21+35. (In Chinese) [Google Scholar] [CrossRef]
- Chen, Q.; Zhang, S.; Wei, N.; Fan, Y.; Zhang, W.; Wang, Z.; Chen, J.; Chen, Y. Simulation and Experiment of Goji Berry Vibrational Harvesting under Different Excitation Modes. Trans. Chin. Soc. Agric. Eng. 2025, 41, 32–42. (In Chinese) [Google Scholar]
- Liu, Y.; Liu, J.; Zhao, J.; Zhao, D.; Zhang, H.; Su, X.; Feng, Y.; Cheng, Y.; Li, Z. Research Progress on Theories and Equipment for Mechanized Goji Berry Harvesting. Sci. Silvae Sin. 2025, 61, 222–232. (In Chinese) [Google Scholar]
- Min, X.; Ye, Y.; Xiong, S.; Chen, X. Computer Vision Meets Generative Models in Agriculture: Technological Advances, Challenges and Opportunities. Appl. Sci. 2025, 15, 7663. [Google Scholar] [CrossRef]
- Yang, X.; Zhong, J.; Lin, K.; Wu, J.; Chen, J.; Si, H. Research Progress on Binocular Stereo Vision Technology and Its Applications in Smart Agriculture. Trans. Chin. Soc. Agric. Eng. 2025, 41, 27–39. [Google Scholar] [CrossRef]
- Hou, C.; Xu, J.; Tang, Y.; Zhuang, J.; Tan, Z.; Chen, W.; Wei, S.; Huang, H.; Fang, M. Detection and Localization of Citrus Picking Points Based on Binocular Vision. Precis. Agric. 2024, 25, 2321–2355. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Hirschmuller, H. Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 328–341. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Van Gool, L. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-End Learning of Geometry and Context for Deep Stereo Regression. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 66–75. [Google Scholar] [CrossRef]
- Chang, J.-R.; Chen, Y.-S. Pyramid Stereo Matching Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 5410–5418. [Google Scholar] [CrossRef]
- Li, Z.; Liu, X.; Drenkow, N.; Ding, A.; Creighton, F.X.; Taylor, R.H.; Unberath, M. Revisiting Stereo Depth Estimation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 6177–6186. [Google Scholar] [CrossRef]
- Lipson, L.; Teed, Z.; Deng, J. RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. In Proceedings of the IEEE International Conference on 3D Vision (3DV), London, UK, 1–3 December 2021; pp. 218–227. [Google Scholar] [CrossRef]
- Li, J.; Wang, P.; Xiong, P.; Cai, T.; Yan, Z.; Yang, L.; Liu, S.; Fan, H.; Liu, S. CREStereo: Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 5485–5494. [Google Scholar] [CrossRef]
- Tang, Y.; Chen, M.; Wang, C.; Luo, L.; Li, J.; Lian, G.; Zou, X. Recognition and Localization Methods for Vision-Based Fruit Picking Robots: A Review. Front. Plant Sci. 2020, 11, 510. [Google Scholar] [CrossRef]
- Li, J.; Zhang, H.; Xu, Y.; Chen, Z.; Wang, Q. Efficient and Non-Invasive Grading of Chinese Mitten Crab Based on Fatness Estimated by Machine Vision and Deep Learning. Foods 2025, 14, 1989. [Google Scholar] [CrossRef]
- Li, S.; Zhao, L.; Zhou, X.; Wang, J.; Xu, C. Dynamic Object Detection and Non-Contact Localization in Lightweight Cattle Farms Based on Binocular Vision and Improved YOLOv8s. Agriculture 2025, 15, 1766. [Google Scholar] [CrossRef]
- Wang, J.; Gao, Z.; Zhang, Y.; Zhou, J.; Wu, J.; Li, P. Real-Time Detection and Location of Potted Flowers Based on a ZED Camera and a YOLO V4-Tiny Deep Learning Algorithm. Horticulturae 2022, 8, 21. [Google Scholar] [CrossRef]
- Wu, C.; Shen, J.; Yang, W.; Ren, W.; Liu, Y.; Wei, Z.; Ma, L.; Cao, X.; Zhao, Y.; Kwong, S. Synthetic Weathered Image Generation for Robust Vision Models. IEEE Trans. Image Process. 2021, 30, 5359–5373. [Google Scholar]
- Goyal, B.; Dogra, A.; Agrawal, S.; Sohi, B.S.; Sharma, A. Image Denoising Review: From Classical to State-of-the-Art Approaches. Inf. Fusion 2020, 55, 220–244. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zhang, T.; Li, L.; Zhou, Y.; Liu, W.; Qian, C.; Ji, X. CAS-ViT: Convolutional Additive Self-Attention Vision Transformers for Efficient Mobile Applications. arXiv 2024, arXiv:2408.03703. Available online: https://arxiv.org/abs/2408.03703 (accessed on 27 October 2024).
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Zhang, H.; Lee, S. Robot Bionic Vision Technologies: A Review. Appl. Sci. 2022, 12, 7970. [Google Scholar] [CrossRef]
- Zhu, Y.; Zhang, D.; Zhou, Y.; Jin, W.; Wu, G.; Li, Y. A Binocular Stereo-Imaging-Perception System with a Wide Field-of-View and Infrared-and-Visible Light Dual-Band Fusion. Sensors 2024, 24, 676. [Google Scholar] [CrossRef]
- Ding, J.; Yan, Z.; We, X. High-Accuracy Recognition and Localization of Moving Targets in an Indoor Environment Using Binocular Stereo Vision. ISPRS Int. J. Geo-Inf. 2021, 10, 234. [Google Scholar] [CrossRef]
- Lin, X.; Wang, J.; Lin, C. Research on 3D Reconstruction in Binocular Stereo Vision Based on Feature Point Matching Method. In Proceedings of the IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, 25–27 September 2020; pp. 551–556. [Google Scholar]
- Zhang, Z. A Flexible New Technique for Camera Calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
- Xu, Z.; Luo, T.; Lai, Y.; Liu, Y.; Kang, W. EdgeFormer-YOLO: A Lightweight Multi-Attention Framework for Real-Time Red-Fruit Detection in Complex Orchard Environments. Mathematics 2025, 13, 3790. [Google Scholar] [CrossRef]
- Zhang, W.; Liu, Y.; Chen, K.; Li, H.; Duan, Y.; Wu, W.; Shi, Y.; Guo, W. Lightweight Fruit-Detection Algorithm for Edge Computing Applications. Front. Plant Sci. 2021, 12, 740936. [Google Scholar] [CrossRef]
- Sun, H.; Wang, B.; Xue, J. YOLO-P: An Efficient Method for Pear Fast Detection in Complex Orchard Picking Environment. Front. Plant Sci. 2023, 13, 1089454. [Google Scholar] [CrossRef] [PubMed]
- Xu, H.F.; Zhang, J.; Cai, J.F.; Rezatofighi, H.; Yu, F.; Tao, D.C.; Geiger, A. UniMatch: A Unified Transformer for 3D Vision Tasks via 2D Dense Correspondence Learning. arXiv 2023, arXiv:2309.11754. [Google Scholar]
- Luo, L.; Tang, Y.; Zou, X.; Ye, M.; Feng, W.; Li, G. Vision-based Extraction of Spatial Information in Grape Clusters for Harvesting Robots. Biosyst. Eng. 2016, 151, 90–104. [Google Scholar] [CrossRef]
- Huang, W.; Miao, Z.; Wu, T.; Guo, Z.; Han, W.; Li, T. Design of and Experiment with a Dual-Arm Apple Harvesting Robot System. Horticulturae 2024, 10, 1268. [Google Scholar] [CrossRef]
- Yu, Y.; Zhang, K.; Liu, H.; Yang, L.; Zhang, D. Real-Time Visual Localization of the Picking Points for a Ridge-Planting Strawberry Harvesting Robot. IEEE Access 2020, 8, 116556–116568. [Google Scholar] [CrossRef]














| Parameter Name | Parameter Value |
|---|---|
| Translation Matrix | |
| Rotation Matrix | |
| Left Camera Intrinsic Matrix | |
| Left Camera Radial Distortion | |
| Left Camera Tangential Distortion | |
| Right Camera Intrinsic Matrix | |
| Right Camera Radial Distortion | |
| Right Camera Tangential Distortion | |
| Fundamental Matrix | |
| Essential Matrix |
| Parameter Name | Parameter Value |
|---|---|
| Model Input Size | 640 × 640 pixels |
| Training Batch Size | 16 |
| Initial Learning Rate | 0.01 |
| Momentum Setting | 0.937 |
| Weight Decay Coefficient | 0.0005 |
| Optimizer | SGD |
| Training Epochs | 180 |
| Basic Model | AdditiveBlock | BiFPN | SDDH | Precision | Recall | mAP50 | Model Size (MB) | Parameters (M) |
|---|---|---|---|---|---|---|---|---|
| YOLO11n | ✕ | ✕ | ✕ | 0.965 | 0.898 | 0.950 | 5.5 | 2.582 |
| ✓ | ✕ | ✕ | 0.978 | 0.877 | 0.966 | 5.8 | 2.659 | |
| ✓ | ✓ | ✕ | 0.882 | 0.921 | 0.972 | 4.6 | 2.028 | |
| ✓ | ✓ | ✓ | 0.958 | 0.915 | 0.966 | 4.3 | 1.856 |
| Model | Precision | Recall | mAP50 | Model Size (MB) | Parameters (M) | FPS |
|---|---|---|---|---|---|---|
| Faster-R-CNN | 0.846 | 0.836 | 0.914 | 90.8 | 40.651 | 25 |
| YOLOv7n | 0.947 | 0.919 | 0.92 | 74.8 | 37.212 | 61 |
| YOLOv8n | 0.941 | 0.883 | 0.957 | 6.2 | 3.128 | 124 |
| YOLOv9n | 0.8412 | 0.882 | 0.921 | 390 | 51.145 | 21 |
| YOLOv10n | 0.928 | 0.88 | 0.951 | 5.8 | 2.713 | 115 |
| YOLO11n | 0.965 | 0.898 | 0.950 | 5.5 | 2.622 | 133 |
| (OURS) | 0.958 | 0.915 | 0.966 | 4.3 | 1.856 | 155 |
| Serial Number | Actual Distance (mm) | Test Distance (mm) | Absolute Error (mm) | Relative Error (%) |
|---|---|---|---|---|
| 1 | 400 | 409 | 9 | 2.2% |
| 2 | 452 | 463 | 11 | 2.4% |
| 3 | 498 | 509 | 11 | 2.2% |
| 4 | 545 | 558 | 13 | 2.3% |
| 5 | 598 | 612 | 14 | 2.3% |
| 6 | 654 | 667 | 13 | 2.0% |
| 7 | 694 | 677 | 17 | 2.4% |
| 8 | 832 | 810 | 22 | 2.6% |
| 9 | 917 | 943 | 26 | 2.8% |
| 10 | 1024 | 999 | 25 | 2.4% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Shi, J.; Li, C.; Zhao, Z.; Zhang, S. Research on a Method for Identifying and Localizing Goji Berries Based on Binocular Stereo Vision Technology. AgriEngineering 2026, 8, 6. https://doi.org/10.3390/agriengineering8010006
Shi J, Li C, Zhao Z, Zhang S. Research on a Method for Identifying and Localizing Goji Berries Based on Binocular Stereo Vision Technology. AgriEngineering. 2026; 8(1):6. https://doi.org/10.3390/agriengineering8010006
Chicago/Turabian StyleShi, Juntao, Changyong Li, Zehui Zhao, and Shunchun Zhang. 2026. "Research on a Method for Identifying and Localizing Goji Berries Based on Binocular Stereo Vision Technology" AgriEngineering 8, no. 1: 6. https://doi.org/10.3390/agriengineering8010006
APA StyleShi, J., Li, C., Zhao, Z., & Zhang, S. (2026). Research on a Method for Identifying and Localizing Goji Berries Based on Binocular Stereo Vision Technology. AgriEngineering, 8(1), 6. https://doi.org/10.3390/agriengineering8010006

