YOLOv11-IMP: Anchor-Free Multiscale Detection Model for Accurate Grape Yield Estimation in Precision Viticulture
Abstract
1. Introduction
2. Materials and Methods
2.1. Improved YOLOv11 Model
2.1.1. Input Processing
2.1.2. Enhanced Backbone Architecture
2.1.3. Neck and Head Architecture
2.2. Model Training and Validation
2.2.1. Yield Estimation Framework
- (1)
- This module analyzes the geometric properties of grape clusters via ellipsoidal 3D volumetric approximation based on two-dimensional (2D) projections. Perspective-aware scaling functions compensate for distance-related distortions, and multiview consistency verification is implemented when sequential images are available. The grape-packing density models are specific to 12 major wine grape varieties. The compensation algorithms for partial occlusion scenarios demonstrate up to 67% obstruction.
- (2)
- A statistical correction mechanism addresses occlusion and perspective limitations. Bayesian inference models incorporate the prior knowledge of typical cluster properties, and an ensemble analysis of multiple viewpoints is conducted under geometric consistency constraints. Count predictions from detection thresholds are aggregated using confidence weighting. Variety-specific correction factors are derived from extensive field validation (n = 2371 samples), and adaptive regression models account for phenological stage variations during the growing season.
- (3)
- A novel nonlinear mapping function transforms the combined size and count metrics into weight estimates via a multilayer perceptron with variety-specific parameter sets. The function incorporates environmental factors, including growing degree days, soil moisture, and canopy management practices. Integrating temporal models accounts for grape growth patterns throughout the ripening period. This function applies density correction factors based on refractometer-measured sugar content. Transfer learning from historical yield data improves the prediction accuracy across the seasons.
- (4)
- Kimi-VL is a highly efficient open-source mixture-of-experts vision language model with advanced cross-modal reasoning, enabling the inference and prediction of data from one modality via learning associations across diverse modalities. This paper proposes a simple image–text mapping strategy that inputs RGB images into the YOLO detection model to produce bounding boxes for target objects. A text mapping library is constructed using the bounding boxes, comprising external and internal lexicons. The external lexicon is derived from historical data, including the average grape cluster weight by growth stage and lighting condition. The internal lexicon is computed by applying relevant algorithms to individual bounding boxes, encompassing various features (e.g., the 3D volume of ellipsoids from 2D projections, corresponding cultivars, compactness indices, and growth stages). A structured text prompt is dynamically constructed for each ROI using the external lexicon, historical averages, lighting conditions, internal lexicon, geometric volume, and variety. The cropped ROI image and its corresponding text prompt are paired and transmitted to the Kimi-VL model via the API. Kimi-VL then performs vision language reasoning to output the estimated weight attribute for each specific cluster. This design decouples object localization from attribute estimation, ensuring that the heavy computational load of the LLM does not impede the real-time performance of the initial detection phase. Figure 5 presents an example of the weight estimation of a single bunch of grapes.
- (5)
- The final component aggregates individual cluster predictions via spatial calibration using vineyard block reference measurements. Hierarchical aggregation is conducted from the cluster to vine to row to block level. Confidence-weighted summation is performed with uncertainty propagation, and environmental correction factors account for local microclimatic variations. Finally, vineyard management system data are integrated to enable comprehensive yield forecasting.
2.2.2. Grape Yield Estimation
2.2.3. Ground Truth Comparison
3. Results
3.1. Grape Cluster Recognition Performance
3.2. Grape Yield Estimation Accuracy
3.3. Benchmarking Against SOTA Detection Methods
3.3.1. Performance Metric Evaluation
3.3.2. Environmental Adaptability Assessment
3.4. Ablation Studies and Architectural Insights
3.4.1. Component-Wise Contribution Analysis
3.4.2. Computational Efficiency Analysis
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Moreira, G.; dos Santos, F.N.; Cunha, M. Grapevine inflorescence segmentation and flower estimation based on Computer Vision Techniques for early yield assessment. Smart Agric. Technol. 2024, 10, 100690. [Google Scholar] [CrossRef]
- Parr, B.; Legg, M.; Alam, F. Grape yield estimation with a smartphone’s colour and depth cameras using machine learning and computer vision techniques. Comput. Electron. Agric. 2023, 213, 108174. [Google Scholar] [CrossRef]
- Olensky, A.G.; Sams, B.S.; Fei, Z.; Singh, V.; Raja, P.V.; Bornhorst, G.M.; Earles, J.M. End-to-end deep learning for directly estimating grape yield from ground-based imagery. Comput. Electron. Agric. 2022, 198, 107081. [Google Scholar] [CrossRef]
- Sapkota, R.; Ahmed, D.; Karkee, M. Comparing YOLOv8 and Mask R-CNN for instance segmentation in complex orchard environments. Artif. Intell. Agric. 2024, 13, 84–99. [Google Scholar] [CrossRef]
- Gentilhomme, T.; Villamizar, M.; Corre, J.; Odovez, J.M. Towards smart pruning: ViNet, a deep-learning approach for grapevine structure estimation. Comput. Electron. Agric. 2023, 207, 107736. [Google Scholar]
- Saranya, T.; Deisy, C.; Sridevi, S.; Anbananthen, K.S.M. A comparative study of deep learning and Internet of Things for precision agriculture. Eng. Appl. Artif. Intell. 2023, 122, 106034. [Google Scholar] [CrossRef]
- Šupčík, A.; Milics, G.; Matečný, I. Predicting grape yield with vine canopy morphology analysis from 3D point clouds generated by UAV imagery. Drones 2024, 8, 216. [Google Scholar] [CrossRef]
- Leolini, L.; Bregaglio, S.; Ginaldi, F.; Constafreda-Aumedes, S.; Di Gennaro, S.F.; Matese, A.; Maselli, F.; Caruso, G.; Palai, G.; Bajocco, S.; et al. Use of remote sensing-derived fPAR data in a grapevine simulation model for estimating vine biomass accumulation and yield variability at sub-field level. Precis. Agric. 2023, 24, 705–726. [Google Scholar] [CrossRef]
- Yang, W.; Qiu, X. A lightweight and efficient model for grape bunch detection and biophysical anomaly assessment in complex environments based on YOLOv8s. Front. Plant Sci. 2025, 15, 1395796. [Google Scholar] [CrossRef] [PubMed]
- Dillner, R.P.; Wimmer, M.A.; Porten, M.; Udelhoven, T.; Retzlaff, R. Combining a Standardized Growth Class Assessment, UAV Sensor Data, GIS Processing, and Machine Learning Classification to Derive a Correlation with the Vigour and Canopy Volume of Grapevines. Sensors 2025, 25, 431. [Google Scholar] [CrossRef]
- Falih, B.S.; Ali, Y.H.; Alabbas, A.R.; Arica, S. Optimising yield estimation for grapes: Utilising the sliding window technique for visual counting of bunches and berries. Pak. J. Agric. Sci. 2024, 61, 443–452. [Google Scholar]
- Devanna, R.P.; Romeo, L.; Reina, G.; Milella, A. Yield estimation in precision viticulture by combining deep segmentation and depth-based clustering. Comput. Electron. Agric. 2025, 232, 110025. [Google Scholar] [CrossRef]
- Arab, S.T.; Noguchi, R.; Matsushita, S.; Ahamed, T. Prediction of grape yields from time-series vegetation indices using satellite remote sensing and a machine-learning approach. Remote Sens. Appl. Soc. Environ. 2021, 22, 100485. [Google Scholar] [CrossRef]
- Santos, T.T.; De Souza, L.L.; dos Santos, A.A.; Avila, S. Grape detection, segmentation, and tracking using deep neural networks and three-dimensional association. Comput. Electron. Agric. 2020, 170, 105247. [Google Scholar] [CrossRef]
- Yang, C.; Geng, T.; Peng, J.; Song, Z. Probability map-based grape detection and counting. Comput. Electron. Agric. 2024, 224, 109175. [Google Scholar] [CrossRef]
- Zhang, C.; Ding, H.; Shi, Q.; Wang, Y. Grape cluster real-time detection in complex natural scenes based on YOLOv5s deep learning network. Agriculture 2022, 12, 1242. [Google Scholar] [CrossRef]
- Oliveira, F.; da Silva, D.Q.; Filipe, V.; Pinho, T.M.; Cunha, M.; Cunha, J.B.; Dos Santos, F.N. Enhancing Grapevine Node Detection to Support Pruning Automation: Leveraging State-of-the-Art YOLO Detection Models for 2D Image Analysis. Sensors 2024, 24, 6774. [Google Scholar] [CrossRef]
- Torres-Sánchez, J.; Mesas-Carrascosa, F.J.; Santesteban, L.G.; Jiménez-Brenes, F.M.; Oneka, O.; Villa-Llop, A.; Loidi, M.; López-Granados, F. Grape cluster detection using UAV photogrammetric point clouds as a low-cost tool for yield forecasting in vineyards. Sensors 2021, 21, 3083. [Google Scholar] [CrossRef]
- García-Fernández, M.; Sanz-Ablanedo, E.; Pereira-Obaya, D.; Rodríguez-Pérez, J.R. Vineyard pruning weight prediction using 3D point clouds generated from UAV imagery and structure from motion photogrammetry. Agronomy 2021, 11, 2489. [Google Scholar] [CrossRef]
- Vrochidou, E.; Bazinas, C.; Manios, M.; Papakostas, G.A.; Pachidis, T.P.; Kaburlasos, V.G. Machine vision for ripeness estimation in viticulture automation. Horticulturae 2021, 7, 282. [Google Scholar] [CrossRef]
- Badeka, E.; Karapatzak, E.; Karampatea, A.; Bouloumpasi, E.; Kalathas, I.; Lytridis, C.; Tziolas, E.; Taskalidou, V.N.; Kaburlasos, V.G. A deep learning approach for precision viticulture, assessing grape maturity via YOLOv7. Sensors 2023, 23, 8126. [Google Scholar] [CrossRef]
- Palacios, F.; Melo-Pinto, P.; Diago, M.P.; Tardaguila, J. Deep learning and computer vision for assessing the number of actual berries in commercial vineyards. Biosyst. Eng. 2022, 218, 175–188. [Google Scholar] [CrossRef]
- Íñiguez, R.; Gutiérrez, S.; Poblete-Echeverría, C.; Hernández, I.; Barrio, I.; Tardáguila, J. Deep learning modelling for non-invasive grape bunch detection under diverse occlusion conditions. Comput. Electron. Agric. 2024, 226, 109421. [Google Scholar] [CrossRef]
- Michael, K.; Andreou, C.; Markou, A.; Christoforou, M.; Nikoloudakis, N. A Novel Sorbitol-Based Flow Cytometry Buffer Is Effective for Genome Size Estimation across a Cypriot Grapevine Collection. Plants 2024, 13, 733. [Google Scholar] [CrossRef]
- Oliveira, H.M.; Tugnolo, A.; Fontes, N.; Marques, C.; Geraldes, Á.; Jenne, S.; Zappe, H.; Graça, A.; Giovenzana, V.; Beghi, R.; et al. An autonomous Internet of Things spectral sensing system for in-situ optical monitoring of grape ripening: Design, characterization, and operation. Comput. Electron. Agric. 2024, 217, 108599. [Google Scholar] [CrossRef]
- Quiñones, R.; Banu, S.M.; Gultepe, E. GCNet: A Deep Learning Framework for Enhanced Grape Cluster Segmentation and Yield Estimation Incorporating Occluded Grape Detection with a Correction Factor for Indoor Experimentation. J. Imaging 2025, 11, 34. [Google Scholar] [CrossRef]
- Codes-Alcaraz, A.M.; Furnitto, N.; Sottosanti, G.; Failla, S.; Puerto, H.; Rocamora-Osorio, C.; Freire-Garcia, P.; Rimírez-Cuesta, J.M. Automatic grape cluster detection combining YOLO model and remote sensing imagery. Remote Sens. 2025, 17, 243. [Google Scholar] [CrossRef]
- González, M.R.; Martínez-Rosas, M.E.; Brizuela, C.A. Comparison of CNN architectures for single grape detection. Comput. Electron. Agric. 2025, 231, 109930. [Google Scholar] [CrossRef]
- Wu, Z.; Xia, F.; Zhou, S.; Xu, D. A method for identifying grape stems using keypoints. Comput. Electron. Agric. 2023, 209, 107825. [Google Scholar] [CrossRef]
- Blekos, A.; Chatzis, K.; Kotaidou, M.; Chatzis, T.; Solachidis, V.; Konstantinidis, D.; Dimitropoulos, K. A grape dataset for instance segmentation and maturity estimation. Agronomy 2023, 13, 1995. [Google Scholar] [CrossRef]
- Sneha, N.; Sundaram, M.; Ranjan, R. Acre-scale grape bunch detection and predict grape harvest using YOLO deep learning Network. SN Comput. Sci. 2024, 5, 250. [Google Scholar] [CrossRef]
- Ilyas, Q.M.; Ahmad, M.; Mehmood, A. Automated estimation of crop yield using artificial intelligence and remote sensing technologies. Bioengineering 2023, 10, 125. [Google Scholar] [CrossRef]
- Luo, L.; Yin, W.; Ning, Z.; Wang, J.; Wei, H.; Lu, Q. In-field pose estimation of grape clusters with combined point cloud segmentation and geometric analysis. Comput. Electron. Agric. 2022, 200, 107197. [Google Scholar] [CrossRef]
- Kodors, S.; Zarembo, I.; Lācis, G.; Litavniece, L.; Apeināns, I.; Sondors, M.; Pacejs, A. Autonomous Yield Estimation System for Small Commercial Orchards Using UAV and AI. Drones 2024, 8, 734. [Google Scholar] [CrossRef]
- Palacios, F.; Diago, M.P.; Melo-Pinto, P.; Tardaguila, J. Early yield prediction in different grapevine varieties using computer vision and machine learning. Precis. Agric. 2023, 24, 407–435. [Google Scholar] [CrossRef]
- Schieck, M.; Krajsic, P.; Loos, F.; Hussein, A.; Franczyk, B.; Kozierkiewicz, A.; Pietranik, M. Comparison of deep learning methods for grapevine growth stage recognition. Comput. Electron. Agric. 2023, 211, 107944. [Google Scholar] [CrossRef]
- Zheng, S.; Gao, P.; Zhang, J.; Zhang, J.; Ma, Z.; Chen, S. A precise grape yield prediction method based on a modified DCNN model. Comput. Electron. Agric. 2024, 225, 109338. [Google Scholar] [CrossRef]
- Yang, C.; Geng, T.; Peng, J.; Xu, C.; Song, Z. Mask-GK: An efficient method based on mask Gaussian kernel for segmentation and counting of grape berries in field. Comput. Electron. Agric. 2025, 234, 110286. [Google Scholar] [CrossRef]
- Sarkar, S.; Dey, A.; Pradhan, R.; Sarkar, U.M.; Chatterjee, C.; Mondal, A.; Mitra, P. Crop Yield Prediction Using Multimodal Meta-Transformer and Temporal Graph Neural Networks. IEEE Trans. AgriFood Electron. 2024, 2, 545–553. [Google Scholar] [CrossRef]
- Sozzi, M.; Cantalamessa, S.; Cogato, A.; Kayad, A.; Marinello, F. Grape yield spatial variability assessment using YOLOv4 object detection algorithm. Comput. Electron. Agric. 2021, 188, 106345. [Google Scholar]










| Grape Variety | Sample Size | Manual Measurement | YOLOv11-IMP Estimate | RMSE | MAE | Relative Error | Pearson’s r |
|---|---|---|---|---|---|---|---|
| (kg/vine) | (kg/vine) | (kg/vine) | (kg/vine) | (%) | |||
| Cabernet Sauvignon | 157 | 8.72 | 8.47 | 0.52 | 0.41 | 5.8 | 0.941 |
| Pinot Noir | 142 | 7.63 | 7.42 | 0.48 | 0.37 | 5.3 | 0.957 |
| Chardonnay | 168 | 9.34 | 8.97 | 0.61 | 0.49 | 6.7 | 0.923 |
| Merlot | 145 | 8.21 | 7.95 | 0.55 | 0.43 | 6.1 | 0.932 |
| Sauvignon Blanc | 151 | 8.89 | 8.57 | 0.58 | 0.45 | 6.4 | 0.928 |
| Average | 152.6 | 8.56 | 8.28 | 0.55 | 0.43 | 6.06 | 0.936 |
| Method | MAE (kg/vine) | RMSE (kg/vine) | Accuracy (%) | Processing Time (ms) | GPU Memory (GB) |
|---|---|---|---|---|---|
| Faster R-CNN + ResNet101 | 0.72 | 0.93 | 83.2 | 44.3 | 5.8 |
| RetinaNet + ResNeXt101 | 0.68 | 0.88 | 85.6 | 38.2 | 5.2 |
| EfficientDet-D2 | 0.66 | 0.85 | 86.7 | 36.5 | 4.9 |
| YOLOv11 (baseline) | 0.61 | 0.79 | 87.3 | 32.4 | 4.5 |
| YOLOv11-IMP (proposed) | 0.46 | 0.62 | 91.2 | 28.9 | 3.8 |
| Environmental Factor | Condition | MAE (kg/vine) | Accuracy (%) |
|---|---|---|---|
| Illumination | Direct sunlight | 0.48 | 90.3 |
| Partial shade | 0.47 | 90.8 | |
| Overcast | 0.46 | 91.5 | |
| Dawn/dusk | 0.52 | 88.7 | |
| Canopy Density | Sparse | 0.44 | 92.1 |
| Medium | 0.45 | 91.2 | |
| Dense | 0.5 | 89.5 | |
| Growth Stage | Pre-veraison | 0.48 | 90.1 |
| Veraison | 0.47 | 91 | |
| Post-veraison | 0.44 | 91.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zheng, S.; Yang, X.; Gao, P.; Guo, Q.; Zhang, J.; Chen, S.; Tang, Y. YOLOv11-IMP: Anchor-Free Multiscale Detection Model for Accurate Grape Yield Estimation in Precision Viticulture. Agronomy 2026, 16, 370. https://doi.org/10.3390/agronomy16030370
Zheng S, Yang X, Gao P, Guo Q, Zhang J, Chen S, Tang Y. YOLOv11-IMP: Anchor-Free Multiscale Detection Model for Accurate Grape Yield Estimation in Precision Viticulture. Agronomy. 2026; 16(3):370. https://doi.org/10.3390/agronomy16030370
Chicago/Turabian StyleZheng, Shaoxiong, Xiaopei Yang, Peng Gao, Qingwen Guo, Jiahong Zhang, Shihong Chen, and Yunchao Tang. 2026. "YOLOv11-IMP: Anchor-Free Multiscale Detection Model for Accurate Grape Yield Estimation in Precision Viticulture" Agronomy 16, no. 3: 370. https://doi.org/10.3390/agronomy16030370
APA StyleZheng, S., Yang, X., Gao, P., Guo, Q., Zhang, J., Chen, S., & Tang, Y. (2026). YOLOv11-IMP: Anchor-Free Multiscale Detection Model for Accurate Grape Yield Estimation in Precision Viticulture. Agronomy, 16(3), 370. https://doi.org/10.3390/agronomy16030370

