Convolutional Neural Networks for Detecting White Grape Bunches in High-Density Vineyards
Abstract
1. Introduction
2. Materials and Methods
2.1. Description of the Vineyard
2.2. Image Acquisition
2.3. Image Pre-Processing
- Global RGB channel histogram equalization: Global equalization was applied independently to the three-color channels (red, green, and blue). This approach improves contrast across the color spectrum, making grape bunches more distinguishable in complex environments where background colors may be similar.
- Global greyscale histogram equalization: This technique was applied to the greyscale version of the images to improve overall contrast, particularly in images captured under unfavorable lighting conditions. Global equalization helps distribute intensity levels more evenly, facilitating better discrimination between grape bunches and background elements.
- CLAHE (contrast-limited adaptive histogram equalization) applied to the green channel: CLAHE was applied specifically to the green channel, which contains a significant proportion of relevant information in vegetation images. This method enhances local contrast while limiting excessive contrast amplification in homogeneous regions.
2.4. Methodology Using CNN
2.5. Metrics
- IoU (Intersection over Union): measures the overlap between the predicted bounding box or mask and the ground truth and is defined in Equation (1).
- Accuracy (ACC): the ground-truth values over all data.
- Precision: of all predictions, indicates the correct fraction, using an IoU threshold of 0.5
- Recall: of all real objects, indicates the fraction correctly detected. In the following formula, false negatives (FN) are real objects that were not predicted or whose prediction fell below the threshold of 0.5
- F1-score: is the harmonic mean between precision and recall
- AP (Average Precision): obtained by integrating the precision-recall (P-R) curve for a class. It is a value between 0 and 1 or between 0% and 100%. The P-R curve is obtained by changing the threshold of IoU values. When the IoU threshold is very high, the number of FN increases dramatically, and FP is reduced, which increases recall and reduces precision. Conversely, if the threshold of IoU is low, FN is reduced and FP increases, lowering recall and increasing precision (Figure 5). The point highlighted in blue in Figure 5 is an example of the computation of the confusion matrix.
- mAP (mean Average Precision): average of the AP across all classes
- mAP@0.5: average of the APs with an IoU threshold of 0.5. If the IoU ≥ 0.5, the detection is considered correct (TP). If the IoU < 0.5, it is considered incorrect (FP).
- mAP@0.5:0.95: The AP is calculated several times, with different IoU thresholds: from 0.5 to 0.95, in steps of 0.05: 0.50, 0.55, 0.60, …, 0.95. Finally, the AP obtained at each of these thresholds is averaged. This is a stricter and more complete metric, whereas mAP@0.5 is more permissive as it only requires a 50% overlap.
- Fitness: is a weighted average of the metrics: precision, recall, mAP@0.5 and mAP@0.5:0.95. By default, in YOLOv8, the weighting [0, 0, 0.1, 0.9] is taken respectively for the four.
3. Results
4. Discussion
5. Conclusions
- The proposed YOLOv8-based approach demonstrates strong robustness for grape bunch detection in real vineyard conditions.The model successfully detects white grape bunches using natural daylight and passive illumination, without any background manipulation, confirming its suitability for in-field deployment where environmental conditions are inherently variable.
- Leaf occlusion remains a major challenge in grape bunch detection, but its impact is effectively mitigated by the proposed method.Although increased occlusion negatively influences detection performance, the achieved precision and recall indicate that YOLOv8 can handle complex canopy structures more effectively than earlier approaches, particularly in highly occluded scenarios.
- High detection precision is achieved despite challenging visual conditions.The method reaches strong precision while maintaining comparable recall, demonstrating its capability to accurately localize grape bunches, even when dealing with white grape varieties, dense foliage, and the absence of controlled backgrounds.
- The lower overall accuracy reflects the increased complexity of real-world vineyard imaging rather than model inadequacy.The reduced performance metrics, compared with studies carried out under controlled conditions, highlight the trade-off between accuracy and real-field applicability and emphasize the importance of evaluating models under realistic operational constraints.
- Advanced architectural modifications and longer training regimes can improve performance but may reduce deployment practicality.While more complex network designs and extended training can yield higher accuracy, the competitive results obtained with a standard YOLOv8 configuration demonstrate a favorable balance between detection performance, computational efficiency, and ease of implementation.
- Deep-learning-based methods show strong potential for precision viticulture applications.The findings confirm that, even under uncontrolled and highly variable field conditions, deep learning approaches can provide reliable grape bunch detection, supporting their use in yield estimation and decision-support systems in commercial vineyard management.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| CLAHE | Contrast Limited Adaptive Histogram |
| CNN | Convolutional Neural Network |
| CSP | Cross Stage Partial |
| CSPD | CSPDarknet-53 is a convolutional neural network (CNN) backbone that integrates Cross Stage Partial (CSP) connections into the traditional Darknet-53 architecture |
| R-CNN | Region-Based Convolutional Region |
| RoI | Region of Interest |
| SegNet | A Deep Convolutional Encoder–Decoder Architecture for Image Segmentation |
| SPPF | Spatial Pyramid Pooling Fast |
| SSD | Single Shot Multi-Box Detector |
| SVM | Support Vector Machine |
| VGGNet | A convolutional neural network developed by the Computer Vision Group of Oxford University and Google DeepMind Laboratory |
| YOLO | You Only Look Once |
References
- Tello, J.; Ibáñez, J. Evaluation of indexes for the quantitative and objective estimation of grapevine bunch compactness. Vitis-Geilweilerhof 2014, 53, 9–16. [Google Scholar]
- Tello, J.; Ibáñez, J. What do we know about grapevine bunch compactness? A state-of-the-art review. Aust. J. Grape Wine Res. 2018, 24, 6–23. [Google Scholar] [CrossRef]
- Herrero-Langreo, A.; Barreiro, P.; Diago, M.P.; Baluja, J.; Ochagavia, H.; Tardaguila, J. Pixel classification through Mahalanobis distance for identification of grapevine canopy elements on RGB images. In Proceedings of the International Association for Spectral Imaging (IASIM-10), Dublin, Ireland, 18–19 November 2010. [Google Scholar]
- Correa, C.; Valero, C.; Barreiro, P.; Diago, M.P.; Tardáguila, J. A comparison of Fuzzy Clustering Algorithms Applied to Feature Extraction on Vineyard. Inteligencia Artificial: Revista Iberoamericana de Inteligencia Artificial. 2011. Available online: https://www.academia.edu/17065134/A_Comparison_of_Fuzzy_Clustering_Algorithms_Applied_to_Feature_Extraction_on_Vineyard (accessed on 23 March 2026).
- Correa, C.; Valero, C.; Barreiro, P.; Diago, M.P.; Tardáguila, J. Feature extraction on vineyard by Gustafson Kessel FCM and K-means. In Proceedings of the Mediterranean Electrotechnical Conference (MELECON), Yasmine Hammamet, Tunisia, 25–28 March 2012. [Google Scholar]
- Diago, M.-P.; Correa, C.; Millán, B.; Barreiro, P.; Valero, C.; Tardaguila, J. Grapevine yield and leaf area estimation using supervised classification methodology on RGB images taken under field conditions. Sensors 2012, 12, 16988–17006. [Google Scholar] [CrossRef] [PubMed]
- Íñiguez, R.; Palacios, F.; Barrio, I.; Hernández, I.; Gutiérrez, S.; Tardaguila, J. Impact of leaf occlusions on yield assessment by computer vision in commercial vineyards. Agronomy 2021, 11, 1003. [Google Scholar] [CrossRef]
- Íñiguez, R.; Gutiérrez, S.; Poblete-Echeverría, C.; Hernández, I.; Barrio, I.; Tardáguila, J. Deep learning modelling for non-invasive grape bunch detection under diverse occlusion conditions. Comput. Electron. Agric. 2024, 226, 109421. [Google Scholar] [CrossRef]
- Tardáguila Laso, M.J.; Millán Prior, B.; Diago Santamaría, B.P. Patente de Invención B1: Procedimiento para la Estimación Automática de la Porosidad del Viñedo Mediante Visión Artificial. 2016. Available online: https://consultas2.oepm.es/pdf/ES/0000/000/02/55/09/ES-2550903_B1.pdf (accessed on 23 March 2026).
- Smart, R.; Robinson, M. Sunlight into Wine: A Handbook for Winegrape Canopy Management; Winetitles: Broadview, Australia, 1991; pp. viii+–88. [Google Scholar]
- Tardáguila Laso, M.J.; Diago Santamaría, M.P.; Millán Prior, B.; Cubero García, S.; Aleixos Borrás, M.N.; Prats Montalbán, J.M. Patente de Invención con examen previo B2: Procedimiento Automático para Determinar la Compacidad de un racimo de uva en Modo Continuo, sobre una Cinta Transportadora sita en Bodega. 2015. Available online: https://patents.google.com/patent/ES2523390B2/es?q=(compacidad+de+racimo)&inventor=tard%C3%A1guila&language=SPANISH (accessed on 23 March 2026).
- Cubero, S.; Diago, M.; Blasco, J.; Tardaguila, J.; Prats-Montalbán, J.; Ibáñez, J.; Tello, J.; Aleixos, N. A new method for assessment of bunch compactness using automated image analysis. Aust. J. Grape Wine Res. 2015, 21, 101–109. [Google Scholar] [CrossRef]
- Su, S.; Chen, R.; Fang, X.; Zhu, Y.; Zhang, T.; Xu, Z. A novel lightweight grape detection method. Agriculture 2022, 12, 1364. [Google Scholar] [CrossRef]
- Mohimont, L.; Alin, F.; Rondeau, M.; Gaveau, N.; Steffenel, L.A. Computer Vision and Deep Learning for Precision Viticulture. Agronomy 2022, 12, 2463. [Google Scholar] [CrossRef]
- Casado-García, A.; Heras, J.; Milella, A.; Marani, R. Semi-supervised deep learning and low-cost cameras for the semantic segmentation of natural images in viticulture. Precis. Agric. 2022, 23, 2001–2026. [Google Scholar] [CrossRef]
- Palacios, F.; Diago, M.P.; Melo-Pinto, P.; Tardaguila, J. Early yield prediction in different grapevine varieties using computer vision and machine learning. Precis. Agric. 2023, 24, 407–435. [Google Scholar] [CrossRef]
- Huang, Y.; Qian, Y.; Wei, H.; Lu, Y.; Ling, B.; Qin, Y. A survey of deep learning-based object detection methods in crop counting. Comput. Electron. Agric. 2023, 215, 108425. [Google Scholar] [CrossRef]
- Aguiar, A.S.; Monteiro, N.N.; dos Santos, F.N.; Pires, E.J.S.; Silva, D.; Sousa, A.J.; Boaventura-Cunha, J. Bringing semantics to the vineyard: An approach on deep learning-based vine trunk detection. Agriculture 2021, 11, 131. [Google Scholar] [CrossRef]
- Pacioni, E.; Abengózar, E.; Macías, M.M.; García-Orellana, C.J.; Gallardo, R.; Velasco, H.M.G. Towards Intelligent Pruning of Vineyards by Direct Detection of Cutting Areas. Agriculture 2025, 15, 1154. [Google Scholar] [CrossRef]
- García-Navarrete, O.L.; Correa-Guimaraes, A.; Navas-Gracia, L.M. Application of convolutional neural networks in weed detection and identification: A systematic review. Agriculture 2024, 14, 568. [Google Scholar] [CrossRef]
- Albahar, M. A survey on deep learning and its impact on agriculture: Challenges and opportunities. Agriculture 2023, 13, 540. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25; Neural Information Processing Systems Foundation, Inc.: San Diego, CA, USA, 2012. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics. GitHub. Available online: https://github.com/ultralytics/ultralytics (accessed on 1 January 2023).
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Shen, L.; Su, J.; He, R.; Song, L.; Huang, R.; Fang, Y.; Song, Y.; Su, B. Real-time tracking and counting of grape bunches in the field based on channel pruning with YOLOv5s. Comput. Electron. Agric. 2023, 206, 107662. [Google Scholar] [CrossRef]






| Score Threshold | True Positives (TP) | False Positives (FP) | False Negatives (FN) | Precision TP/(TP + FP) | Recall TP/(TP + FN) |
|---|---|---|---|---|---|
| 0.35 | 67 | 653 | 101 | 0.093 | 0.399 |
| 0.40 | 57 | 540 | 113 | 0.095 | 0.335 |
| 0.45 | 50 | 515 | 116 | 0.088 | 0.301 |
| 0.50 | 42 | 424 | 119 | 0.090 | 0.261 |
| 0.55 | 65 | 430 | 129 | 0.130 | 0.336 |
| 0.60 | 38 | 285 | 120 | 0.117 | 0.241 |
| 0.65 | 40 | 197 | 111 | 0.170 | 0.265 |
| Score Threshold | Max Score in Predict Box | Total Predicted Boxes (PB) | Ground-Truth Boxes (Validation) | PB/GTB Validation (%) | Ground-Truth Boxes (Train) |
|---|---|---|---|---|---|
| 0.35 | 0.998 | 720 | 161 | 447% | 506 |
| 0.40 | 0.995 | 597 | 167 | 357% | 500 |
| 0.45 | 0.996 | 565 | 161 | 351% | 506 |
| 0.50 | 0.998 | 466 | 159 | 293% | 508 |
| 0.55 | 0.991 | 495 | 187 | 265% | 480 |
| 0.60 | 0.991 | 323 | 155 | 208% | 512 |
| 0.65 | 0.998 | 237 | 147 | 161% | 520 |
| Model | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 | Fitness |
|---|---|---|---|---|---|
| CLAHE | 0.746 | 0.654 | 0.719 | 0.331 | 0.370 |
| 0.799 | 0.694 | 0.776 | 0.405 | 0.442 | |
| RGB eq. | 0.819 | 0.684 | 0.765 | 0.374 | 0.413 |
| 0.850 | 0.747 | 0.830 | 0.451 | 0.489 | |
| GRAY eq. | 0.753 | 0.612 | 0.696 | 0.327 | 0.364 |
| 0.822 | 0.677 | 0.766 | 0.397 | 0.434 | |
| RGB Aug. | 0.774 | 0.717 | 0.776 | 0.379 | 0.419 |
| 0.849 | 0.726 | 0.839 | 0.465 | 0.503 |
| Model | TP | FP | FN | ACC |
|---|---|---|---|---|
| CLAHE | 651 | 201 | 222 | 61% |
| RGB eq. | 694 | 197 | 165 | 66% |
| GRAY eq. | 650 | 198 | 223 | 61% |
| RGB Aug. | 720 | 208 | 153 | 67% |
| Mask R-CNN | 320 | 8910 | 615 | 3% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Méndez Fuentes, V.; Lleó, L.; Barreiro Elorza, P.; Tamargo-Vinces, A.; Da Costa Neto, W.V.; Moya González, A.; Guillén, P.; Baeza, P. Convolutional Neural Networks for Detecting White Grape Bunches in High-Density Vineyards. Agriculture 2026, 16, 1061. https://doi.org/10.3390/agriculture16101061
Méndez Fuentes V, Lleó L, Barreiro Elorza P, Tamargo-Vinces A, Da Costa Neto WV, Moya González A, Guillén P, Baeza P. Convolutional Neural Networks for Detecting White Grape Bunches in High-Density Vineyards. Agriculture. 2026; 16(10):1061. https://doi.org/10.3390/agriculture16101061
Chicago/Turabian StyleMéndez Fuentes, Valeriano, Lourdes Lleó, Pilar Barreiro Elorza, Abraham Tamargo-Vinces, Wilson Valente Da Costa Neto, Adolfo Moya González, Pablo Guillén, and Pilar Baeza. 2026. "Convolutional Neural Networks for Detecting White Grape Bunches in High-Density Vineyards" Agriculture 16, no. 10: 1061. https://doi.org/10.3390/agriculture16101061
APA StyleMéndez Fuentes, V., Lleó, L., Barreiro Elorza, P., Tamargo-Vinces, A., Da Costa Neto, W. V., Moya González, A., Guillén, P., & Baeza, P. (2026). Convolutional Neural Networks for Detecting White Grape Bunches in High-Density Vineyards. Agriculture, 16(10), 1061. https://doi.org/10.3390/agriculture16101061

