Automated Detection and Counting of Gossypium barbadense Fruits in Peruvian Crops Using Convolutional Neural Networks

Ballena-Ruiz, Juan; Arcila-Diaz, Juan; Tuesta-Monteza, Victor

doi:10.3390/agriengineering7050152

Open AccessArticle

Automated Detection and Counting of Gossypium barbadense Fruits in Peruvian Crops Using Convolutional Neural Networks

by

Juan Ballena-Ruiz

,

Juan Arcila-Diaz

^*

and

Victor Tuesta-Monteza

School of Systems Engineering, Universidad Señor de Sipán, Chiclayo 14000, Peru

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(5), 152; https://doi.org/10.3390/agriengineering7050152

Submission received: 29 March 2025 / Revised: 24 April 2025 / Accepted: 8 May 2025 / Published: 12 May 2025

(This article belongs to the Section Computer Applications and Artificial Intelligence in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

This study presents the development of a system based on convolutional neural networks for the automated detection and counting of Gossypium barbadense fruits, specifically the IPA cotton variety, during its maturation stage, known as “mota”, in crops located in the Lambayeque region of northern Peru. To achieve this, a dataset was created using images captured with a mobile device. After applying data augmentation techniques, the dataset consisted of 2186 images with 70,348 labeled fruits. Five deep learning models were trained: two variants of YOLO version 8 (nano and extra-large), two of YOLO version 11, and one based on the Faster R-CNN architecture. The dataset was split into 70% for training, 15% for validation, and 15% for testing, and all models were trained over 100 epochs with a batch size of 8. The extra-large YOLO models achieved the highest performance, with precision scores of 99.81% and 99.78%, respectively, and strong recall and F1-score values. In contrast, the nano models and Faster R-CNN showed slightly lower effectiveness. Additionally, the best-performing model was integrated into a web application developed in Python, enabling automated fruit counting from field images. The YOLO architecture emerged as an efficient and robust alternative for the automated detection of cotton fruits and stood out for its capability to process images in real time with high precision. Furthermore, its implementation in crop monitoring facilitates production estimation and decision-making in precision agriculture.

Keywords:

object detection; automated counting; Gossypium barbadense; cotton; precision agriculture; YOLO

1. Introduction

Modern agriculture faces the challenge of increasing productive efficiency while minimizing operational costs and reducing environmental impact. In this context, the integration of advanced technologies has proven to be a key strategy for optimizing agricultural management. Among these innovations, computer vision and deep learning have enabled the development of automated fruit detection and counting systems, significantly improving the accuracy and efficiency of data collection [1].

The cultivation of Gossypium barbadense, commonly known as Pima cotton, is of great significance in the agricultural sector due to its high-quality fiber, which is widely used in the textile industry [2]. Accurate cotton yield estimation is essential for improving resource planning and distribution, reducing losses, and optimizing the international trade of this raw material [3]. However, traditional manual counting methods are highly time- and labor-intensive, making them impractical for large-scale production. Additionally, variability in environmental conditions and the limitations of current monitoring technologies hinder precise determination of the cultivated area, directly impacting strategic decision-making in the agricultural sector.

Advancements in deep learning techniques, particularly convolutional neural networks (CNNs), have opened new opportunities for the automated detection and quantification of fruits in various agricultural species. The application of computer vision techniques in cotton crops extends beyond fruit detection to include weed detection [4], disease identification in leaves [5], and pest monitoring [6], among others.

The detection and counting of cotton fruits have been addressed through various computer vision and deep learning-based techniques. Models such as You Only Look Once (YOLO) and Space-to-Depth Single Shot Detector (SSPD) have demonstrated high accuracy in detecting cotton bolls in images captured by Unmanned Aerial Vehicles (UAVs), integrating space-to-depth convolutions and attention mechanisms to enhance the detection of small objects [7]. Additionally, density-guided optimal transport strategies have been proposed to mitigate issues related to overlap and occlusion in natural field environments, achieving lower counting and localization errors [8].

The use of mobile cameras has enabled real-time detection and counting, reaching an accuracy of 93% and a processing speed of 21 fps using GPU-accelerated neural networks [9]. Furthermore, deep convolutional neural networks have been employed for crop yield estimation, as exemplified by the CD-SegNet model, which achieved an average error of 6.2% in yield estimations for high-density fields [10]. Spectral segmentation and region-growing methods have demonstrated accuracies exceeding 88% in the automatic detection of open cotton bolls [11], while 3D point cloud-based approaches have been utilized for cotton flower counting with reduced error margins [12].

The primary challenges in cotton fruit detection include identifying small objects, addressing occlusion by foliage, ensuring real-time processing, and adapting models to diverse cultivation environments. Lightweight models such as YOLOv5_E have been optimized for mobile devices and UAVs, achieving speeds of up to 178 fps with an accuracy of 71.7% [13]. Finally, the integration of these models with robotic harvesting systems and the use of 3D data for plant part segmentation represent key research directions for enhancing efficiency in precision agriculture [13].

Several studies have explored automated fruit detection. Table 1 presents relevant research on fruit detection, detailing the techniques employed, datasets used, obtained results, and applications of these studies.

This study explores the application of convolutional neural networks for the identification and quantification of Gossypium barbadense fruits, aiming to enhance the efficiency of agricultural monitoring in the Peruvian context. The impact of these models on optimizing crop management, their contribution to more sustainable production, and their potential integration into precision agriculture systems are analyzed.

2. Materials and Methods

In this research, a computer vision-based approach was employed for the detection and counting of cotton fruits. The process involved image collection and preprocessing, model training, and performance evaluation using relevant metrics. Furthermore, the trained model was implemented in a web application to validate its practical applicability. Figure 1 illustrates the workflow of the process developed in this study.

2.1. Study Area and Data Collection

This study was conducted in the district of Lambayeque, province of Lambayeque, region of Lambayeque, Peru, located at the coordinates 6°40′20″ S latitude and 79°53′17″ W longitude.

Figure 2a presents the geospatial delineation and dimensions of one of the study plots, while Figure 2b details the metric characteristics of the second plot. The cotton variety cultivated in both plots corresponds to IPA, which was in the production stage, with plants aged between 5 and 6 months.

2.2. Data Collection

The image capture process in this study followed a standardized protocol. A walking path was established with a distance of 1 m between the cotton rows. The camera was positioned at 2.5 m above ground level to optimize image acquisition, considering that the cotton plants’ height ranged between 1.00 and 1.20 m. Figure 3 illustrates the trajectory followed for image collection.

For visual data acquisition, two mobile devices were utilized. One served as a remote controller for the other, enabling remote image capture. The primary device was mounted on a custom-designed structure comprising a tripod and an extendable “selfie stick” support, reaching a total height of 2.5 m. Additionally, an adjustable holder with a 180° rotation angle was incorporated, securing the capturing device to ensure stability and precision during image acquisition.

A total of 370 images were collected at a resolution of 640 × 640 pixels using a Samsung A12 device (model SM-A127M), equipped with a quad-camera system: a 48 MP (f/2.0) primary sensor, a 5 MP (f/2.2) ultra-wide sensor, a 2 MP (f/2.4) macro sensor, and a 2 MP (f/2.4) depth sensor. To enhance image quality, specific parameters were configured, including an ISO sensitivity of 20, a focal distance of 5 mm, and a 35 mm-equivalent focal length of 25 mm. Furthermore, a brightness level of 5.56 was recorded, contributing to improved exposure and contrast in the captured images. The files were stored in JPG format, with an average file size of 5.84 MB per image, ensuring high resolution and fidelity in the visual representation of the collected data.

The factors influencing image acquisition, such as wind speed, humidity, meteorological conditions, latitude, and longitude, are presented in Table 2.

In Figure 4, some sample images that comprise the dataset can be observed.

2.3. Data Anotation

A total of 370 images in JPG format were collected, from which the first 100 were selected for the manual annotation of cotton fruits using the LabelImg software, version 1.8.6 [19]. Each cotton fruit was labeled under the class “cotton”. This process resulted in the generation of approximately 3062 annotations in YOLO format, with an average of 30 annotations per image. Figure 5 illustrates the manual annotation process performed on one of the images.

2.4. Data Augmentation

To expand the training dataset, data augmentation techniques were implemented, an effective method to mitigate overfitting [20]. In this study, data augmentation was performed through progressive rotations within a range of −5° to +5°, aiming to simulate different viewing angles of the cotton fruits on the plant (Equation (1)). This approach was selected to introduce variability in the dataset, enabling the model to recognize fruits from different perspectives, similar to real-world conditions where images are often taken from varying angles. As a result, each original image was augmented to generate an average of 21 to 22 new images, leading to a final dataset of 2186 images and 70,348 labeled fruits. Table 3 presents the number of images generated through this process. All images were manually annotated with the technical assistance of a local farmer. Although this process is time-consuming, manual labeling provides a reliable ground truth foundation, which is essential for the accurate training and evaluation of deep learning models.

r o t a t i o n = I_{r o t} (x', y') = I (x c o s (θ) - y s i n (θ), x s i n (θ) + y c o s (θ))

(1)

where

(θ)

is the rotation angle.

2.5. Fruit Detection

In this study, two architectures, YOLO [21] and Faster R-CNN [22], were trained using the dataset. The dataset was divided into three subsets: 70% for training, 15% for validation, and 15% for testing. Both architectures were trained for 100 epochs with a batch size of 8.

2.5.1. YOLO

The first model was trained using YOLO in versions 8 and 11, employing the nano (“N”) scale for resource-constrained environments and the extra-large (“X”) scale to maximize accuracy and performance.

YOLO is a family of object detection models that partitions the input image into a grid and simultaneously predicts multiple bounding boxes along with their respective class probabilities. In this study, images were resized to a resolution of 640 × 640 pixels. The model then generates feature maps at different scales through downsampling operations. Unlike previous versions, YOLOv8 and YOLOv11 employ an anchor-free scheme, meaning that each grid cell directly predicts the bounding box coordinates and confidence scores without relying on predefined anchors. Additionally, in this case, the model estimates the probability of the Gossypium barbadense class.

Finally, the loss function is computed by considering the error in the bounding box coordinates (x,y,w,ℎ), confidence scores, and class probabilities, thereby optimizing detection accuracy.

2.5.2. Faster R-CNN

In addition to the YOLO-based training, another model was trained using Faster R-CNN, an architecture that integrates region proposal and object detection into a unified process. The general workflow of Faster R-CNN consists of three main phases.

First, in the feature extraction phase, the deep convolutional neural network ResNet50 is utilized to obtain rich representations of the input image. Next, in the region proposal phase, the Region Proposal Network (RPN), a fully convolutional network, generates potential locations where objects might be present. Finally, in the refinement and classification phase, the proposed regions are used to extract Regions of Interest (RoIs) from the feature map. These RoIs are normalized to a fixed dimension and processed through a pooling layer (RoI Pooling or RoI Align), ensuring that the information remains consistent before being passed to the classification network and bounding box regression.

2.6. Performance Evaluation

To evaluate the model’s performance in cotton fruit detection, predictions are classified based on their correctness. A True Positive (TP) occurs when the model correctly identifies a fruit in its actual location. A True Negative (TN) is recorded when the model accurately determines the absence of a fruit in a given position. Conversely, a False Positive (FP) is assigned when the model incorrectly detects a fruit where none exists, while a False Negative (FN) occurs when the model fails to detect an existing fruit.

The following metrics were used to assess the model’s performance:

Precision (P): This is a metric that quantifies the proportion of true positives relative to the total number of elements classified as positive, thus evaluating the accuracy of the model’s positive predictions.

P = \frac{T P}{T P + F P}

(2)

Recall (R): evaluates the model’s ability to correctly identify all actual positive cases.

R = \frac{T P}{T P + F N}

(3)

F1-score (F): provides a balanced measure of precision and recall.

F = 2 \times \frac{P \times R}{P + R}

(4)

3. Results

3.1. Performance

Using the available dataset, training experiments were conducted with five distinct models, utilizing the YOLO architecture in versions 8 and 11 with their respective “X” and “N” variants, as well as the Faster R-CNN model. To ensure a fair evaluation of model performance, the dataset was partitioned into 70% for training, 15% for validation, and 15% for testing. Each model was trained for 100 epochs with a batch size of 8, aiming to optimize convergence and learning stability.

Figure 6 presents the confusion matrices obtained for the evaluated models: (a) YOLO v8N, (b) YOLO v8X, (c) YOLO v11N, (d) YOLO v11X, and (e) Faster R-CNN. The results indicate that YOLO v8X (b) achieved 10,794 true positives with only 26 false positives, demonstrating high detection accuracy. In contrast, YOLO v8N (a) exhibited a higher number of false positives (265) and false negatives (239), suggesting a lower overall performance.

Similarly, YOLO v11X (d) demonstrated a performance comparable to YOLO v8X, with 10,621 true positives and only 35 false positives, indicating robust detection capabilities. However, YOLO v11N (c) produced a significantly higher number of false positives (426) and false negatives (168), suggesting suboptimal performance compared to its more advanced counterparts.

Finally, Faster R-CNN (e) recorded the highest number of false negatives (551) and false positives (575), indicating lower precision and sensitivity in comparison to the YOLO models evaluated.

Figure 7 illustrates the evolution of precision during the training process for the five fruit detection models analyzed in this study: Faster R-CNN and YOLO in versions 8 and 11, employing both the nano (“N”) and extra-large (“X”) scales.

The YOLOv8X and YOLOv11X variants achieved the highest performance, with precision values of 0.9981 and 0.9978, respectively. In contrast, YOLOv11N exhibited a slower convergence rate, reaching a final precision of 0.9791. Although Faster R-CNN demonstrated competitive performance (0.9910 precision), it remained slightly below the more advanced YOLO versions.

Figure 8 illustrates the evolution of recall, while Figure 9 depicts the F1-score throughout the training process for the five fruit detection models evaluated in this study.

The YOLO v8X and YOLO v11X variants exhibited the best performance in both metrics, achieving recall values of 0.9954 and 0.9944, respectively, and F1-scores of 0.9968 and 0.9961. In contrast, YOLO v11N demonstrated lower performance, with a recall of 0.9733 and an F1-score of 0.9762, indicating a reduced ability to correctly detect all fruits. Faster R-CNN attained a recall of 0.9722 and an F1-score of 0.9815, showing competitive performance, albeit slightly lower than that of the more advanced YOLO versions.

The results obtained from the evaluation of the fruit detection models indicate that the extra-large (“X”) variants of YOLO v8 and YOLO v11 achieved the best performance in terms of precision, recall, and F1-score, while the nano (“N”) version of YOLO v11 exhibited slower convergence. Faster R-CNN demonstrated competitive performance, albeit slightly lower than the more advanced YOLO versions.

The superior performance of the extra-large variants of YOLOv8 and YOLOv11 can be attributed to their enhanced network depth and increased number of parameters, which allow for a more refined extraction of hierarchical features. These models possess a greater capacity to capture complex patterns and contextual information within the image, which is particularly beneficial for detecting small and partially occluded objects in cluttered agricultural environments. Furthermore, the extra-large architectures facilitate a more robust learning process by reducing the risk of underfitting when exposed to high-variability datasets, thereby contributing to improved generalization capabilities during inference.

The detailed values presented in Table 4, where the results from the final epochs have been averaged, reflect these differences in model performance.

3.2. Application

Finally, the best-performing trained model for cotton fruit detection has been integrated into a web application that enables automated fruit counting for a given image. For its implementation, a Model-View-Controller (MVC) architecture was employed, utilizing the Django framework. The platform allows users to upload images of cotton plants in the production stage, which are then processed using the trained model to identify and count the cotton fruits. Figure 10 illustrates the developed application: Figure 10a displays the user interface for uploading an image, while Figure 10b presents the result after processing the image, indicating the number of detected fruits along with labels assigned to each detected fruit.

The practical implications of integrating this model into a web application are significant, as it provides an accessible tool for field technicians and farmers to estimate crop yields accurately and efficiently. By automating the counting process, the application significantly reduces the labor burden of manual counting, enabling faster decision-making and more precise crop management.

During preliminary testing and feedback collection from end-users, including agricultural specialists and farmers, the platform was praised for its ease of use and rapid results. However, with the goal of enhancing the tool’s usability, we plan to implement the same model in a mobile application capable of processing real-time video. This version will allow users to perform dynamic sampling while traversing the field, increasing efficiency and ease of use for real-time decision-making.

4. Discussion

This study is framed within the growing need to automate crop monitoring through computer vision techniques, and its findings compare favorably with previous research. For instance, the YOLO SSPD model described in [7] demonstrated high accuracy in detecting cotton bolls using UAV imagery by leveraging depth-wise convolutions and attention mechanisms. In contrast, our approach is based on YOLOv8 and YOLOv11 architectures (in both nano and extra-large variants), which, by employing an anchor-free scheme, achieved superior accuracy rates (99.81% and 99.78% for the extra-large versions). This highlights an enhanced capability for detecting small objects under field conditions. Additionally, studies utilizing density-guided optimal transport strategies, as reported in [8], have helped mitigate issues related to occlusion and overlap, thereby reducing counting errors. However, our methodology, complemented by data augmentation techniques and a standardized image capture protocol, not only optimizes detection performance but also achieves more stable and rapid convergence during training, significantly reducing both false positives and false negatives.

Moreover, previous research employing YOLOv2 for fruit detection and counting, as outlined in [9], achieved approximately 93% accuracy in controlled environments with a processing speed of 21 fps. In comparison, the extra-large variants of YOLOv8 and YOLOv11 presented in this study surpass these results, demonstrating competitive performance even in more complex scenarios. The extra-large variants of YOLOv8 and YOLOv11 (v8X and v11X) demonstrated superior performance compared to their lighter counterparts due to their deeper and more complex architectures, which include a higher number of layers and parameters. This configuration allows the models to capture richer and more abstract feature representations, which is particularly beneficial for detecting small objects in uncontrolled field environments. Additionally, the increased computational capacity of these models facilitates better generalization during training, leading to higher precision and recall rates. The integration of an anchor-free scheme with a more robust network also contributes to faster and more stable convergence while reducing the occurrence of false positives and false negatives. However, while the results obtained in this study are promising and reflected in satisfactory training metrics, it is important to note that the model has not yet been evaluated with field images different from those in the training dataset. This limitation underscores the need for future testing in real-world environments to validate its performance under varying lighting conditions, capture angles, and crop variability beyond those present in the training set.

Unlike previous studies, which have primarily focused on optimizing algorithmic performance without immediate practical applications, the present work adopts an applied approach by integrating the trained model into an accessible platform for field use.

The results obtained not only enable fruit counting but also facilitate yield estimation, as demonstrated in [23], where fruit counting serves as a fundamental preliminary step for crop yield estimation.

5. Conclusions

This study presents several key contributions to the field of automated crop monitoring through computer vision techniques. Firstly, it introduces a specialized dataset for the detection of Gossypium barbadense fruits during their maturation stage, representing a significant advancement for precision agriculture in northern Peru. The dataset, composed of RGB images and manually annotated labels, serves as a fundamental tool for the development of automated agricultural monitoring systems. Furthermore, this research demonstrates the effectiveness of the extra-large YOLO variants (v8X and v11X) in detecting small objects, achieving high levels of precision, recall, and F1-score, with performance metrics reaching up to 99.8%.

In addition to the dataset, this work integrates the best-performing YOLO model into a practical and real-time web application, developed under a Model-View-Controller (MVC) architecture using the Django framework. The application enables automated fruit counting, offering a tool that can be implemented in precision agriculture platforms. This integration significantly enhances the practical applicability of the research, allowing farmers and plant breeders to optimize management strategies, conduct real-time crop monitoring, and estimate yields based on reliable quantitative data. The proposed approach is noteworthy not only for achieving high detection accuracy but also for being designed for immediate field deployment, effectively bridging the gap between theoretical research and real-world agricultural practice in northern Peru, where agricultural technologization is still emerging. This demonstrates the potential of artificial intelligence-based applications to support the advancement of precision agriculture. Future work could focus on enhancing the system’s robustness to variable agricultural field conditions and extending the applicability of convolutional neural network architectures to diverse contexts and crop types.

Author Contributions

Conceptualization, J.B.-R. and J.A.-D.; methodology, J.B.-R., J.A.-D. and V.T.-M.; software, J.B.-R. and J.A.-D.; validation, J.B.-R. and J.A.-D.; formal analysis, V.T.-M. and J.A.-D.; investigation, J.B.-R. and J.A.-D.; data curation, J.B.-R.; writing—original draft, J.B.-R., J.A.-D. and V.T.-M.; writing—review and editing, V.T.-M. and J.A.-D.; supervision, J.A.-D. and V.T.-M. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by the Universidad Señor de Sipán (Perú).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Farjon, G.; Huijun, L.; Edan, Y. Deep-learning-based counting methods, datasets, and applications in agriculture: A review. Precis. Agric. 2023, 24, 1683–1711. [Google Scholar] [CrossRef]
Manavalan, R. Towards an intelligent approaches for cotton diseases detection: A review. Comput. Electron. Agric. 2022, 200, 107255. [Google Scholar] [CrossRef]
Tedesco-Oliveira, D.; da Silva, R.P.; Maldonado, W.; Zerbato, C. Convolutional neural networks in predicting cotton yield from images of commercial fields. Comput. Electron. Agric. 2020, 171, 105307. [Google Scholar] [CrossRef]
Dang, F.; Chen, D.; Lu, Y.; Li, Z.; Zheng, Y. DeepCottonWeeds (DCW): A Novel Benchmark of YOLO Object Detectors for Weed Detection in Cotton Production Systems. In Proceedings of the 2022 ASABE Annual International Meeting, Houston, TX, USA, 17–20 July 2022; American Society of Agricultural and Biological Engineers: St. Joseph, MI, USA, 2022. [Google Scholar] [CrossRef]
Bappi, M.B.R.; Swapno, S.M.M.R.; Rabbi, M.M.F. Deploying DenseNet for Cotton Leaf Disease Detection on Deep Learning; Springer: Berlin/Heidelberg, Germany, 2025; pp. 485–498. [Google Scholar] [CrossRef]
Meng, K.; Xu, K.; Cattani, P.; Mei, S. Camouflaged cotton bollworm instance segmentation based on PVT and Mask R-CNN. Comput. Electron. Agric. 2024, 226, 109450. [Google Scholar] [CrossRef]
Zhang, M.; Chen, W.; Gao, P.; Li, Y.; Tan, F.; Zhang, Y.; Ruan, S.; Xing, P.; Guo, L. YOLO SSPD: A small target cotton boll detection model during the boll-spitting period based on space-to-depth convolution. Front. Plant Sci. 2024, 15, 1409194. [Google Scholar] [CrossRef] [PubMed]
Huang, Y.; Li, Y.; Liu, Y.; Zheng, D. In-field cotton counting and localization jointly based on density-guided optimal transport. Comput. Electron. Agric. 2023, 212, 108058. [Google Scholar] [CrossRef]
Fue, K.G.; Porter, W.M.; Rains, G.C. Deep Learning based Real-time GPU-accelerated Tracking and Counting of Cotton Bolls under Field Conditions using a Moving Camera. In Proceedings of the 2018 ASABE Annual International Meeting, Detroit, MI, USA, 29 July–1 August 2018; American Society of Agricultural and Biological Engineers: St. Joseph, MI, USA, 2018. [Google Scholar] [CrossRef]
Li, F.; Bai, J.; Zhang, M.; Zhang, R. Yield estimation of high-density cotton fields using low-altitude UAV imaging and deep learning. Plant Methods 2022, 18, 55. [Google Scholar] [CrossRef] [PubMed]
Yeom, J.; Jung, J.; Chang, A.; Maeda, M.; Landivar, J. Automated Open Cotton Boll Detection for Yield Estimation Using Unmanned Aircraft Vehicle (UAV) Data. Remote Sens. 2018, 10, 1895. [Google Scholar] [CrossRef]
Xu, R.; Li, C.; Paterson, A.H.; Jiang, Y.; Sun, S.; Robertson, J.S. Aerial Images and Convolutional Neural Network for Cotton Bloom Detection. Front. Plant Sci. 2018, 8, 2235. [Google Scholar] [CrossRef] [PubMed]
Yu, G.; Cai, R.; Luo, Y.; Hou, M.; Deng, R. A-pruning: A lightweight pineapple flower counting network based on filter pruning. Complex Intell. Syst. 2024, 10, 2047–2066. [Google Scholar] [CrossRef]
Bolouri, F.; Kocoglu, Y.; Pabuayon, I.L.B.; Ritchie, G.L.; Sari-Sarraf, H. CottonSense: A high-throughput field phenotyping system for cotton fruit segmentation and enumeration on edge devices. Comput. Electron. Agric. 2024, 216, 108531. [Google Scholar] [CrossRef]
Tan, C.; Sun, J.; Paterson, A.H.; Song, H.; Li, C. Three-view cotton flower counting through multi-object tracking and RGB-D imagery. Biosyst. Eng. 2024, 246, 233–247. [Google Scholar] [CrossRef]
Bairi, A.; Dulhare, U.N. Advanced Cotton Boll Segmentation, Detection, and Counting Using Multi-Level Thresholding Optimized with an Anchor-Free Compact Central Attention Network Model. Eng 2024, 5, 2839–2861. [Google Scholar] [CrossRef]
Xu, R.; Paterson, A.; Li, C. Cotton flower detection using aerial color images. In Proceedings of the 2017 ASABE Annual International Meeting, Spokane, WA, USA, 16–19 July 2017; American Society of Agricultural and Biological Engineers: St. Joseph, MI, USA, 2017. [Google Scholar] [CrossRef]
Lu, Z.; Han, B.; Dong, L.; Zhang, J. COTTON-YOLO: Enhancing Cotton Boll Detection and Counting in Complex Environmental Conditions Using an Advanced YOLO Model. Appl. Sci. 2024, 14, 6650. [Google Scholar] [CrossRef]
Tzutalin. LabelImg: Image Annotation Tool. Available online: https://github.com/HumanSignal/labelImg (accessed on 21 December 2024).
van Dyk, D.A.; Meng, X.-L. The Art of Data Augmentation. J. Comput. Graph. Stat. 2001, 10, 1–50. [Google Scholar] [CrossRef]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo Algorithm Developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.B.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2015. [Google Scholar] [CrossRef] [PubMed]
Reddy, J.; Niu, H.; Scott, J.L.L.; Bhandari, M.; Landivar, J.A.; Bednarz, C.W.; Duffield, N. Cotton Yield Prediction via UAV-Based Cotton Boll Image Segmentation Using YOLO Model and Segment Anything Model (SAM). Remote Sens. 2024, 16, 4346. [Google Scholar] [CrossRef]

Figure 1. Process flow for the detection and counting of cotton fruits. Note: The arrows represent the sequential flow of the process, indicating the order in which each step is performed—from data collection to the application of the detection model.

Figure 2. Delimitation of the cotton cultivation area. (a) First plot used for image capture. (b) Second plot evaluated. Note: The blue dot indicates the starting point of the terrain survey, while the red landmarks serve as reference points to delineate the terrain and obtain the measurements.

Figure 3. Path followed for image acquisition. Note: The arrows in the figure indicate the path followed for image acquisition.

Figure 4. Sample Images from the Dataset.

Figure 5. Manual Annotation Process of Cotton Fruits. Note: The green points mark a cotton fruit selected using the LabelImg software.

Figure 6. Confusion matrix. (a) YOLO v8N. (b) YOLO v8X. (c) YOLO v11N. (d) YOLO v11X. (e) Faster R-CNN.

Figure 7. Comparison of Precision between models Trained for 100 Epochs.

Figure 8. Comparison of Recall between models Trained for 100 Epochs.

Figure 9. Comparison of F1-Score between models Trained for 100 Epochs.

Figure 10. Application Interface: (a) Initial Data Entry. (b) Quantity of Fruits Detected in Uploaded Images.

Table 1. Algorithms Used in Fruits Detection.

Insights	Dataset	Results	Applications
YOLO small-scale pyramid depth-aware detection (SSPD) model, enhancing automated detection and counting of cotton bolls (Gossypium barbadense) using UAV imagery [7].	‘Xinlu Early No. 53’ and ‘Xinlu Early No. 74’ varieties. Collected during three stages of cotton fluffing period.	Boll detection accuracy: 0.874 on UAV-scale imagery Coefficient of determination (R²): 0.86, RMSE: 12.38, RRMSE: 11.19%	Cotton yield estimation during the flocculation period. High-precision cotton monitoring using UAV imagery.
Joint cotton counting and localization algorithm using VGG19 and a density-guided optimal transport approach, effectively addressing challenges in detecting and counting Gossypium barbadense cotton in unevenly distributed and occluded field environments [8].	Constructed in-field cotton dataset with 400 images. Dataset used for validating the proposed algorithm.	Lower counting error MAE and RMSE by 10.54 and 11.57. Increased Precision and Recall by 1.7% and 3.8%.	In-field counting of cotton status. Localization for intelligent agricultural management.
Automated detection and counting of cotton bolls using YOLO v2 [9].	Twelve defoliated cotton plants in pots. 486 images with 7498 bolls for training.	System achieved 93% accuracy and 21 fps processing speed. Counting performance accuracy was around 93% with 6% standard deviation.	Robotic harvesting of cotton bolls in real-time. Navigation and environmental perception for harvesting operations.
Optimized Mask R-CNN with TensorRT for segmenting and counting cotton fruits in four growth stages [14].	RGB-D cameras, 344 images	AP score of 79%, R² = 0.94 Average segmentation model accuracy: 79%. Correlation between total fruit count per image and expert evaluations: R² = 0.94.	CottonSense is an HTP system that monitors cotton development using computer vision, segmentation, and real-time fruit counting.
YOLO v8x trained to detect flowers in RGB images [15].	Videos of cotton flowers captured with three RGB-D cameras in an experimental field	Mean Average Precision (mAP) of YOLOv8x: 96.4%.	Facilitates the study of flowering time and the productivity of different cotton genotypes without relying on manual methods.
An anchor-free compact central attention network model, significantly enhancing the efficiency and precision in identifying and quantifying cotton fruits in agricultural studies [16].	Annotated dataset extracted from weakly supervised detection. Data gathered from various sources for analysis	Accuracy of proposed technique: 94%. Precision, recall, F1-score, specificity: 93.8%, 92.99%, 93.48%, 92.99%.	It utilizes image preprocessing, noise removal, segmentation, and detection.
CNN for detecting and counting cotton flowers in images captured by a drone [17].	RGB images taken by a UAV	4.5% false negatives and 5.1% false positives. A correlation between flower count and cotton yield was observed	Production estimation and agricultural management.
Implementation of the COTTON-YOLO model, based on YOLOv8n [18]	Images of cotton bolls captured in natural environments under varying lighting and weather conditions	COTTON-YOLO improves detection accuracy compared to YOLOv8	Automated monitoring of cotton bolls in agricultural fields.

Table 2. Environmental and operational factors considered during image acquisition.

Criteria	Value
Distance	1 m between rows
Area	8458.37 m² ó 0.85 ha
Height	2.5 m from the ground
Camera angle	180°
Weather	Sunny
Wind speed	22.5 °C
Temperature	7.6 Km/h SSW
Humidity	71%
Time	9:00 AM
Day	5 May 2024
Latitude	6°40′20″ S
Longitude	79°53′17″ W

Table 3. Dataset details.

Original Images		Data Augmentation
Images	Annotations	Images	Annotations
100	3062	2186	70,348

Table 4. Summary of Precision, Recall, and F1-Score in Cotton Fruit Detection.

Metric	YOLO v8N	YOLO v8X	YOLO v11N	YOLO v11X	Faster R-CNN
Precision (%)	98.80	99.81	97.91	99.78	99.10
Recall (%)	97.40	99.54	97.33	99.44	97.22
F1-Score (%)	98.10	99.68	97.62	99.61	98.15

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ballena-Ruiz, J.; Arcila-Diaz, J.; Tuesta-Monteza, V. Automated Detection and Counting of Gossypium barbadense Fruits in Peruvian Crops Using Convolutional Neural Networks. AgriEngineering 2025, 7, 152. https://doi.org/10.3390/agriengineering7050152

AMA Style

Ballena-Ruiz J, Arcila-Diaz J, Tuesta-Monteza V. Automated Detection and Counting of Gossypium barbadense Fruits in Peruvian Crops Using Convolutional Neural Networks. AgriEngineering. 2025; 7(5):152. https://doi.org/10.3390/agriengineering7050152

Chicago/Turabian Style

Ballena-Ruiz, Juan, Juan Arcila-Diaz, and Victor Tuesta-Monteza. 2025. "Automated Detection and Counting of Gossypium barbadense Fruits in Peruvian Crops Using Convolutional Neural Networks" AgriEngineering 7, no. 5: 152. https://doi.org/10.3390/agriengineering7050152

APA Style

Ballena-Ruiz, J., Arcila-Diaz, J., & Tuesta-Monteza, V. (2025). Automated Detection and Counting of Gossypium barbadense Fruits in Peruvian Crops Using Convolutional Neural Networks. AgriEngineering, 7(5), 152. https://doi.org/10.3390/agriengineering7050152

Article Menu

Automated Detection and Counting of Gossypium barbadense Fruits in Peruvian Crops Using Convolutional Neural Networks

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Data Collection

2.2. Data Collection

2.3. Data Anotation

2.4. Data Augmentation

2.5. Fruit Detection

2.5.1. YOLO

2.5.2. Faster R-CNN

2.6. Performance Evaluation

3. Results

3.1. Performance

3.2. Application

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI