Smart Postharvest Management of Strawberries: YOLOv8-Driven Detection of Defects, Diseases, and Maturity

Cordeiro, Luana dos Santos; Nääs, Irenilza de Alencar; Okano, Marcelo Tsuguio

doi:10.3390/agriengineering7080246

Open AccessArticle

Smart Postharvest Management of Strawberries: YOLOv8-Driven Detection of Defects, Diseases, and Maturity

by

Luana dos Santos Cordeiro

,

Irenilza de Alencar Nääs

and

Marcelo Tsuguio Okano

^*

Graduate Program in Production Engineering, Paulista University, São Paulo 04026-002, SP, Brazil

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(8), 246; https://doi.org/10.3390/agriengineering7080246

Submission received: 9 July 2025 / Revised: 20 July 2025 / Accepted: 25 July 2025 / Published: 1 August 2025

Download

Browse Figures

Versions Notes

Abstract

Strawberries are highly perishable fruits prone to postharvest losses due to defects, diseases, and uneven ripening. This study proposes a deep learning-based approach for automated quality assessment using the YOLOv8n object detection model. A custom dataset of 5663 annotated strawberry images was compiled, covering eight quality categories, including anthracnose, gray mold, powdery mildew, uneven ripening, and physical defects. Data augmentation techniques, such as rotation and Gaussian blur, were applied to enhance model generalization and robustness. The model was trained over 100 and 200 epochs, and its performance was evaluated using standard metrics: Precision, Recall, and mean Average Precision (mAP). The 200-epoch model achieved the best results, with a mAP50 of 0.79 and an inference time of 1 ms per image, demonstrating suitability for real-time applications. Classes with distinct visual features, such as anthracnose and gray mold, were accurately classified. In contrast, visually similar categories, such as ‘Good Quality’ and ‘Unripe’ strawberries, presented classification challenges.

Keywords:

YOLO; fruit classification; computer vision; defects and diseases; Convolutional Neural Networks; fruit quality

1. Introduction

According to the Food and Agriculture Organization of the United Nations (FAO) [1], global strawberry production in 2023 was approximately 10,485,454.06 tons. In the same year, Brazil produced approximately 187,795.78 tons, ranking ninth in the world, with the states in the southeast and south regions being its most prominent producers. This production has been growing thanks to new cultivation techniques, which enable year-round production in various parts of the country [2]. Despite these advances, waste throughout the production chain remains significant; it is estimated that the supply chain wastes approximately 26 million tons of food annually, of which 5.3 million tons are fruits and 5.6 million tons are vegetables [3,4].

Strawberry is a fruit with high commercial and nutritional value and great relevance in the fruit production chain in Brazil and worldwide. The genus for all strawberries is Fragaria L., which belongs to the rose family (Rosaceae). The most common strawberry-grown commercial hybrid (Fragaria × ananassa), resulting from a cross between two other species, Fragaria chiloensis (from Chile) and Fragaria virginiana (from North America), is commonly found today in the food supply chains [5]. Strawberries undergo a rapid growth cycle, spanning approximately four weeks between flowering and ripening, which makes them especially susceptible to damage, disease, and postharvest losses [6,7]. Therefore, as it is a highly perishable fruit with significant commercial value, it requires careful handling to ensure the product’s final quality is delivered to the consumer.

Visual defects, mechanical damage, and common diseases such as anthracnose and gray mold compromise not only the appearance of the fruit but also its durability and safety for consumption. Identifying these factors quickly and accurately reduces losses, increases process efficiency, and ensures a high quality standard [8,9,10]. Diseases significantly affect the quality and productivity of strawberries, making it urgent to develop accurate and timely methods for their identification. However, detecting diseases in strawberries poses a challenge due to the complexity of the image background and the subtle variations between different disease classes [11,12]. Due to their perishability, it is crucial to develop efficient classification and selection methods to ensure the delivery of high-quality fruit to the final consumer.

The growing demand for high-quality food and the pursuit of more efficient agricultural processes have driven the adoption of technologies in the fruit production chain. Among these technologies, solutions based on computer vision and artificial intelligence stand out, enabling the automation of traditionally manual tasks, such as fruit classification and selection, to increase sustainability and optimize agricultural practices [13,14,15].

Artificial intelligence (AI) techniques, particularly those based on deep learning and machine learning, have been increasingly utilized in agriculture over the past few years. Computer vision, combined with convolutional neural network (CNN) algorithms, has proven efficient in detecting and classifying fruits, including strawberries, using images and videos [13,16,17,18]. In this context, the You Only Look Once (YOLO) model stands out, recognized for its efficiency and speed in object detection. In its most recent version, YOLO presents significant advances in accuracy, generalization capacity, and computational performance, making it a promising choice for real-time agricultural applications [17,19].

The present study aims to apply computer vision techniques and convolutional neural networks to classify the quality of strawberries using the YOLO model. The proposal involves classifying the quality of strawberries in images, focusing on identifying defects, diseases, and ripeness; promoting greater efficiency in identifying these problems; and directly contributing to reducing waste and improving the quality of the final product delivered to the consumer. Moreover, the study examines the advantages, limitations, and prospects of implementing these technologies in the agricultural and commercial sectors.

2. Background

A high-quality strawberry exhibits consistent size, shape, and a vibrant red color characteristic of complete ripeness and cultivar specificity, with even pigment distribution, signifying optimal anthocyanin development [20]. Surface attributes such as evenly spaced achenes and a fresh green calyx contribute to the visual appeal and marketability [21]. Firmness, a key texture parameter, is best measured instrumentally and reflects cellular integrity, directly influencing shelf life and mechanical resilience [22]. The postharvest fruit must be free of defects, such as bruising or microbial contamination, including Botrytis cinerea [23]. The flavor depends on the balance of soluble solids (7–12°Brix) and titratable acidity (0.4–0.9%), with the Soluble Solids Content (SSC) to Titratable Acidity (TA) (SSC/TA) ratio serving as a reliable indicator of sweetness perception [20]. Aroma is shaped by volatile organic compounds (VOCs), particularly esters and furanones, which vary by cultivar and ripening stage and are typically assessed via gas chromatography-mass spectrometry [21]. Juiciness reflects cellular water retention, enhancing mouthfeel. Nutritionally, strawberries are rich in vitamin C, anthocyanins, ellagic acid, and folate, which contribute to their coloration and health benefits [24]. A commercially viable fruit must retain firmness, flavor, and nutritional quality during storage and transport, factors influenced by cultivar, maturity at harvest, and preservation strategies [23].

In recent years, computer vision technologies have accelerated progress in image detection, and machine learning or deep learning (ML/DL) is gradually becoming a common approach in crop disease detection systems [16,19]. In the broader context of technologies applied to fruit detection, image processing in agriculture has undergone a considerable evolution. According to Wang et al. [25], the process has undergone three main phases: (I) conventional digital image processing, (II) image processing with machine learning, and (III) image processing with deep learning. Initially, threshold analysis and regional segmentation were used to extract basic features such as color and shape [26].

However, with the emergence of deep learning, algorithms based on convolutional neural networks (CNNs) began to offer better performance, providing greater accuracy, speed, and adaptability to image variations. This evolution is essential for accurately identifying defects, diseases, and ripening stages, which are necessary elements in agricultural production.

The architecture of a convolutional neural network (CNN) mainly consists of an input layer, convolutional layers, activation functions, pooling layers, fully connected layers, and an output layer [27]. Convolutional neural networks (CNNs) excel in image processing and visual feature extraction, being widely applicable and efficiently identifying attributes such as color, shape, and texture [28,29,30]. This feature enables their use in tasks such as visual inspection of food, detection of surface defects, and identification of adulteration, allowing for the detection of deteriorated areas in fruits through texture analysis or the automatic identification of defective products on packaging lines [31,32]. Deep convolutional neural networks (CNNs) have revolutionized computer vision by enhancing image classification, object detection, and video prediction while addressing key challenges and paving the way for future research directions [6].

Diseases and postharvest damages significantly impact strawberry quality and yield, and deep learning has become an essential approach for disease detection in crops [16]. CNNs are particularly good at identifying patterns in images, such as detecting defects or diseases in fruits, due to their ability to capture complex spatial relationships between pixels. Their application in strawberry image processing enables the identification of fruit presence and the classification of defects, such as spots, cracks, or fungal diseases. Furthermore, CNNs can identify different stages of ripeness, providing valuable information for optimizing harvests and postharvest fruit transportation [26].

Integrating artificial intelligence (AI) and deep learning further enhances the speed and effectiveness of fruit detection. Models such as YOLO perform real-time classification of multiple objects in a single image, ensuring robust and accurate detection, especially suitable for agricultural contexts. Recent studies confirm this efficiency by highlighting the successful use of these algorithms in fruit production estimates and classification, with minimal errors [33].

3. Materials and Methods

The study was developed in three stages. First, we identified the most impactful conditions affecting postharvest fruit, ensuring we also had high-quality fruit. Second, we applied YOLO (You Only Look Once) to detect the condition, and third, the augmentation technique was used to improve the model quality. The images were all collected from the Creative Commons online source. Figure 1 shows the study stages.

3.1. Image Acquisition and the Classes of Strawberry Images Studied

Table 1 presents the eight distinct strawberry conditions used to train and evaluate the YOLO model for quality assessment at the final stage of the supply chain. Each class includes a representative image and a detailed description of the condition. These categories encompass healthy and defective strawberries, ranging from good-quality fruit, characterized by firmness, vibrant color, and postharvest nutritional integrity, to various defects and diseases. The conditions include unripe strawberries, uneven ripening, missing calyx, fasciation (abnormal growth), and fungal infections such as powdery mildew, gray mold, and anthracnose. The descriptions are supported by physiological and biochemical traits, including pigment development, texture, and disease-specific symptoms, which provide the context for robust image-based classification. This classification framework forms the basis for automated detection using deep learning.

3.2. Model Development and Training

In the second stage, the dataset was first divided into three distinct, non-overlapping subsets to ensure an unbiased evaluation:

Training Set: 1756 images (80% of the total).
Validation Set: 220 images (10% of the total).
Test Set: 220 images (10% of the total).

This resulted in a total set of 2196 strawberry images.

This split was performed prior to any augmentation to avoid data leakage between training and evaluation subsets. Data augmentation techniques were then applied exclusively to the training set to increase its variability and improve the model’s generalization ability. At the same time, the validation and test sets remained unchanged for an accurate performance assessment. Ultimately, this process yielded approximately 900 annotated instances per strawberry condition.

YOLO Use

Initially, the YOLOv8m (Medium) model was benchmarked due to its improved detection accuracy compared to earlier YOLO versions. However, subsequent optimization experiments revealed that the lightweight YOLOv8n (Nano) variant provided comparable accuracy while achieving significantly faster inference time, which is crucial for real-time agricultural applications with limited computational resources. Therefore, YOLOv8n was ultimately selected as the final model for deployment in this study, ensuring an optimal balance between detection performance and processing speed. Terven and Cordova-Esparza [38] reported that YOLOv8m outperformed earlier iterations, such as YOLOv5, as well as alternative frameworks like Faster R-CNN, particularly in metrics like mean Average Precision (mAP). This study further evaluated model performance across datasets with and without augmentation, thereby elucidating the influence of data augmentation on detection accuracy.

3.3. Data Augmentation

Data augmentation was applied as part of the third stage, which is a process that increases the size of a training dataset through techniques such as rotation, flipping, cropping, and contrast adjustment to enhance the model’s generalization and performance [39]. Data augmentation techniques were applied to the training set to improve the generalization capacity of the YOLOv8m model and optimize its accuracy in fruit identification [40]. This strategy aimed to simulate natural variations in real-world image capture conditions, making the model more robust to changes in lighting, angles, and image sharpness. The augmentation methods, such as rotation and blurring, were selected based on prior studies indicating their positive impact on YOLOv8m accuracy. In contrast, cropping was excluded due to its tendency to eliminate critical features in small objects [41]. Based on these findings, specific data augmentation techniques were applied to the training dataset to enhance model generalization and robustness. First, random image rotations between −30° and +30° were applied to simulate natural variability in camera angles and fruit orientation during harvesting and inspection, which improves the model’s robustness to positional changes. Additionally, a Gaussian blur filter with a 3 × 3 kernel was used to replicate common imaging artifacts, such as slight motion blur or focus inconsistencies, which often occur in real-world postharvest environments. These augmentations were selected based on prior studies indicating that geometric transformations and controlled blurring enhance the generalization of the YOLO model in agricultural datasets.

Following data augmentation, the dataset increased to a total of 5663 images, with the final data split structured as follows:

Training Set: 5223 images (92% of the total).
Validation Set: 220 images (4% of the total).
Test Set 220 images (4% of the total).

Model training was conducted using the Google Colab Pro platform, which provides access to hardware-accelerated resources suitable for deep learning workflows [42]. The computational environment also employed the Ultralytics YOLOv8 framework, version 8.3.144, implemented in Python 3.11.12 and PyTorch 2.6.0 with CUDA 12.4 support. The training was performed on an NVIDIA A100-SXM4 GPU with 40 GB of memory, which offers 40.5 GB (40,507 MiB) of VRAM. The implemented YOLOv8 model consisted of 72 layers and approximately 3.01 million parameters, with no gradient storage and an estimated computational load of 8.1 GFLOPs following layer fusion [43], in conjunction with the Ultralytics YOLOv8 model [44], OpenCV [45], and Roboflow [46]. Google Colab Pro was selected due to its high-performance GPU resources, which enable accelerated model training and improved computational efficiency during the development of the YOLOv8 detection framework. The model was implemented using Python, using several specialized libraries to facilitate workflow stages. OpenCV was used for image pre-processing, while TensorFlow served as the primary framework for model construction and training. PyTorch was applied as an alternative deep learning framework [47].

Following the training phase, the performance of the YOLOv8 convolutional neural network (CNN) was assessed using key evaluation metrics. These included Box Precision (BoxP) Equation (1), which quantifies the accuracy of predicted bounding boxes relative to ground truth annotations; Recall (R) (Equation (2)), which measures the proportion of correctly identified positive instances; Mean Average Precision at 50% Intersection-over-Union (mAP50) (Equation (3)), evaluating detection accuracy at a 50% overlap threshold; and Mean Average Precision at 95% IoU (mAP95) (Equation (4)), which applies a more stringent 95% overlap threshold to assess model precision.

Precision = TP/(TP + FP)

(1)

Here, TP = true positives (correctly predicted bounding boxes) and FP = false positives (incorrectly predicted bounding boxes).

Recall = TP/(TP + FN)

(2)

Here, TP = true positives (correctly predicted bounding boxes) and FN = false negatives (missed ground truth objects).

mAP50 = (1/N) × Σ(AP_i)

(3)

for IoU threshold = 0.50, where AP_i = Average Precision for class i and N = Number of object classes. IoU ≥ 0.50 is required for a prediction to be considered a true positive

mAP95 = (1/N) × Σ((1/10) × Σ(AP_i,j)

(4)

for j = 0.50 to 0.95 in steps of 0.05, AP_i = Average Precision for class i at IoU threshold j (incremented by 0.05 from 0.50 to 0.95). This reflects a more stringent and comprehensive evaluation across multiple IoU thresholds.

4. Results

This section presents the performance of the YOLOv8n model in two distinct training runs, comparing the results for classifying strawberries under varying quality conditions. Initially, a set of 2196 strawberry images was gathered from public sources (Google and Kaggle) and our selected images. The images were pre-processed through automatic orientation and resizing to 640 × 640 pixels and then annotated in Roboflow [46]. To increase the variability and robustness of the dataset, data augmentation techniques were applied, including horizontal mirroring, 90° rotations (both clockwise and counterclockwise), and the application of blur with an intensity of up to twice the original value. This process expanded the dataset to a total of 5663 images, with 80% allocated for training, 10% for validation, and 10% for testing.

Two training runs were conducted using the YOLOv8n model: one with 100 epochs and the other with 200 epochs. The objective was to evaluate the impact of the number of iterations on the classification and detection performance of the different strawberry classes. Regarding the evaluation of the model, Table 2 summarizes the general performance metrics obtained by the trained models. The results in Table 2 indicate that Training 2 (200 epochs) consistently outperformed Training 1 (100 epochs) in most performance metrics. Notably, Training 2 demonstrated an improvement in mAP@0.5, increasing from 0.772 to 0.79, which indicates higher accuracy in object detection with an IoU of 0.5. There were also improvements in Precision (from 0.683 to 0.744) and recall (from 0.769 to 0.777), as well as an increase in mAP0.5 to 0.95.

Also noted is the substantial reduction in inference time for Training 2, which was 1.0 ms per image compared to 2.0 ms per image for Training 1. Such a result means that the model with 200 epochs processes images twice as fast, which can be considered suitable for real-time classification systems. The analysis of the F1-Confidence and Loss Curves, as shown in Figure 2 for both training sessions, provides insight into the relationship between prediction confidence and the balance between Precision and Recall for each class.

Figure 2 illustrates the F1-score as a function of confidence threshold for the YOLOv8n models trained with 100 epochs (Figure 2a) and 200 epochs (Figure 2b), respectively. These curves provide a detailed assessment of the trade-off between Precision and Recall across varying confidence levels for each strawberry quality class. A higher F1-score at a specific confidence value indicates a more optimal balance between false positives and false negatives. In the model trained for 200 epochs, higher peak F1-scores are achieved, particularly for classes with well-defined visual characteristics such as anthracnose, gray mold, and powdery mildew, which reach F1 values above 0.90. Conversely, lower F1-scores are observed for Good Quality and Unripe Strawberry, suggesting difficulties in distinguishing these categories due to visual overlap or intra-class variability. Notably, the model with 200 epochs shows not only improved F1-score magnitudes but also better separation and consistency across classes, highlighting the benefits of extended training in refining decision boundaries and enhancing classification confidence.

The model with 100 epochs achieved a maximum F1-score of 0.71 with a confidence of 0.429, while the model with 200 epochs achieved a higher value, reaching 0.75 with a confidence of 0.452. This improvement indicates that the second training enabled the model to define decision limits with greater confidence for different classes, particularly the most visually distinct ones, such as Gray_Mold, Anthracnose, and Powdery_Mildew, which have F1 curves of 0.90 or higher. On the other hand, it can be seen that the Good_Quality and Unripe_Strawberry classes presented the lowest F1 scores in both curves. Such an outcome reflects a greater difficulty in visually differentiating or a lower representation of these classes in the dataset, indicating that the performance curve for these classes shows low performance even at moderate confidence levels.

Figure 3 shows the evolution curves of losses and metrics (such as Precision, Recall, and mAP) during training with 100 and 200 epochs. The analysis of these curves (blue lines for the obtained values and orange dashed lines for the smoothed values) suggests progressive learning and no apparent signs of overfitting, especially for the 200-epoch model, which maintained a trend of continuous improvement, reflecting greater stability and generalization capacity.

The training dynamics of the YOLOv8n models across 100 (panel a) and 200 (panel b) epochs (Figure 3) illustrate the evolution of key performance metrics, Precision, Recall, mAP@0.5, and mAP@0.5:0.95, as well as the loss functions (box loss, classification loss, and distance-from-object loss, or DFL). The blue solid lines represent the raw metric values across training iterations, while the orange dashed lines indicate smoothed trends to emphasize the learning progression. In both training sessions, an apparent reduction in all loss components is observed during the early epochs, indicative of effective weight optimization and convergence. Particularly in the 200-epoch model, the loss curves exhibit sustained stability and lower variance in later epochs, suggesting improved generalization capacity without marked overfitting. Concurrently, the upward trends in mAP and recall, especially visible in the extended training, indicate incremental improvements in object detection accuracy. These patterns confirm that the longer training duration contributed to enhanced feature extraction and classification consistency, thereby validating the model’s robustness and suitability for deployment in real-time postharvest quality assessment systems.

The training loss curves (box, cls, and dfl) show a decreasing and stable trend over the epochs in both models, indicating that the weight adjustment process was effective. However, it is possible to observe that the validation loss (val/box_loss, val/cls_loss, val/dfl_loss) shows a slight increase after half of the training, especially in the 200-epoch model. This outcome may be a subtle indication of the beginning of overfitting despite the evaluation metrics continuing to improve. Precision and recall increase significantly until approximately the 50th epoch in both models, stabilizing at levels above 0.74 in the second training.

The mAP0.5 in the 200-epoch model stabilizes at around 0.79, stops learning in the 159th epoch, and reaches a mAP0.5 of 0.95, which is approximately 0.16 higher than the previously presented values. The variations observed in the curves are expected in training with data from imbalanced classes, as in this study, and were smoothed out with the increase in the number of epochs. The graphs reinforce that the model with 200 epochs achieved better learning and stability in the classification parameters, albeit at the cost of a slight increase in validation losses, which is acceptable within a reasonable limit, given the dataset’s complexity and size.

4.1. Analysis by Class

Table 3 details the per-class metrics for the model trained with 100 and 200 epochs. In the second training, the anthracnose class at mAP50 (0.995) achieved the best results, followed by Gray_Mold with mAP@0.5 (0.967) and Powdery_Mildew (0.882).

The classes with the most significant difficulty in classification were ‘Good Quality’ and ‘Unripe Strawberry’. The former presented low Precision and Recall, resulting in a mAP50 of only 0.398. This limitation may be related to the internal variability of the images considered ‘good’, which may share visual characteristics with other classes, such as Uneven_Ripening. Despite presenting a specific coloration, the Unripe_Strawberry class had a low Recall (0.440), suggesting that the model frequently fails to recognize all instances of this class in the test set.

These patterns are also presented in the F1-Confidence curves (Figure 1), where the Good_Quality (green) and Unripe_Strawberry (gray) classes maintain F1 values significantly lower than the others throughout the entire confidence interval. This result highlights both the reduced distinction between classes and the model’s difficulty in establishing well-defined decision limits for these categories. The recall and precision values indicate that the most frequent confusions are concentrated between Good_Quality and classes related to ripeness or deformities, such as Uneven_Ripening and Fasciated_Strawberry.

Figure 4 presents the confusion matrices of the models trained with 100 and 200 epochs, respectively. They provide a detailed view of the correct classifications and the main errors made by the model in each category.

In both matrices, we observed solid performance on the main diagonal for the classes anthracnose, gray mold, powdery mildew, and uneven ripening, confirming the metrics reported earlier. Higher counts in these quadrants indicate consistent, correct predictions. However, some confusion is evident, such as in the following cases:

Good quality is frequently confused with uneven ripening, missing calyx, and, to a lesser extent, with the background itself. Even after 200 epochs, although there is a slight improvement (21 to 24 correct), errors persist for these classes, suggesting significant visual overlap.
Unripe strawberry shows a confusing pattern with uneven ripening and missing calyx. The number of incorrect predictions for Unripe_Strawberry decreased from 36 to 30, representing progress, but it is still a critical area.
Fasciated strawberry showed an increase in correct answers (from 10 to 12) and a slight reduction in cross-errors, demonstrating a benefit from the increase in epochs while maintaining some confusion with Good_Quality and anthracnose.

A noteworthy positive aspect is that the model with 200 epochs exhibits a stronger diagonal and less dispersion outside it, indicating an improvement in the learning process and a reduction in ambiguities. These results reinforce the need for targeted interventions in classes with greater overlap, such as increasing the number of good quality and unripe strawberry samples, applying refined labeling to reduce subjectivity between uneven ripening and good quality, and introducing complementary attributes (for instance, texture, shine, and ripeness index via color).

Figure 5 shows images generated during training that are examples of fruit segmentation, detection, and classification of strawberries based on attributes such as quality (good quality), deformities (strawberry fascia, anthracnose, gray mold), immaturity, uneven ripening, and absence of calyx.

Figure 4 presents representative output images generated by the YOLOv8n model during inference, illustrating its ability to detect and classify various strawberry conditions in real-time. The bounding boxes and class labels, superimposed on each fruit image, confirm the model’s successful identification of distinct quality categories, including good quality, anthracnose, gray mold, fasciated strawberry, uneven ripening, and unripe strawberry. These detections reflect the model’s capacity to extract and interpret salient visual features, such as color heterogeneity, lesion morphology, and calyx presence. The classification performance is more accurate in categories with distinct phenotypic markers, such as fungal infections. In contrast, greater ambiguity persists in visually similar classes, particularly between ‘Good Quality’ and ‘Uneven Ripening’. The output demonstrates the feasibility of integrating YOLOv8n into automated sorting systems for postharvest quality control, supporting earlier metrics that indicate high model precision for visually distinctive defects.

4.2. Extended Model Evaluation and Optimization

To address initial concerns regarding convergence, we further optimized the training process by benchmarking different YOLOv8 variants (Nano, Small, and Medium) under the same dataset conditions. Through these experiments, YOLOv8n emerged as the optimal choice, achieving a superior balance between detection accuracy and inference speed for real-time applications. Additionally, we adopted an early stopping strategy, limiting training to approximately 70 epochs based on the stabilization of validation loss. This approach not only reduced the risk of overfitting observed in extended 200-epoch training but also improved mAP@0.5 for previously challenging classes, such as ‘Good Quality’ (from 0.398 to 0.746) and ‘Unripe Strawberry’ (from 0.667 to 0.783). These refinements indicate that careful model selection and training optimization can yield better performance with fewer epochs while maintaining high efficiency. Data augmentation techniques, including horizontal flipping, rotation, and Gaussian blur, were refined to address inter-class confusion. Table 4 summarizes the comparative performance across the three models.

These results indicate that YOLOv8n achieves a comparable mAP@0.5 score to the larger models, while offering significantly faster inference time, a crucial factor for real-time agricultural applications. YOLOv8n outperformed its previous 200-epoch configuration in both accuracy and efficiency, achieving a higher mAP@0.5 and lower latency with less training time. Furthermore, improvements were observed in specific challenging classes. For example, the mAP@0.5 for the ‘Good Quality’ class increased from 0.398 (in the earlier version) to 0.746, and for ‘Unripe Strawberry’ from 0.667 to 0.783. These improvements are attributed to both dataset refinement and better tuning of training and augmentation parameters. This extended evaluation highlights that even without modifying the YOLOv8 architecture itself, careful selection of model variant, training duration, and augmentation strategy can substantially enhance detection performance. The findings highlight the practical potential of lightweight models, such as YOLOv8n, when optimized for deployment in constrained environments.

5. Discussion

YOLOv8n was ultimately selected as the optimal model due to its superior trade-off between detection accuracy and real-time inference speed. Furthermore, an early stopping strategy was applied, reducing training to approximately 70 epochs without compromising performance. This adjustment improved the mAP@0.5 for previously challenging classes and minimized the overfitting tendency observed in the 200-epoch training.

The findings from this study underscore the practical applicability of YOLOv8n in the postharvest classification of strawberries, demonstrating both operational speed and acceptable accuracy metrics for real-time deployment. The model’s enhanced performance in the second training iteration, particularly in terms of mAP50 (0.79), Precision (0.744), and Recall (0.777), aligns with earlier studies on object detection in agriculture using YOLO frameworks [17,40]. These results suggest that deeper training (200 epochs) enhances the model’s capacity to learn discriminative visual features, particularly for classes with clear phenotypic markers, such as anthracnose and gray mold.

The use of data augmentation was decisive for the model’s robustness, enabling it to mitigate the effects of data scarcity in certain classes [48]. Previous models with fewer variation techniques tend to exhibit overfitting or low generalization, which was not observed in this case. This outcome is consistent with the findings of Xiao et al. [40], who found that applying geometric and photometric transformations, such as rotation, blur, and brightness adjustments, significantly improved the YOLOv8 model’s performance in fruit classification tasks by enhancing the variability in training data and reducing model sensitivity to environmental noise. The application of data augmentation proved critical to improving model generalization. By simulating variations encountered in real-world image acquisition, such as lighting fluctuations and orientation inconsistencies, augmentation enhanced the robustness of feature learning [26]. This method aligns with the findings of Aboelenin et al. [49], who emphasized that balanced and diversified training datasets are essential for reducing overfitting in plant disease classification models.

Compared to conventional approaches based on convolutional neural networks or previous versions of YOLO, the results of YOLOv8n stand out both in accuracy and inference time, enabling practical applications. Among the limitations identified, the imbalance between classes stands out, with a smaller number of fasciated strawberry and good-quality samples, as previously observed [49].

Although no structural modifications were made to the core architecture of YOLOv8, several targeted optimizations were performed to enhance model performance. In addition to the YOLOv8n model initially used in the study, comparative experiments were conducted with the YOLOv8s and YOLOv8m variants. This model-level benchmarking enabled a comprehensive assessment of the trade-offs between accuracy and computational cost. Furthermore, training parameters such as the number of epochs were optimized, with 69–70 epochs yielding comparable or superior results to the original 200-epoch training. EarlyStopping techniques were also applied to avoid overfitting and improve training efficiency. The YOLOv8n model, in particular, demonstrated a precision of 0.752, a recall of 0.829, and mAP@0.5 of 0.827 in just 69 epochs, surpassing earlier configurations in both accuracy and inference speed. These results indicate that model selection, training duration, and convergence criteria were systematically adjusted to achieve optimal performance for real-time classification tasks, particularly in constrained computational environments. Therefore, while the backbone remained unchanged, the methodological refinements applied in this study reflect meaningful model optimization aligned with deployment-oriented research.

Per-class analysis revealed important insights into class-specific challenges. For example, the classes ‘Good Quality’ and ‘Unripe Strawberry’ had significantly lower performance metrics, a likely consequence of visual overlap and class imbalance. The relatively low mAP50-95 values for these categories (0.279 and 0.441, respectively) suggest a limitation in model granularity when handling subtler variations in color and shape, which are not as morphologically distinct as those in diseased classes, such as anthracnose or gray mold. Similar observations were made in previous studies by Hu et al. [11] and Zhou et al. [12], who reported model confusion in early fruit maturation stages due to low intra-class variance and high inter-class similarity.

It was observed that the classes uneven ripening, powdery mildew, gray mold, and missing calyx presented reasonable accuracy rates and correct predictions. This behavior indicates that the model was able to capture distinct visual patterns in these categories, even in different contexts [9]. On the other hand, some classes presented recurring confusion among themselves. The unripe strawberry class, for example, was frequently confused with good quality. Similarly, uneven ripening was also confused with good quality. These confusions may be related to the visual similarity between the classes [11], particularly in cases of partial ripeness or strawberries that appear healthy but exhibit irregular patterns.

It is also worth noting that the dataset was expanded in the most recent experiments, increasing the number of annotated validation images from 220 to 273 and the total number of class instances from approximately 900 to 1100. This increase improved class representation and diversity, particularly benefiting underperforming classes such as ‘Good Quality’ and ‘Unripe Strawberry’. The expanded dataset contributed to better generalization and learning, as evidenced by improved mAP scores for these categories. These enhancements reflect the importance of dataset refinement as a complementary strategy to model selection in object detection tasks.

The evolution of the training and validation loss curves further supports the notion that 200 epochs yielded a more stable and better-generalizing model, albeit with slight signs of overfitting toward the later epochs. This trade-off is expected in small-to-moderate datasets [9], particularly when dealing with eight distinct visual classes characterized by imbalanced sample distributions.

Confusion matrices substantiate the quantitative metrics, showing frequent misclassifications between visually similar classes. The recurrent misidentification of good quality as uneven ripening or missing calyx suggests that the inclusion of auxiliary imaging modalities, such as hyperspectral or RGB-D sensors, could help resolve ambiguities [12]. Moreover, improved labeling precision and the integration of multisensorial descriptors (e.g., gloss, firmness proxies) may yield better intra-class consistency, as advocated by Ahmed et al. [20].

In addition to these improvements, the analysis of dataset structure and augmentation effects could be further enhanced using visual analytics tools. Although the augmentation strategies applied in this study improved performance, future work will benefit from embedding-space visualization methods such as t-SNE or PCA. These tools can help analyze feature distribution before and after augmentation, providing clearer insights into how augmented data affects class separability, particularly for categories with subtle visual distinctions. Moreover, heatmaps and class-wise visual dashboards may provide a better understanding of sample density, distribution imbalance, and potential dataset biases that impact model learning.

To address the observed misclassification between the ‘Good Quality’ and ‘Unripe Strawberry’ classes, additional training experiments were conducted using different YOLOv8 model variants (Nano, Small, and Medium) under consistent dataset and augmentation conditions. These new tests aimed to reduce confusion by evaluating whether model scaling could better capture subtle visual distinctions between these visually similar categories. The YOLOv8n model trained for 69 epochs achieved improved performance in both classes compared to the original YOLOv8n-200 model. Specifically, the ‘Good Quality’ class improved from a mAP@0.5 of 0.398 to 0.746, and the ‘Unripe Strawberry’ class increased from 0.667 to 0.783. These gains indicate enhanced class separability and reinforce that careful model selection, even without architectural changes, can contribute meaningfully to classification accuracy. While challenges persist, particularly under conditions of overlapping visual traits and dataset imbalance, these results validate the effectiveness of model version tuning in improving performance for the most challenging classes.

However, despite these improvements, certain limitations remain. The model still exhibits misclassification in instances where strawberries share highly similar visual traits, such as between early-stage ripening and fully ripe fruits. This finding suggests that even optimized deep learning models may require supplementary support from techniques such as color space transformations (for example, HSV), texture-based descriptors, or multimodal inputs (for instance, hyperspectral data) to resolve ambiguities in fine-grained classification. Investigating these approaches will be a primary focus of future research.

A challenge encountered in this study was the persistent misclassification between the ‘Good Quality’ and ‘Unripe Strawberry’ categories. This confusion likely stems from the visual similarity in surface color and shape that these classes may share, particularly in cases of borderline ripeness or partial pigmentation. The F1-confidence curves (Figure 1) and confusion matrices (Figure 3) highlight this issue, with both classes exhibiting the lowest F1-scores and frequent cross-predictions. Such misclassification can be attributed to two primary factors: (i) overlapping visual features that reduce class separability and (ii) an imbalanced dataset where the good quality class was underrepresented, limiting the model’s ability to generalize subtle differences in texture and hue.

To mitigate this, future work should prioritize curating a more balanced dataset, primarily by increasing the representation of good-quality and unripe samples under varying lighting, maturation, and background conditions. The incorporation of additional image features, such as glossiness, surface texture, or color histograms extracted through HSV or LAB color space transformations, could enhance the model’s discriminative power [20,26]. Furthermore, implementing multispectral imaging or RGB-D sensors would allow the model to learn from biochemical or structural properties, such as firmness or chlorophyll content, which are not captured by RGB data alone [12]. On the architectural level, fine-tuning the classification head of the YOLOv8n model or integrating a two-stage hybrid system that first segments the fruit region and then applies a secondary classifier may further reduce ambiguity between closely related classes [11].

While the application of data augmentation techniques—such as rotation, Gaussian blur, and mirroring—clearly contributed to improving the model’s performance metrics, it is important to interpret this improvement in the context of the high-dimensional feature space in which the YOLOv8n model operates. In deep convolutional networks, such as YOLO, image transformations alter the distribution of learned features by introducing variations in spatial orientation, texture, and illumination, thereby increasing the diversity of activation patterns in the feature maps. From a representational learning perspective, these augmentations facilitate the generation of local perturbations around each class manifold, which can enhance the robustness of the decision boundary and reduce class overlap, particularly in low-sample regimes [40,50].

Although no explicit dimensionality reduction (e.g., t-SNE or UMAP) was performed in this study, the consistent performance gains observed in Precision, Recall, and class-wise mAP (particularly in visually distinct categories) suggest a positive shift in feature separability in the latent space. Future work should include visual embeddings of pre- and post-augmentation samples to formally assess cluster dispersion and inter-class margin shifts, as recommended by Lespinats et al. [50] and Colange et al. [51]. Moreover, measuring intrinsic dimensionality or analyzing learned embeddings via cosine similarity and intra-class variance could further reveal how augmentation affects the geometric structure of the dataset and its representation in YOLOv8’s intermediate layers.

The YOLOv8n model, when sufficiently trained and supported by augmentation strategies, performs robustly in strawberry defect classification. Nonetheless, the study highlights the need for improved class definition, enhanced image diversity, and potentially multimodal inputs to resolve persistent classification errors in visually similar fruit conditions. Additionally, the subjectivity involved in visual quality labeling can lead to inconsistencies, so attention should be given to the process. For future studies, it is essential to balance the dataset through the class-directed collection, include more images under natural lighting conditions, evaluate the system with multispectral or depth sensors (RGB-D) [12], and integrate it with devices for real-time testing.

6. Conclusions

This study presented the utility of the YOLOv8n model for automated postharvest strawberry quality classification, showing high accuracy in distinguishing between defects, diseases, and ripeness stages. The implementation of data augmentation strategies substantially bolstered model robustness by alleviating the adverse effects of class imbalance and inconsistencies in image quality. Quantitative performance assessment showed that increased training epochs were correlated with enhanced mean Average Precision (mAP), Precision, and Recall, supporting the model’s suitability for real-time deployment in agricultural settings. Despite the strong performance in visually distinguishable categories, such as anthracnose and gray mold, limitations persisted in differentiating visually proximate classes, such as ‘Good Quality’ versus ‘Unripe’ strawberries. These results highlight a requirement for future research to focus on curating more equitably distributed datasets and integrating complementary imaging technologies. Overall, the presented approach constitutes a possible solution for optimizing postharvest handling and quality assurance throughout the strawberry supply chain.

We suggest exploring the integration of YOLOv8 with other emerging technologies to further enhance model performance and generalization capabilities in the future. This aspect involves incorporating Transformer-based architectures, such as DETR or Swin Transformer, which have demonstrated strong performance in object detection tasks involving complex spatial relationships. Additionally, the use of multimodal data fusion, combining RGB imagery with hyperspectral or depth information, may help address class ambiguity issues and improve detection robustness under variable lighting and occlusion conditions. These directions hold potential for advancing the technological contributions of this research beyond its current, single-modality, real-time scope.

As part of future work, we also intend to benchmark YOLOv8 against other state-of-the-art object detection architectures, such as Faster R-CNN, EfficientDet, and Transformer-based models, to further validate its relative performance in strawberry quality assessment tasks, particularly in scenarios where real-time inference is not a strict requirement, and to integrate color-aware and multimodal techniques to address remaining challenges in visually similar fruit categories.

Author Contributions

Conceptualization, M.T.O. and L.d.S.C.; methodology, M.T.O. and L.d.S.C.; software, M.T.O. and L.d.S.C.; validation, M.T.O. and L.d.S.C.; formal analysis, M.T.O. and L.d.S.C.; investigation, M.T.O. and L.d.S.C.; resources, M.T.O. and L.d.S.C.; data curation, M.T.O.; writing—original draft preparation, L.d.S.C.; writing—review and editing, I.d.A.N. and M.T.O.; visualization, I.d.A.N.; supervision, M.T.O.; project administration, M.T.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data will be available upon request.

Acknowledgments

The authors thank the Coordination for the Advancement of Higher Education—CAPES for the scholarship and the cooperation between Universidade Paulista and CEAGESP, which made the images available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

FAO. Global Agricultural Statistics: Strawberry; Food and Agriculture Organization of the United Nations: Rome, Italy, 2025; Available online: https://www.fao.org (accessed on 20 April 2025).
Zeist, A.R.; Resende, J.T.V. Strawberry breeding in Brazil: Current momentum and perspectives. Hortic. Bras. 2019, 37, 7–16. [Google Scholar] [CrossRef]
CEDES. Perdas e Desperdícios de Alimentos no Brasil. Centro de Estudos e Debates Estratégicos da Câmara dos Deputados. 2018. Available online: https://www2.camara.leg.br/a-camara/estruturaadm/altosestudos/pdf/perdas-e-desperdicio-de-alimentos-no-brasil-estrategias-para-reducao (accessed on 20 March 2025).
Embrapa. Perdas Pós-Colheita de Frutas e Hortaliças no Brasil. Empresa Brasileira de Pesquisa Agropecuária. 2023. Available online: https://www.embrapa.br (accessed on 5 March 2025).
NCBI-National Center for Biotechnology Information. Fragaria × ananassa (Taxonomy ID: 3747). In NCBI Taxonomy Database. Available online: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=info&id=3747 (accessed on 18 May 2025).
Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
Albertini, M.R.; Ferrari, L.F.; Bonow, S.; Antunes, L.E.C. Produção de Mudas de Morangueiro Cultivar BRS DC25 (Fênix) em Leito de Areia. 2023. Available online: https://www.infoteca.cnptia.embrapa.br/infoteca/bitstream/doc/1156880/1/Cpact-Circular-240.pdf (accessed on 20 March 2025).
Xu, L.; Wang, J.; Yang, G. An improved deep learning model for the classification of strawberry diseases. Sensors 2021, 21, 4845. [Google Scholar] [CrossRef]
Yang, G.-F.; Yang, Y.; He, Z.-K.; Zhang, X.-Y.; He, Y. A rapid, low-cost deep learning system to classify strawberry disease based on cloud service. J Integr. Agric. 2022, 21, 460–473. [Google Scholar] [CrossRef]
Souza, B.; Zhang, H.; Oliveira, J. Automatic strawberry quality evaluation using deep learning and computer vision. Fermentation 2022, 9, 249. [Google Scholar] [CrossRef]
Hu, X.; Wang, R.; Du, J.; Hu, Y.; Jiao, L.; Xu, T. Class-attention-based lesion proposal convolutional neural network for strawberry diseases identification. Front. Plant Sci. 2023, 14, 1091600. [Google Scholar] [CrossRef]
Zhou, X.; Lee, W.S.; Ampatzidis, Y.; Chen, Y.; Peres, N.; Clyde Fraisse, C. Strawberry Maturity Classification from UAV and Near-Ground Imaging Using Deep Learning. Smart Agric. Technol. 2021, 1, 100001. [Google Scholar] [CrossRef]
Koirala, A.; Walsh, K.B.; Wang, Z.; McCarthy, C. Fruit detection and quality evaluation using deep learning techniques: A review. Agriculture 2023, 13, 241. [Google Scholar] [CrossRef]
Adewusi, A.O.; Asuzu, O.F.; Olorunsogo, T.; Iwuanyanwu, C.; Adaga, E.; Daraojimba, D.O. AI in precision agriculture: A review of technologies for sustainable farming practices. World J. Adv. Res. Rev. 2024, 21, 2276–2285. [Google Scholar] [CrossRef]
Espinel, R.; Herrera-Franco, G.; Rivadeneira García, J.L.; Escandón-Panchana, P. Artificial Intelligence in Agricultural Mapping: A Review. Agriculture 2024, 14, 1071. [Google Scholar] [CrossRef]
Zhao, S.; Liu, J.; Wu, S. Multiple disease detection method for greenhouse-cultivated strawberry based on multiscale feature fusion Faster R_CNN. Comput. Electron. Agric. 2022, 199, 107176. [Google Scholar] [CrossRef]
Lin, Z.; Wang, Y.; Chen, H.; Huang, X. YOLOv5-based strawberry maturity and defect detection using computer vision. Comput. Electron. Agric. 2023, 210, 107599. [Google Scholar]
Shahbazi, M.; Rahnemoonfar, M. Real-time fruit detection using YOLOv4 and YOLOv5 models. IEEE Access 2021, 9, 115526–115535. [Google Scholar]
Krichen, M. Convolutional Neural Networks: A Survey. Computers 2023, 12, 151. [Google Scholar] [CrossRef]
Ahmed, M.M.; Asim, M.; Kaleri, A.A.; Manzoor, D.; Rajput, A.A.; Laghari, R.; Khaki, S.A.; Musawwir, A.; Ullah, Z.; Ahmad, W. Biochemical Dynamics and Quality Attributes of Strawberry Fruits across Maturity Stages with Respect to Different Preservation Methods: Biochemical Dynamics and Quality Attributes. Futur. Biotechnol. 2024, 4, 28–35. [Google Scholar] [CrossRef]
Azam, M.; Ejaz, S.; Rehman, R.N.U.; Khan, M.; Qadri, R. Postharvest Quality Management of Strawberries; IntechOpen: London, UK, 2019; Chapter 4; pp. 59–79. Available online: https://www.intechopen.com/chapters/66681 (accessed on 19 March 2025).
He, Y.; Peng, Y.; Wei, C.; Zheng, Y.; Yang, C.; Zou, T. Automatic Disease Detection from Strawberry Leaf Based on Improved YOLOv8. Plants 2024, 13, 2556. [Google Scholar] [CrossRef]
Manda-Hakki, K.; Hassanpour, H. Changes in postharvest quality and physiological attributes of strawberry fruits influenced by L-Phenylalanine. Food Sci. Nutr. 2024, 12, 10262–10274. [Google Scholar] [CrossRef]
Barbieri, G.; Colonna, E.; Rouphael, Y.; De Pascale, S. Effect of the farming system and postharvest frozen storage on quality attributes of two strawberry cultivars. Fruits 2015, 70, 361–368. [Google Scholar] [CrossRef]
Wang, X.; Li, Y.; Zhang, M. Intelligent systems in postharvest agriculture: Quality assessment and waste reduction. Trends Food Sci. Technol. 2022, 122, 96–108. [Google Scholar] [CrossRef]
Wang, C.; Li, J.; Guo, Y.; Zhang, M.; Liu, Y. Strawberry detection and ripeness classification using YOLOv8+ model and image processing method. Agriculture 2024, 14, 751. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Seeliger, K.; Fritsche, M.; Güçlü, U.; Schoenmakers, S.; Schoffelen, J.; Bosch, S.; Gerven, M. Convolutional neural network-based encoding and decoding of visual object recognition in space and time. NeuroImage 2017, 180, 253–266. [Google Scholar] [CrossRef]
Jiang, B.; Zhang, Y.; Zhang, L.; De Bock, G.; Vliegenthart, R.; Xie, X. Human-recognizable CT image features of subsolid lung nodules associated with diagnosis and classification by convolutional neural networks. Eur. Radiol. 2021, 31, 7303–7315. [Google Scholar] [CrossRef]
Purwono, P.; Maarif, A.; Rahmaniar, W.; Fathurrahman, H.; Frisky, A.; Haq, Q. Understanding of Convolutional Neural Network (CNN): A Review. Int. J. Robot. Control Syst. 2023, 2, 4. [Google Scholar] [CrossRef]
Ding, H.; Hou, H.; Wang, L.; Cui, X.; Yu, W.; Wilson, D.I. Application of Convolutional Neural Networks and Recurrent Neural Networks in Food Safety. Foods 2025, 14, 247. [Google Scholar] [CrossRef]
Khan, R.; Kumar, S.; Dhingra, N.; Bhati, N. The Use of Different Image Recognition Techniques in Food Safety: A Study. J. Food Qual. 2021, 2021, 7223164. [Google Scholar] [CrossRef]
Hernández-Martínez, N.; Blanchard, C.; Wells, D.; Salazar-Gutiérrez, M. Current state and future perspectives of commercial strawberry production: A review. Sci. Hortic. 2023, 312, 111893. [Google Scholar] [CrossRef]
Huang, Z.; Omwange, K.; Saito, Y.; Kuramoto, M.; Kondo, N. Monitoring strawberry (Fragaria × ananassa) quality changes during storage using UV-excited fluorescence imaging. J. Food Eng. 2023, 353, 111553. [Google Scholar] [CrossRef]
Wu, E.; Ma, R.; Dong, D.; Zhao, X. D-YOLO: A Lightweight Model for Strawberry Health Detection. Agriculture 2025, 15, 570. [Google Scholar] [CrossRef]
Petrasch, S.; Knapp, S.; Van Kan, J.; Blanco-Ulate, B. Grey mould of strawberry, a devastating disease caused by the ubiquitous necrotrophic fungal pathogen Botrytis cinerea. Mol. Plant Pathol. 2019, 20, 877–892. [Google Scholar] [CrossRef]
Aljawasim, B.D.; Samtani, J.B.; Rahman, M. New insights in the detection and management of anthracnose diseases in strawberries. Plants 2023, 12, 3704. [Google Scholar] [CrossRef]
Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Frizzi, S.; Bouchouicha, M.; Ginoux, J.M.; Moreau, E.; Sayadi, M. Convolutional neural network for smoke and fire semantic segmentation. IET Image Process. 2021, 15, 634–647. [Google Scholar] [CrossRef]
Xiao, B.; Nguyen, M.; Yan, W. Fruit ripeness identification using YOLOv8 model. Multimed. Tools Appl. 2023, 83, 28039–28056. [Google Scholar] [CrossRef]
Yilmaz, B.; Kutbay, U. YOLOv8 based drone detection: Performance analysis and optimization. Computers 2024, 13, 234. [Google Scholar] [CrossRef]
Google. (n.d.). Google Colab Pro. Available online: https://colab.google.com (accessed on 12 November 2024).
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar] [CrossRef]
Ultralytics. Ultralytics YOLOv8m Docs. Available online: https://docs.ultralytics.com/ (accessed on 12 November 2024).
Open Source Computer Vision Library. OpenCV. Available online: https://opencv.org/ (accessed on 12 November 2024).
Roboflow. Available online: https://roboflow.com/ (accessed on 12 January 2024).
Antunes, S.N.; Okano, M.T.; Nääs, I.A.; Lopes, W.A.C.; Aguiar, F.P.L.; Vendrametto, O.; Fernandes, J.C.L.; Fernandes, M.E. Model Development for Identifying Aromatic Herbs Using Object Detection Algorithm. AgriEngineering 2024, 6, 1924–1936. [Google Scholar] [CrossRef]
Lopes, W.A.C.; Fernandes, J.C.L.; Antunes, S.N.; Fernandes, M.E.; Nääs, I.D.A.; Vendrametto, O.; Okano, M.T. Augmented Reality Applied to Identify Aromatic Herbs Using Mobile Devices. AgriEngineering 2024, 6, 2824–2844. [Google Scholar] [CrossRef]
Aboelenin, S.; Elbasheer, F.A.; Eltoukhy, M.M.; Elhady, W.M.; Hosny, K.M. A hybrid Framework for plant leaf disease detection and classification using convolutional neural networks and vision transformer. Complex Intell. Syst. 2025, 11, 2. [Google Scholar] [CrossRef]
Lespinats, S.; Colange, B.; Dutykh, D. Nonlinear Dimensionality Reduction Techniques: A Data Structure Preservation Approach; Springer International Publishing: Berlin/Heidelberg, Germany, 2022. [Google Scholar] [CrossRef]
Colange, B.; Vuillon, L.; Lespinats, S.; Dutykh, D. MING: An interpretative support method for visual exploration of multidimensional data. Inf. Vis. 2022, 21, 246–269. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the stages in the study. Source: The authors.

Figure 2. F1 curves as a function of confidence for models trained with 100 (a) and 200 (b) epochs. Source: The authors.

Figure 3. Loss curves (box, cls, dfl) and metrics (Precision, Recall, mAP0.5, mAP0.5:0.95) of the models trained with 100 (a) and 200 (b) epochs. Source: The authors.

Figure 4. Confusion matrices of models trained with 100 (a) and 200 (b) epochs. Source: The authors.

Figure 5. Images generated during training. Source: The authors.

Table 1. Classification of strawberry conditions used for YOLO model training and evaluation, including visual examples and descriptive criteria for assessing postharvest quality and defects.

Class Image	Condition	Description
	A good-quality strawberry.	Maintaining fruit firmness, color, flavor, and nutritional quality postharvest is essential for commercial viability [34].
	An unripe strawberry.	An unripe strawberry is firm, green, or white due to the presence of chlorophyll, a lack of anthocyanins, and is typically smaller. Biochemically, it is low in sugars but high in organic acids, leading to a tart, astringent taste. It features a “green” aroma profile [34].
	Uneven ripening.	The area around the calyx (cap) remains white or pale green, while the rest of the fruit reddens [34].
	Strawberry with a missing calyx.	The calyx is expected to be present and fresh-looking for strawberries sold fresh. It is considered a defect or damage if it is missing, significantly damaged, or improperly developed [20].
	Fasciated strawberry.	It is a condition of abnormal plant growth where the growing tip (apical meristem) becomes elongated or multiplied, leading to flattened, crested, or contorted plant parts [33].
	Strawberry with powdery mildew.	A strawberry with powdery mildew is affected by a common fungal disease typically caused by the fungus Podosphaera aphanis. This disease affects various parts of the strawberry plant, including the fruit [35].
	Strawberry with gray mold.	A strawberry with gray mold is characterized by a rapidly developing soft rot covered in a distinctive, fuzzy, gray fungal growth, primarily caused by the fungus Botrytis cinerea, which typically develops under cool, moist, and humid conditions [36].
	Strawberry affected by anthracnose.	A strawberry affected by anthracnose, a disease caused by Colletotrichum species (commonly C. acutatum), exhibits distinct, firm, sunken lesions on the fruit. These lesions are typically circular, light brown, or tan spots that progressively darken to brown or black [37].

Source: Adapted from [33,34,35,36,37].

Table 2. Comparison of performance metrics between YOLOv8n models (100 and 200 epochs).

General Metrics	Training 1	General Metrics
Epochs	100	200
Duration	1468 horas	2273 horas
mAP50-95	0.53	0.534
mAP50	0.772	0.79
Precision	0.683	0.744
Recall	0.769	0.777
Tempo de Inferência	2.0 ms/imagem	1.0 ms/imagem

Source: The authors.

Table 3. Performance by class (Precision/Recall/mAP50/mAP50-95).

Class	Training 1 (YOLOv8n_inspection3)	Training 2 (YOLOv8n_inspection4)
Anthracnose	P: 0.890, R: 0.949, mAP50: 0.972, mAP50-95: 0.652	P: 0.969, R: 1.000, mAP50: 0.995, mAP50-95: 0.662
Fasciated_Strawberry	P: 0.639, R: 0.667, mAP50: 0.734, mAP50-95: 0.498	P: 0.684, R: 0.721, mAP50: 0.724, mAP50-95: 0.484
Good_Quality	P: 0.396, R: 0.486, mAP50: 0.443, mAP50-95: 0.321	P: 0.501, R: 0.459, mAP50: 0.398, mAP50-95: 0.279
Gray_Mold	P: 0.799, R: 0.861, mAP50: 0.891, mAP50-95: 0.585	P: 0.970, R: 0.906, mAP50: 0.967, mAP50-95: 0.590
Missing_calyx	P: 0.614, R: 0.875, mAP50: 0.829, mAP50-95: 0.582	P: 0.623, R: 0.850, mAP50: 0.842, mAP50-95: 0.585
Powdery_Mildew	P: 0.741, R: 0.902, mAP50: 0.822, mAP50-95: 0.604	P: 0.751, R: 0.951, mAP50: 0.882, mAP50-95: 0.651
Uneven_Ripening	P: 0.708, R: 0.909, mAP50: 0.850, mAP50-95: 0.598	P: 0.734, R: 0.891, mAP50: 0.845, mAP50-95: 0.579
Unripe_Strawberry	P: 0.675, R: 0.499, mAP50: 0.635, mAP50-95: 0.398	P: 0.722, R: 0.440, mAP50: 0.667, mAP50-95: 0.441

Source: The authors.

Table 4. YOLOv8 model comparison with optimization parameters.

Model	Epochs	Precision	Recall	mAP@0.5	mAP@0.5:0.95	Inference Time (ms/img)
YOLOv8n	69	0.752	0.829	0.827	0.595	9.3
YOLOv8s	70	0.793	0.761	0.827	0.593	19.3
YOLOv8m	69	0.736	0.810	0.828	0.593	44.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cordeiro, L.d.S.; Nääs, I.d.A.; Okano, M.T. Smart Postharvest Management of Strawberries: YOLOv8-Driven Detection of Defects, Diseases, and Maturity. AgriEngineering 2025, 7, 246. https://doi.org/10.3390/agriengineering7080246

AMA Style

Cordeiro LdS, Nääs IdA, Okano MT. Smart Postharvest Management of Strawberries: YOLOv8-Driven Detection of Defects, Diseases, and Maturity. AgriEngineering. 2025; 7(8):246. https://doi.org/10.3390/agriengineering7080246

Chicago/Turabian Style

Cordeiro, Luana dos Santos, Irenilza de Alencar Nääs, and Marcelo Tsuguio Okano. 2025. "Smart Postharvest Management of Strawberries: YOLOv8-Driven Detection of Defects, Diseases, and Maturity" AgriEngineering 7, no. 8: 246. https://doi.org/10.3390/agriengineering7080246

APA Style

Cordeiro, L. d. S., Nääs, I. d. A., & Okano, M. T. (2025). Smart Postharvest Management of Strawberries: YOLOv8-Driven Detection of Defects, Diseases, and Maturity. AgriEngineering, 7(8), 246. https://doi.org/10.3390/agriengineering7080246

Article Menu

Smart Postharvest Management of Strawberries: YOLOv8-Driven Detection of Defects, Diseases, and Maturity

Abstract

1. Introduction

2. Background

3. Materials and Methods

3.1. Image Acquisition and the Classes of Strawberry Images Studied

3.2. Model Development and Training

YOLO Use

3.3. Data Augmentation

4. Results

4.1. Analysis by Class

4.2. Extended Model Evaluation and Optimization

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI