Next Article in Journal
Farmers’ Willingness to Adopt Maize-Soybean Rotation Based on the Extended Theory of Planned Behavior: Evidence from Northeast China
Previous Article in Journal
Influence of Larval Diet and Adult Age on the Chemical Composition of Female Pheromone Glands of Copitarsia decolora (Lepidoptera: Noctuidae): Implications for Semiochemical-Based Crop Protection
Previous Article in Special Issue
Microclimate Condition Influence on the Physicochemical Properties and Antioxidant Activity of Pomegranate (Punica granatum L.): A Case Study of the East Adriatic Coast
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Detecting the Maturity of Red Strawberries Using Improved YOLOv8s Model

1
School of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China
2
National Digital Agricultural Equipment (Artificial Intelligence and Agricultural Robotics) Innovation Sub-Centre, Jiangsu University, Zhenjiang 212013, China
3
Key Laboratory of Modern Agricultural Equipment and Technology, Ministry of Education, Jiangsu University, Zhenjiang 212013, China
4
Key Laboratory for Theory and Technology of Intelligent Agricultural Machinery and Equipment, Jiangsu University, Zhenjiang 212013, China
*
Author to whom correspondence should be addressed.
Agriculture 2025, 15(21), 2263; https://doi.org/10.3390/agriculture15212263
Submission received: 13 October 2025 / Revised: 28 October 2025 / Accepted: 29 October 2025 / Published: 30 October 2025
(This article belongs to the Special Issue Advanced Cultivation Technologies for Horticultural Crops Production)

Abstract

Strawberry picking relies primarily on manual labor, making it the most labor-intensive stage in strawberry cultivation. Harvesting robots have become essential for strawberry production, and fruit ripeness detection models are critical for picking operations. This study collected strawberry ripeness photographs under various natural environments and enhanced feature expression through diverse image enhancement techniques. Considering practical deployment on harvesting robots, the low-parameter, high-accuracy YOLOv8s was selected as the base model. Leveraging the ease of integration of the Global Attention Mechanism (GAM) within the YOLO model, we incorporated GAM before the SPFF module to enhance the extraction capabilities of both global and local features. Experimental results demonstrate that the improved YOLOv8s achieves excellent performance, with a mAP of 91.5% for three maturity classes and a frame rate of 53 FPS. Compared with other mainstream models, the improved YOLOv8s presented in this paper demonstrates superior detection performance, achieving mAP improvements of 12.1%, 8.0%, 6.1%, 4.6%, and 3.1% over YOLOv3, YOLOv5s, YOLOv7s, YOLOv8s, and CBAM-YOLOv8s, respectively. It also exhibits robust detection capabilities under varying lighting conditions and occlusions, meeting the demands for high precision and rapid performance during harvesting operations.

1. Introduction

Originating from Europe and East Asia, strawberries are a delicious and tasty fruit. They are widely loved by consumers for their enticing aroma, tender and smooth texture, and vibrant red color [1,2]. In addition, strawberries can adapt to diverse environments, making them widely cultivated worldwide and one of the most popular fruits globally [3]. As cultivation areas continue to expand, economic benefits have significantly increased. Strawberries hold substantial economic importance across the globe and possess immense potential as an export commodity [4].
According to the latest data from the United Nations Food and Agriculture Organization (FAO), China is the world’s largest producer of strawberries [5,6]. Its cultivation area and output have consistently been in the lead worldwide, accounting for approximately one-third of the world’s total production. After strawberries begin to set fruit, they typically require 20 to 30 days to ripen. The degree of ripeness is a key factor affecting strawberry quality and flavor; harvesting them too early or too late will significantly reduce their quality [7]. Currently, strawberry harvesting relies primarily on manual picking, making it the most labor-intensive task in strawberry cultivation. During peak seasons, rising labor costs indirectly increase the overall production expenses of strawberry farming [5]. Additionally, manual picking can easily cause surface damage and lead to misjudgments about fruit ripeness, failing to ensure strawberries are harvested at their optimal picking time.
With the rapid advancement of modern computer technology, the integration of traditional agriculture with modern computing has emerged as a significant research focus [8]. Introducing robots into strawberry harvesting not only liberates labor and reduces the workload for farmers but also enhances picking efficiency during peak ripening periods [9]. Therefore, conducting research on automated strawberry ripeness detection in natural environments holds substantial scientific significance and practical value. It not only provides crucial technical support for strawberry production processes but also lays the foundation for developing intelligent harvesting equipment [10].
In recent years, computer vision technology has been widely applied in the agricultural sector, enabling the detection of ripeness in fruits such as apples, bananas, grapes, and tomatoes [11,12,13]. Numerous scholars have employed traditional image segmentation techniques to extract features such as the size, color, and contours of target objects. Cho et al. [14] selected spectral data correlated with tomato ripeness and established a ripeness classification model using support vector classification (SVC) and snapshot hyperspectral imaging. Arefi et al. [15] employed threshold analysis for background removal in the RGB color space, followed by mature tomato extraction using HSI, YIQ, and RGB color spaces, achieving an accuracy rate of 96.36%. Fadhel et al. [16] employed color thresholding and k-means clustering to identify unripe strawberries. The results demonstrate that color thresholding outperformed k-means in terms of accuracy, effectiveness, and speed in code implementation. Mo et al. [17] developed a banana fruit ripeness discrimination model based on genetic algorithms and SVM, utilizing manually measured banana rhombic features as indicators of ripeness. The final model achieved an average prediction accuracy of 86.20%. Khojastehnazhand et al. [18] employed RGB values converted to HSV for classification, focusing on the HSV color space and Naive Bayes method to classify dragon fruit ripeness, achieving an accuracy rate of 86.6%. Malik et al. [19] proposed a novel detection method based on an improved HSV color space and watershed segmentation algorithm, utilizing low-cost and readily available RGB cameras to assist robots in automatically picking ripe red tomatoes.
Traditional image processing-based maturity detection has achieved certain results but exhibits several significant drawbacks: (1) Algorithms are developed for specific crops and environments, requiring predefined parameters such as thresholds and filter kernel sizes, lacking dynamic adjustment capabilities. (2) Feature extraction parameter settings rely on subjective human experience, making it difficult to adapt to complex and variable real-world scenarios, resulting in poor universality. (3) Due to the excessive complexity of crop feature extraction, the results obtained are easily affected by environmental noise, light changes, overlap, and other surrounding disturbances. This leads to low detection efficiency and poor robustness and generalization capabilities, making it difficult to meet field detection requirements.
In recent years, deep learning technology has garnered significant attention. By leveraging neural networks to learn more abstract and advanced features from raw images, it enables automatic feature learning and extraction without human intervention [20]. This technology effectively analyzes large datasets and identifies replicable features, substantially enhancing performance in image classification, segmentation, and detection [21]. It currently represents a major research focus in the field of classification and recognition.
Zhou et al. [22] employed the YOLOv3 model to detect the maturity levels of strawberry flowers and fruits at different stages using both aerial and ground-level imaging, achieving excellent results with average accuracies of 88.0% and 89.0%. MIAO et al. [23] modified the original model backbone to MobileNetV3 and incorporated a Global Attention Mechanism into the feature fusion network to address the challenges of subtle maturity differences among cherry tomatoes and frequent mutual occlusion between fruits. The improved YOLOv7 model achieved a mAP of 98.2% on the test set while reducing the model size by 55.7 MB. Qiu et al. [24] proposed an algorithm based on an improved YOLOv4 model, utilizing MobilenetV3, separable convolutions, and the SeNet attention mechanism for precise grape ripeness detection and spatial localization within orchards, achieving an accuracy of 93.52%. Fan et al. [25] proposed an algorithm combining YOLOv5 with dark channel enhancement based on the three standard criteria of strawberry maturity: ripe, near-ripe, and unripe. Huang et al. [26] proposed a production line-based mango ripeness detection method using an enhanced YOLOv8s model. By modifying the C2F module within the YOLOv8s model’s neck structure and incorporating a channel attention mechanism, the approach significantly improves detection efficiency. Wang et al. [27] proposed an artificial intelligence algorithm framework for locating lychee picking points and detecting obstacles, achieving 88.1% accuracy in segmenting lychee fruits from branches and an 88% success rate in identifying picking points.
In summary, although deep learning-based fruit ripeness detection technology has made certain progress, related methods still require improvement in terms of robustness and practical application [28]. Strawberry fruits exhibit the following characteristics: (1) Strawberry canopies are dense, and the fruit bodies are relatively small, often overlapping and shading each other. (2) Due to varying light conditions during growth, strawberry fruits do not ripen simultaneously. Even on the same plant, the degree of ripeness among individual fruits is not uniform.
This study collected photographs of strawberry ripeness across diverse natural environments and enhanced the expressive power of the dataset’s feature information through various image enhancement techniques. Subsequently, this study selected the YOLOv8s model—known for its minimal parameters and high accuracy—as the base model. By incorporating a Global Attention Mechanism (GAM) into the backbone network, the model effectively enhances its extraction of key information regarding strawberry ripeness. Finally, through multiple comparative validations, this paper demonstrates the effectiveness of the improved model, ensuring its suitability for subsequent application in harvesting robots.

2. Materials and Methods

2.1. Building Strawberry Maturity Dataset

Deep learning methods have been successfully used to grade and identify strawberry ripeness. Beyond demanding high-quality detection models, this approach relies on ample datasets for model training. Commonly used public datasets such as ImageNet, Open Images Dataset, COCO, and PASCALVOC feature diverse object categories but lack images depicting strawberries at different stages of ripeness. Therefore, this study necessitates the construction of a strawberry ripeness dataset for model training.

2.1.1. Image Collection

All datasets used for training were manually collected at the No. 1 Farm Strawberry Garden in Changzhou City, Jiangsu Province. The strawberry variety selected was Hongyan, with samples collected between November 2024 and March 2025. To ensure consistency between training data features and those encountered in the subsequent target recognition tasks of the harvesting robot, data were acquired using an Intel Realsense D415 (Intel, Santa Clara, CA, USA). With a Python (V3.6) program, images were automatically captured every 500 milliseconds at a resolution of 640 × 640 pixels. During collection, the camera maintained a distance of 5~15 cm from the target. The lighting conditions and shooting angles were varied as much as possible throughout the process. Finally, the collected images underwent manual screening to remove blurry images or those lacking strawberry fruit objects. The collected strawberry image samples are shown in Figure 1.
This dataset comprises 1500 images capturing diverse lighting conditions, including front lighting, backlighting, overcast skies, and sunny weather. The strawberry data further encompasses various growth states such as single fruits, clusters of fruits, overlapping fruits, and foliage obstruction. This comprehensive approach ensures the trained model can effectively handle complex scenarios encountered in real-world strawberry cultivation environments.

2.1.2. Image Enhancement

To ensure the efficiency of the strawberry detection model, a rich dataset is required to guarantee the model’s strong generalization ability and stability. Sufficient data enhance the model’s adaptability to diverse scenarios. Furthermore, due to the complex growing environment of strawberries, factors such as lighting and shadows during photography are difficult to control. By simulating variations in natural lighting conditions, angular deviations, and imaging blur, we expanded the diversity of the strawberry ripeness dataset to enhance the model’s generalization capability in real-world scenarios. To achieve this, three data augmentation techniques were employed to enrich the features of strawberry samples within the dataset:
(1) To ensure the model can effectively recognize targets under varying lighting conditions, this study adjusted the exposure levels of captured raw images to accommodate detection requirements in both intense sunlight and dim environments.
(2) To enable the model to adapt to blurry photos caused by the camera’s inability to focus, this study employed a Gaussian blur method to simulate the out-of-focus effect.
(3) To simulate the randomness of imaging angles in natural environments, this study enhanced the dataset using image flipping techniques.
After processing the strawberry images as described above. We applied the three image enhancement techniques described above to each original image; the dataset size was expanded from 1500 images to 6000 images (Figure 2).
During our research, we surveyed over a dozen strawberry farms in Jiangsu Province and found that farmers determine strawberry ripeness based on the extent of red coloration covering the fruit during harvest. Through extensive sample analysis, strawberry ripeness was categorized into three stages: unripe, half-ripe, and ripe. Unripe strawberries exhibit no red coloration. Half-ripe strawberries display red coverage between 30% and 70% of the fruit’s surface area. Strawberries with over 70% red coverage are classified as ripe. The LabelImg annotation tool was used to manually label each sample according to the three stages. Then, text labels were generated in the YOLO format (Table 1).

2.2. YOLOv8s Model

As a newer iteration within the mature YOLO framework, YOLOv8 focuses on achieving an optimal balance between accuracy and speed [29]. Building upon the success of YOLOv5, YOLOv8 introduces new features and enhancements, resulting in significantly improved accuracy. Based on CNNs, YOLOv8 employs an end-to-end object detection approach, directly feeding images into the model to output bounding boxes for target locations and category predictions [30]. YOLOv8 stands as one of the most advanced models in the YOLO series, offering both superior performance and cutting-edge architecture. Its optimized training mechanism, mixed-precision support, and multi-format export capabilities enable flexible and efficient deployment, making it the current benchmark within the YOLO family for balancing performance and ease of use.
For different detection tasks, five YOLOv8 models of varying sizes have been designed. YOLOv8s strikes a good balance between detection speed and accuracy. Compared to YOLOv8n, it achieves higher detection accuracy, while outperforming YOLOv8m and YOLOv8l in detection speed [31]. This makes it highly suitable for robotic harvesting tasks. This study selected YOLOv8s as the base model and optimized its architecture based on the characteristics of strawberry fruit images.
As shown in Figure 3, the YOLOv8s network architecture consists of four components: the input module, backbone network, neck network, and detection head. The backbone network is responsible for extracting features from the input image, primarily composed of CBS (Convolution-BN-SiLU), C2f (Cross-Stage Local Fusion), and SPPF (Spatial Pyramid Pooling-Fast) modules. The C2f module enhances the model’s feature representation capabilities by optimizing feature propagation paths. This significantly improves overall object detection accuracy and inference speed while maintaining efficient computational resource utilization.
The neck section employs a Rep-PAN (Reparametrized Path Aggregation Network) architecture. During training, it utilizes a multi-branch topology to extract features, effectively integrating characteristics from different levels. At inference time, structural reparameterization techniques merge these into a lightweight, computationally efficient graph, perfectly balancing model representational power with inference efficiency. The detection head receives multi-scale feature maps from Rep-PAN, introduces Dynamic Label Allocation (DLA) to reasonably adjust the distribution of positive and negative samples, and adopts an Anchor-Free strategy in the improved prediction head to reduce computational complexity.

2.3. Global Attention Mechanism

In recent years, numerous researchers have incorporated attention mechanisms into their models to achieve improvements in detection accuracy. The core concept of the attention mechanism is to simulate the focusing capabilities of the human visual system, enabling the model to selectively focus on different parts of the input when processing complex data. This enhances both the model’s performance and interpretability. Currently, attention mechanisms are primarily categorized into channel-based and spatial-based approaches. By dynamically adjusting feature channel weights by analyzing the importance of each channel, the network can focus more on information from critical channels. Spatial attention enhances the model’s focus on key regions by calculating the distribution of importance across spatial dimensions.
CBAM combines channel and spatial attention mechanisms, first adjusting the channel dimension and then optimizing the spatial dimension [32]. This enables the network to focus more intently on task-relevant regions when processing images. However, the drawback of CBAM lies in its reliance on local information processing, which makes it difficult to capture global contextual information and results in poor performance in complex agricultural scenarios.
In this paper, the Global Attention Module (GAM) [33] replaces CBAM. GAM amplifies global-level interaction features while minimizing information dispersion. By computing global context, it effectively captures global information within gesture features, enhancing the model’s focus on critical features. Figure 4 shows the channel and spatial attention modules. The calculation method is as follows:
F 2 = M C F 1 F 1 ;   F 3 = M S F 2 F 2
where F1 is the input feature, F2 is the intermediate state, and F3 is the output feature.
The channel attention submodule employs multilayer perceptrons to preserve information across channels, spatial width, and height, thereby enhancing global interaction features. The spatial attention submodule fuses spatial information through convolutional layers. GAM addresses CBAM’s limitation of insufficient global feature representation when processing images by simultaneously focusing on interactions across both channel and spatial dimensions, thereby strengthening the expression capability of global features.
The channel attention submodule preserves information across three dimensions through a three-dimensional arrangement. A two-layer multilayer perceptron (MLP) is employed to extend cross-dimensional channel-spatial dependencies. This MLP adopts an encoder–decoder architecture with a compression ratio of r (rate = 4).
In the spatial attention submodule, two convolutional layers are employed to fuse spatial features and focus on spatial information. First, a convolution with a 7 × 7 kernel reduces the number of channels to decrease computational load. Subsequently, another 7 × 7 convolution increases the channel count to maintain consistency, and the padding value is set to 3. Finally, the output passes through a Sigmoid function.

2.4. Fusion of GAM and YOLOv8s

As shown in Figure 5, the GAM module is integrated into the backbone network of the YOLOv8 algorithm to enhance the extraction of both global and local features. This approach preserves finer-grained details while suppressing irrelevant noise, thereby improving the detection of occluded objects. A 4 × 4-pixel detection head was added to the head section, along with a fourth output layer P2, to enhance the detection of small targets like strawberry fruits. This design not only preserves YOLOv8’s efficiency but also significantly improves the model’s ability to detect small objects, achieving higher detection accuracy and robustness.

2.5. Settings and Data Description

Through stratified random sampling, the 6000 images were divided into training, validation, and test sets in an 8:1:1 ratio. The division strictly maintained consistent proportions across the three categories: unripe, semiripe, and ripe. The training set contained 4800 images covering all scenarios without extreme interference. It was used for iterative model parameter training and gradient descent optimization, with model weights saved every 10 iterations. The validation set comprised 600 images with identical scene distributions to the training set. It was used for hyperparameter optimization, overfitting monitoring, and optimal model selection. Early stopping was triggered if validation loss increases for 15 consecutive iterations. The test set contained 600 images representing extreme scenarios not covered in the training or validation sets, enabling unbiased evaluation of the model’s generalization capability.
The experimental hardware utilized a Dell T7920 workstation (Dell, American) equipped with an Intel Xeon Gold 6248 CPU, NVIDIA Tesla V100 GPU, 64GB of memory, and 2TB NVMe SSD storage. This configuration provides sufficient computational power for efficient deep learning model training. Training parameters were set based on preliminary experimental results, with the final parameters detailed in the table below. Key parameter selections are outlined in Table 2.

3. Results and Discussion

The experimental results and discussion focus on the effectiveness of the YOLOv8s model enhanced with a Global Attention Mechanism (GAM) for strawberry ripeness detection. The model’s performance advantages are systematically quantified across five dimensions: standardization of experimental baseline conditions, validation of core improvement modules, cross-comparison with multiple algorithms, robustness testing in complex scenarios, and analysis of the underlying mechanisms behind the results.

3.1. Performance Enhancement of YOLOv8s Through GAM Module

To validate the effectiveness of GAM in enhancing the YOLOv8s model, this study employed a controlled variable method. The sole variable tested was “whether the GAM module is introduced,” while all other experimental conditions remained identical. We compare the convergence characteristics, detection accuracy, and feature extraction capabilities between the original YOLOv8s and the improved YOLOv8s, with a particular focus on verifying GAM’s advantages in complex scenarios.
Model convergence characteristics serve as key indicators for evaluating model stability and training efficiency. By monitoring the trends in training loss (Train-Loss), validation loss (Val-Loss), and validation set mAP, we compared the convergence processes of the two model groups. The results are shown in Figure 6.
By monitoring the trends in training loss, validation loss, and validation set mAP, we observed that both model groups entered the convergence phase after 40 epochs. However, the improved YOLOv8s converged faster, with loss values below 0.3 after 30 epochs, whereas the original model required 45 epochs. The final training loss of the improved model stabilized at 0.23 ± 0.05, and the validation loss stabilized at 0.38 ± 0.02, representing reductions of 0.13 and 0.14, respectively, compared to the original model. Regarding validation set mAP, the improved model reached 88.5% at 50 epochs and stabilized at 91.5% by epoch 120, whereas the original model achieved only 86.9% mAP at epoch 120.
The core reason for this discrepancy lies in the original YOLOv8s backbone network’s reliance solely on local feature aggregation, making it prone to learning irrelevant features like leaf patterns and soil backgrounds. This leads to slow loss reduction and overfitting. In contrast, the GAM module employs a dual-branch interaction of “channel attention–spatial attention” to focus on effective features such as the red areas and contour textures of strawberry fruits. It suppresses the weights of irrelevant features, thereby accelerating convergence and enhancing generalization stability.

3.2. Performance Comparison of Different Models in Strawberry Maturity Assessment

To validate the competitiveness of the improved YOLOv8s among comparable algorithms, we selected traditional YOLO variants, attention-enhanced models, and other common detection models for comparison. A comprehensive evaluation was conducted across three dimensions: accuracy, speed, and lightweight performance. All models were retrained using the training dataset from this study to ensure consistent conditions.
The comparison models include YOLOv3, YOLOv5s, and YOLOv7s; the attention-enhanced model CBAM-YOLOv8s; and the proposed GAM-YOLOv8s. YOLOv3 serves as the baseline model for early strawberry ripeness detection, based on the Darknet53 backbone network; YOLOv5s represents lightweight models, based on CSPDarknet53; YOLOv7s represents medium-to-high-accuracy models, based on the ELAN backbone network; CBAM-YOLOv8s replaces GAM with the traditional CBAM attention mechanism, functioning as the attention mechanism comparison group.
All models were retrained using the training set from this study (under consistent training conditions). After training, various performance metrics were calculated on the test set, and the results are shown in Table 3.
The test results indicate that, among the traditional YOLO series, YOLOv3 exhibits relatively low overall performance, achieving only 79.4% mAP@0.5, with an inference speed of 28FPS. Both accuracy and real-time capability fall short. YOLOv5s improves accuracy to 83.5% and reaches an inference speed of 42FPS, demonstrating greater lightweight performance, though recognition accuracy remains insufficient. YOLOv7s achieves further accuracy improvement to 85.4%, but its speed drops to 38FPS, resulting in slightly poorer balance. As a new-generation base model, YOLOv8s enhances mAP@0.5 to 86.9% while significantly boosting inference speed to 55FPS, delivering overall performance superior to its predecessor. CBAM-YOLOv8s, which incorporates an attention mechanism based on YOLOv8s, achieves an even higher accuracy of 88.4%, though its inference speed slightly decreases to 51FPS. This demonstrates that the attention mechanism positively contributes to accuracy.
The proposed GAM-YOLOv8s model achieves the best performance across all metrics, with a mAP@0.5 of 91.5%. This represents improvements of 12.1%, 8.0%, 6.1%, 4.6%, and 3.1% over YOLOv3, YOLOv5s, YOLOv7s, YOLOv8s, and CBAM-YOLOv8s, respectively. Meanwhile, the inference speed remains at 53FPS, which is only slightly lower than that of the baseline YOLOv8s. This demonstrates that the model maintains high real-time performance and lightweight advantages while significantly improving detection accuracy. Its comprehensive performance stands out, making it highly valuable for practical applications.
We further analyzed the feature heatmaps, which are shown in Figure 7. We compared the activation focus of the base YOLOv8s (Figure 7a) and the proposed GAM-YOLOv8s (Figure 7c). By measuring the pixel intensities in the feature maps, we observed that the GAM-YOLOv8s model concentrated significantly more activation energy (approximately 85–90%) within the ground-truth bounding box of the strawberry, compared to the more dispersed activations of the base model. This difference in feature concentration demonstrates that GAM effectively filters background noise and directs the model’s attention. This focused learning mechanism ensures more efficient and relevant gradient updates, resulting in faster convergence and lower, more stable loss values.

3.3. Verification of Model Robustness Improvement Across Different Scenarios

In the natural environment of strawberry cultivation, variations in light intensity and fruit occlusion represent the two primary core interference factors affecting detection accuracy. To validate the robustness of the enhanced YOLOv8s model, this study designed “Light Robustness Tests” and “Occlusion Robustness Tests” to compare performance differences between the improved model and the baseline YOLOv8s. The test dataset was categorized into four lighting conditions: “front-lit,” “back-lit,” “cloudy,” and “dusk low-light,” with 150 images per category. The mAP comparisons between the two models are presented in the table below (Table 4).
Comparisons of mAP@0.5 under varying lighting conditions reveal that both the baseline YOLOv8s and the improved GAM-YOLOv8s exhibit detection performance that fluctuates with light intensity. However, the enhanced model demonstrates superior overall performance and greater robustness. In front-lit scenes (8000–12,000 lux), the base model achieved a mAP of 90.2%, while GAM-YOLOv8s improved to 93.5%, representing a 3.3 percentage point increase. In back-lit conditions (1000–3000 lux), the baseline model drops to 78.5%, while the improved model maintains 87.2%, an increase of 8.7 percentage points. Under cloudy conditions (500–1500 lux), the baseline model achieves 85.6%, and the improved model reaches 90.1%, an increase of 4.5 percentage points. In low-light conditions at dusk (200–500 lux), the baseline model achieved only a mAP of 72.3%, a performance close to that of our defined “practicality threshold,” which is set at 70% mAP@0.5. Given our operational requirements for harvesting robots, a detection accuracy below 70% would result in missed detections of ripe fruit or a high false detection rate of unripe fruit. In contrast, GAM-YOLOv8s maintained a mAP of 82.8%, well above this survival threshold, achieving a maximum improvement of 10.5 percentage points. Overall, as illumination decreased from bright to dim, the baseline model’s mAP declined by 19.8%, while the improved model only decreased by 10.7%. Moreover, the improvement effect exhibited an increasing trend as lighting conditions deteriorated.
Occlusion Robustness Testing categorizes the dataset into three scenarios based on fruit occlusion levels, with 200 images per category and 67–68 samples of each maturity stage across all three categories. Strawberry leaves serve as occluding objects to simulate canopy shading in natural cultivation. The unobstructed scenario features fruit surfaces without leaf coverage, preserving complete features. The partially obstructed scenario shows 30–50% leaf coverage on fruit surfaces, with partial features visible. The severely obstructed scenario exhibits 50–70% leaf coverage, revealing only edge or localized features. Test results using the AP and average miss rate as core metrics for each category are presented in Table 5.
Table 5 shows that, as the strawberry occlusion area increases from 0% to 50–70%, both YOLOv8s and GAM-YOLOv8s exhibit declining detection accuracy (AP). However, the improved model demonstrates a significant advantage that grows more pronounced with increasing occlusion. Without occlusion, YOLOv8s achieved AP values of 89.2%, 88.5%, and 93.8% for the three strawberry categories, with an average miss rate of 3.2%. GAM-YOLOv8s improved these to 92.5%, 91.8%, and 95.6%, while reducing the miss rate to 1.8%. Under partial occlusion (30–50%), YOLOv8s’ AP dropped to 81.5–86.2% with a false negative rate rising to 8.5%, while GAM-YOLOv8s achieved an AP of 87.2–92.3% and a false negative rate of 4.2%, demonstrating a 7.1–7.4% AP improvement and a clear advantage. Under severe occlusion (50–70%), YOLOv8s’ AP dropped to 71.8–78.6% with a false negative rate of 15.6%. GAM-YOLOv8s maintained an AP of 83.5–90.2% and a false negative rate of 6.8%, achieving an AP improvement of 13.8–14%. The results demonstrate that GAM-YOLOv8s significantly mitigates performance degradation in occlusion scenarios, with the most pronounced effects observed under moderate to severe occlusion conditions.

3.4. Compared with Existing Research

In recent years, many scholars have used various deep learning models to study strawberry maturity (Table 6). Tao et al. [34] propose a strawberry maturity recognition algorithm named YOLOv5s-BiCE, which replaces the upsampling algorithm with the CARAFE module structure to achieve multi-scale feature fusion. Compared to the original YOLOv5s model, YOLOv5s-BiCE achieves a 2.8% improvement in mean average precision and a 7.4% increase in accuracy. Chen et al. [35] propose a CES-YOLOv8 network architecture that replaces some C2f modules in the YOLOv8 backbone with ConvNeXt V2 modules to enhance feature capture capabilities for strawberries at different ripeness levels. The introduction of the ECA attention mechanism further improves feature representation. Experimental results demonstrate that the CES-YOLOv8 model achieves an accuracy of 88.20%, a recall rate of 89.80%, a mAP50 of 92.10%, and an F1 score of 88.99% in complex environments.
Cai et al. [36] introduced an attention mechanism into the main network and the hollow space pyramid pooling module of the DeepLabV3+ network to enhance the feature information of strawberry images. Experimental results demonstrate that this method can accurately segment strawberry images at different stages of ripeness, achieving a model mean pixel accuracy of 90.9% and a mean intersection-over-union ratio of 83.05%, with a frame rate of 7.67 frames per second. Tamrakar et al. [37] proposed a lightweight improved version of the YOLOv5s-CGhostnet model to enhance strawberry detection performance. Compared to the original YOLOv5 model, the model size was significantly reduced by 85.09%, and the GFLOPS computational load decreased by 88.5%.
In terms of detection results, our model outperforms those of Chen et al. [35] and Cai et al. [36], with 2.51% and 0.6% higher mAP values, respectively. Compared with Tao et al. [34] and Tamrakar et al. [37], our model has a slightly lower mAP. The main reason is that the image samples used in this paper are closer to the natural environment, with more complex backgrounds and lighting. Regarding model parameters, Tao et al. [34] and Tamrakar et al. [37] selected YOLOv5s as their base model due to its lower number of parameters. This paper opts for YOLOv8s, which features fewer parameters and higher accuracy. Its computational speed outperforms Chen et al. [35] and Cai et al. [36], enabling precise strawberry maturity detection with minimal computational effort. Through comparisons with models in previous research, the model proposed in this paper is effective and has good detection effects in natural environments.

4. Conclusions and Future Work

To achieve precise detection of red strawberry ripeness, this study collected photographs of strawberry maturity under various natural environments and enhanced the expressiveness of feature information in the dataset through different image enhancement methods. Considering the application requirements for picking robots, this study selected the YOLOv8s model as the base model due to its low parameter count and high accuracy. Leveraging the advantages of attention mechanisms, we integrated a Global Attention Mechanism (GAM) into the backbone network. This effectively enhances the model’s extraction of key information related to strawberry ripeness while suppressing the learning of irrelevant information. Experimental results demonstrate that the improved YOLOv8s achieves excellent performance, with a mAP of 91.5% for three maturity levels and a frame rate of 53 FPS. Compared with other mainstream YOLO models, our model delivers optimal performance. It also exhibits robust detection capabilities under varying lighting conditions and occlusions, meeting the requirements for subsequent deployment on harvesting robots.
In subsequent tasks, we will further expand the strawberry ripeness dataset to include variations in weather conditions and growth stages, enhancing the model’s generalization capabilities under natural environments. We will also deploy this model onto strawberry-harvesting robots for field validation and application. Based on actual test results, we will optimize the model’s performance. Concurrently, leveraging the harvesting robot’s hardware architecture, we will develop autonomous learning capabilities to enable simultaneous model training and sample collection. The strawberry ripeness detection model developed in this research holds significant application potential and will contribute to the advancement of agricultural modernization.

Author Contributions

Conceptualization, S.Z., C.F., T.H. and Y.J.; methodology, S.Z., C.F., T.H. and Y.J.; software, S.Z.; validation, S.Z., C.F. and Y.J.; formal analysis, S.Z., C.F., T.H. and Y.J.; investigation, S.Z. and Y.J.; resources, C.F.; data curation, Y.J. and T.H.; writing—original draft preparation, S.Z.; writing—review and editing, S.Z. and Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the Young Scientists Fund of the National Natural Science Foundation of China (No. 32401693), Jiangsu Province Modern Agricultural Machinery Equipment and Technology Demonstration and Promotion Project (No. NJ2023-23), and Priority Academic Program Development of Jiangsu Higher Education Institutions (No. PAPD2023-87), and Changzhou Agricultural Science and Technology Innovation and Demonstration Extension Project (No. KCSF(2023)11).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Skrovankova, S.; Sumczynski, D.; Mlcek, J.; Jurikova, T.; Sochor, J. Bioactive compounds and antioxidant activity in different types of berries. Int. J. Mol. Sci. 2015, 16, 24673–24706. [Google Scholar] [CrossRef] [PubMed]
  2. Wang, Z.; Cang, T.; Qi, P.; Zhao, X.; Xu, H.; Wang, X.; Zhang, H.; Wang, X. Dissipation of four fungicides on greenhouse strawberries and an assessment of their risks. Food Control 2015, 55, 215–220. [Google Scholar] [CrossRef]
  3. Elhariri, E.; El-Bendary, N.; Saleh, S.M. Strawberry-DS: Dataset of annotated strawberry fruits images with various developmental stages. Data Brief 2023, 48, 109165. [Google Scholar] [CrossRef]
  4. Zhao, S.; Liu, J.; Hua, T.; Jiang, Y. Improved UNet Recognition Model for Multiple Strawberry Pests Based on Small Samples. Agronomy 2025, 15, 2252. [Google Scholar] [CrossRef]
  5. Liu, J.; Wu, S. Research Progress and Prospect of Strawberry Whole-process Farming Mechanization Technology and Equipment. Trans. Chin. Soc. Agric. Mach. 2021, 52, 1–16. [Google Scholar]
  6. Zhao, S.; Liu, J.; Wu, S. Multiple disease detection method for greenhouse-cultivated strawberry based on multiscale feature fusion Faster R_CNN. Comput. Electron. Agric. 2022, 199, 107176. [Google Scholar] [CrossRef]
  7. Tang, C.; Chen, D.; Wang, X.; Ni, X.; Liu, Y.; Liu, Y.; Mao, X.; Wang, S. A fine recognition method of strawberry ripeness combining Mask R-CNN and region segmentation. Front. Plant Sci. 2023, 14, 1211830. [Google Scholar] [CrossRef]
  8. Liu, J.; Jiang, Y. Industrialization trends and multi-arm technology direction of harvesting robots. Trans. Chin. Soc. Agric. Mach. 2024, 55, 1–17. [Google Scholar]
  9. Xiong, Y.; Peng, C.; Grimstad, L.; From, P.J.; Isler, V. Development and field evaluation of a strawberry harvesting robot with a cable-driven gripper. Comput. Electron. Agric. 2019, 157, 392–402. [Google Scholar] [CrossRef]
  10. Yu, Y.; Zhang, K.; Liu, H.; Yang, L.; Zhang, D. Real-time visual localization of the picking points for a ridge-planting strawberry harvesting robot. IEEE Access 2020, 8, 116556–116568. [Google Scholar] [CrossRef]
  11. Liu, Y.; Wei, C.; Yoon, S.-C.; Ni, X.; Wang, W.; Liu, Y.; Wang, D.; Wang, X.; Guo, X. Development of multimodal fusion technology for tomato maturity assessment. Sensors 2024, 24, 2467. [Google Scholar] [CrossRef]
  12. Wang, F.; Zhao, C.; Yang, H.; Jiang, H.; Li, L.; Yang, G. Non-destructive and in-site estimation of apple quality and maturity by hyperspectral imaging. Comput. Electron. Agric. 2022, 195, 106843. [Google Scholar] [CrossRef]
  13. Gupta, A.K.; Pathak, U.; Tongbram, T.; Medhi, M.; Terdwongworakul, A.; Magwaza, L.S.; Mditshwa, A.; Chen, T.; Mishra, P. Emerging approaches to determine maturity of citrus fruit. Crit. Rev. Food Sci. Nutr. 2022, 62, 5245–5266. [Google Scholar] [CrossRef]
  14. Cho, B.-H.; Kim, Y.-H.; Lee, K.-B.; Hong, Y.-K.; Kim, K.-C. Potential of snapshot-type hyperspectral imagery using support vector classifier for the classification of tomatoes maturity. Sensors 2022, 22, 4378. [Google Scholar] [CrossRef]
  15. Arefi, A.; Motlagh, A.M.; Mollazade, K.; Teimourlou, R.F. Recognition and localization of ripen tomato based on machine vision. Aust. J. Crop Sci. 2011, 5, 1144–1149. [Google Scholar]
  16. Fadhel, M.A.; Hatem, A.S.; Alkhalisy, M.A.E.; Awad, F.H.; Alzubaidi, L. Recognition of the unripe strawberry by using color segmentation techniques. Int. J. Eng. Technol. 2018, 7, 3383–3387. [Google Scholar]
  17. Mo, S.-T.; Dong, T.; Zhao, X.-X.; Kan, J.-M. Discriminant model of banana fruit maturity based on genetic algorithm and SVM. J. Fruit Sci. 2022, 39, 2418–2427. [Google Scholar]
  18. Khojastehnazhand, M.; Mohammadi, V.; Minaei, S. Maturity detection and volume estimation of apricot using image processing technique. Sci. Hortic. 2019, 251, 247–251. [Google Scholar] [CrossRef]
  19. Malik, M.H.; Zhang, T.; Li, H.; Zhang, M.; Shabbir, S.; Saeed, A. Mature tomato fruit detection algorithm based on improved HSV and watershed algorithm. IFAC Pap. 2018, 51, 431–436. [Google Scholar] [CrossRef]
  20. Chen, M.; Shi, X.; Zhang, Y.; Wu, D.; Guizani, M. Deep feature learning for medical image analysis with convolutional autoencoder neural network. IEEE Trans. Big Data 2017, 7, 750–758. [Google Scholar] [CrossRef]
  21. Manakitsa, N.; Maraslidis, G.S.; Moysis, L.; Fragulis, G.F. A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision. Technologies 2024, 12, 15. [Google Scholar] [CrossRef]
  22. Zhou, X.; Lee, W.S.; Ampatzidis, Y.; Chen, Y.; Peres, N.; Fraisse, C. Strawberry maturity classification from UAV and near-ground imaging using deep learning. Smart Agric. Technol. 2021, 1, 100001. [Google Scholar] [CrossRef]
  23. Miao, R.; Li, Z.; Wu, J. Lightweight maturity detection of cherry tomato based on improved YOLO v7. Trans. Chin. Soc. Agric. Mach. 2023, 54, 225–233. [Google Scholar]
  24. Qiu, C.; Tian, G.; Zhao, J.; Liu, Q.; Xie, S.; Zheng, K. Grape maturity detection and visual pre-positioning based on improved YOLOv4. Electronics 2022, 11, 2677. [Google Scholar] [CrossRef]
  25. Fan, Y.; Zhang, S.; Feng, K.; Qian, K.; Wang, Y.; Qin, S. Strawberry maturity recognition algorithm combining dark channel enhancement and YOLOv5. Sensors 2022, 22, 419. [Google Scholar] [CrossRef] [PubMed]
  26. Huang, Y.; Jiang, X.; Zhou, C.; Zhuo, X.; Xiong, J.; Zhang, M. Study on mango ripeness detection on production line based on improved YOLOv8s. J. Food Meas. Charact. 2025, 19, 768–780. [Google Scholar] [CrossRef]
  27. Wang, C.; Li, C.; Han, Q.; Wu, F.; Zou, X. A performance analysis of a litchi picking robot system for actively removing obstructions, using an artificial intelligence algorithm. Agronomy 2023, 13, 2795. [Google Scholar] [CrossRef]
  28. Tang, Y.; Qiu, J.; Zhang, Y.; Wu, D.; Cao, Y.; Zhao, K.; Zhu, L. Optimization strategies of fruit detection to overcome the challenge of unstructured background in field orchard environment: A review. Precis. Agric. 2023, 24, 1183–1219. [Google Scholar] [CrossRef]
  29. Sohan, M.; Sai Ram, T.; Rami Reddy, C.V. A review on yolov8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 18–20 November 2024; pp. 529–545. [Google Scholar]
  30. Yang, G.; Wang, J.; Nie, Z.; Yang, H.; Yu, S. A lightweight YOLOv8 tomato detection algorithm combining feature enhancement and attention. Agronomy 2023, 13, 1824. [Google Scholar] [CrossRef]
  31. Wang, A.; Qian, W.; Li, A.; Xu, Y.; Hu, J.; Xie, Y.; Zhang, L. NVW-YOLOv8s: An improved YOLOv8s network for real-time detection and segmentation of tomato fruits at different ripeness stages. Comput. Electron. Agric. 2024, 219, 108833. [Google Scholar] [CrossRef]
  32. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  33. Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar] [CrossRef]
  34. Tao, Z.; Li, K.; Rao, Y.; Li, W.; Zhu, J. Strawberry maturity recognition based on improved YOLOv5. Agronomy 2024, 14, 460. [Google Scholar] [CrossRef]
  35. Chen, Y.; Xu, H.; Chang, P.; Huang, Y.; Zhong, F.; Jia, Q.; Chen, L.; Zhong, H.; Liu, S. CES-YOLOv8: Strawberry maturity detection based on the improved YOLOv8. Agronomy 2024, 14, 1353. [Google Scholar] [CrossRef]
  36. Cai, C.; Tan, J.; Zhang, P.; Ye, Y.; Zhang, J. Determining strawberries’ varying maturity levels by utilizing image segmentation methods of improved deeplabv3+. Agronomy 2022, 12, 1875. [Google Scholar] [CrossRef]
  37. Tamrakar, N.; Karki, S.; Kang, M.Y.; Deb, N.C.; Arulmozhi, E.; Kang, D.Y.; Kook, J.; Kim, H.T. Lightweight Improved YOLOv5s-CGhostnet for Detection of Strawberry Maturity Levels and Counting. AgriEngineering 2024, 6, 962–978. [Google Scholar] [CrossRef]
Figure 1. Strawberry Maturity Sample.
Figure 1. Strawberry Maturity Sample.
Agriculture 15 02263 g001
Figure 2. Image Enhancement Sample.
Figure 2. Image Enhancement Sample.
Agriculture 15 02263 g002
Figure 3. YOLOv8s Overall Network Framework.
Figure 3. YOLOv8s Overall Network Framework.
Agriculture 15 02263 g003
Figure 4. GAM Network Framework.
Figure 4. GAM Network Framework.
Agriculture 15 02263 g004
Figure 5. GAM+ YOLOv8s Network Framework.
Figure 5. GAM+ YOLOv8s Network Framework.
Agriculture 15 02263 g005
Figure 6. Comparison of Convergence Curves Between Original and Improved YOLOv8s.
Figure 6. Comparison of Convergence Curves Between Original and Improved YOLOv8s.
Agriculture 15 02263 g006
Figure 7. Feature Heatmap Comparison Between Original YOLOv8s and Enhanced YOLOv8s.
Figure 7. Feature Heatmap Comparison Between Original YOLOv8s and Enhanced YOLOv8s.
Agriculture 15 02263 g007
Table 1. Strawberry Maturity Classification.
Table 1. Strawberry Maturity Classification.
ClassificationImageFeatures
unripeAgriculture 15 02263 i001The red-colored area is less than 30%.
half-ripeAgriculture 15 02263 i002The red-colored area ranges from 30% to 70%.
ripeAgriculture 15 02263 i003The red-colored area exceeds 70%.
Table 2. Model Training Parameter Settings.
Table 2. Model Training Parameter Settings.
Parameter NameValue
Epoch150
Batch Size8
Algorithm OptimizerSGD
Initial Learning Rate0.01
Weight Decay0.0005
Bounding Box LossCIoU Loss
Image Dimensions640 × 640
Table 3. Performance Comparison of Different Models in Strawberry Maturity Assessment.
Table 3. Performance Comparison of Different Models in Strawberry Maturity Assessment.
ModelIoU
(%)
Precision
(%)
Recall
(%)
AP
Unripe (%)
AP
Half-Ripe (%)
AP
Ripe (%)
mAP@0.5
(%)
FPS
YOLOv372.381.579.878.676.283.579.428
YOLOv5s76.585.284.182.380.587.883.542
YOLOv7s78.186.785.984.182.889.285.438
YOLOv8s80.288.387.585.784.390.686.955
CBAM-YOLOv8s82.589.688.887.286.191.888.451
GAM-YOLOv8s
(our model)
85.792.191.390.589.894.291.553
Table 4. Performance Comparison of Different Models in Strawberry Maturity Assessment.
Table 4. Performance Comparison of Different Models in Strawberry Maturity Assessment.
Lighting ScenarioIlluminance (Lux)YOLOv8sGAM-YOLOv8sImprovement Rate
Front lighting8000–12,00090.293.53.3
Backlighting1000–300078.587.28.7
Overcast500–150085.690.14.5
Low light at dusk200–50072.382.810.5
Table 5. Model Performance Comparison at Different Occlusion Levels.
Table 5. Model Performance Comparison at Different Occlusion Levels.
Occlusion LevelOcclusion Area RatioModelAP
Unripe
(%)
AP
Half-Ripe
(%)
AP
Ripe
(%)
Average Miss Rate
(%)
Unobstructed0%YOLOv8s89.288.593.83.2
GAM-YOLOv8s92.591.895.61.8
Partially obstructed30–50%YOLOv8s81.579.886.28.5
GAM-YOLOv8s88.687.292.34.2
Severely obstructed50–70%YOLOv8s71.869.578.615.6
GAM-YOLOv8s85.683.590.26.8
Table 6. Comparison with Existing Research.
Table 6. Comparison with Existing Research.
AuthorYearMethodmAP
Tao et al. [34]2024YOLOv5s-BiCE96.1%
Chen et al. [35]2024CES-YOLOv888.99%
Cai et al. [36]2022Improved DeepLabV3+90.90%
Tamrakar et al. [37]2024YOLOv5s-CGhostnet91.7%
Our model2025YOLOv8s-GAM91.5%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, S.; Fang, C.; Hua, T.; Jiang, Y. Detecting the Maturity of Red Strawberries Using Improved YOLOv8s Model. Agriculture 2025, 15, 2263. https://doi.org/10.3390/agriculture15212263

AMA Style

Zhao S, Fang C, Hua T, Jiang Y. Detecting the Maturity of Red Strawberries Using Improved YOLOv8s Model. Agriculture. 2025; 15(21):2263. https://doi.org/10.3390/agriculture15212263

Chicago/Turabian Style

Zhao, Shengyi, Chen Fang, Tianzheng Hua, and Yong Jiang. 2025. "Detecting the Maturity of Red Strawberries Using Improved YOLOv8s Model" Agriculture 15, no. 21: 2263. https://doi.org/10.3390/agriculture15212263

APA Style

Zhao, S., Fang, C., Hua, T., & Jiang, Y. (2025). Detecting the Maturity of Red Strawberries Using Improved YOLOv8s Model. Agriculture, 15(21), 2263. https://doi.org/10.3390/agriculture15212263

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop