A Blueberry Maturity Detection Method Integrating Attention-Driven Multi-Scale Feature Interaction and Dynamic Upsampling

You, Haohai; Li, Zhiyi; Wei, Zhanchen; Zhang, Lijuan; Bi, Xinhua; Bi, Chunguang; Li, Xuefang; Duan, Yunpeng

doi:10.3390/horticulturae11060600

Open AccessArticle

A Blueberry Maturity Detection Method Integrating Attention-Driven Multi-Scale Feature Interaction and Dynamic Upsampling

by

Haohai You

^1,2

,

Zhiyi Li

³,

Zhanchen Wei

¹,

Lijuan Zhang

^1,2

,

Xinhua Bi

¹,

Chunguang Bi

¹,

Xuefang Li

¹ and

Yunpeng Duan

^1,*

¹

College of Information Technology, Jilin Agricultural University, Changchun 130118, China

²

College of Internet of Things Engineering, Wuxi University, Wuxi 214063, China

³

College of Instrument Science & Electrical Engineering, Jilin University, Changchun 130012, China

^*

Author to whom correspondence should be addressed.

Horticulturae 2025, 11(6), 600; https://doi.org/10.3390/horticulturae11060600

Submission received: 26 March 2025 / Revised: 3 May 2025 / Accepted: 23 May 2025 / Published: 27 May 2025

(This article belongs to the Special Issue Applied Artificial Intelligence in Digital Horticulture: Practices and Innovations)

Download

Browse Figures

Versions Notes

Abstract

In the context of blueberry orchard management and automated harvesting, this study introduces an improved YOLOv8 model, ADE-YOLO, designed for precise blueberry ripeness detection, enhancing automated picking efficiency. Built on the YOLOv8n architecture, ADE-YOLO features a dimensionality-reducing convolution at the backbone’s end, reducing computational complexity while optimizing input features. This improvement enhances the effectiveness of the AIFI module, particularly in multi-scale feature fusion, boosting detection accuracy and robustness. Additionally, the neck integrates a dynamic sampling technique, replacing traditional upsampling methods, allowing for more precise feature integration during feature transfer from P5 to P4 and P4 to P3. To further enhance computational efficiency, CIOU is replaced with EIOU, simplifying the aspect ratio penalty term while maintaining high accuracy in bounding box overlap and centroid distance calculations. Experimental results demonstrate ADE-YOLO’s strong performance in blueberry ripeness detection, achieving a precision of 96.49%, recall of 95.38%, and mAP scores of 97.56% (mAP50) and 79.25% (mAP50-95). The model is lightweight, with just 2.95 M parameters and a 6.2 MB weight file, outpacing YOLOv8n in these areas. ADE-YOLO’s design and performance underscore its significant application potential in blueberry orchard management, providing valuable support for precision agriculture.

Keywords:

YOLOv8; deep learning; target recognition; EIOU; blueberry

1. Introduction

Blueberries are widely cultivated and consumed worldwide for their distinctive flavor, vibrant color, and exceptional health benefits [1,2]. These benefits are primarily attributed to their rich content of bioactive compounds, such as anthocyanins and phenolics, which possess strong antioxidant, anti-inflammatory, anti-cancer, anti-obesity, and neuroprotective properties. As a low-calorie and nutrient-dense fruit, blueberries are an excellent source of dietary fiber, vitamins, and minerals. Notably, they play a critical role in supporting cognitive health and maintaining memory function with aging [3]. In recent years, China has emerged as a rapidly growing market for blueberry cultivation and consumption, becoming a key player in the global industry [4]. The country’s primary blueberry-growing regions are in the northeast, southwest, and eastern areas, where favorable climatic conditions support robust cultivation. Among these, Northeast China—encompassing Liaoning, Jilin, and Heilongjiang provinces—serves as the core region for production, with its cool climate providing optimal conditions for blueberry growth.

The traditional method of identifying blueberry ripeness relies on fruit color, the condition of the fruit tip, and morphological characteristics [5]. Unripe blueberries are typically green or red, transitioning to dark blue or purple-black as they mature. Fully ripe blueberries are identifiable by a characteristic white frost on their skin, indicating freshness and readiness for harvest. Additionally, ripe blueberries are easily detachable from their clusters due to their loose attachment, whereas unripe fruits remain firmly attached, making them more challenging to pick. Ripe blueberries are also full and soft to the touch, contrasting with the harder texture of unripe fruit, which is less appealing. Currently, many blueberries in the Chinese market are harvested manually. However, the delicate nature of blueberries necessitates careful handling, making manual picking not only labor-intensive but also inefficient. To address these challenges, farmers and agricultural technicians are actively exploring advanced harvesting methods to enhance efficiency and reduce labor costs. Automated picking techniques and mechanized harvesting equipment show promise in improving efficiency [6]. However, they face significant challenges, such as high equipment costs and operational constraints during harvesting. Improving harvesting efficiency while ensuring fruit quality will be a critical focus for the sustainable development of the blueberry industry. Continued innovation in automated and mechanized picking technologies will be essential to overcoming current limitations and meeting the growing demand for blueberries in global markets.

In recent years, significant advancements have been made in blueberry robotic picking technology. Tian et al. [7] proposed a lightweight blueberry ripeness detection model (MSC-YOLOv8) based on YOLOv8, which significantly enhances detection accuracy and speed by incorporating MobileNetV3 and the Convolutional Attention Mechanism Module (CBAM) along with the SCYLLA-IoU bounding box loss function. This model improves the mAP by 3.9 percentage points and reduces the detection time per image by 3.97 ms compared to the original YOLOv8, providing robust technical support for blueberry picking in complex field environments. Wang et al. [8] improved the YOLOv4 Tiny network for blueberry detection in complex conditions, achieving an average precision of 96.24% for immature, underripe, and ripe blueberries, with a detection time of 5.723 ms, meeting both accuracy and speed requirements. Wang et al. [9] developed the YOLO-BLBE (Blueberry) model, with a compact size of 12.75 MB, integrating the innovative Improved Multi-scale Retinex with Color Recovery (I-MSRCR) method. This approach effectively enhances the recognition accuracy of blueberries at different ripeness stages. Similarly, Liu et al. [10] focused on improving the recognition accuracy of blueberry-picking robots in complex environments. By modifying the YOLOv5x structure and introducing lightweight CBAM and MSSENet modules, they enhanced small-target detection and anti-interference capabilities. The resulting model achieved a mAP of 78.3% with a real-time detection speed of 47 FPS, significantly improving recognition efficiency. Feng et al. [11] proposed the YOLOv9c model, which incorporates the SCConv convolution module and the SE attention mechanism, coupled with the MDPIoU loss function, to improve detection performance in complex environments. This model achieved mAP@0.5 and mAP@0.5:0.95 improvements of 0.7% and 0.8%, respectively, while reducing model size by 3.42 MB and the number of parameters by 1.847 M. These enhancements significantly increased the detection efficiency and accuracy of blueberry-picking robots. However, despite these architectural breakthroughs, many of these studies have yet to verify the feasibility of deploying their models on edge devices. While reduced model weights offer potential for edge device deployment, practical application challenges remain unresolved. To address this gap, Xiao et al. [12] proposed a lightweight detection method based on an improved YOLOv5 algorithm, capable of effectively identifying blueberry fruits and assessing ripeness in orchard environments. Their model achieved an average recall of 92.0%, a mAP of 91.5% (at a threshold of 0.5), and a real-time detection speed of 67.1 frames per second, making it particularly suitable for migration and deployment on edge devices. Zhao Yun et al. [13] introduced the PAC3 module for blueberry detection using drone-based remote sensing and deep learning, achieving significant performance improvements over existing models, and incorporating the Cluster-NMF algorithm to enhance detection efficiency. To further tackle the challenges of edge deployment, this study proposes a YOLOv8-based algorithm for field blueberry ripeness detection, designed to improve accuracy in real outdoor environments. Compared to other models, the proposed algorithm demonstrates significant advantages in size and computational efficiency, making it highly suitable for edge device deployment. Section 2 of this paper outlines the data preprocessing methodology, Section 3 describes the improvements made to the YOLOv8 network framework, Section 4 presents experimental validation and result analysis, and Section 5 summarizes the findings and discusses future research directions.

2. Materials and Methods

2.1. Dataset Collection

The dataset utilized in this study consists of two components. The first part includes images collected from a blueberry picking base located in Wanliang Town, Baishan City, Jilin Province (as shown in Figure 1). The blueberry used is Northern Highbush Blueberry, which is well suited to the region’s climatic conditions. Image acquisition was conducted on a sunny day at three different times—9:00 a.m., 12:30 p.m., and 3:00 p.m.—using an iPhone 15 Pro (manufactured by Apple Inc., Cupertino, CA, USA) with telephoto (f/2.8) and wide-angle (f/1.5) lenses. A total of 500 high-resolution images (4032 × 3024 pixels) were captured in JPG format. The second part of the dataset comprises 1483 images collected using web crawling techniques from publicly available sources. These images also predominantly feature Northern Highbush Blueberry, ensuring consistency in variety across the entire dataset. This consistency is critical for the classification task, as it ensures the uniformity of the blueberry variety, which was a concern raised by the reviewers. All images were rigorously screened for quality and diversity, maintaining consistent resolution (4032 × 3024 pixels) and JPG format, as in the first part of the dataset. To guarantee the validity and representativeness of the dataset, the collection and processing procedures adhered strictly to relevant ethical standards and data usage regulations. This comprehensive dataset, combining images from both the research site and publicly available sources, provides robust and reliable support for subsequent analysis and model training.

This study aims to establish a standardized method for classifying blueberry fruit maturity stages based on observable image features. While the BBCH scale provides a comprehensive framework for plant growth stages, it relies on field-based agronomic measurements (e.g., fruit hardness and soluble solids content), which cannot be obtained through visual observation alone [14,15]. Therefore, inspired by the BBCH scale, we developed a classification scheme based on visible traits, mapping blueberry maturity levels to consistently observable features [16], as shown in Figure 2. Specifically, we defined three main stages: Fully mature: The fruit is uniformly deep blue, plump, and round, with soft skin and an expanded calyx opening. This stage corresponds to the optimal harvest time, where the fruit’s sugar content is highest, and the fruit is most suitable for consumption or marketing. Semi-mature: The fruit shows partial pigmentation (from pink to red), slightly softened skin, visible surface wrinkles, and decreased calyx tension. During this stage, the fruit’s sugar content begins to increase, but sourness still dominates the flavor. Immature: The fruit is green, with no anthocyanin deposition, smooth and firm skin, and an intact calyx structure. At this stage, the fruit is still developing, with minimal sugar accumulation and a predominantly sour taste. This classification method aligns with the principles of the BBCH scale, ensuring the objectivity and repeatability of the classification. By visually observing these features, blueberries can be effectively categorized into maturity stages, providing a standardized labeling method for training and evaluating visual recognition models.

2.2. Data Processing

Image preprocessing is essential for enhancing data diversity and improving the effectiveness of model training [17,18]. The collected images were first divided into training, validation, and test sets in a ratio of 7:2:1. To augment the dataset, techniques including flipping, translation, and rotation were applied to the blueberry images (illustrated in Figure 3). Additionally, to maintain annotation consistency, the corresponding annotation files for each image were synchronized during the augmentation process. As a result, the dataset was expanded to include 3048 images, providing a more robust foundation for model training and evaluation.

2.3. Model Selection and Enhancement

YOLOv8, introduced in 2023 by Ultralytics, is one of the most advanced models in the YOLO series, offering a strong trade-off between detection accuracy and computational efficiency [19]. It offers multiple variants (YOLOv8n to YOLOv8x) to accommodate diverse application needs. The model incorporates a CSPNet backbone and a Focus module for efficient spatial downsampling [20], while combining PANet and FPN to enable robust multi-scale feature fusion. These architectural choices significantly enhance detection performance across objects of various sizes. Additionally, YOLOv8 employs advanced techniques such as mixed-precision training, label smoothing, and Mosaic data augmentation to improve the accuracy of small object detection [21].

Building on this foundation, we propose an enhanced model, ADE-YOLOv8, which introduces several architectural improvements to further boost detection accuracy and computational efficiency. Specifically, a dimensionality-reducing convolution module is added at the end of the backbone to reduce the computational load and enhance the quality of feature inputs. The original SPPF module is replaced with an Adaptive Information Fusion and Integration (AIFI) module, which improves multi-scale feature integration. A dynamic sampling strategy is adopted in the neck to replace conventional upsampling, yielding a more accurate feature alignment across resolutions. Moreover, the CIOU loss function is substituted with the more efficient EIOU loss, reducing inference complexity while maintaining localization precision. These enhancements allow ADE-YOLOv8 to achieve higher detection accuracy and efficiency in multi-scale and small-object detection tasks, making it particularly suitable for deployment on resource-constrained edge devices. Experimental results demonstrate that the proposed model outperforms the original YOLOv8 model in terms of both mean average precision (mAP) and inference speed. The network structure of ADE-YOLOv8 is shown in Figure 4.

2.3.1. AIFI

In the YOLOv8n model, the traditional Spatial Pyramid Pooling—Fast (SPPF) module enhances the sensory field through multiple pooling, but it may lead to feature redundancy in the blueberry ripeness detection task, especially when dealing with smaller immature blueberries, which weakens the detailed features and affects detection. To address this issue, this study proposes the Attention-based Information Flow Integration (AIFI) module to replace the SPPF module. The AIFI module focuses on feature interactions at high-level feature layers (e.g., S5 layer) [22], avoids mixing with low-level features, improves detection efficiency, and reduces the phenomena of leakage and false detection.

The Attention-based Information Flow Integration (AIFI) module realizes efficient feature interactions through the multi-Head Attention mechanism, which is a key component in improving the performance of the model. In AIFI, the input multi-scale feature maps (e.g., S3, S4, S5) are first subjected to position embedding to provide position information for the features, which enables the self-attention mechanism to capture the spatial dependencies of each part of the feature maps. The self-attention mechanism at the core of AIFI realizes the interactions between features through the interactions among the three matrices, namely, the query (Q), the key (K), and the value (V), articulated in Equation (1).

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(1)

These interactions are generated by a linear transformation of the input feature map and correspond to the dimensions of the key. The formula dynamically adjusts the relationship between the features by calculating the similarity between the parts of the feature graph and assigning different weights to the features. AIFI uses the multi-Head Attention mechanism articulated in Equations (2) and (3).

M u l t i H e a d (Q, K, V) = C o n c a t ({h e a d}_{1}, {h e a d}_{2}, \dots, {h e a d}_{h}) W_{o}

(2)

{h e a d}_{i} = A t t e n t i o n (Q W_{q_{i}}, K W_{k_{i}}, V W_{v_{i}})

(3)

The multi-attention mechanism can process information from different subspaces of features in parallel, allowing the model to focus on different semantic regions and enhance the expressiveness of the features. This mechanism enables AIFI to capture long-distance dependencies between complex objects in an image, reinforcing important semantic features while weakening redundant or irrelevant information. After processing by the self-attention mechanism, the AIFI module further processes the features through multilayer perception (MLP), as articulated in Equation (4).

M L P (X) = R e L U (X W_{1} + b_{1}) W_{2} + b_{2}

(4)

This process further enhances the expressiveness of the features and helps the model to better utilize these higher-order features for subsequent detection tasks. Meanwhile, the normalization layer (Norm) ensures the stability of the features during processing, effectively preventing the gradient from disappearing or exploding. Ultimately, the feature maps processed by AIFI can provide more accurate semantic information for the subsequent modules, which significantly improves the model’s detection accuracy in complex scenes, especially when dealing with images with complex semantic relationships. AIFI demonstrates a powerful ability to capture complex feature relationships. By introducing AIFI, the model is able to capture the associations between objects in the image more efficiently and enhance the semantic comprehension ability, thus reducing the leakage and false detection rate and providing more accurate and reliable feature support for the overall detection task. The AIFI structure is shown in Figure 5.

2.3.2. DySample

In the neck section, the DySample module is mainly introduced to improve the up-sampling process of different scale features. We combine the features of the backbone network with the output of the DySample dynamic upsampling module to improve the model’s ability to detect multi-scale targets through further splicing and feature extraction [23]. The DySample module is responsible for dynamic upsampling, which handles the input feature maps in a more flexible and dynamic way. In this module, the input feature X first goes through the sampling point generator to generate a collection of sampling points. This collection is sampled through a grid to obtain a new upsampled feature map X’, which completes the spatial resolution upsampling. DySample further processes the input features through a combination of dynamic and static range factors, as illustrated in the structure shown in Figure 6.

-: The static scope factor performs feature map upsampling by combining linear transformation and pixel shuffle operation with a fixed scope factor (0.25).
-: The dynamic scope factor introduces a dynamic scope factor (0.5σ) to further adjust the upsampled features by dynamically combining them with the input features, so that the upsampling process can be better adapted to the spatial and content changes of the input.

After DySample processing, the upsampled feature maps are spliced with different feature layers of the backbone network (e.g., P4 and P3). This splicing operation is realized by the Concat module, which allows the network to utilize both the upsampled features and the different layers of the backbone feature maps to improve the detection of small-scale targets. The detailed structure of point sampling based on dynamic scaling factors is shown in Figure 7.

2.3.3. EIOU

In the field of target detection, Efficient Intersection over Union (EIOU) is an enhanced loss function designed to improve the localization accuracy between the predicted and target bounding boxes [24]. While the traditional IoU loss function primarily measures the overlap between the predicted and target boxes, it does not fully account for critical factors such as the centroid position and aspect ratio of the target box. This limitation is particularly evident when the overlap between bounding boxes is low, often resulting in slower convergence and reduced localization accuracy. To address these issues, EIOU introduces additional penalty terms to optimize bounding box positioning. The optimization objectives of the EIOU loss function can be summarized into the following three key aspects:

-: Faster convergence: By directly optimizing the centroid distance and aspect ratio differences between the predicted and target frames, EIOU enables faster convergence, especially when the frame overlaps are low. This reduces unnecessary iterations during training, improving efficiency.
-: Higher positioning accuracy: EIOU’s focus on optimizing the aspect ratio improves the model’s ability to accurately predict the scale of the target box. This reduces errors in both width and height predictions, making it particularly effective for targets with significant scale variations
-: Better robustness: Even when the predicted box does not perfectly overlap with the target box, EIOU provides effective gradients, preventing the training process from stalling and ensuring stable learning.

The EIOU loss adds two penalty terms to the IoU loss: the first aims to minimize the Euclidean distance between the centroids of the true frame and the predicted frame; the second focuses on reducing the difference between the true frame and the predicted frame in terms of width and height. Mathematically, the EIOU loss is defined as articulated in Equation (5).

L_{E I o U} = 1 - E I o U = 1 - (I o U - \frac{d^{2} (b^{g t}, b^{p})}{{(w^{c})}^{2} + {(h^{c})}^{2}} - (\frac{d^{2} (w^{p}, w^{g t})}{{(w^{c})}^{2}} + \frac{d^{2} (h^{p}, h^{g t})}{{(h^{c})}^{2}}))

(5)

where

d^{2} (b^{g t}, b^{p})

denotes the Euclidean distance between the centroids of

b^{g t}

and

b^{p}

;

w^{c}

and

h^{c}

are the width and height of the smallest box C containing

b^{g t}

and

b^{p}

, respectively; and

d^{2} (w^{p}, w^{g t})

and

d^{2} (h^{p}, h^{g t})

denote the difference in width and height between the real and predicted boxes, the ground truth bounding box and predicted bounding box, as shown in Figure 8. This improvement makes EIOU more accurate and adaptable in target detection tasks, especially for small targets like blueberries.

2.4. Experimental Environment

In the experiments, Ubuntu 18.04 was used as the operating system, PyTorch was used as the deep learning framework, an experimental platform was set up, and Python 3.9.13 and torch-1.1.3+cuda11.6 were used. The CPU model was Intel(R) Xeon(R) Silver 4214R 2.40 GHz. The graphics card model was NVIDIA GeForce RTX 3090, 24260MiB. The detailed hyperparameters of the experiment are shown in Table 1.

2.5. Evaluation Criteria

In this study, blueberries were categorized as unripe, underripe, and ripe, and the detection performance of YOLOv8 and its improved model was evaluated using recall, precision, AP, and mAP. Specifically, TP (true positive) refers to correctly detected samples from the target category (e.g., ripe blueberries); FP (false positive) denotes incorrectly identified samples from non-target categories; and FN (false negative) refers to undetected samples from the target category. Recall (R) is the proportion of correctly identified samples out of all actual samples in the target category, while precision (P) is the proportion of correctly identified samples within a detected category. AP (average precision) measures precision across varying recall rates, and mAP (mean average precision) is the average AP across multiple categories, reflecting the model’s overall detection accuracy. These metrics were used to assess YOLOv8 and its improved model’s detection performance on blueberries at different stages: immature, underripe, and ripe. The formula is as follows ((6)–(9)):

P r e c i s i o n = \frac{T P}{T P + F P}

(6)

R e c a l l = \frac{T P}{T P + F N}

(7)

A P = \int_{0}^{1} P \cdot R d R

(8)

m A P = \frac{\sum_{i = 1}^{N} A P_{i}}{N}

(9)

3. Experiments and Analysis of Results

3.1. Experimental Comparison Before and After Model Improvement

Figure 9 presents a performance comparison between the ADE-YOLO and YOLOv8n models across four critical metrics: precision, recall, mAP50, and mAP50-95. Overall, ADE-YOLO demonstrates superior detection capabilities, particularly in the later stages of training. The precision curve reveals that while both models exhibit similar performance during the initial training phases, ADE-YOLO gradually surpasses YOLOv8n as training progresses and maintains greater stability after convergence. This indicates that ADE-YOLO is better at minimizing false positives. In terms of recall, both models perform comparably in the early stages of training, but ADE-YOLO begins to show a notable improvement as training advances, ultimately achieving higher recall. This suggests that ADE-YOLO excels at capturing a larger number of targets. For mAP50 and mAP50-95, the performance difference becomes even more evident. Although the initial gap between the models is small, ADE-YOLO consistently outperforms YOLOv8n as training progresses, with a particularly significant improvement in mAP50-95. This metric highlights ADE-YOLO’s superior detection accuracy under stricter Intersection over Union (IoU) thresholds. This advantage can be attributed to ADE-YOLO’s enhancements in feature extraction and multi-scale detection, which bolster its robustness in handling complex target scenarios. In conclusion, ADE-YOLO offers significant advantages across all key metrics, especially in mAP50 and mAP50-95, underscoring its superior performance and adaptability for detection tasks that demand high precision.

Table 2 compares the performance of ADE-YOLO and YOLOv8 across different ripeness detection tasks (ripe, unripe, and early ripe), evaluated using four key metrics: precision, recall, mAP50, and mAP50-95. Overall, ADE-YOLO consistently outperforms YOLOv8 in all categories. In ripe fruit detection, ADE-YOLO achieves slightly higher precision (98.33%) and recall (97.76%) than YOLOv8 (97.62% and 94.44%, respectively). Moreover, ADE-YOLO excels in mAP50 (99.18%) and mAP50-95 (79.95%), demonstrating enhanced detection precision, particularly in complex scenarios. For unripe and early ripe fruit detection, ADE-YOLO again outperforms YOLOv8 in both precision and recall. The performance improvement is particularly evident in the more challenging mAP50-95 metric, where ADE-YOLO scores 79.95% and 77.36%, surpassing YOLOv8 by 2–3 percentage points. When considering all categories together, ADE-YOLO achieves a precision of 96.49% and recall of 95.38%, significantly outperforming YOLOv8’s 95.25% and 92.58%, respectively. Notably, in the mAP50-95 metric, ADE-YOLO reaches 79.25%, marking a substantial improvement over YOLOv8’s 76.57%. These results highlight ADE-YOLO’s superior performance and robustness in detecting targets across various ripeness levels.

3.2. Ablation Experiments

Table 3 presents the results of ablation experiments on the YOLOv8n model, incorporating various modules for blueberry ripeness detection, including AIFI (Attention-Enhanced Feature Integration), DySample (Dynamic Downsampling Module), and EIOU (Extended Intersection over Union Loss). The baseline YOLOv8n model achieves a precision of 95.25%, recall of 92.58%, mAP50 of 96.33%, and mAP50-95 of 76.57%, with a parameter count of 3.01 M and a weight size of 6.3 MB. Introducing the AIFI module leads to an improvement in precision (95.98%), a significant increase in recall (95.20%), and higher mAP50 (97.27%) and mAP50-95 (78.23%). These results indicate that the AIFI module enhances the model’s ability to represent features, improving its capacity to capture blueberry ripeness features and boosting overall prediction accuracy. The DySample module, designed for dynamic downsampling, improves the model’s adaptability in detecting blueberries of varying sizes. Although recall slightly decreases to 93.36%, precision remains stable at 96.01%, and mAP50 and mAP50-95 reach 97.07% and 77.17%, respectively, demonstrating that DySample enhances adaptability without sacrificing accuracy. The EIOU module, aimed at optimizing localization accuracy, results in a slight drop in precision to 94.97%, while recall improves. Additionally, mAP50 increases to 96.50%, and mAP50-95 improves to 77.20%, showing that EIOU strengthens the model’s robustness, especially in distinguishing blueberries at different ripeness stages in multi-target scenarios. When all three modules are combined, the model achieves its best performance, with precision and recall reaching 96.49% and 95.38%, respectively, and mAP50 and mAP50-95 improving to 97.56% and 79.25%. This significant improvement highlights the synergistic effects of the modules, enhancing the model’s ability to detect blueberries at various ripeness levels. Furthermore, the combined model maintains a lightweight architecture, reducing the parameter count to 2.95 M and the weight size to 6.2 MB, making it well suited for deployment in resource-constrained environments.

3.3. Optimization of Space Pyramid Pools

Table 4 presents a performance comparison of different feature pyramid modules in blueberry ripeness detection, including SPPF (Spatial Pyramid Pooling Field) [25], SPPF-LSKA (SPPF with Local Spatial Kernel Attention), Focal Modulation, SPPELAN (Spatial Pyramid Enhanced Local Attention Network) [26,27], and AIFI (Attention-Enhanced Feature integration). To assess the impact of different feature pyramid designs on blueberry ripeness detection, Table 4 compares five modules: SPPF, SPPF-LSKA, Focal Modulation, SPPELAN, and AIFI. Among them, AIFI achieves the highest overall performance, with a precision of 96.49%, recall of 95.38%, mAP50 of 97.56%, and mAP50-95 of 79.25%, indicating its superior capacity for multi-scale feature interaction and semantic enhancement. The SPPF-LSKA module, which builds on the baseline SPPF by incorporating local spatial kernel attention, shows a slight improvement in recall, suggesting better sensitivity to detailed features. Focal Modulation improves recall further but with a minor drop in precision, implying a trade-off between recognizing challenging targets and maintaining classification accuracy. In contrast, SPPELAN exhibits the weakest performance across all metrics, likely due to its limited ability to extract effective features in complex field environments. These results confirm that the AIFI module provides the most robust and balanced feature representation among the evaluated designs, significantly contributing to the improved accuracy and reliability of blueberry ripeness detection.

Figure 10 visualizes the detection performance of the baseline YOLOv8 model (with the SPPF module) and the proposed ADE-YOLO model (with the AIFI module) using heat maps [28]. The YOLOv8 model shows sparse and uneven activation, particularly with suppressed responses in darker regions of ripe fruit, indicating limitations in feature extraction and difficulty in focusing on immature targets. In contrast, ADE-YOLO exhibits broader and more uniform activation across blueberries of different ripeness levels, demonstrating enhanced feature representation and spatial awareness. The AIFI module significantly contributes to this improvement by enabling richer multi-scale feature interactions and reducing over-reliance on isolated features. As a result, ADE-YOLO achieves greater robustness in detecting small and overlapping fruits, reinforcing its effectiveness in complex orchard environments.

3.4. Comparison of Loss Functions

In this study, we systematically compare the train/dfl_loss performance of six loss functions—GIOU [29], DIOU [30], ShapeIOU [31], SIOU [32], PIOU [33], and EIOU—in the blueberry ripeness detection task (Figure 11). The train/dfl_loss metric reflects each function’s error optimization capacity during training, especially under conditions involving complex backgrounds and indistinct fruit boundaries. All loss functions exhibit a rapid decline in train/dfl_loss during the initial training phase (0–50 epochs), indicating effective early-stage optimization. As training progresses, the loss curves stabilize and converge. Notably, during the convergence phase (150–200 epochs), EIOU consistently maintains the lowest train/dfl_loss values among all candidates, demonstrating its superior optimization ability. EIOU’s advantage lies in its comprehensive consideration of not only the bounding box overlap but also the distance between centroids and differences in aspect ratios. This leads to more precise parameter updates, improving detection accuracy in challenging scenarios. In contrast, functions like GIOU and DIOU underemphasize geometric constraints, resulting in higher residual errors during training.

3.5. Comparison Test

Table 5 presents a comprehensive comparison of ADE-YOLO with mainstream detection models, focusing not only on accuracy but also on computational efficiency and inference speed, which are critical for the real-time deployment of edge devices. ADE-YOLO achieves superior performance, with 96.49% precision, 95.38% recall, 97.56% mAP50, and 79.25% mAP50-95. Despite its lightweight structure of just 2.95 million parameters and 6.2 MB in weight size, it delivers an outstanding 136.99 FPS, alongside moderate 8.1 GFLOPs, making it highly efficient for real-time processing on resource-constrained devices. In comparison, models like SSD and YOLOv3, while achieving lower accuracy (SSD: 82.45%, YOLOv3: 80.71% precision), face much higher computational demands (15.1–154.6 GFLOPs) and offer lower FPS (under 80), significantly impacting their real-time inference capability on edge devices. Lightweight models such as YOLOv3-Tiny (8.67 M parameters, 85.11 FPS) and YOLOv5n (1.78 M parameters, 150.6 FPS) show improved speed or size but still struggle with mAP50-95 (60.32–70.10%), highlighting limitations in detecting small or overlapping targets, which is a common challenge in real-world edge applications. While models like YOLOv7-Tiny, YOLOv9t, and YOLOv10n offer a better balance of accuracy and speed, their mAP50-95 (70.19–76.42%) remains lower than that of ADE-YOLO, which emphasizes its advantage in handling complex detection tasks on edge devices. These results underline how ADE-YOLO’s design optimizations—particularly in reducing FLOPs and increasing FPS—enable superior performance in resource-constrained environments, making it ideal for intelligent blueberry harvesting systems operating in real-time edge-device settings.

3.6. Model Detection Test

Figure 12 showcases the ADE-YOLO model’s detection performance across three ripe stages of blueberries: ripe, semi-ripe, and unripe. In the upper left and right panels, ripe blueberries are accurately identified, with high confidence scores ranging from 0.95 to 0.96, reflecting the model’s strong precision and reliability. In the lower left panel, semi-ripe blueberries are detected with moderately lower confidence scores (0.41 to 0.64), likely due to their color blending with the background, which increases detection difficulty. Nonetheless, the model successfully localizes all semi-ripe fruits, demonstrating robustness under visually complex conditions. The lower right panel presents the detection of unripe green blueberries, with confidence scores between 0.91 and 0.97. This indicates the model’s excellent capability to distinguish early-stage fruits, even under low-contrast conditions. Overall, the visualization results confirm that ADE-YOLO maintains high accuracy and stability across varying ripeness levels, reinforcing its applicability in real-world blueberry harvesting scenarios.

4. Discussion

Application of Blueberry Ripeness Detection in Robotic Picking: Blueberry ripeness detection plays a crucial role in the development of robotic harvesting systems, facilitating the integration of computer vision with agricultural automation. By accurately detecting ripeness, this technology significantly enhances the efficiency of robotic picking systems, reduces labor costs, and helps address labor shortages. It minimizes errors such as mispacking or missed picking, ensuring the quality of the harvested fruit and ultimately providing economic benefits to farmers. Moreover, real-time ripeness detection can offer valuable insights, allowing farmers to optimize harvest timing and resource management. These real-time data help improve decision making regarding harvest timing, resource allocation, and market positioning, giving farmers a competitive advantage. The ADE-YOLO model shows strong potential for robotic harvesting, with its high accuracy and computational efficiency. Based on local testing, the model demonstrated its capability to process images rapidly in real time, making it suitable for deployment in robotic agricultural systems. However, further testing in real-world orchard environments is required to validate its performance under operational conditions.

Limitations and Future Research Directions: Despite its promising performance in controlled environments, ADE-YOLO has not yet been deployed in real-world orchard environments. Testing its performance and stability under practical conditions, particularly on edge devices with limited computational resources, is crucial. While real-time deployment on edge devices presents technical challenges due to computational and energy limitations, the 136.99 FPS achieved in local testing indicates the model’s potential for real-time processing on such platforms. However, performance validation in real-world conditions is required. Recent studies have begun to explore this direction. For instance, Chen et al. [34] designed the ESP-YOLO network for the real-time detection of table grapes on embedded platforms, achieving high detection accuracy with low computational overhead. Similarly, Li et al. [35] proposed a lightweight deep learning model optimized for edge systems, successfully identifying green passion fruits in complex orchard scenes. These studies underscore the growing feasibility of edge deployment for agricultural applications. In comparison, ADE-YOLO offers higher precision (96.49%) and stronger multi-scale detection capabilities, while maintaining a compact size (2.95 M parameters, 6.2 MB), showing great promise for future deployment in edge devices used in agricultural robotics. Future research will focus on optimizing ADE-YOLO for low-power platforms and exploring model compression techniques such as pruning, quantization, and hardware-aware optimization.

5. Conclusions

In this study, we proposed ADE-YOLO, a lightweight modification of YOLOv8n optimized for blueberry ripeness detection. The primary contributions of this research are as follows: (1) Dataset Construction: We developed a comprehensive dataset by collecting many blueberry images from real orchard environments and further expanding it with web-sourced images. These images were meticulously annotated, pre-processed, and augmented using various data enhancement techniques, improving the model’s ability to generalize across complex agricultural scenarios. (2) Model Improvements: Key structural enhancements include the introduction of the AIFI module, which improves feature extraction; the DySample module, which dynamically adjusts sampling density for better detail capture; and the EIOU loss function, which boosts detection accuracy, particularly for small or overlapping targets. (3) Experimental Validation: ADE-YOLO achieved 96.49% precision, 95.38% recall, and 97.56% mAP50. The model also demonstrated a fast processing speed of 136.99 FPS during local testing, making it suitable for real-time applications in robotic harvesting systems. However, real-world testing is required to validate its performance under operational conditions.

In conclusion, ADE-YOLO demonstrates high accuracy and efficiency in blueberry ripeness detection, with a fast processing speed that makes it ideal for real-time deployment in agricultural robotics. Future work will focus on adapting the model for edge deployment, optimizing it for real-world environments, and expanding the training dataset to enhance its robustness.

Author Contributions

Data curation, H.Y., X.B., and Y.D.; Formal analysis, H.Y. and L.Z.; Funding acquisition, H.Y. and Y.D.; Investigation, H.Y. and C.B.; Methodology, H.Y.; Project administration, H.Y.; Resources, H.Y., Z.L., and C.B.; Software, H.Y., Z.W., and L.Z.; Supervision, H.Y.; Validation, H.Y., Z.L., and Z.W.; Visualization, H.Y. and X.L.; Writing—original draft, H.Y.; Writing—review and editing, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Key R&D Project of Jilin Provincial Department of Science and Technology (20210204050YY); Jilin Provincial Department of Education Research Project (JJKH20210747KJ); Jilin Provincial Department of Environmental Protection Project (202107); funding from the Jilin Province Youth Leading Team and Innovative Talent Support Program (No. 2020200301037RQ); and Research Project of the Education Department of Jilin Province: Application Research of Electronic Nose Technology Based on Optimizing the Eigenvalues of the Sensor Array in the Detection of Grain Quality (JJKH20250568KJ).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We thank the anonymous reviewers for their helpful and constructive comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shi, J.; Xiao, Y.; Jia, C.; Zhang, H.; Gan, Z.; Li, X.; Yang, M.; Yin, Y.; Zhang, G.; Hao, J. Physiological and biochemical changes during fruit maturation and ripening in highbush blueberry (Vaccinium corymbosum L.). Food Chem. 2023, 410, 135299. [Google Scholar] [CrossRef]
Wu, Y.; Han, T.; Yang, H.; Lyu, L.; Li, W.; Wu, W. Known and potential health benefits and mechanisms of blueberry anthocyanins: A review. Food Biosci. 2023, 55, 103050. [Google Scholar] [CrossRef]
Krikorian, R.; Shidler, M.D.; Nash, T.A.; Kalt, W.; Vinqvist-Tymchuk, M.R.; Shukitt-Hale, B.; Joseph, J.A. Blueberry supplementation improves memory in older adults. J. Agric. Food Chem. 2010, 58, 3996–4000. [Google Scholar] [CrossRef]
Huang, W.; Wang, X.; Zhang, J.; Xia, J.; Zhang, X. Improvement of blueberry freshness prediction based on machine learning and multi-source sensing in the cold chain logistics. Food Control 2023, 145, 109496. [Google Scholar] [CrossRef]
Kang, S.; Li, D.; Li, B.; Zhu, J.; Long, S.; Wang, J. Maturity identification and category determination method of broccoli based on semantic segmentation models. Comput. Electron. Agric. 2024, 217, 108633. [Google Scholar] [CrossRef]
Moggia, C.; Lobos, G.A. Why measuring blueberry firmness at harvest is not enough to estimate postharvest softening after long term storage? A review. Postharvest Biol. Technol. 2023, 198, 112230. [Google Scholar] [CrossRef]
Tian, Y.; Qin, S.; Yan, Y.; Wang, J.; Jiang, F. Blueberry Ripeness Detection in Complex Field Environments Based on Improved YOLOv8. Trans. Chin. Soc. Agric. Eng. 2024, 40, 153–162. [Google Scholar]
Wang, L.; Qin, M.; Lei, J.; Wang, X.; Tan, K. Blueberry Ripeness Recognition Method Based on Improved YOLOv4-Tiny. Trans. Chin. Soc. Agric. Eng. 2021, 37, 170–178. [Google Scholar]
Wang, C.; Han, Q.; Li, J.; Li, C.; Zou, X. YOLO-BLBE: A Novel Model for Identifying Blueberry Fruits with Different Maturities Using the I-MSRCR Method. Agronomy 2024, 14, 658. [Google Scholar] [CrossRef]
Liu, Y.; Zheng, H.; Zhang, Y.; Zhang, Q.; Chen, H.; Xu, X.; Wang, G. “Is this blueberry ripe?”: A blueberry ripeness detection algorithm for use on picking robots. Front. Plant Sci. 2023, 14, 1198650. [Google Scholar] [CrossRef]
Feng, W.; Liu, M.; Sun, Y.; Wang, S.; Wang, J. The Use of a Blueberry Ripeness Detection Model in Dense Occlusion Scenarios Based on the Improved YOLOv9. Agronomy 2024, 14, 1860. [Google Scholar] [CrossRef]
Xiao, F.; Wang, H.; Xu, Y.; Shi, Z. A Lightweight Detection Method for Blueberry Fruit Maturity Based on an Improved YOLOv5 Algorithm. Agriculture 2024, 14, 36. [Google Scholar] [CrossRef]
Zhao, Y.; Li, Y.; Xu, X. Object Detection in High-Resolution UAV Aerial Remote Sensing Images of Blueberry Canopy Fruits. Agriculture 2024, 14, 1842. [Google Scholar] [CrossRef]
Wichura, M.A.; Koschnick, F.; Jung, J.; Bauer, S.; Wichura, A. Phenological growth stages of highbush blueberries (Vaccinium spp.): Codification and description according to the BBCH scale. Botany 2024, 102, 428–437. [Google Scholar] [CrossRef]
Giongo, L.; Poncetta, P.; Loretti, P.; Costa, F. Texture profiling of blueberries (Vaccinium spp.) during fruit development, ripening and storage. Postharvest Biol. Technol. 2013, 76, 34–39. [Google Scholar] [CrossRef]
Min, D.; Zhao, J.; Bodner, G.; Ali, M.; Li, F.; Zhang, X.; Rewald, B. Early decay detection in fruit by hyperspectral imaging–Principles and application potential. Food Control 2023, 152, 109830. [Google Scholar] [CrossRef]
Bartunek, J.S.; Nilsson, M.; Sallberg, B.; Claesson, I. Adaptive fingerprint image enhancement with emphasis on preprocessing of data. IEEE Trans. Image Process. 2012, 22, 644–656. [Google Scholar] [CrossRef]
Faris, P.D.; Ghali, W.A.; Brant, R.; Norris, C.M.; Galbraith, P.D.; Knudtson, M.L.; Investigators, A. Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. J. Clin. Epidemiol. 2002, 55, 184–191. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Fang, J.; Michael, K.; Montes, D.; Nadar, J.; Skalski, P. ultralytics/yolov5; v6. 1-tensorrt, tensorflow edge tpu and openvino export and inference; Zenodo: Brussel, Belgium, 2022. [Google Scholar]
Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 11–15 June 2024; pp. 16965–16974. [Google Scholar]
Liu, W.; Lu, H.; Fu, H.; Cao, Z. Learning to upsample by learning to sample. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6027–6037. [Google Scholar]
Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
Lau, K.W.; Po, L.-M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
Wang, C.; Yeh, I.; Liao, H. YOLOv9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12993–13000. [Google Scholar]
Zhang, H.; Zhang, S. Shape-iou: More accurate metric considering bounding box shape and scale. arXiv 2023, arXiv:2312.17663. [Google Scholar]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
Chen, Z.; Chen, K.; Lin, W.; See, J.; Yu, H.; Ke, Y.; Yang, C. Piou loss: Towards accurate oriented object detection in complex environments. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part V 16, pp. 195–211. [Google Scholar]
Chen, J.; Chen, H.; Xu, F.; Lin, M.; Zhang, D.; Zhang, L. Real-time detection of mature table grapes using ESP-YOLO network on embedded platforms. Biosyst. Eng. 2024, 246, 122–134. [Google Scholar] [CrossRef]
Li, H.; Chen, J.; Gu, Z.; Dong, T.; Chen, J.; Huang, J.; Gai, J.; Gong, H.; Lu, Z.; He, D. Optimizing edge-enabled system for detecting green passion fruits in complex natural orchards using lightweight deep learning model. Comput. Electron. Agric. 2025, 234, 110269. [Google Scholar] [CrossRef]

Figure 1. Geographic location and harvesting environment.

Figure 2. Photographs of three blueberry ripening stages: (a) fully mature, (b) semi-mature, and (c) immature.

Figure 3. Changes in data enhancement: (a,c) original picture; (b,d) data-enhanced picture.

Figure 4. ADE-YOLO network structure.

Figure 5. AIFI structure diagram.

Figure 6. DySample upsampling module.

Figure 7. Point sampling based on dynamic scaling factors.

Figure 8. Truth bounding box and prediction bounding box.

Figure 9. Comparison of metrics before and after model improvement. (a) Precision, (b) recall, (c) mAP50, and (d) mAP50-95.

Figure 10. Visualization results of thermal characteristics before and after the introduction of AIFI. (a) Original photo; (b) YOLOv8; (c) ADE-YOLO.

Figure 11. Comparison of different dfl_loss loss functions in the training set.

Figure 12. Improved ADE-YOLO detection results. (a) Blueberries at different stages of ripeness: (b) fully mature; (c) semi-mature; (d) immature.

Table 1. Detailed hyperparameters of the experiment.

Parameter	Setup
Epochs	200
Batch Size	16
Optimizer	SGD
Initial Learning Rate	0.01
Final Learning Rate	0.01
Momentum	0.937
Weight Decay	5 × 10⁻⁴
Close Mosaic	Last ten epochs
Images	640
Workers	8
Mosaic	1.0

Table 2. Improved blueberry ripeness detection results.

Level	Model	Precision (%)	Recall (%)	mAP50 (%)	mAP50-95 (%)
Mature	YOLOv8	97.62	94.44	98.62	78.15
Mature	ADE-YOLOv8	98.33	97.76	99.18	79.95
Undermature	YOLOv8	94.51	93.81	96.30	77.01
Undermature	ADE-YOLOv8	95.72	94.42	97.37	79.95
Immature	YOLOv8	93.61	89.49	94.06	74.53
Immature	ADE-YOLO	95.42	93.95	96.14	77.36
ALL	YOLOv8	95.25	92.58	96.33	76.57
ALL	ADE-YOLOv8	96.49	95.38	97.56	79.25

Table 3. Ablation experiments.

YOLOv8n	AIFI	DySample	EIOU	Precision (%)	Recall (%)	mAP50 (%)	mAP50-95 (%)	Parameters (M)	Weight (MB)
√				95.25	92.58	96.33	76.57	3.01	6.3
√	√			95.98	95.2	97.27	78.23	2.93	6.2
√		√		96.01	93.36	97.07	77.17	3.02	6.3
√			√	94.97	92.69	96.50	77.20	3.01	6.3
√	√	√		95.16	94.22	97.04	78.13	2.94	6.2
√	√		√	94.28	94.63	97.02	77.16	2.94	6.2
√		√	√	95.67	94.25	97.34	77.28	2.80	6.3
√	√	√	√	96.49	95.38	97.56	79.25	2.95	6.2

Table 4. Comparison of the effects of different feature pyramid modules.

Attention Mechanism	Precision (%)	Recall (%)	mAP50 (%)	mAP50-95 (%)
SPPF	95.67.	94.25	97.34	77.28
SPPF-LSKA	95.66	94.81	97.32	77.80
Focal Modulation	94.64	95.20	96.56	77.69
SPPELAN	94.19	89.96	94.99	76.31
AIFI	96.49	95.38	97.56	79.25

Table 5. Comparison experiments with other models.

Model	Precision (%)	Recall (%)	mAP50 (%)	mAP50-95 (%)	Parameters (M)	Weight (MB)	FLOPs (G)	FPS (S)
SSD	82.45	78.67	87.3	66.74	14.34	48.10	15.10	77.06
YOLOv3	80.71	77.19	85.03	70.42	61.5	123.60	154.60	45.84
YOLOv3-Tiny	88.2	80.51	87.80	60.32	8.67	17.50	12.90	85.11
YOLOv5n	94.49	89.53	94.44	70.10	1.78	3.90	4.10	150.6
YOLOv5s	92.0	85.72	91.07	72.62	7.03	14.50	15.80	90.48
YOLOv7	93.40	91.39	95.31	73.63	37.21	74.80	105.10	60.00
YOLOv7-Tiny	94.43	92.30	96.22	70.19	6.02	12.30	13.20	97.16
YOLOv8n	95.25	92.58	96.33	76.57	3.01	6.30	8.20	130.05
YOLOv9t	92.02	85.71	91.04	72.62	2.62	5.70	11.00	107.56
YOLOv10n	94.11	91.08	96.24	76.42	2.70	5.70	8.40	121.43
ADE-YOLO	96.49	95.38	97.56	79.25	2.95	6.20	8.10	136.99

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

You, H.; Li, Z.; Wei, Z.; Zhang, L.; Bi, X.; Bi, C.; Li, X.; Duan, Y. A Blueberry Maturity Detection Method Integrating Attention-Driven Multi-Scale Feature Interaction and Dynamic Upsampling. Horticulturae 2025, 11, 600. https://doi.org/10.3390/horticulturae11060600

AMA Style

You H, Li Z, Wei Z, Zhang L, Bi X, Bi C, Li X, Duan Y. A Blueberry Maturity Detection Method Integrating Attention-Driven Multi-Scale Feature Interaction and Dynamic Upsampling. Horticulturae. 2025; 11(6):600. https://doi.org/10.3390/horticulturae11060600

Chicago/Turabian Style

You, Haohai, Zhiyi Li, Zhanchen Wei, Lijuan Zhang, Xinhua Bi, Chunguang Bi, Xuefang Li, and Yunpeng Duan. 2025. "A Blueberry Maturity Detection Method Integrating Attention-Driven Multi-Scale Feature Interaction and Dynamic Upsampling" Horticulturae 11, no. 6: 600. https://doi.org/10.3390/horticulturae11060600

APA Style

You, H., Li, Z., Wei, Z., Zhang, L., Bi, X., Bi, C., Li, X., & Duan, Y. (2025). A Blueberry Maturity Detection Method Integrating Attention-Driven Multi-Scale Feature Interaction and Dynamic Upsampling. Horticulturae, 11(6), 600. https://doi.org/10.3390/horticulturae11060600

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Blueberry Maturity Detection Method Integrating Attention-Driven Multi-Scale Feature Interaction and Dynamic Upsampling

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Collection

2.2. Data Processing

2.3. Model Selection and Enhancement

2.3.1. AIFI

2.3.2. DySample

2.3.3. EIOU

2.4. Experimental Environment

2.5. Evaluation Criteria

3. Experiments and Analysis of Results

3.1. Experimental Comparison Before and After Model Improvement

3.2. Ablation Experiments

3.3. Optimization of Space Pyramid Pools

3.4. Comparison of Loss Functions

3.5. Comparison Test

3.6. Model Detection Test

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI