GIA-YOLO: A Target Detection Method for Nectarine Picking Robots in Facility Orchards

Ren, Longlong; Li, Yuqiang; Du, Yonghui; Gao, Ang; Ma, Wei; Song, Yuepeng; Han, Xingchang

doi:10.3390/agronomy15081934

Open AccessArticle

GIA-YOLO: A Target Detection Method for Nectarine Picking Robots in Facility Orchards

by

Longlong Ren

^1,2,

Yuqiang Li

¹,

Yonghui Du

¹,

Ang Gao

¹,

Wei Ma

³,

Yuepeng Song

^1,2,*

and

Xingchang Han

^4,5,*

¹

College of Mechanical and Electronic Engineering, Shandong Agricultural University, Taian 271018, China

²

Shandong Key Laboratory of Intelligent Production Technology and Equipment for Facility Horticulture, Taian 271018, China

³

Institute of Urban Agriculture, Chinese Academy of Agricultural Sciences, Chengdu 610213, China

⁴

Shandong Academy of Agricultural Machinery Sciences, Jinan 250100, China

⁵

Huang Huai Hai Laboratory of Modern Agricultural Equipment, Ministry of Agriculture and Rural Affairs, Jinan 250100, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2025, 15(8), 1934; https://doi.org/10.3390/agronomy15081934

Submission received: 13 July 2025 / Revised: 8 August 2025 / Accepted: 9 August 2025 / Published: 11 August 2025

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

The complex and variable environment of facility orchards poses significant challenges for intelligent robotic operations. To address issues such as nectarine fruit occlusion by branches and leaves, complex backgrounds, and the demand for high real-time detection performance, this study proposes a target detection model for nectarine fruit based on the YOLOv11 architecture—Ghost–iEMA–ADown You Only Look (GIA-YOLO). We introduce the GhostModule to reduce the model size and the floating-point operations, adopt the fusion attention mechanism iEMA to enhance the feature extraction capability, and further optimize the network structure through the ADown lightweight downsampling module. The test results show that GIA-YOLO achieves 93.9% precision, 88.9% recall, and 96.2% mAP, which are 2.2, 1.1, and 0.7 percentage points higher than YOLOv11, respectively; the size of the model is reduced to 5.0 MB and the floating-point operations is reduced to 5.2 G, which is 9.1% and 17.5% less compared to the original model, respectively. The model was deployed in the picking robot system and field tested in the nectarine facility orchard, the results show that GIA-YOLO maintains high detection precision and stability at different picking distances, with a comprehensive missed detection rate of 6.65%, a false detection rate of 8.7%, and supports real-time detection at 41.6 FPS. The results of the research provide an important reference and support for the optimization of the design and application of the nectarine detection model in the facility agriculture environment.

Keywords:

deep learning; nectarine; target detection; facility orchard; picking robotics

1. Introduction

With the continuous development of intelligent agricultural technology, automated orchard management has gradually become a key direction to promote the transformation and to upgrade the modern fruit tree industry [1]. As a high value-added economic fruit tree, nectarine has the advantages of regular fruit shape and strong commerciality under facility cultivation conditions, which is more suitable for large-scale production and the intelligent management of orchards. However, the complexity of light, branch, and leaf shading, dense fruit shading, and other factors in the orchard environment, result in the target detection of nectarine fruit often having leakage, high false detection rate, and inefficiency, among other problems. It is difficult to meet the double demand for high precision and real-time fruit intelligent picking, so improving the target detection performance of nectarines is an important part of the research to enhance the automated management of the orchard [2].

In recent years, with the rapid development of deep learning object detection technology, the YOLO series algorithms have been widely applied in the field of agricultural fruit detection, with the significant advantages of fast speed and high precision. Aiming at the problem of complex background and significant target scale changes in the detection environment, Jianhua Liu et al. improved a strawberry detection model by adding a hybrid channel spatial attention mechanism, which effectively enhanced the strawberry ripeness detection precision [3]. Zhanglei Yan et al. enhanced the complexity of apple target localization and detection in a complex environment by combining the multiscale channel attention MSCA and the EnMPDIoU loss function [4]. Qian Wu et al. proposed the YOLO-PGC tomato ripeness detection framework, which achieved 81.6% mAP and 80.4% precision by integrating the three modules, PSSD for adaptive weight assignment, GHVC for occlusion handling, and CIF2M for complex background differentiation, using only 3.8 M parameters and 8.7 G FLOPs [5]. In greenhouse tomato detection, Zeyang Bi et al. proposed an improved lightweight EDH-YOLO algorithm, which was shown to be able to maintain high precision while reducing computational complexity and model size for tomato detection in facility orchard environments, with mean values of precision, recall, and mAP of 95.9%, 93.1%, and 96.8%, respectively; the model size was only 7.3 MB, with a detection speed of 53.2 fps [6]. He Bin et al. proposed an improved method for nighttime greenhouse tomato detection, which significantly improves tomato fruit detection in complex situations, such as overlapping fruit and occlusion [7]. Xueyan Zhu et al. proposed an improved lightweight oleander fruit target detection model, YOLO-LM, constructed to significantly reduce the model complexity while improving the detection precision and robustness, especially in the natural orchard occlusion scenarios with superior performance [8]. Na Ma et al. proposed the AHG-YOLO pear fruit detection model, which was structurally optimized based on the YOLOv11 framework. By introducing ADown downsampling, lightweight detection head, and GIoU loss function, it not only raised the detection precision and model convergence speed, but enhanced the practicality of the model in embedded device deployment [9]. In peach fruit detection, Jianping Jing et al. proposed a YOLO-PEM model based on the YOLOv8s framework for the task of detecting young peach fruit, which was suitable for intelligent fruit thinning in complex orchard environments through structural lightweighting, multi-scale attention mechanism embedding, and loss function optimization [10]. Guohai Zhang et al. proposed an open-air nectarine detection method, YOLOv8n-CSD, which had a model size of 4.7 MB. The model can achieve 95.1% precision, 84.9% recall, and 93.2% mAP, and could recognize nectarines in different environments faster and more accurately [11].

In recent years, improved models based on the YOLO series architecture have shown good performance in various fruit type detection tasks, especially lightweight models, which have significant advantages in edge device deployment. However, according to the existing literature, there is still relatively little research on the detection of peach fruit in facility orchards, and there is a lack of detection and evaluation on robot platforms. Therefore, we have designed a facility orchard nectarine detection model that balances detection performance and deployment efficiency, and have verified its practicality in picking scenarios. In this paper, we design a model that balances detection performance and model size, and validate its practicality in facility orchards, as shown in Figure 1. The overall process of the method is divided into four parts, namely, image acquisition, model improvement, result analysis, and field validation. Firstly, we construct a high-quality complex nectarine dataset, then improve the model by introducing the GhostModule to reduce the model size and computational complexity. We then combine the iEMA fusion attention mechanism to enhance the feature extraction capability, add the ADown downsampling module to continue optimizing the module, and verify the evaluative indexes of the model through ablation verification and comparison tests. Finally, we integrate GIA-YOLO in the picking robot system for testing to examine its field work performance. The robot system was equipped with a crawler chassis, a six-axis robotic arm, a depth camera, a Nuvo-8003 industrial computer, various high-precision sensors, and a self-developed nectarine picking end-effector mounted at the end of the robotic arm. This study provides a feasible way for the optimization, design, and application of a fruit detection model in facility agriculture, and also provides a strong support for the development of a sensing system for agricultural intelligent equipment. This article designs and validates an efficient and lightweight nectarine fruit target detection model suitable for complex environments in facility orchards. Through module fusion and optimization, the model achieves high-precision detection and low resource consumption in picking scenarios, such as occlusion, dimming, and density, in facility orchards, providing a stable visual perception ability for picking robot systems in facility orchards, and also providing a feasible method for optimizing the design and application of fruit detection models in facility agriculture.

2. Materials and Methods

2.1. Image Acquisition

The nectarine tree variety collected for the experiment is ‘Zhongyou 4’, which was bred by the Peach Breeding Group of the Zhengzhou Institute of Fruit Tree, Chinese Academy of Agricultural Sciences (CAAS). This variety is characterized by rounded, full, and symmetrical fruit tops. The fruit is initially light yellow and turns bright red upon ripening, with vivid color, high productivity, and strong resistance to storage and transportation. In this study, data collection was carried out in a nectarine facility orchard located in Wangfang Village, Taian, Shandong Province, China (36.20° N, 116.80° E). Meanwhile, the facility orchard cultivation mode can avoid natural disasters, achieve off-season production, increase economic benefits, and make the orchard neat and transparent, suitable for deploying intelligent robot systems. Images were collected on 5 May 2025, using a Xiaomi 12S Pro smartphone (Xiaomi Co. Ltd., Beijing, Chian) equipped with a 50-megapixel rear camera. The image frame ratio was set to 1:1, and the shooting distance ranged from 20 to 50 cm. The nectarine trees were planted with a row spacing of approximately 1.3 m and a plant spacing of about 0.8 m. Due to the complex background in the facility orchard, detection precision was often affected by factors such as light intensity, which made object identification more challenging. Therefore, images were collected under different lighting conditions, shooting angles, and distances. The dataset included scenarios with varying fruit densities, backlighting, fruit occlusion, and leaf occlusion to enhance data diversity. Samples of the nectarine dataset are shown in Figure 2.

A total of 3234 nectarine images were collected as the training dataset, as shown in Table 1. Based on the constructed dataset, this study used the LabelImg tool to manually label the nectarine samples [12]. A rectangular box was manually drawn to fit the edges of each fruit, and the label was named, “Nectarine”. To ensure the reliability and validity of model training and validation, the dataset was divided into training, validation, and testing sets in a ratio of 7:2:1.

2.2. GIA-YOLO Target Detection Network

In this study, a high-precision and lightweight nectarine detection model for facility orchards—GIA-YOLO—was proposed based on the lightweight version of the YOLOv11 series, namely YOLOv11n [13]. The structure of the proposed model is shown in Figure 3. To meet the requirements of edge computing devices for low-power consumption while maintaining high detection precision, three key improvements were made to the YOLOv11n framework in this work.

First, the C3k2 module in the neck module was replaced by the GhostModule, which reduced the model size and the floating-point operations load of the network, while also im-proving processing efficiency without sacrificing model precision. Second, an iEMA fused-attention mechanism module was adopted and embedded at the twentieth layer of the neck module. This module combines local perception and dynamic multi-scale modeling capabilities, and further enhances nectarine feature extraction through a dual-path synergistic structure. In addition, the fourth and sixth convolution layers in the backbone, as well as the convolution module in the neck module, were replaced with the ADown module. These modules have multi-scale feature extraction capability, effectively retain key feature details during downsampling, and reduce the number of model parameters, thereby lowering computational and storage requirements. Together, these improvements constitute the enhanced GIA-YOLO target detection network, which was optimized in terms of detection precision, computational efficiency, and model size, making it suitable for real-time detection and automated nectarine picking tasks in low-power environments.

2.2.1. Convolutional Neural Network GhostModule

The GhostModule [14] is a module designed based on the GhostNet [15] architecture. It combines the CSP concept from the YOLO family with the efficient feature extraction mechanism of GhostNet. This module combines the functionality of C3k and Ghost Bottleneck [16]. Its core idea is to generate additional feature maps from a few intrinsic ones using low-cost linear operations, rather than discarding redundant information. This approach aims to reduce model parameters and to strike a balance between performance and computational cost [17]. In the captured nectarine images, there are numerous repetitive textures and color features from branches and leaves, both on the fruit and in the background. The GhostModule first generates a small number of primary feature maps using 1 × 1 standard convolutions. Then, it efficiently produces the remaining feature maps through linear transformations using multiple high-efficiency φ-channels. This avoids redundant computation and enables the generation of auxiliary representations from various perspectives, enhancing the expression of fine details and boundaries. Finally, the original and newly generated feature maps are concatenated, which reduces the computational burden of the deep learning network. As shown in Equations (1)–(3), compared to the original module, the GhostModule reduces computational cost by approximately 1/s while maintaining high performance [18]. Its structure is illustrated in Figure 4.

The traditional convolutional arithmetic is as follows:

k \times k \times n \times h \times w \times c

(1)

The total computation of the GhostModule consists of two parts, the convolutional parameter and the linear transformation parameter, as follows:

k \times k \times \frac{n}{s} \times h \times w \times c + \frac{n (s - 1)}{s} \times h \times w \times p \times p

(2)

Compared to standard convolution, the GhostModule’s operations are reduced as follows:

\frac{k \times k \times \frac{n}{s} \times h \times w \times c + \frac{n (s - 1)}{s} \times h \times w \times p \times p}{k \times k \times n \times h \times w \times c} \approx \frac{1}{s} + \frac{s - 1}{s \times c} \approx \frac{1}{s}

(3)

where m is the number of original feature map channels, c is the number of input feature map channels, h is the height of the output feature map, w is the width of the output feature map, n is the number of channels of the output feature map, and s is the number of channels of the generated GhostModule feature map; then, m = n/s; k × k is the size of the input convolution kernel, p × p is the convolution kernel after linear arithmetic, and the size of k × k is close to the size of p × p, and s

≪

c.

2.2.2. The iEMA Fusion Attention Mechanism

When nectarine fruit detection is performed in facility orchards, diverse target morphology, strong lighting, and branch and leaf shading often make it difficult for traditional feature extraction methods to capture sufficiently rich and accurate features, thereby reducing detection precision. To address this problem, this paper proposes an efficient feature enhancement module, iEMA [19], which was embedded into the neck module.

The EMA [20] module ensures well-distributed spatial semantic features within each feature group by reshaping part of the channels into the batch dimension and grouping the channels into multiple sub-features. This design avoids potential feature loss caused by traditional channel dimensionality reduction strategies. By combining grouped normalization and multi-scale convolution to construct the spatial attention map, and applying softmax normalization for attention weighting and fusion, the module effectively captures the spatial features of nectarines. It enhances the detection of nectarines in cases of overlapping fruit occlusion or partial leaf coverage. The parallel sub-network design allows the module to integrate structural information at different spatial scales, reducing excessive sequential operations without increasing network depth. This minimizes redundant steps, stabilizes training, and improves overall model performance. The structure is illustrated in Figure 5a. The iRMB [21] residual mobile network module combines the lightweight characteristics of convolutional neural networks with the dynamic processing capabilities of transformer architecture. Its inverted residual design improves information flow and enables the capture of long-range dependencies, all while keeping the model lightweight. This makes iRMB particularly suitable for high-intensity prediction tasks on mobile devices. Its structure is shown in Figure 5b.

The iEMA attention mechanism combines the EMA attention mechanism with the iRMB structure, which effectively integrates local spatial details and global semantic context information to improve the robustness and expressive ability of the model in the target detection task. Firstly, the iEMA module performs the initial transformations of the input features, in terms of the channel dimensions, to enhance the stability of the feature distributions. Subsequently, the input features are divided into two groups of sub-features, which are input into different paths for processing, and one group is used for global dependency modeling through the EMA attentional mechanism. In the EMA path, one-dimensional global pooling is used to encode the context information of the features in the horizontal and vertical directions, respectively, and then lightweight convolution operation is used to realize the cross-spatial attentional mapping, so as to strengthen the response capability of the key regions. The two sub-paths are processed and integrated in a weighted fusion way to form the final output features, which combines the convolutional expansion channel, depth-separable convolution, and convolutional compression channel of the iRMB structure to enhance the feature extraction capability while maintaining the computational efficiency. The iEMA module, through the integration of lightweight structural design and the attention mechanism, not only has stronger semantic expression capability, under the premise of taking into account the computational efficiency, but enhances the model’s local texture of nectarine fruit. When detecting partially occluded nectarine fruit by branches and leaves, the iEMA model performs global pooling on the input feature map in both horizontal and vertical directions to obtain contextual tensors in two directions, enhancing the local detection ability and contour of nectarine fruit, inferring that the occluded area may still be part of the fruit, thus maintaining the target response. Through multi-scale convolution kernels and lightweight residual structures, detailed texture features are extracted, preserving edge details and deep semantics, and improving detection precision and robustness in complex backgrounds. The structure is shown in Figure 5c.

2.2.3. Downsampling Module ADown

When performing detection in a nectarine facility orchard environment, direct deployment to edge devices is challenging due to the large size and computational cost of the original model. In addition, traditional convolution-based downsampling with large step sizes can result in feature loss, especially when dealing with subtle features in long-range targets.

Therefore, in this study, the ADown [22] module was used for downsampling The ADown module structure is shown in Figure 6. By combining global average pooling, global maximum pooling, and convolutional feature concatenation, the ADown module introduces an innovative dual-branch collaborative feature compression architecture. This design not only effectively reduces computational cost but retains important feature details. In this module, the input feature maps are first processed using 2D average pooling to reduce positional sensitivity and to mitigate edge effects. The feature map is then divided into two branches, X₁ and X₂. The X₁ branch captures salient regions through max pooling and enhances cross-channel correlations using a 1×1 convolution. The X₂ branch performs spatial downsampling via a 3 × 3 convolution, compressing the spatial dimensions of the feature map. Finally, the outputs of the two branches are concatenated along the channel dimension to form the final output tensor. The dual-branch structure adopts a heterogeneous sampling strategy [23], which enables efficient dimension reduction while preserving key discriminative and contextual information. This enhances the model’s robustness and precision when handling complex, fine-grained image data. The high efficiency and multi-scale feature extraction capabilities of the ADown module improve the model’s adaptability to challenging orchard environments, including issues such as strong illumination, occlusion, and diverse background textures. By reducing the number of floating-point operations while retaining critical feature information, ADown supports both precision and real-time performance in edge-computing scenarios.

2.3. Edge Device Deployment

In order to verify the model’s performance in practical applications, it was deployed on a nectarine picking robot. The models are shown in Table 2. This configuration enables the robot to perform nectarine picking through precise control of the end-effector. The industrial controller, running Ubuntu 20.04.6, was configured with an Intel i7-8700 CPU, 16 GB RAM, and a GeForce RTX 2060 GPU. It provided sufficient computational power to support real-time image processing and target detection using the GIA-YOLO model, ensuring the efficient operation of the vision system in dynamic environments. Meanwhile, an interactive interface was designed to visualize detection and picking results, allowing orchard managers to monitor system performance and to view real-time data conveniently.

2.4. Evaluation Indicators and Parameterization

In this study, model training and testing were conducted in a Linux operating system (Ubuntu22.04) environment with hardware configurations including Intel Xeon Platinum 8352V processor, NVIDIA GeForce RTX 4090 processor, and 24 GB of operating memory. The tests were conducted using the Python 3.10 programming language, trained with the PyTorch 2.1.0 deep learning framework, and CUDA12.1 was invoked as a GPU acceleration library to enhance the training efficiency. In order to accurately evaluate the performance of the model, the precision (P) [24], recall (R) [25], mean average precision mean (mAP) [26], model size, and floating-point operations (FLOPs) [27] were used as the evaluation indexes of the model in this study. The training parameter settings are as follows: the model input size is 640 × 640, the multilinear process is 8, the stochastic gradient descent method is used to optimize the network parameters, the initial learning rate of weights is 0.01, and a total of 200 rounds of training are performed [28].

3. Results and Analysis

3.1. Analysis of Ablation Test Results

In order to systematically verify the practical contribution of each improved module in GIA-YOLO for nectarine target detection in facility orchards, this paper uses YOLOv11n as the benchmark network, adding the GhostModule, the fused-attention module iEMA, and the lightweight downsampling module ADown, in turn, and designs ablation experiments for comparison. The experimental results are shown in Table 3, and the metrics include Precision, Recall, mAP, Model Size, and FLOPs.

The analysis results showed that, after introducing GhostModule on the basis of YOLOv11n, the precision increased from 91.7% to 92.3%, the recall increased from 87.8% to 88.4%, and the mAP increased from 95.5% to 95.7%. This module first captures features, such as the shape, fruit surface pattern, and branch and leaf texture of nectarines, generates a small number of main feature maps, and then generates redundant feature maps through low-cost linear transformation. The feature maps are then fused to reduce unnecessary convolution operations, effectively reducing network redundancy, improving overall processing efficiency, and significantly reducing the model size and the floating-point operations. On this basis, the further introduction of the integrated attention module iEMA brings about an improvement in performance, with precision and recall improving to 93.6% and 88.5%, respectively, and mAP reaching 96.0%. The iEMA module integrates multi-scale spatial attention, depthwise separable convolution, and a feedforward network, enhancing the robustness and discriminative ability of the model for recognizing incomplete nectarine fruit features under complex orchard conditions, such as lighting changes and branch and leaf occlusion. It also utilizes global horizontal and vertical pooling to generate cross spatial attention maps, highlighting the response of the target area, and introduces lightweight residual structures. Combined with depthwise separable convolution, it improves the efficiency of local and global feature fusion, compensates for the features of obstructed nectarine fruit, and has the ability to depict details and understand global semantics. Despite the increase in computing resources, the precision of detection has been improved. In addition, the model was further optimized by using the lightweight downsampling module ADown, with precision and recall improving to 93.6% and 88.5%, respectively, and mAP reaching 96.0%. This module adopts a dual branch structure that combines max pooling and convolutional downsampling, preserving the salient regions and fine-grained structural information of the nectarine fruit, ensuring that the compressed space size of the detection key features is retained even after feature reduction, avoiding the loss of key features in the downsampling stage, and ensuring stable information flow transmission. In the detection task of nectarine fruit, different models collaborated with each other to construct a complete recognition chain from lightweight feature extraction to complex scene feature enhancement, and then to high fidelity transmission, thereby enhancing the recognition performance of nectarine fruit in complex facility orchard environments under conditions of single target, multiple targets, backlighting, occlusion by fruit, and occlusion by branches and leaves. As a result, the final model achieved 93.9% precision, 88.9% recall, and 96.2% mAP. Moreover, the model size was reduced to 5.0 MB, and the floating-point operations dropped from 6.3 G to 5.2 G, significantly optimizing resource consumption. Each module contributed effectively to improving detection precision, enhancing feature representation, and reducing computational cost. Compared to the original YOLOv11n, the combined improvements in GIA-YOLO increased precision by 2.2%, recall by 1.1%, and mAP by 0.7%, while reducing the model size by 9.1% and the floating-point operations by 17.5%. These results demonstrated that the proposed improvement strategy enabled the lightweight and efficient optimization of the nectarine detection task in complex orchard environments, while maintaining detection precision, and showed strong potential for deployment on edge devices.

3.2. Analysis of the GhostModule Test Results

In order to evaluate the impact of different module structures on the performance of nectarine fruit target detection and model resource consumption in facility orchards, this study constructed four different network structures by replacing the C3K2 module in the neck module with PConv, LDConv, SAConv, and the GhostModule used in this work, respectively, while keeping the rest of the network architecture unchanged. The experimental results are presented in Table 4.

The four structures performed differently in terms of detection precision and resource efficiency. The PConv model was lightweight and had a high recall rate. This indicated that it had a strong ability to comprehensively capture the target object, which helped to reduce missed detections in the orchard scenario. However, its relatively low precision led to a higher risk of false alarms and slightly higher resource consumption. The LDConv model achieved the best mAP performance, demonstrating strong recognition capability for fruit targets with large scale or pose variations in orchards. Nevertheless, its recall was slightly lower than that of the GhostModule, and its significantly increased resource consumption limits its applicability on edge devices. The SAConv model outperformed the GhostModule in terms of precision, effectively reducing false detections caused by cluttered backgrounds in orchards. However, its resource consumption was notably high, with a model size of 7.2 MB, making it less suitable for lightweight deployment. As shown in Figure 7, the GhostModule maintained a strong detection performance while achieving the smallest model size and the lowest floating-point operations, demonstrating a clear advantage in efficiency.

3.3. Analysis of the iEMA Attention Mechanism Test Results

Based on the improvement of the above modules, and in order to further enhance the model’s feature extraction capability and target perception performance, four attention mechanisms—SEAM, ACmix, ECA, iRMB, and the iEMA mechanism proposed in this study—were introduced into the network for comparative testing. The results are presented in Table 5.

The experimental results showed that different attention mechanisms had their own characteristics in terms of precision improvement and resource consumption. The SEAM attention mechanism performed optimally in terms of precision rate, effectively focused on key regions, such as fruit, and demonstrated strong discriminative ability in orchard scenes with complex backgrounds. However, its recall was the lowest, leading to target misdetections that affected overall detection integrity. The ACmix attention mechanism achieved the highest recall under varying lighting but had lower precision and mAP, with a high model size. Although ECA achieved the highest mAP and strong precision, its lower recall rate limited its overall robustness compared to iEMA, which offered a better trade-off across all metrics. The iRMB attention mechanism struck a balance between precision and recall rates, but its floating-point operations reached 8.0 G, resulting in significant resource consumption. The iEMA module achieved high precision, recall, and mAP values while maintaining a low model size and floating-point operations, thereby offering an optimal balance between performance and resource efficiency.

3.4. Analysis of the ADown Module Test Results

On the basis of improving the optimization of the convolutional structure and the attention mechanism, this paper further introduces three downsampling mechanisms, HWDown, CGDown, and ADown, for performance comparison and evaluates their effectiveness in the target detection task. The experimental results are shown in Table 6.

The results showed that the addition of the ADown module excelled in precision rate and mAP. Its two-branch synergistic feature compression architecture achieved low model size and floating-point operations, demonstrating superior resource efficiency. Although the addition of HWDown resulted in the highest recall rate, ADown also achieved relatively high recall performance, striking a better balance between precision and efficiency. In addition, although the addition of CGDown resulted in a slightly higher mAP and a reasonably good precision rate, it exhibited a low recall rate and relatively high resource consumption, which was not favorable for deployment in resource-constrained environments. As shown in Figure 8, the ADown module exhibited significant advantages in both resource control and slightly improved detection precision, making it suitable for nectarine fruit detection in facility orchards.

3.5. Mainstream Model Comparison Results

In order to comprehensively validate the effectiveness of the proposed GIA-YOLO model in the nectarine fruit detection task, six mainstream target detection models—YOLOv5 [29], YOLOv8 [30], YOLOv9 [31], YOLOv10 [32], YOLOv11, and YOLOv12 [33]—were selected as comparison objects, and their performance was evaluated under the same experimental conditions and dataset. The performance of each model in terms of precision, recall, mAP, model size, FLOPs, and speed is shown in Table 7.

According to the results in Table 7, GIA-YOLO performed well on several key performance indicators, with an accuracy rate of 93.9% and a recall rate of 88.9%, which were 2.2% and 1.1% higher than YOLOv11, respectively. This indicated stronger recognition ability and better adaptability to complex orchard environments. The mAP reached 96.2%, which was 0.7% higher than YOLOv11, indicating an improvement in detection accuracy. The model size was only 5.0 MB, which was 0.5 MB smaller than YOLOv11. The FLOP was reduced from 6.3 G to 5.2 G, significantly improving the model efficiency. The inference speed was 73 fps, which was 9 fps higher than YOLOv11. Compared to Faster RCNN, it had higher accuracy and smaller model size. As shown in Figure 9, compared with other mainstream models, GIA-YOLO also exhibits advantages in overall detection performance, achieving higher accuracy and recall while maintaining strong resource efficiency.

3.6. Visualization Analysis

In this study, four different YOLO target detection models, YOLOv10, YOLOv11, YOLOv12, and GIA-YOLO, were visualized and compared in nectarine orchard environments, covering five typical detection conditions: single-target, multi-target, backlighting, occlusion by fruit, and occlusion by branches and leaves.

The results visualized in Figure 10 showed that GIA-YOLO exhibited better detection precision and robustness across all test scenarios. In single-target detection, the detection precision of GIA-YOLO for individual fruit was significantly improved compared to the other models. Under backlighting conditions, GIA-YOLO was able to stably detect fruit contours and to avoid missed detections or bounding box shifts, whereas YOLOv10 and YOLOv11 showed low recognition precision and noticeable bounding box deviations. In cases of severe fruit or leaf occlusion, YOLOv11 often misjudged overlapping fruit as a single target, which reduced detection effectiveness, while GIA-YOLO achieved better recognition of overlapping fruit. In densely distributed multi-target scenarios, GIA-YOLO demonstrated stronger target separation ability and more accurate bounding box discrimination, which effectively reduced redundant detections and mis-framing, enhancing the overall stability and reliability of the detection.

3.7. Experimental Validation of Nectarine Orchards

For the detection of nectarines in this study, we collected the nectarine data in the facility orchard, and then improved the proposed GIA-YOLO nectarine detection model based on the YOLOv11 model. After thorough validation and analysis of the different improvement modules, the effectiveness of the proposed GIA-YOLO model in real field environments was further verified by deploying it to a picking robot system. Field tests were conducted in a nectarine facility orchard in Wangfang Village, Feicheng City, Taian City, Shandong Province. The overall workflow is illustrated in Figure 11.

To verify the effectiveness of the proposed GIA-YOLO nectarine detection model in real field conditions, it was deployed on a picking robot system and tested in a facility-based nectarine orchard. During the test, four on-site experiments were conducted under normal weather conditions in the facility orchard, using different picking paths and detection angles; the robot autonomously followed a preset path, while the depth camera mounted at the end of its arm moved horizontally and vertically to perform the real-time detection of nectarine fruit. As shown in Figure 12, the model successfully identified the nectarines in the video stream, demonstrating its detection performance in a dynamic orchard environment.

Two sets of detection tests were carried out based on target distance: one for distant targets and the other for close-range targets. The miss detection rate and false detection rate under each condition were recorded, and the statistical results were presented in Table 8. In long-distance target detection, the model maintained good recognition performance. Even in cases where the fruits were partially obscured by leaves or branches, the miss detection rate was 8.7%, and the false detection rate was 10.5%, indicating that the model could still effectively detect nectarines under real-time detection in the orchard. In close-range detection, the GIA-YOLO model exhibited improved performance, with clearer fruit boundary localization. Compared to long-distance detection, the miss detection rate was reduced by 4.1%, and the false detection rate was reduced by 3.6%. Additionally, during the operation of the picking robot, the system operated stably. The model demonstrated low-latency real-time detection of nectarines in the video stream, achieving a frame rate of 41.6 FPS, which met the real-time requirements for field operations.

4. Discussion

This study constructs an efficient and lightweight nectarine fruit target detection model suitable for facility orchard environments. In terms of model improvement, by integrating the GhostModule, the iEMA attention mechanism, and the ADown module to reduce redundant computation and to enhance the ability of fruit target discrimination and contour perception, and by accomplishing the retention of spatial information and multi-scale feature extraction without sacrificing precision, the model achieves 93.9% detection precision, while reducing the model size and the floating-point operation to 5.0 MB and 5.2 G, respectively. In terms of experimental results, the GIA-YOLO model outperforms YOLOv11 and other target detection models in several metrics, and shows obvious advantages in terms of model volume and computational overhead, which are reduced by 9.1% and 17.5% compared with YOLOv11, respectively. This not only verifies the design rationality of the model at the theoretical level, but demonstrates the potential for deployment in picking robot systems. According to the test data, the leakage and false detection rates were 8.7% and 10.5%, respectively, under long-distance conditions, while they were reduced to 4.6% and 6.9%, respectively, under close-distance conditions, and the real-time detection frame rate met the picking detection requirements of this study. However, GIA-YOLO still has limitations in the boundary perception of small-scale targets and feature compensation of occluded regions. In particular, the precision of feature extraction is affected when the fruits overlap or are occluded by branches and leaves, in which case the overlapping fruit are often mistaken for a single target, resulting in lower real-time detection precision. Although GIA-YOLO improves the detection precision of nectarines, with higher mAP performance and higher recall compared to the YOLOv8n-CSD open-air nectarine detection model, better integrated discrimination on different targets and backgrounds, and advantages in reducing missed detections, there is still room for improvement in the performance in terms of accurate fruit classification and model volume.

Deep learning detection models have wide applications in industries such as agriculture, healthcare, and finance. Deep learning methods are used for the target recognition of thinning flowers and fruit in orchards [34], to enhance heart function evaluations [35], and to strengthen financial investment decisions [36]. The nectarine picking robot target recognition model established in this paper is specialized for a single nectarine variety, and its generalization ability across different varieties and planting modes needs to be further verified; for memory-constrained devices, under the premise of guaranteeing the detection performance, techniques such as knowledge distillation and model pruning should be introduced to further reduce the model size to improve the detection speed; and more evaluation indexes should be added to enhance the overall representativeness of the research and the quality. Further, the economic quantitative assessment of the loss of nectarines, due to recognition failure or mis-catch, and the long-term operation and maintenance cost of the robotic system will have an impact on the overall economy of the system. In future research, we should improve the generalization ability of the model under different varieties and growth stages, should realize the integration of fruit recognition and maturity level assessment, should deeply integrate this model with picking robots and environment sensing systems, should realize a closed-loop system integrating fruit detection, localization, picking control, and data feedback, and should carry out comprehensive feasibility analyses, including economic cost and revenue models, so as to provide more reliable technical support for the intelligent management of facility orchards, and to promote the intelligent management of facility orchards, as well as to provide more reliable technical support for the intelligent management of facility orchards and to promote the management of facility orchards to a high degree of intelligence and systematization.

5. Conclusions

In this study, an improved lightweight target detection model, GIA-YOLO, was proposed, based on the YOLOv11 architecture, to address the challenges of nectarine fruit detection in facility orchards, which involve complex scenes and high real-time requirements for intelligent operations. A training dataset was built by collecting images of various types of nectarines in orchard environments. To reduce the model size and to improve detection precision, the GhostModule was introduced to replace the C3K2 module in the original neck module of YOLOv11, thereby reducing both the model size and the floating-point operations. The iEMA fusion attention mechanism was embedded into the neck module to enhance the extraction of nectarine features and to improve the model’s ability to distinguish targets under complex backgrounds. Additionally, the ADown lightweight downsampling module was used to further optimize the network structure, achieving both model compactness and efficiency. These improvements effectively enhanced the model’s precision and robustness under various conditions, such as background clutter, varying illumination, and fruit occlusion. The experimental results showed that GIA-YOLO achieved the best comprehensive performance among several mainstream models, with precision of 93.9%, recall of 88.9%, and mAP of 96.2%. The model size was reduced to 5.0 MB, and the floating-point operation was lowered to 5.2 G. Compared to the original YOLOv11 model, GIA-YOLO improved precision by 2.2%, recall by 1.1%, and mAP by 0.7%, while reducing the model size by 9.1% and the floating-point operations by 17.5%. These results demonstrated that the model reduced resource consumption while maintaining high detection performance, showing strong adaptability for deployment on edge devices. In field deployment tests, the GIA-YOLO model was successfully integrated into a picking robot system. Under varying distance conditions, it achieved a combined miss detection rate of 6.65%, a false detection rate of 8.7%, and a detection speed of 41.6 FPS. The model exhibited stable performance and real-time capability, supporting continuous, real-time nectarine detection tasks and meeting the requirements of intelligent operations in facility orchards.

Author Contributions

Conceptualization, L.R., Y.L. and Y.S.; investigation, A.G.; resources, L.R.; data curation, Y.L. and X.H.; writing—original draft preparation, Y.L. and Y.D.; writing—review and editing, L.R., Y.D., W.M., Y.S. and X.H.; visualization, W.M.; supervision, L.R., Y.S. and X.H.; project administration, Y.S. and X.H.; funding acquisition, Y.S. and L.R. All authors have read and agreed to the published version of the manuscript.

Funding

We would like to acknowledge the financial support from Innovation Team Fund for Fruit Industry of Modern Agricultural Technology System in Shandong Province (SDAlT-06-12) and the Key R&D Program of Shandong Province, China (2024TZXD045, 2024TZXD038).

Data Availability Statement

The datasets used in this experiment were independently collected and are part of an ongoing study. If you need to access information on this dataset, please contact the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Izmailov, A.Y. Intelligent Technologies and Robotic Means in Agricultural Production. Her. Russ. Acad. Sci. 2019, 89, 209–210. [Google Scholar] [CrossRef]
Xiao, X.; Wang, Y.; Zhou, B.; Jiang, Y. Flexible Hand Claw Picking Method for Citrus-Picking Robot Based on Target Fruit Recognition. Agriculture 2024, 14, 1227. [Google Scholar] [CrossRef]
Liu, J.; Guo, J.; Zhang, S. YOLOv11-HRS: An Improved Model for Strawberry Ripeness Detection. Agronomy 2025, 15, 1026. [Google Scholar] [CrossRef]
Yan, Z.; Wu, Y.; Zhao, W.; Zhang, S.; Li, X. Research on an Apple Recognition and Yield Estimation Model Based on the Fusion of Improved YOLOv11 and DeepSORT. Agriculture 2025, 15, 765. [Google Scholar] [CrossRef]
Wu, Q.; Huang, H.; Song, D.; Zhou, J. YOLO-PGC: A Tomato Maturity Detection Algorithm Based on Improved YOLOv11. Appl. Sci. 2025, 15, 5000. [Google Scholar] [CrossRef]
Bi, Z.; Yang, L.; Lü, S.; Gong, Y.; Zhang, J.; Zhao, L. Lightweight Greenhouse Tomato Detection Method Based on EDH−YOLO. Trans. Chin. Soc. Agric. Mach. 2024, 55, 246–254. [Google Scholar]
He, B.; Zhang, Y.; Gong, J.; Fu, G.; Zhao, Y.; Wu, R. Fast Recognition of Tomato Fruit in Greenhouse at Night Based on Improved YOLO v5. Trans. Chin. Soc. Agric. Mach. 2022, 53, 201–208. [Google Scholar]
Zhu, X.; Chen, F.; Zheng, Y.; Chen, C.; Peng, X. Detection of Camellia oleifera fruit maturity in orchards based on modified lightweight YOLO. Comput. Electron. Agric. 2024, 226, 109471. [Google Scholar] [CrossRef]
Ma, N.; Sun, Y.; Li, C.; Liu, Z.; Song, H. AHG-YOLO: Multi-category detection for occluded pear fruits in complex orchard scenes. Front. Plant Sci. 2025, 16, 1580325. [Google Scholar] [CrossRef]
Jing, J.; Zhang, S.; Sun, H.; Ren, R.; Cui, T. YOLO-PEM: A Lightweight Detection Method for Young “Okubo” Peaches in Complex Orchard Environments. Agronomy 2024, 14, 1757. [Google Scholar] [CrossRef]
Zhang, G.; Yang, X.; Lv, D.; Zhao, Y.; Liu, P. YOLOv8n-CSD: A Lightweight Detection Method for Nectarines in Complex Environments. Agronomy 2024, 14, 2427. [Google Scholar] [CrossRef]
Xie, Y.; Zhong, X.; Zhan, J.; Wang, C.; Liu, N.; Li, L.; Zhou, G. ECLPOD: An extremely compressed lightweight model for pear object detection in smart agriculture. Agronomy 2023, 13, 1891. [Google Scholar] [CrossRef]
Zhu, F.; Wang, S.; Liu, M.; Wang, W.; Feng, W. A Lightweight Algorithm for Detection and Grading of Olive Ripeness Based on Improved YOLOv11n. Agronomy 2025, 15, 1030. [Google Scholar] [CrossRef]
Xu, W.; Yang, R.; Karthikeyan, R.; Shi, Y.; Su, Q. GBiDC-PEST: A novel lightweight model for real-time multiclass tiny pest detection and mobile platform deployment. J. Integr. Agric. 2024, 24, 2749–2769. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1580–1589. [Google Scholar]
Yang, Y.; Wang, L.; Huang, M.; Zhu, Q.; Wang, R. Polarization imaging based bruise detection of nectarine by using ResNet-18 and ghost bottleneck. Postharvest Biol. Technol. 2022, 189, 111916. [Google Scholar] [CrossRef]
Chen, C.; Guo, Z.; Zeng, H.; Gong, P.; Dong, J. Repghost: A hardware-efficient ghost module via reparameterization. arXiv 2022, arXiv:2211.06088. [Google Scholar]
Wang, T.; Zhang, S. DSC-Ghost-Conv: A compact convolution module for building efficient neural network architectures. Multimed. Tools Appl. 2024, 83, 36767–36795. [Google Scholar] [CrossRef]
Liu, T.; Gu, M.; Sun, S. RIEC-YOLO: An improved road defect detection model based on YOLOv8. Signal, Image Video Process. 2025, 19, 285. [Google Scholar] [CrossRef]
Zhao, F.; Zhang, J.; Liu, Q.; Liang, C.; Zhang, S.; Li, M. Fast quality detection of Astragalus slices using FA-SD-YOLO. Agriculture 2024, 14, 2194. [Google Scholar] [CrossRef]
Xie, X.; Xu, B.; Chen, Z. Real-time fall attitude detection algorithm based on iRMB. Signal Image Video Process. 2025, 19, 156. [Google Scholar]
Wang, Y.; Rong, Q.; Hu, C. Ripe Tomato Detection Algorithm Based on Improved YOLOv9. Plants 2024, 13, 3253. [Google Scholar] [CrossRef]
Mathew, M.P.; Mahesh, T.Y. Leaf-based disease detection in bell pepper plant using YOLO v5. Signal Image Video Process. 2022, 16, 841–847. [Google Scholar] [CrossRef]
Liu, X.; Wang, T.; Yang, J.; Tang, C.; Lv, J. MPQ-YOLO: Ultra low mixed-precision quantization of YOLO for edge devices deployment. Neurocomputing 2023, 574, 127210. [Google Scholar] [CrossRef]
Xu, K.; Xu, Y.; Xing, Y.; Liu, Z. YOLO-F: YOLO for flame detection. Int. J. Pattern Recognit. Artif. Intell. 2023, 37, 2250043. [Google Scholar] [CrossRef]
Kang, L.; Lu, Z.; Meng, L.; Gao, Z. YOLO-FA: Type-1 fuzzy attention based YOLO detector for vehicle detection. Expert Syst. Appl. 2024, 237, 121209. [Google Scholar] [CrossRef]
Zhong, Z.; Yun, L.; Cheng, F.; Chen, Z.; Zhang, C. Light-YOLO: A Lightweight and Efficient YOLO-Based Deep Learning Model for Mango Detection. Agriculture 2024, 14, 140. [Google Scholar] [CrossRef]
Liu, Y.; Han, X.; Ren, L.; Ma, W.; Liu, B.; Sheng, C.; Li, Q. Surface Defect and Malformation Characteristics Detection for Fresh Sweet Cherries Based on YOLOv8-DCPF Method. Agronomy 2025, 15, 1234. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, R.; Wang, X. Visual SLAM Mapping Based on YOLOv5 in Dynamic Scenes. Appl. Sci. 2022, 12, 11548. [Google Scholar] [CrossRef]
Zhang, L.J.; Fang, J.J.; Liu, Y.X.; Le, H.F.; Rao, Z.Q.; Zhao, J.X. CR-YOLOv8: Multiscale Object Detection in Traffic Sign Images. IEEE Access 2023, 12, 219–228. [Google Scholar] [CrossRef]
Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. Yolov9: Learning what you want to learn using programmable gradient information. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2014; pp. 1–21. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Tian, Y.; Ye, Q.; Doermann, D. Yolov12: Attention-centric real-time object detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
Gao, A.; Du, Y.; Li, Y.; Ren, L.; Song, Y. Apple flower phenotype detection method based on YOLO-FL and application of intelligent flower thinning robot. Int. J. Agric. Biol. Eng. 2025, 18, 236–246. [Google Scholar]
Balasubramani, M.; Sung, C.W.; Hsieh, M.Y. Automated Left Ventricle Segmentation in Echocardiography Using YOLO: A Deep Learning Approach for Enhanced Cardiac Function Assessment. Electronics 2024, 13, 2587. [Google Scholar] [CrossRef]
Birogul, S.; Temür, G.; Kose, U. YOLO object recognition algorithm and “buy-sell decision” model over 2D candlestick charts. IEEE Access 2020, 8, 91894–91915. [Google Scholar] [CrossRef]

Figure 1. Overall workflow diagram.

Figure 2. Images of selected datasets.

Figure 3. GIA-YOLO detection model structure.

Figure 4. GhostModule structure.

Figure 5. (a) Structure of the EMA Attention Mechanism; (b) Structure of the iRMB Attention Mechanism; (c) Structure of the iEMA Attention Mechanism.

Figure 6. Structure of the ADown Module.

Figure 7. Line Chart of Comparative Analysis.

Figure 8. Comparison of the different downsampling mechanisms.

Figure 9. Columnar analysis of the mainstream models.

Figure 10. Visualization and analysis chart.

Figure 11. Field Test Operation Diagram.

Figure 12. Real-time detection graph.

Table 1. Dataset sample classification information.

Dataset Sample Types	Number of Samples in the Dataset
Single-target	419
Multi-target	771
Backlighting	435
Occlusion by fruit	754
Occlusion by branches and leaves	855

Table 2. Configuration information of the picking robot.

Enterprise	Models
Depth cameras	Obi Zhongguang
Robotic arms	Nova Series Robotic Arm
Industrial controllers	Nuvo-8003
CPU	Intel i7-8700
GPU	GeForce RTX 2060

Table 3. Analysis of ablation test data.

GhostModule	iEMA	ADown	Precision (%)	Recall (%)	mAP (%)	Model Size (MB)	FLOPs (G)
×	×	×	91.7	87.8	95.5	5.5	6.3
√	×	×	92.3	88.4	95.7	4.9	5.5
√	√	×	93.6	88.5	96.0	5.5	6.1
√	√	√	93.9	88.9	96.2	5.0	5.2

Table 4. Comparative analysis of the different convolution modules.

Module	Precision (%)	Recall (%)	mAP (%)	Model Size (MB)	FLOPs (G)
Additive Free	91.7	87.8	95.5	5.5	6.3
PConv	89.9	89.1	95.3	5.2	5.9
LDConv	91.4	88.1	95.9	6.3	6.8
SAConv	92.8	87.3	95.4	7.2	8.2
GhostModule	92.3	88.4	95.7	4.9	5.5

Table 5. Comparative analysis of the different attention mechanisms.

Module	Precision (%)	Recall (%)	mAP (%)	Model Size (MB)	FLOPs (G)
Additive Free	92.3	88.4	95.7	4.9	5.5
SEAM	93.7	86.6	95.9	5.6	6.0
ACmix	91.6	89.1	94.9	5.6	6.2
ECA	93.3	86.9	96.2	5.5	6.0
iRMB	92.1	88.4	95.3	5.6	8.0
iEMA	93.6	88.5	96.0	5.5	6.1

Table 6. Comparative analysis of the different downsampling mechanisms.

Module	Precision (%)	Recall (%)	mAP (%)	Model Size (MB)	FLOPs (G)
Additive Free	93.6	88.5	96.0	5.5	6.1
HWDown	92.8	90.2	95.4	5.2	5.4
CGDown	93.6	87.8	96.5	5.8	6.4
ADown	93.9	88.9	96.2	5.0	5.2

Table 7. Comparative analysis of mainstream models.

Module	Precision (%)	Recall (%)	mAP (%)	Model Size (MB)	FLOPs (G)	Speed (fps)
Faster RCNN	92.3	87.5	96.0	112.4	236.7	19
YOLOv5	90.8	87.0	95.1	3.9	4.1	96
YOLOv8	89.3	86.8	94.9	5.6	6.8	85
YOLOv9	90.7	88.3	95.5	6.4	9.4	47
YOLOv10	91.3	87.4	94.2	5.8	6.8	58
YOLOv11	91.7	87.8	95.5	5.5	6.3	64
YOLOv12	91.3	87.3	95.4	5.4	6.3	65
GIA-YOLO	93.9	88.9	96.2	5.0	5.2	73

Table 8. Analysis of the test results.

Category	Total Number	Detected/ Count	Missed/ Count	False Positives/ Count	Miss Detection Rate (%)	False Detection Rate (%)
Long Distance	104	84	9	11	8.7	10.5
Short Distance	87	77	4	6	4.6	6.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ren, L.; Li, Y.; Du, Y.; Gao, A.; Ma, W.; Song, Y.; Han, X. GIA-YOLO: A Target Detection Method for Nectarine Picking Robots in Facility Orchards. Agronomy 2025, 15, 1934. https://doi.org/10.3390/agronomy15081934

AMA Style

Ren L, Li Y, Du Y, Gao A, Ma W, Song Y, Han X. GIA-YOLO: A Target Detection Method for Nectarine Picking Robots in Facility Orchards. Agronomy. 2025; 15(8):1934. https://doi.org/10.3390/agronomy15081934

Chicago/Turabian Style

Ren, Longlong, Yuqiang Li, Yonghui Du, Ang Gao, Wei Ma, Yuepeng Song, and Xingchang Han. 2025. "GIA-YOLO: A Target Detection Method for Nectarine Picking Robots in Facility Orchards" Agronomy 15, no. 8: 1934. https://doi.org/10.3390/agronomy15081934

APA Style

Ren, L., Li, Y., Du, Y., Gao, A., Ma, W., Song, Y., & Han, X. (2025). GIA-YOLO: A Target Detection Method for Nectarine Picking Robots in Facility Orchards. Agronomy, 15(8), 1934. https://doi.org/10.3390/agronomy15081934

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GIA-YOLO: A Target Detection Method for Nectarine Picking Robots in Facility Orchards

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Acquisition

2.2. GIA-YOLO Target Detection Network

2.2.1. Convolutional Neural Network GhostModule

2.2.2. The iEMA Fusion Attention Mechanism

2.2.3. Downsampling Module ADown

2.3. Edge Device Deployment

2.4. Evaluation Indicators and Parameterization

3. Results and Analysis

3.1. Analysis of Ablation Test Results

3.2. Analysis of the GhostModule Test Results

3.3. Analysis of the iEMA Attention Mechanism Test Results

3.4. Analysis of the ADown Module Test Results

3.5. Mainstream Model Comparison Results

3.6. Visualization Analysis

3.7. Experimental Validation of Nectarine Orchards

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI