Next Article in Journal
Rotational Tillage and Nitrogen Rate Affect Maize Yield Through Regulations on Deep Root Morphology and Physiology
Previous Article in Journal
Performance Prediction of Condensation Dehumidification System Utilizing Natural Cold Resources in Cold Climate Regions Using Physical-Based Model and Stacking Ensemble Learning Models
Previous Article in Special Issue
LiDAR-IMU Sensor Fusion-Based SLAM for Enhanced Autonomous Navigation in Orchards
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Lightweight Improvements to the Pomelo Image Segmentation Method for Yolov8n-seg

College of Artificial Intelligence and Low-Altitude Technology, South China Agricultural University, Guangzhou 510642, China
*
Author to whom correspondence should be addressed.
Agriculture 2026, 16(2), 186; https://doi.org/10.3390/agriculture16020186
Submission received: 15 December 2025 / Revised: 7 January 2026 / Accepted: 8 January 2026 / Published: 12 January 2026
(This article belongs to the Special Issue Advances in Precision Agriculture in Orchard)

Abstract

Instance segmentation in agricultural robotics requires a balance between real-time performance and accuracy. This study proposes a lightweight pomelo image segmentation method based on the YOLOv8n-seg model integrated with the RepGhost module. A pomelo dataset consisting of 5076 samples was constructed through systematic image acquisition, annotation, and data augmentation. The RepGhost architecture was incorporated into the C2f module of the YOLOv8-seg backbone network to enhance feature reuse capabilities while reducing computational complexity. Experimental results demonstrate that the YOLOv8-seg-RepGhost model enhances efficiency without compromising accuracy: parameter count is reduced by 16.5% (from 3.41 M to 2.84 M), computational load decreases by 14.8% (from 12.8 GFLOPs to 10.9 GFLOPs), and inference time is shortened by 6.3% (to 15 ms). The model maintains excellent detection performance with bounding box mAP50 at 97.75% and mask mAP50 at 97.51%. The research achieves both high segmentation efficiency and detection accuracy, offering core support for developing visual systems in harvesting robots and providing an effective solution for deep learning-based fruit target recognition and automated harvesting applications.

1. Introduction

Pomelo cultivation plays a vital role in agricultural economies across tropical and subtropical regions worldwide [1]. With over 3000 years of cultivation history in China, pomelos represent an important economic crop for farmers in southern provinces. To enable autonomous pomelo harvesting in natural orchard environments through robotic systems, the development of a lightweight segmentation model is a critical step. Currently, pomelo harvesting operations remain heavily reliant on manual labor, which is labor-intensive and time-consuming [2,3,4]. Therefore, lightweight improvements require not only reducing model size and accelerating inference speed but also optimizing the model architecture to balance the utilization of computational resources, thereby ensuring high recognition accuracy and robustness in complex field scenarios such as mountainous orchards.
Automated fruit recognition and segmentation are critical technologies for intelligent harvesting robots. The evolution of fruit recognition techniques can be categorized into three main phases: traditional methods based on handcrafted features, machine learning approaches, and deep learning-based techniques. Traditional methods relying on color [5], shape [6], texture [7], or multi-feature fusion [8,9] often fail to generalize in complex natural environments. Machine learning approaches, such as support vector machines [10,11] and K-means clustering [12,13], have shown improved performance but still face limitations in handling variations in lighting and occlusion. Deep learning has revolutionized agricultural computer vision by enabling end-to-end feature learning from raw images. Recent advances include enhanced YOLOv8 architectures for strawberry detection [14], grape cluster recognition [15], lychee fruit detection [16], camellia fruit identification [17], and strawberry segmentation [18]. These studies demonstrate the superior performance of deep learning in agricultural applications.
In recent years, the architectural evolution and lightweight deployment of deep learning models have become significant advancements in the field of computer vision. The core principle of model lightweighting involves reducing parameter counts and computational complexity to accommodate constrained computational power of hardware. Typical improvement strategies include employing lightweight backbone networks, designing efficient convolutional operations, adjusting the neck and detection head structures, and introducing attention mechanisms [19,20,21,22,23]. Yao et al. [24] adopted the ADown downsampling module to reduce computational load and parameter count, resulting in approximately 10% fewer parameters in the improved model. The mAP50 and F1 scores increased by 1.9% and 2.0%, respectively. The proposed model outperforms common YOLO series models in detection performance while maintaining relatively low computational and memory requirements. Liu et al. [25] optimized the YOLOv8n neck network by introducing the VoVGSCSP module and constructing the C2f_DShC2D module, reducing the model parameters and computational load. For rice grain recognition tasks, it achieved a 92.7% mAP with 2.62 million parameters and a computational complexity of 7.0 GFLOPs, representing a 1.1% improvement in accuracy, with reductions of 12.96% and 13.58% in parameters and computational load, respectively. Shen et al. [26] proposed an enhanced algorithm, ESLC-YOLOv8, for real-time pineapple detection in complex environments. By incorporating modules such as EIEStem, v7DS downsampling, and lightweight detection heads, the improved model achieved a 94.5% average precision while reducing parameters by 8.87 × 105 and floating-point operations by 1.6 GFLOPs, effectively balancing accuracy and efficiency. Liu et al. [27] replaced the C2F feature extraction module in YOLOv8pico with the PDWFasterNet module and employed Depthwise Separable Convolution (DWSConv) for downsampling. This reduced the model’s parameters and FLOPs to 0.66 M and 2.29 G, respectively, providing a viable solution for real-time edge detection in apple-picking robots.
Despite demonstrating exceptional performance in object segmentation, deep learning models typically have substantial parameter counts and computational complexity, making them challenging to deploy on edge devices. This contradiction is particularly pronounced in the task of pomelo image segmentation: On one hand, publicly available research and specialized models in this domain remain relatively scarce, often necessitating reliance on or adaptation of general-purpose models; on the other hand, excessive parameter reduction in pursuit of lightweight solutions compromises the model’s expressive capability, thereby degrading segmentation accuracy. Therefore, developing a model for pomelo segmentation that can achieve effective lightweighting while maintaining high precision is of great significance.
However, specialized lightweight models for pomelo segmentation remain limited. Excessive parameter reduction often compromises segmentation accuracy, particularly in complex orchard environments. Experimental results demonstrate that the improved model achieves enhancements in key metrics, including precision, recall, and mean average precision, providing effective technical support for visual recognition and real-time operation of the pomelo harvesting robot [28]. The contributions of this research include:
  • A pomelo dataset with 5076 samples capturing diverse environmental conditions.
  • Integration of RepGhost module into YOLOv8n-seg for enhanced feature reuse and reduced complexity.
  • Systematic evaluation demonstrating improved efficiency without performance degradation.
  • Validation of real-world applicability through deployment on embedded devices.

2. Materials and Methods

2.1. Dataset Acquisition and Preparation

Given that the maturity period of pomelos falls in autumn and winter, the data collection was conducted from September to December 2023 in the Liyuzaixiang Pomelo Orchard, Aotou Town, Conghua District, Guangzhou. The fruit variety under study was Shatian pomelo. Pomelos on 30 trees were photographed from various angles, with the camera settings in automatic mode. The data acquisition system employed an Intel RealSense D455 depth camera (Intel Corporation, Santa Clara, CA, USA). The RGB sensor of this camera features a resolution of 1280 × 800 pixels and a wide field of view of 90° × 65°. This configuration ensures the precise capture of the detailed texture of the fruits within an operating range of 0.3–3 m.
A total of 700 JPG-format images (96 dpi) were collected. This included 500 close-range (30–50 cm) single-fruit images and 200 medium-to-long-range (50–200 cm) multi-fruit images, representing 1169 pomelo samples. For close-range scenarios, single pomelo images were captured under four conditions: fully exposed fruit (300 images), foliage-obscured fruit (100 images), overlapping fruit (100 images), and fruit with light spots or shadows (100 images) (Figure 1). These close-range scenarios included images with normal exposure, underexposure, and overexposure, as illustrated in Figure 2.
The Segment Anything (SAM) model was used for initial automatic annotation, with labels pre-generated as “pomelo”. As shown in Figure 3, automatic annotation may result in missed or incorrect annotations. As depicted in Figure 4, the labels were then manually fine-tuned using Labelme according to the following annotation criteria [29]: firstly, samples with occlusion exceeding 80% or edge exposure area less than 20% were not annotated due to incomplete feature information; secondly, based on actual measurement data, fruits with a diameter less than 20 mm were classified as small-sized samples. These samples were excluded from the annotation scope, as annotation accuracy cannot be guaranteed due to resolution constraints.
To ensure the fairness of model evaluation and prevent data leakage, we adopted the following pipeline to construct the final dataset: First, the collected 700 raw images (covering 8 different scenarios) were divided separately by scenario category into training, validation, and test sets, with each category allocated in an approximate ratio of 0.75:0.15:0.1. Subsequently, the data augmentation strategy was applied only to the original images in the training set in order to expand the training samples and enhance model robustness. Take Figure 5 as an example.
Random rotation: θ π / 5 ,   π / 5 to simulate natural fruit orientations.
Color space transformations: Saturation channel (S) adjusted as S = S × γ s   γ s U 0.7 ,   1.0 and luminance channel (V) adjusted as V = V × γ v   γ v U 0.3 ,   1.3 .
Random occlusion: Algorithmically generated to simulate foliage obstruction in natural orchard environments.
The validation and test sets, on the other hand, remained in their original state without any augmentation transformations, serving as an objective evaluation of the model’s generalization performance. This splitting strategy ensured the independence between the test environment and the training environment.

2.2. Improved Method for Pomelo Image Segmentation

2.2.1. Network Architecture and Lightweight Optimization

In this study, the RepGhost network structure was adopted to replace the backbone feature extraction network in YOLOv8n-seg, with the aim of enhancing feature representation capability and improving inference efficiency while maintaining accuracy, thereby achieving model lightweighting and further improving segmentation performance. The structure of the improved model is illustrated in Figure 6.

2.2.2. YOLOv8-seg Segmentation Network

Pursuing a lightweight design, this study adopted the YOLOv8n version, which possesses the minimal parameter count, as the foundational framework. In the context of pomelo harvesting, mere object detection—which yields only a bounding box for the fruit—proves inadequate. The task necessitates the localization of the fruit stalk to compute the harvest point, a calculation that relies on the precise spatial relationship between the fruit and stem regions. Consequently, the YOLOv8n detection model alone is incapable of providing the requisite stem location data. To address this, the YOLOv8n-seg algorithm was utilized for instance segmentation. YOLOv8n-seg [30,31] is an efficient instance segmentation model that combines object detection with pixel-level segmentation capabilities. Its network architecture comprises an input layer, a backbone network, a neck network, and a segmentation head, enabling unified object localization and mask prediction.

2.2.3. RepGhost Lightweight Module

RepGhost [32] is a novel, lightweight convolutional neural network (CNN) module design methodology. Its core concept is to utilize a multi-branch structure during the training phase to enrich the feature extraction process. During inference, it combines the parameters and merges them into a more compact topological form. This enables a significant reduction in the number of parameters and an improvement in computational efficiency, while maintaining or even enhancing the model’s feature reusability. This approach not only compresses the model size but also improves inference speed, making it suitable for embedded vision tasks requiring high real-time performance.
Unlike traditional lightweighting techniques that primarily rely on simplifying network architecture to achieve parameter compression, RepGhost employs a two-stage optimization strategy: during the training phase, a multi-branch network is constructed to enhance feature representation through diversified feature transformation paths; in the inference phase, the branches are dynamically merged into a single computational path via re-parameterization, thereby significantly improving inference efficiency while sacrificing only marginal accuracy. The RepGhost module proposes a dual strategy of ‘feature reuse’ and ‘weight space fusion’. Unlike traditional Ghost methods that maintain high channel counts through concatenation operations, RepGhost achieves more efficient feature reuse in hardware by integrating features from different layers during training process and then performing the integration in the weight space during the inference phase.
RepGhost combines multiple Ghost modules into a single RepGhost module. During training, it employs the Add operation instead of Concat for feature fusion. At inference time, the fused features are mapped to the weight space for computation, thereby avoiding the memory overhead and computational complexity associated with the Concat operation. This design enables the RepGhost module to utilize feature information more efficiently while reducing the number of network parameters and computational load, thereby enhancing the model’s operational speed and efficiency. The RepGhost module architecture is illustrated in Figure 7. Specifically, the original input features first undergo 1 × 1 convolutions to compress the channel dimension. Subsequently, multiple sets of intermediate features are generated through multiple groups of structurally identical yet weight-independent 3 × 3 separable convolutions. Features from each convolutional group are progressively fused with preceding features through element-wise addition. This progressive feature enhancement can be mathematically expressed as shown in Equation (1):
F o u t =   i = 1 n D W i F i n + F i n
Here, D W i denotes the depthwise separable convolution operation for the i-th group, with n representing the reuse order. This design permits features generated once to be reused multiple times in subsequent convolution stages, forming an implicit feature pyramid across layers.

3. Results

3.1. Experimental Environment and Evaluation Metrics

3.1.1. Experimental Environment Configuration and Parameter Settings

The experimental environment was constructed on a high-performance heterogeneous computing architecture, using the Ubuntu 20.04.1 LTS operating system as the software ecosystem foundation. It was equipped with an Intel i7-10700F processor (8 cores, 16 threads; base frequency 2.9 GHz, Turbo Boost up to 4.8 GHz) and an NVIDIA GeForce RTX 2060 graphics card (6 GB GDDR6 VRAM, 1920 CUDA cores), with 32 GB DDR4 dual-channel memory (3200 MHz). The software environment operates on Ubuntu 20.04.1 LTS, using Python 3.8.10 and OpenCV 4.11.0 for image processing. The deep learning framework employed is PyTorch 1.11.0, leveraging CUDA 11.3 and CuDNN 8.2 acceleration libraries to optimize computational performance, with support for FP16 mixed-precision training. During model training, the network parameter settings were as detailed in Table 1.

3.1.2. Evaluation Metrics

This study implements a dual-faceted evaluation framework, assessing performance in both object detection and mask segmentation, to comprehensively evaluate the model’s capability in instance segmentation. All reported metrics are rigorously derived from the comparison of predictions with test set ground truths.
(1)
Fundamental Metrics
A suite of fundamental metrics, defined based on the core elements of the confusion matrix (True Positive, True Negative, False Positive, False Negative), provides a multi-dimensional performance assessment.
The precision calculation formula is shown in Equation (2):
P = T P / ( T P + F P )
The recall calculation formula is shown in Equation (3):
R = T P / ( T P + F N )
The F1 score is calculated as shown in Equation (4):
F 1 s c o r e = 2 P R / ( P + R )
(2)
Object Detection Evaluation Metrics
For object detection tasks, model performance is assessed primarily through the accuracy of bounding box predictions, with Intersection over Union (IoU) serving as the core metric. Its mathematical definition is given in Equation (5):
I o U = A r e a ( B p r e d B g t ) A r e a ( B p r e d B g t )
The mathematical expression for mAP50 (mean Average Precision at IoU = 0.5) is shown in Equation (6):
m A P 50 = 1 C C = 1 C 0 1 P C R d R  
mAP50–95 (mean Average Precision over IoU = 0.5–0.95) is mathematically expressed as shown in Equation (7):
m A P 50 95 = 1 10 k = 0 9 m A P 0.5 + 0.05 k  
(3)
Mask Segmentation Evaluation Metrics
For the instance segmentation task, the quality of pixel-level masks (Masks) predicted by the model is further evaluated.
Mask IoU: The pixel-level intersection over union between the predicted mask and the ground truth mask is calculated in the same way as in Equation (6) (with bounding boxes replaced by mask pixel regions). This metric quantifies the spatial overlap accuracy of a single pair of masks.
Mask mAP50: Mean Average Precision for masks at a fixed IoU threshold of 0.5.
Mask mAP50–95: Mean Average Precision for masks averaged across IoU thresholds from 0.5 to 0.95 with a step size of 0.05.
Mask Matching and TP/FP/FN Determination Rules: A “Maximum IoU Priority” one-to-one matching strategy is adopted to ensure each ground truth mask corresponds to only one optimal predicted mask. When multiple predicted masks match the same ground truth mask, only the predicted mask with the highest Mask IoU (≥the corresponding threshold) is retained as a true positive (TP), and the other associated predicted masks are classified as false positives (FP). When a single predicted mask matches multiple ground truth masks, it is only matched with the ground truth mask having the maximum IoU (classified as TP), and the remaining unmatched ground truth masks are classified as false negatives (FN). A predicted mask is classified as FP if its Mask IoU with all ground truth masks is below the corresponding threshold. Any ground truth mask not matched by any predicted mask is classified as FN.
(4)
Significance of the metrics in the task
Within the scenario of automated pomelo harvesting, these metrics serve distinct purposes: mAP50 efficiently verifies the model’s fundamental ability to detect pomelos, whereas mAP50–95 rigorously assesses its capability for precise localization. The multi-threshold computation of mAP50–95 mitigates the variability inherent in single-point IoU evaluations, thereby providing a more robust and holistic reflection of model performance across a spectrum of localization accuracy demands.

3.2. Ablation Studies

The baseline YOLOv8-seg comprises a 261-layer architecture with 3.4 million parameters and a computational complexity of 12.8 GFLOPs. Incorporating the RepGhost structure expands the model to 388 layers while reducing the parameters to 2.84 million (a 16.5% decrease) and computational load to 10.9 GFLOPs (a 14.8% reduction). This improvement substitutes the channel stacking (Concat) in traditional convolutional layers with weight space fusion (Add) via a feature reuse mechanism, thereby reducing the redundant overhead of memory access operations at the hardware level. The model parameter comparison is shown in Table 2.
To assess the effectiveness of the lightweight model proposed in this paper, we carried out comparative experiments on the instance segmentation task. The experiments were conducted between the baseline model YOLOv8n-seg and the lightweight model with the incorporation of the RepGhost module. The comparison of the training curves of several key metrics is presented in Figure 8. Comprehensive results demonstrate that the lightweight model can maintain performance comparable to that of the baseline model while significantly reducing the number of parameters and computational complexity.
As evidenced by the training curves in Figure 8, YOLOv8n-seg-RepGhost demonstrates highly convergent performance with the baseline YOLOv8n-seg across all key metrics. Regarding recall (R) and precision (P), the two curves remain closely aligned throughout training, both stabilizing above 0.95 and 0.99, respectively. This indicates that the lightweight design does not compromise the model’s discriminative ability or its control over false negatives. For the comprehensive detection metric mAP50, the curves of both models are nearly indistinguishable, each converging close to 0.98, proving that the improved model maintains equivalent recognition and localization precision at the standard IoU threshold. Under the more stringent multi-threshold average metric mAP50–95, the improved model’s curve shows a marginal lag behind the baseline, with a negligible final gap of approximately 0.01, while following an identical upward trend and convergence pattern. This demonstrates its robust, comprehensive performance across varying precision requirements. Overall, the experimental results confirm that integrating the RepGhost module significantly reduces the model’s parameters and computational complexity while effectively preserving high accuracy and robustness in instance segmentation, achieving an excellent balance between lightweight design and performance.

3.3. Performance Comparison of Mainstream Models

To investigate the performance differences in pomelo image segmentation between the improved model and other models, this section conducts comparative experiments with several mainstream instance segmentation models with similar parameter counts: Mask R-CNN, YOLOv5n-seg, YOLOv8-seg, and YOLOv8n-seg-RepGhost. The comprehensive comparison results in Table 3 demonstrate that YOLO-based models outperform traditional methods overall in both object detection (Box) and instance segmentation (Mask) tasks. YOLOv8n-seg and its improved variant YOLOv8n-seg-RepGhost significantly outperform Mask R-CNN across all metrics, particularly achieving approximately 15–30 percentage point leads in mAP50 and mAP50–95. Specifically, YOLOv8n-seg achieved 98.06% Precision, 93.29% Recall, 96.42% mAP50, and 91.80% mAP50–95 in the Box detection task; for the mask segmentation task, it achieved 98.02% Precision, 93.44% Recall, 96.54% mAP50, and 89.58% mAP50–95. Notably, despite significant reductions in both parameter count (16.5%) and computational complexity (14.8%), the YOLOv8n-seg-RepGhost model exhibits only marginal performance degradation compared to the original YOLOv8n-seg. Differences across metrics remain within 1 percentage point, with some variations lacking statistical significance (p > 0.05). This demonstrates that the RepGhost module successfully maintains detection and segmentation accuracy while achieving model lightweighting. Compared to YOLOv5n-seg, the YOLOv8 series models exhibit slight advantages in Precision and Recall metrics, while showing comparable performance in mAP, highlighting the effectiveness of the YOLOv8 architecture for pomelo detection tasks.
Overall, the YOLOv8n-seg-RepGhost model achieves lightweighting while maintaining high accuracy, providing an ideal solution for edge device applications such as pomelo harvesting robots.

3.4. Algorithm Validation

As shown in Table 3, both Mask R-CNN and YOLACT models exhibit low mean average precision. Meanwhile, YOLOv5n-seg incurs excessive computational demands, failing to meet the requirement for rapid segmentation of pomelo images. Consequently, it was excluded from further validation. This study employs YOLOv8n-seg and YOLOv8n-seg-RepGhost for visual comparison on pomelo images. A randomly selected pomelo image from natural scenes under both strong and weak lighting conditions is presented, with detection results shown in Figure 9.
As demonstrated by the experimental comparison in Figure 9, both the original YOLOv8n-seg and the RepGhost-enhanced variant achieve accurate instance segmentation in single-fruit scenarios with minimal interference. However, in challenging multi-fruit scenarios, the original YOLOv8n-seg exhibits significant missed detections or false positives, particularly in low-light conditions where overlapping fruits occur. In contrast, the enhanced model incorporating the RepGhost module demonstrates more complete detection capability in multi-fruit scenarios, effectively reducing missed detections while maintaining stable edge segmentation quality under both strong and weak lighting conditions. Results indicate that the proposed enhancement effectively improves the model’s robustness and detection completeness in complex environments, reducing the impact of multi-object overlap and lighting variations on segmentation performance. YOLOv8-seg-RepGhost achieves an inference time of 15 milliseconds on 1280 × 720 JPG images using an RTX 2060 GPU, representing a 6.3% reduction compared to the original version’s 16 milliseconds.

3.5. Analysis of Edge Computing Device Deployment Trials

Current embedded devices typically possess limited computational resources, necessitating models with minimal parameter counts and computational demands [33]. The improved model in this study reduces the parameter count by 16.5% and computational complexity by 14.8% compared to the original YOLOv8n baseline model. This reduced resource requirement enables better adaptation to embedded devices with low computational capacity.
To validate the model’s performance on the NVIDIA Jetson Orin Nano embedded development board, 100 randomly selected pomelo images from the test dataset were subjected to inference testing. TensorRT inference acceleration was employed to achieve maximum throughput and low latency. First, the PyTorch-trained model was exported as a static ONNX model, converting the weight files from PyTorch format (.pt) to ONNX format (.onnx). The onnxsim tool was used to optimize the network architecture, while the trtexec tool built the inference engine, converting the model into TensorRT inference format (.engine). The TensorRT engine was then used to achieve high-performance inference. The accuracy metric for evaluating model performance is the percentage of correctly detected pomelos, while detection time serves as the speed metric. Experimental results are presented in Table 4.
According to the edge device deployment test results shown in Table 4, on the Jetson Orin Nano embedded platform (with 4 GB memory) under the configuration of FP16 half-precision mode and single-batch processing, the improved YOLOv8n-seg-RepGhost model achieved an inference speed of 22.1 FPS (frames per second), which represents a 17.6% increase over the 18.8 FPS of the original YOLOv8n-seg model. Meanwhile, the detection success rate increased from 95% to 96.6%, achieving a simultaneous improvement in both accuracy and inference speed.
To validate the model’s detection performance in actual pomelo cultivation environments, image segmentation and recognition were conducted using the edge device in a pomelo orchard. Through a comparative deployment and detection of YOLOv8n-seg and YOLOv8n-seg-RepGhost, both models were employed to detect unoccluded pomelos, occluded pomelos, and overlapping pomelos in the same area. The scenario of the robotic arm picking operation in the actual orchard environment is presented in Figure 10.
As shown by the actual detection results of the model on edge computing devices in Figure 11, the improved YOLOv8n-seg-RepGhost model maintains stable segmentation performance in natural orchard environments. The model demonstrates good adaptability across different periods and lighting conditions, effectively overcoming challenges in recognition and segmentation caused by foliage interference and complex backgrounds. In terms of segmentation accuracy, the performance of the improved model is generally comparable to that of the original model. However, in segmentation details, the YOLOv8n-seg-RepGhost model achieves a more accurate fit to the contours of pomelos, with its edge segmentation results aligning more closely to the actual shapes of the fruits, demonstrating superior shape preservation capability. The actual deployment results indicate that while improving inference speed, the enhanced model not only maintains segmentation accuracy and robustness but also achieves further optimization in handling occlusions and edge details, validating its feasibility and practicality for deployment on edge computing platforms.

4. Discussion

In this study, a lightweight model for pomelo instance segmentation is proposed by embedding the RepGhost module into the YOLOv8n-seg backbone. The effectiveness of the model is primarily attributed to the “training–inference decoupling” mechanism of RepGhost. The training-phase multi-branch architecture enriches feature representation, improving the discrimination of multi-scale fruits against complex backgrounds. For deployment, structural reparameterization consolidates the architecture into a single, streamlined pathway, yielding substantial reductions in parameters and computations. The experimental validation confirms that this structural refinement successfully achieves lightweighting without sacrificing representational power, instead promoting efficient feature reuse. Consequently, the proposed method offers a practical resolution to the conflict between stringent performance requirements and the constrained computational resources typical of agricultural edge computing scenarios.
This study has several limitations. First, the model was refined and validated solely on pomelos; the generalizability of its lightweight design to other citrus fruits remains unverified. Second, under intense sunlight conditions, when the camera’s light intake is excessive, purple fringing may occur in the lens image, which could affect the stability of image segmentation.
The primary contribution of this work lies in delivering an efficient instance segmentation solution tailored to resource-limited agricultural environments. The proposed model satisfies the critical requirements of real-time performance and accuracy for automated pomelo harvesting and further proposes a reference architecture adaptable to broader fruit and vegetable recognition tasks. Its edge device deployment validates the practical feasibility of deploying advanced deep learning models in dynamic field conditions. Looking ahead, we identify three key research directions: first, enhancing model discernment in complex occluded scenarios through attention mechanisms; second, developing a comprehensive, cross-domain dataset that includes multiple varieties, growth phases, and environmental conditions to boost generalization; and third, pursuing tight hardware–software co-design between the vision system and harvesting actuators to enable fully integrated perception-to-action cycles, with subsequent field trials to transition the technology from research to deployment.

5. Conclusions

We propose YOLOv8n-seg-RepGhost, a lightweight pomelo segmentation model, by embedding RepGhost into YOLOv8-seg. Using structural reparameterization, it trains a multi-branch network for richer features, then infers via a single merged path for efficiency. This drastically cuts computation while retaining accuracy. Tests show the model reduces complexity yet stays robust to field variations, enabling real-time vision in resource-limited settings.
Nevertheless, this study has certain limitations. The segmentation robustness of the model in scenarios with extremely dense occlusions and severe target overlaps needs to be enhanced. Future work may incorporate attention mechanisms and contextual inference strategies to improve adaptability in complex environments. Furthermore, the current model training dataset is concentrated on a single crop variety and cultivation environment, with cross-domain generalization capabilities yet to be validated. Subsequent research will validate the model on multi-crop and multi-environment datasets while exploring transfer learning approaches such as domain adaptation to enhance its universality and practicality in real-world applications.

Author Contributions

Z.L.: funding acquisition, writing—review and editing. Z.Y.: conceptualization, methodology, writing—original draft. B.C.: methodology, software, investigation, validation, writing—original–final draft. Q.J.: supervision, writing—review and editing. S.L.: funding acquisition, writing—review and editing. X.C.: visualization, data curation. D.M.: investigation, validation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the open competition program of top ten critical priorities of Agricultural Science and Technology Innovation for the 14th Five-Year Plan of Guangdong Province (grant number 2024KJ27), National Natural Science Foundation of China (grant numbers 32271997 and 31971797), Guangzhou Key Research and Development Program (grant number 2024B03J1309), China Agriculture Research System of MOF and MARA (grant number CARS-26).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used in this study is available by contacting the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Pan, T.; Ali, M.M.; Gong, J.; She, W.; Pan, D.; Guo, Z.; Yu, Y.; Chen, F. Fruit physiology and sugar-acid profile of 24 pomelo (Citrus grandis (L.) Osbeck) cultivars grown in subtropical region of China. Agronomy 2021, 11, 2393. [Google Scholar] [CrossRef]
  2. Saini, P.; Nagesh, D.S. A review of deep learning applications in weed detection: UAV and robotic approaches for precision agriculture. Eur. J. Agron. 2025, 168, 127652. [Google Scholar] [CrossRef]
  3. Liu, L.; Yang, F.; Liu, X.; Du, Y.; Li, X.; Li, G.; Chen, D.; Zhu, Z.; Song, Z. A review of the current status and common key technologies for agricultural field robots. Comput. Electron. Agric. 2024, 227, 109630. [Google Scholar] [CrossRef]
  4. Yang, Q.; Du, X.; Wang, Z.; Meng, Z.; Ma, Z.; Zhang, Q. A review of core agricultural robot technologies for crop productions. Comput. Electron. Agric. 2023, 206, 107701. [Google Scholar] [CrossRef]
  5. Liu, X.; Zhang, Z.; Li, Y.; Igathinathane, C.; Yu, J.; Rui, Z.; Azizi, A.; Wang, X.; Pourreza, A.; Zhang, M. Early-stage detection of maize seed germination based on RGB image and machine vision. Smart Agric. Technol. 2025, 11, 100927. [Google Scholar] [CrossRef]
  6. Liu, X.; Zhao, D.; Jia, W.; Ji, W.; Sun, Y. A detection method for apple fruits based on color and shape features. IEEE Access 2019, 7, 67923–67933. [Google Scholar] [CrossRef]
  7. Lin, G.; Zou, X. Citrus segmentation for automatic harvester combined with adaboost classifier and Leung-Malik filter bank. IFAC-Pap. 2018, 51, 379–383. [Google Scholar] [CrossRef]
  8. Wu, Y.; Yu, X.; Zhang, D.; Yang, Y.; Qiu, Y.; Pang, L.; Wang, H. TinySeg: A deep learning model for small target segmentation of grape pedicels with multi-attention and multi-scale feature fusion. Comput. Electron. Agric. 2025, 237, 110726. [Google Scholar] [CrossRef]
  9. Dai, G.; Fan, J.; Dewi, C. ITF-WPI: Image and text based cross-modal feature fusion model for wolfberry pest recognition. Comput. Electron. Agric. 2023, 212, 108129. [Google Scholar] [CrossRef]
  10. Dong, C.; Yang, T.; Liu, L.; Wei, Z.; Shi, C.; Gao, D. Early Identification of Apple Bitter Pit Using Hyperspectral Imaging Technology. Appl. Food Res. 2025, 5, 101166. [Google Scholar] [CrossRef]
  11. Zhu, H.; Yang, L.; Fei, J.; Zhao, L.; Han, Z. Recognition of carrot appearance quality based on deep feature and support vector machine. Comput. Electron. Agric. 2021, 186, 106185. [Google Scholar] [CrossRef]
  12. Luo, L.; Tang, Y.; Lu, Q.; Chen, X.; Zhang, P.; Zou, X. A vision methodology for harvesting robot to detect cutting points on peduncles of double overlapping grape clusters in a vineyard. Comput. Ind. 2018, 99, 130–139. [Google Scholar] [CrossRef]
  13. Wang, C.; Tang, Y.; Zou, X.; SiTu, W.; Feng, W. A robust fruit image segmentation algorithm against varying illumination for vision system of fruit harvesting robot. Optik 2017, 131, 626–631. [Google Scholar] [CrossRef]
  14. Yang, Z.; Wang, X.; Qi, Z.; Wang, D. Recognizing strawberry to detect the key points for peduncle picking using improved YOLOv8 model. Trans. Chin. Soc. Agric. Eng. 2024, 40, 167–175. [Google Scholar]
  15. Chen, J.; Ma, A.; Huang, L.; Li, H.; Zhang, H.; Huang, Y.; Zhu, T. Efficient and lightweight grape and picking point synchronous detection model based on key point detection. Comput. Electron. Agric. 2024, 217, 108612. [Google Scholar] [CrossRef]
  16. Li, C.; Lin, J.; Li, Z.; Mai, C.; Jiang, R.; Li, J. An efficient detection method for litchi fruits in a natural environment based on improved YOLOv7-Litchi. Comput. Electron. Agric. 2024, 217, 108605. [Google Scholar] [CrossRef]
  17. Zhu, A.; Zhang, R.; Zhang, L.; Yi, T.; Wang, L.; Zhang, D.; Chen, L. YOLOv5s-CEDB: A robust and efficiency Camellia oleifera fruit detection algorithm in complex natural scenes. Comput. Electron. Agric. 2024, 221, 108984. [Google Scholar] [CrossRef]
  18. Yang, Z.; Gong, W.; Li, K.; Hao, W.; He, Z.; Ding, X.T.; Cui, Y.J. Fruit recognition and stem segmentation of the elevated planting of strawberries. Trans. Chin. Soc. Agric. Eng. 2023, 39, 172–181. [Google Scholar]
  19. Yang, H.; Yang, L.; Wu, T.; Yuan, Y.; Li, J.; Li, P. MFD-YOLO: A fast and lightweight model for strawberry growth state detection. Comput. Electron. Agric. 2025, 234, 110177. [Google Scholar] [CrossRef]
  20. Jin, S.; Zhou, L.; Zhou, H. CO-YOLO: A lightweight and efficient model for Camellia oleifera fruit object detection and posture determination. Comput. Electron. Agric. 2025, 235, 110394. [Google Scholar] [CrossRef]
  21. Fan, X.; Sun, T.; Chai, X.; Zhou, J. YOLO-WDNet: A lightweight and accurate model for weeds detection in cotton field. Comput. Electron. Agric. 2024, 225, 109317. [Google Scholar] [CrossRef]
  22. Li, J.; Li, J.; Zhao, X.; Su, X.; Wu, W. Lightweight detection networks for tea bud on complex agricultural environment via improved YOLO v4. Comput. Electron. Agric. 2023, 211, 107955. [Google Scholar] [CrossRef]
  23. Cao, L.; Wang, Q.; Luo, Y.; Hou, Y.; Zheng, W.; Qu, H. A yolov8-based lightweight detection model for different perspectives infrared images. Opt. Commun. 2025, 582, 131612. [Google Scholar] [CrossRef]
  24. Yao, J.; Li, Y.; Xia, Z.; Nie, P.; Li, X.; Li, Z. WTAD-YOLO: A Lightweight Tomato Leaf Disease Detection Model Based on YOLO11. Smart Agric. Technol. 2025, 12, 101349. [Google Scholar] [CrossRef]
  25. Liu, C.; Zhong, L.; Wang, J.; Huang, J.; Wang, Y.; Guan, M.; Li, X.; Zheng, H.; Hu, X.; Ma, X.; et al. Grain-YOLO: An improved lightweight YOLO v8 and its Android deployment for rice grains detection. Comput. Electron. Agric. 2025, 237, 110757. [Google Scholar] [CrossRef]
  26. Shen, W.; Dong, M.; Zhang, Z.; Hao, X.; Su, Y.; Xue, Z. ESLC-YOLOv8: Advancing Real-Time Pineapple Recognition with Lightweight Deep Learning. Smart Agric. Technol. 2025, 12, 101139. [Google Scholar] [CrossRef]
  27. Liu, Z.; Abeyrathna, R.M.R.D.; Sampurno, R.M.; Nakaguchi, V.M.; Ahamed, T. Faster-YOLO-AP: A lightweight apple detection algorithm based on improved YOLOv8 with a new efficient PDWConv in orchard. Comput. Electron. Agric. 2024, 223, 109118. [Google Scholar] [CrossRef]
  28. Li, J.; Karkee, M.; Zhang, Q.; Xiao, K.; Feng, T. Characterizing apple picking patterns for robotic harvesting. Comput. Electron. Agric. 2016, 127, 633–640. [Google Scholar] [CrossRef]
  29. Noroozi, M.; Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2016; pp. 69–84. [Google Scholar]
  30. Yue, X.; Qi, K.; Na, X.; Zhang, Y.; Liu, Y.; Liu, C. Improved YOLOv8-Seg network for instance segmentation of healthy and diseased tomato plants in the growth stage. Agriculture 2023, 13, 1643. [Google Scholar] [CrossRef]
  31. Li, H.; Huang, J.; Gu, Z.; He, D.; Huang, J.; Wang, C. Positioning of mango picking point using an improved YOLOv8 architecture with object detection and instance segmentation. Biosyst. Eng. 2024, 247, 202–220. [Google Scholar] [CrossRef]
  32. Chen, C.; Guo, Z.; Zeng, H.; Xiong, P.; Dong, J. Repghost: A hardware-efficient ghost module via re-parameterization. arXiv 2022, arXiv:2211.06088. [Google Scholar]
  33. Wang, J.; Ma, S.; Wang, Z.; Ma, X.; Yang, C.; Chen, G.; Wang, Y. Improved Lightweight YOLOv8 Model for Rice Disease Detection in Multi-Scale Scenarios. Agronomy 2025, 15, 445. [Google Scholar] [CrossRef]
Figure 1. Pomelo images in diverse settings. (a) Whole fruit; (b) Obstructed by foliage; (c) Overlapping fruit; (d) Featuring highlights or shadows.
Figure 1. Pomelo images in diverse settings. (a) Whole fruit; (b) Obstructed by foliage; (c) Overlapping fruit; (d) Featuring highlights or shadows.
Agriculture 16 00186 g001
Figure 2. Pomelo images under varying lighting conditions. (a) Overexposure; (b) Underexposure; (c) Correct exposure.
Figure 2. Pomelo images under varying lighting conditions. (a) Overexposure; (b) Underexposure; (c) Correct exposure.
Agriculture 16 00186 g002
Figure 3. Results of automatic annotation using the SAM model. Note: The yellow box indicates an error. The red box indicates the correct label.
Figure 3. Results of automatic annotation using the SAM model. Note: The yellow box indicates an error. The red box indicates the correct label.
Agriculture 16 00186 g003
Figure 4. Results of manual fine-tuning using LabelMe. Note: The red box indicates the correct label.
Figure 4. Results of manual fine-tuning using LabelMe. Note: The red box indicates the correct label.
Agriculture 16 00186 g004
Figure 5. Data augmentation. (a) Original image; (b) Translation, rotation, and scaling; (c) Brightness and saturation variations; (d) Random occlusion. Note: Gray squares indicate random occlusion.
Figure 5. Data augmentation. (a) Original image; (b) Translation, rotation, and scaling; (c) Brightness and saturation variations; (d) Random occlusion. Note: Gray squares indicate random occlusion.
Agriculture 16 00186 g005
Figure 6. YOLOv8-seg-RepGhost Architecture Diagram.
Figure 6. YOLOv8-seg-RepGhost Architecture Diagram.
Agriculture 16 00186 g006
Figure 7. Bottleneck structure of the RepGhost module. (a) Training bottlenecks; (b) Inference bottlenecks.
Figure 7. Bottleneck structure of the RepGhost module. (a) Training bottlenecks; (b) Inference bottlenecks.
Agriculture 16 00186 g007
Figure 8. Curvature of P, R, mAP50 and mAP50–95 under different models. (a) Recall training curve comparison; (b) Precision training curve comparison; (c) mAP50–95 training curve comparison; (d) mAP50 training curve comparison.
Figure 8. Curvature of P, R, mAP50 and mAP50–95 under different models. (a) Recall training curve comparison; (b) Precision training curve comparison; (c) mAP50–95 training curve comparison; (d) mAP50 training curve comparison.
Agriculture 16 00186 g008
Figure 9. Segmentation results of pomelo fruit using different models. (a) Original image, (b) Yolov8n-seg, (c) Yolov8n-seg-RepGhost.
Figure 9. Segmentation results of pomelo fruit using different models. (a) Original image, (b) Yolov8n-seg, (c) Yolov8n-seg-RepGhost.
Agriculture 16 00186 g009aAgriculture 16 00186 g009b
Figure 10. Schematic Diagram of Harvesting Manipulator in Orchard Operation.
Figure 10. Schematic Diagram of Harvesting Manipulator in Orchard Operation.
Agriculture 16 00186 g010
Figure 11. Detection performance of the model deployed on edge computing devices. (a) YOLOv8n-seg-RepGhost, (b) YOLOv8n-seg.
Figure 11. Detection performance of the model deployed on edge computing devices. (a) YOLOv8n-seg-RepGhost, (b) YOLOv8n-seg.
Agriculture 16 00186 g011
Table 1. Training Parameter Settings.
Table 1. Training Parameter Settings.
ParameterValue
Batch size16
Epochs250
Learn rate0.01
Momentum0.937
Weight decay times0.0005
Input image resolution1280 × 800
OptimizerAdamW
Table 2. Model Parameter and Computational Complexity Comparison.
Table 2. Model Parameter and Computational Complexity Comparison.
ModelNumber of LayersParametersComputational Complexity
YOLOv8n-seg261 layers3.4 million12.8 GFLOPs
YOLOv8n-seg-RepGhost388 layers2.84 million10.9 GFLOPs
Table 3. Comparison of mainstream models.
Table 3. Comparison of mainstream models.
TaskModelPrecision
(Mean ± Std)
Recall
(Mean ± Std)
mAP50
(Mean ± Std)
mAP50–95
(Mean ± Std)
BoxMask R-CNN80.5 ± 0.06%75.6 ± 0.2%80.5 ± 0.01%63.6 ± 0.03%
YOLOv5n-seg97.9 ± 0.8%95.58 ± 0.06%97.4 ± 0.01%91.98 ± 0.01%
YOLOv8n-seg98.06 ± 0.36%93.29 ± 0.79%96.42 ± 0.60%91.80 ± 1.63%
YOLOv8n-seg-RepGhost97.41 ± 1.15%92.91 ± 0.88%96.26 ± 0.80%90.56 ± 1.79%
MaskMask R-CNN80.4 ± 0.06%71.9 ± 0.2%80.4 ± 0.02%68.2 ± 0.01%
YOLOv5n-seg97.84 ± 0.02%95.56 ± 0.06%97.4 ± 0.01%91.66 ± 0.02%
YOLOv8n-seg98.02 ± 0.38%93.44 ± 0.79%96.54 ± 0.46%89.58 ± 1.27%
YOLOv8n-seg-RepGhost97.38 ± 1.10%93.08 ± 0.83%96.47 ± 0.67%88.63 ± 1.41%
Table 4. Comparison of Detection Rates Across Models Following Device Deployment.
Table 4. Comparison of Detection Rates Across Models Following Device Deployment.
DeviceMemory/GBPrecision ModeBatch SizeFPS/(Frames)Detection Success Rate
YOLOv8n-SegYOLOv8n-Seg-RepGhostYOLOv8n-SegYOLOv8n-Seg-RepGhost
Jetson Orin Nano4FP16118.822.195%96.6%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Z.; Cao, B.; Yu, Z.; Jin, Q.; Lyu, S.; Chen, X.; Mao, D. Lightweight Improvements to the Pomelo Image Segmentation Method for Yolov8n-seg. Agriculture 2026, 16, 186. https://doi.org/10.3390/agriculture16020186

AMA Style

Li Z, Cao B, Yu Z, Jin Q, Lyu S, Chen X, Mao D. Lightweight Improvements to the Pomelo Image Segmentation Method for Yolov8n-seg. Agriculture. 2026; 16(2):186. https://doi.org/10.3390/agriculture16020186

Chicago/Turabian Style

Li, Zhen, Baiwei Cao, Zhengwei Yu, Qingting Jin, Shilei Lyu, Xiaoyi Chen, and Danting Mao. 2026. "Lightweight Improvements to the Pomelo Image Segmentation Method for Yolov8n-seg" Agriculture 16, no. 2: 186. https://doi.org/10.3390/agriculture16020186

APA Style

Li, Z., Cao, B., Yu, Z., Jin, Q., Lyu, S., Chen, X., & Mao, D. (2026). Lightweight Improvements to the Pomelo Image Segmentation Method for Yolov8n-seg. Agriculture, 16(2), 186. https://doi.org/10.3390/agriculture16020186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop