RGHD: A Risk-Gated Harvestability Decision Framework for Occlusion-Aware Greenhouse Melon Harvesting

Song, Shijun; Qu, Huixing; Wang, Shaowei; Yang, Huawei; Hao, Yongbing; Zhang, Guohai

doi:10.3390/agriculture16050589

Open AccessArticle

RGHD: A Risk-Gated Harvestability Decision Framework for Occlusion-Aware Greenhouse Melon Harvesting

by

Shijun Song

¹,

Huixing Qu

^2,3,

Shaowei Wang

^2,3

,

Huawei Yang

^2,3

,

Yongbing Hao

⁴ and

Guohai Zhang

^1,5,*

¹

College of Agricultural Engineering and Food Science, Shandong University of Technology, Zibo 255000, China

²

Shandong Academy of Agricultural Machinery Science, Jinan 250100, China

³

Shandong Key Laboratory of Intelligent Agricultural Equipment in Hilly and Mountainous Areas, Jinan 250100, China

⁴

Zibo Quality Supervision and Inspection Institute, Zibo 255000, China

⁵

Institute of Modern Agricultural Equipment, Shandong University of Technology, Zibo 255000, China

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(5), 589; https://doi.org/10.3390/agriculture16050589

Submission received: 26 January 2026 / Revised: 22 February 2026 / Accepted: 3 March 2026 / Published: 4 March 2026

(This article belongs to the Section Agricultural Technology)

Download

Browse Figures

Versions Notes

Abstract

Greenhouse melon harvesting is challenged by leaf occlusion and dense clustering, which often lead to unsafe harvesting attempts in conventional detection-only systems. To prioritize operational safety, this study proposes the Risk-Gated Harvestability Decision (RGHD) framework. The approach decouples candidate fruit detection from risk-aware reasoning by integrating a lightweight YOLO11n detector with an EGSA-enhanced ShuffleNetV2 occlusion classifier. A logic-gated module then fuses multi-source cues—occlusion, overlap, and scale—to enforce a Safety-First harvesting policy. Experimental results show the detector achieves an mAP@0.5:0.95 of 75.8% while running at 113.3 FPS. Under the Safety-First policy, the proxy unsafe-acceptance rate (FPR under our operational proxy) decreased from 8.7% to 4.4%, corresponding to a 49.4% relative reduction in the risk of unsafe attempts, while maintaining 88.0% decision precision. Although an Efficiency-First mode is available for high throughput (91.0% recall), the Safety-First strategy provides the robust crop protection necessary for autonomous systems. Overall, RGHD provides a lightweight, risk-aware decision layer that improves operational safety while preserving real-time performance in cluttered greenhouse scenes.

Keywords:

greenhouse melon harvesting; RGHD (risk-gated harvestability decision); harvestability assessment; occlusion-aware perception; YOLO11n

1. Introduction

Muskmelon is a high-value horticultural crop valued for sweetness and aroma, with strong market demand. In China, the thick-skinned cultivar ‘Boyang No. 9’ is widely grown in protected systems due to its stable yield and superior eating quality [1]. Shandong Province, supported by advanced facility agriculture, has become a core region for standardized and industrialized production of ‘Boyang No. 9’, achieving both high output and consistent quality [2]. Aroma formation further underpins its commercial value and consumer preference [3]. Greenhouse cultivation dominates muskmelon production; however, at the ripening stage, fruits are often low-hanging and embedded in dense vines and leaves, while harvesting occurs under warm and humid conditions. These factors lead to poor working conditions, high labor intensity, and limited picking efficiency [4]. With rising labor shortages and production costs, harvesting robots are increasingly considered a practical approach to improving efficiency and sustainability in high-value horticulture [5]. Recent surveys and reviews have highlighted rapid progress in fruit and vegetable harvesting robots, while consistently emphasizing occlusion, target clustering/overlap, and constrained onboard computation as persistent barriers to reliable field deployment [6,7,8]. In practice, unsafe or failed picking attempts can damage fruits and vines and increase subsequent labor demand, making greenhouse harvesting decisions inherently risk-sensitive, especially under occlusion and clustering.

The performance of harvesting robots depends on converting perception outputs into reliable harvesting decisions (detection, localization, and harvestability). Recent surveys note that this perception-to-action conversion remains challenging in greenhouse scenes under visual clutter and limited onboard computation [9]. Machine vision has, therefore, become central to robotic harvesting [10,11]. Handcrafted-feature methods are often brittle under heavy canopy clutter and varying illumination [12], RGB–D fusion improves localization and picking-point estimation [13,14], and lightweight deep-learning models enable real-time perception on edge devices in complex horticultural environments [15,16,17,18,19,20].

Nevertheless, a detected fruit is not necessarily a safe target for manipulation. Graspable regions may be partially hidden, and collision risk and accessibility depend strongly on leaf occlusion and fruit crowding/overlap. Recent harvesting systems, therefore, increasingly incorporate active sensing and viewpoint planning [21,22,23] or 3D pose-aware perception [24,25,26] to improve downstream end-effector success rather than detection accuracy alone. This indicates a gap between detection and action: without explicit harvestability reasoning, robots may attempt to harvest fruits that are visually detected but operationally risky, resulting in collisions, low success rates, and potential fruit/vine damage.

To bridge this gap, we propose a Risk-Gated Harvestability Decision (RGHD) framework designed for agricultural edge devices. Given the limited compute budget in greenhouse platforms, RGHD adopts a multi-stage design that separates fast candidate generation from lightweight risk assessment because detector confidence alone does not explicitly reflect occlusion severity or collision-prone crowding. Specifically, a YOLO11n detector generates fruit candidates, and a ShuffleNetV2 classifier with an edge-guided spatial attention module estimates occlusion severity. The classifier takes the detected ROI as input to estimate occlusion severity from local boundary cues, which cannot be reliably inferred from box-level scores or overlap-based postprocessing alone. To avoid costly 3D segmentation, clustering risk is approximated using an efficient 2D IoU-based metric. These cues are combined in a lightweight decision gate with two configurable policies—Safety-First and Efficiency-First—to trade off throughput and operational safety. Experiments on a greenhouse dataset of ‘Boyang No. 9’ melons show that RGHD converts detections into actionable harvesting decisions, reducing unsafe attempts while maintaining real-time performance.

The main contributions are as follows:

Decoupling Perception and Decision: We introduce a modular framework that separates target detection from harvestability reasoning, enabling transparent failure analysis and trusted human–robot interaction under complex occlusion and interference.
Edge-Enhanced Occlusion Grading: We design an ROI-based classification module incorporating edge-guided spatial attention (EGSA), which significantly improves the recognition of occlusion levels by emphasizing boundary cues while maintaining low inference latency.
Tunable Risk Policies: We formulate a multi-source risk fusion mechanism that allows the system to switch between “Safety-First” and “Efficiency-First” operating modes, allowing mode selection under different crowding conditions of the greenhouse environment.

2. Materials and Methods

2.1. Framework Overview

The RGHD framework is designed to decouple visual perception from decision logic, processing single-frame RGB images to output specific harvestability commands. As illustrated in Figure 1, the pipeline operates in three sequential stages:

Candidate Generation: A lightweight detector (YOLO11n) scans the global view to localize all potential melon targets and generate initial Regions of Interest (ROIs).
Risk Perception: Each candidate ROI is analyzed individually to quantify environmental constraints. This involves a fine-grained classification network for Occlusion Severity and a geometric analyzer for Spatial overlap risks.
Gated Decision: A logical gate integrates the multi-source risk cues and selects between the “Efficiency-First” and “Safety-First” operating policies. Candidates that do not meet the selected policy are skipped and the system moves on to the next target. Importantly, a rejection is not treated as a permanent exclusion: unharvested fruits remain on the vine and can be re-assessed in later harvesting cycles. As occlusion and crowding conditions change over time—due to vine growth, leaf movement, or routine canopy management—previously rejected fruits may become harvestable.

2.2. Data Collection and Dataset Construction

Data were collected at a commercial greenhouse melon production base in Zhoucun District, Zibo City, Shandong Province, China. The target crop was the thick-skinned muskmelon cultivar ‘Boyang No. 9’. Image acquisition was conducted from late March to early April 2025 using a vivo smartphone (vivo, Dongguan, China; Model: iQOO Neo7, equipped with a 50-megapixel main CMOS sensor, f/1.88 aperture, and 23 mm equivalent focal length). Images were captured in auto-exposure and auto-white-balance modes without HDR processing, at an original resolution of 3060 × 3060 pixels, and at a shooting distance of 30–50 cm from the canopy. While commercial harvesting robots typically employ industrial vision sensors (e.g., RGB-D cameras), a high-resolution smartphone camera was selected for this initial dataset construction to cost-effectively and rapidly capture high-fidelity RGB data representing complex real-world canopy features. To ensure the dataset captured the complexity of real cultivation conditions, sampling covered varying illumination conditions (front-lighting and backlighting with dynamic shadowing) across different times of day, and included scenes with fruit overlap and dense foliage occlusion (Figure 2). A total of 945 raw RGB images were obtained and subsequently standardized to 640 × 640 pixels.

To increase data diversity, geometric augmentations (random horizontal flipping and scaling) and pixel-level augmentations (Gaussian noise and motion blur) were applied to generate 5670 images from 945 original RGB images. The dataset was then split in a group-wise manner into training, validation, and test sets containing 3972, 1128, and 570 images, respectively. Specifically, all augmented variants derived from the same original image were kept in the same subset to prevent information leakage.

2.2.1. Labeling Scheme

To support the full “localization–status–decision” pipeline, the dataset includes two types of annotations:

Detection annotations: YOLO-format bounding boxes for each fruit instance.

Occlusion levels:

Occ = 0: no occlusion; fruit boundary is complete and clearly visible;
Occ = 1: minor occlusion; fruit remains identifiable but the visible boundary is partially missing;
Occ = 2: severe occlusion; a large portion of the fruit body is occluded and the boundary is largely incomplete; harvesting is not recommended.

All annotations were produced by a single annotator and were rechecked in a second pass to ensure label consistency.

2.2.2. Definition of Fruit Overlap State and IoU Inference

Fruit overlap describes the degree of interleaving and boundary blending between fruits in image space and reflects separability and potential interference risk during harvesting. Three overlap levels (ovl) are defined as:

ovl = 0: no significant overlap;
ovl = 1: partial overlap, which may require adjusting the grasping approach or obstacle avoidance;
ovl = 2: extensive overlap with strong boundary blending; direct picking is not recommended.

The overlap status is not manually annotated; it is derived by post-processing the detection outputs. For each target bounding box

b_{i},

the IoU is computed with all other boxes

b_{j}

in the same frame

(i \neq j)

. The maximum IoU value is used as the overlap risk score:

{I o U}_{i}^{m a x} = \max_{i \neq j} I o U (b_{i}, b_{j})

(1)

This score is mapped to ovl using two thresholds

τ_{1}

and

τ_{2}

:

o v l = \{\begin{array}{l} 0, {I o U}_{i}^{m a x} < τ_{1} \\ 1, τ_{1} \leq {I o U}_{i}^{m a x} < τ_{2} \\ 2, {I o U}_{i}^{m a x} \geq τ_{2} \end{array}

(2)

In this study,

τ_{1} = 0.05

and

τ_{2} = 0.20

, which were empirically set based on preliminary observations to represent low, moderate, and high overlap interference, respectively.

2.2.3. Dataset Statistics

Table 1 summarizes the dataset distribution across occlusion levels and overlap levels after data augmentation.

2.3. Stage I: Fruit Detection and Stable Candidate Selection

Stage I aims to generate bounding boxes for all visible melons in a single-frame RGB image and to construct per-fruit ROI candidates for subsequent analysis. Considering frequent occlusion and illumination variation in greenhouse scenes, the detector should balance detection capability with real-time performance and edge-deployment constraints. Therefore, we adopt the lightweight YOLO11n model as the front-end detector in Figure 3 [27].

For each input image, YOLO11n outputs a set of melon bounding boxes. Each bounding box is then used to extract a corresponding ROI by cropping the original image (with boundary clipping when a box extends beyond the image border). The resulting ROI images serve as candidate inputs to Stage II for occlusion-state perception and subsequent risk modeling. In this study, YOLO11n is used primarily as a box generator to provide complete melon instances for ROI construction, without modifying the detector architecture.

2.4. Stage II: Fruit State Perception

2.4.1. ROI Generation

For each detected fruit, a region of interest (ROI) is cropped from the original image. To preserve contextual cues that are informative for occlusion assessment, the detection box is expanded by 10% in both width and height before cropping, and the expanded box is clipped to the image boundaries.

The ROI is then normalized to 224 × 224 while preserving the aspect ratio: the longer side is resized to 224 pixels, and the remaining area is padded to form a square input. In our implementation, constant-value padding is used to avoid geometric distortion and ensure a consistent input size for the classifier. Representative examples of cropped ROI under different occlusion and overlap conditions are shown in Figure 4.

2.4.2. Occlusion Classification Network

Occlusion perception is formulated as a three-class classification task (occ

\in

{0, 1, 2}). To meet edge and mobile deployment constraints, we adopt ShuffleNetV2 as the backbone and introduce an Edge-Guided Spatial Attention (EGSA) module to enhance occlusion-relevant boundary cues while suppressing background texture interference [28]. During training, weighted cross-entropy (WCE) is used to address class imbalance and to reduce high-cost errors, particularly misclassifying severely occluded targets as low-risk cases.

To ensure a fair evaluation and avoid information leakage, the augmented dataset is split such that all augmented variants derived from the same original image are assigned to the same subset (train/valid/test). ROI samples inherit the split of their source images, preventing correlated samples from appearing across subsets.

2.4.3. Edge-Guided Spatial Attention (EGSA)

Occlusion-level perception in greenhouse images is challenging because discriminative cues are often concentrated around local boundaries where the fruit is partially covered by leaves. To enhance boundary-sensitive representations with minimal overhead, we introduce a lightweight Edge-Guided Spatial Attention (EGSA) module into the feature extraction stage of the baseline network.

As illustrated in Figure 5, EGSA constructs a spatial attention map by combining two complementary cues derived from an intermediate feature map: (i) a structural response map obtained by channel-mean aggregation, and (ii) an edge hint map produced by Sobel-based edge extraction followed by normalization. These two maps are concatenated and passed through a 1 × 1 convolution and a sigmoid activation to generate the spatial attention map, which is then broadcast to all channels and used to reweight the original feature map:

F^{'} = M_{s} ⊙ F

(3)

Here,

⊙

denotes element-wise multiplication with broadcasting along the channel dimension. Because EGSA relies on a fixed edge operator and only one 1 × 1 convolution, it introduces minimal parameters and computation. This design suppresses background leaf-texture interference and emphasizes occlusion-relevant fruit boundaries and partially visible fruit regions, thereby improving separability among occlusion levels.

2.4.4. Occlusion Classification with Weighted Cross-Entropy

Due to the uneven distribution of samples across occlusion levels and the fact that misclassifying severe occlusion (occ = 2) as a low-risk class may induce overly aggressive picking decisions, we adopt a weighted cross-entropy (WCE) loss to reduce safety-critical errors. Let

P_{k}

denote the predicted probability of class

k

and

y_{k}

denote the one-hot ground-truth label. The WCE loss is defined as:

L_{W C E} = - \sum_{k = 0}^{K - 1} w_{k} y_{k} l o g (P_{k})

(4)

where

k = 3

in this study and

w_{k}

is the class weight.

To avoid information leakage, class weights are computed using only the training set. Let

f_{k}

be the number of training samples in class

k

. We set

w_{k} \propto 1 / f_{k}

and normalize the weights such that

\sum_{k = 0}^{K - 1} w_{k} = K

:

w_{k} = \frac{1 / f_{k}}{\sum_{j = 0}^{K - 1} 1 / f_{j}} \times K

(5)

This weighting strategy increases the penalty for under-represented and higher-risk categories, improving recognition of severely occluded samples while maintaining overall performance.

2.5. Fruit Overlap Risk Estimation

In a single frame, the detector outputs a set of melon bounding box candidates

{\{b_{i}\}}_{i = 1}^{N}

. In greenhouse scenes with frequent fruit clustering, substantial overlap between candidates in image space often indicates potential spatial interference for subsequent manipulation. To enable online assessment without introducing additional learning modules, we adopt a geometry-based overlap estimator based on the Intersection over Union (IoU), as illustrated in Figure 6.

The IoU between two candidate boxes

b_{i}

and

b_{j}

is computed as:

I o U (b_{i}, b_{j}) = \frac{|b_{i} \cap b_{j}|}{|b_{i} \cup b_{j}|}

(6)

For each candidate

i

, the overlap risk score is defined as the maximum IoU with any other candidate in the same frame:

r_{i}^{o v l} = \max_{j \neq i} I o U (b_{i}, b_{j})

(7)

which characterizes the most unfavorable local crowding condition around the target.

The continuous score is then mapped to a discrete overlap level

o v l \in \{0,1, 2\}

using two thresholds

τ_{1}

and

τ_{2}

:

{o v l}_{i} = \{\begin{array}{l} 0, r_{i}^{o v l} < τ_{1} \\ 1, τ_{1} \leq r_{i}^{o v l} < τ_{2} \\ 2, r_{i}^{o v l} \geq τ_{2} \end{array}

(8)

where the thresholds are determined according to operational safety requirements and validation-set statistics.

In this study,

τ_{1} = 0.05

and

τ_{2} = 0.20

, consistent with the overlap-level definition in Section 2.2.2. The resulting overlap level is used as an input to the subsequent multi-source risk fusion and gated decision.

2.6. Risk-Gated Harvestability Decision (RGHD)

Operational definition and scope. In this work, “harvestability” is operationally defined by observable visual cues in RGB images-occlusion, 2D overlap, and relative scale-that characterize candidate ambiguity in cluttered greenhouse canopies. RGHD focuses on risk-aware candidate gating for conservative decision-making under crowding, providing adjustable policies rather than estimating execution outcomes. Accordingly, the reported unsafe acceptance/rejection rates quantify decision risk with respect to the adopted cues and gating rules. Although harvestability is defined operationally based on visual cues in this study, these criteria are consistent with practical greenhouse harvesting principles, where severe occlusion, strong fruit–fruit interference, and insufficient visible fruit area are commonly regarded as indicators of non-harvestable targets by experienced workers.

After occlusion-level prediction and overlap-risk quantification, RGHD fuses multi-source cues and outputs a harvestability decision through a rule-based gate in Figure 7. For each candidate fruit

i

, three inputs are used: (1) the occlusion level

{o c c}_{i} \in \{0,1, 2\}

predicted by the ShuffleNetV2-based occlusion classifier; (2) the overlap level

{o v l}_{i} \in \{0,1, 2\}

derived from IoU-based mapping [29]; (3) a scale-related cue

s_{i}

computed from the normalized bounding box area:

s_{i} = \frac{A_{i}}{A_{i m g}}

(9)

Where

A_{i}

denotes the area of the candidate bounding box and

A_{i m g}

is the image area. A candidate is considered harvestable only when occlusion risk, spatial-interference risk, and scale constraints jointly satisfy the selected policy. Two operating modes are provided to support different safety–efficiency preferences:

Safety-First:

y_{i}^{p i c k} = I [{o c c}_{i} \leq 0 \land {o v l}_{i} \leq 1 \land s_{i} \geq 0.05]

(10)

Efficiency-First:

y_{i}^{p i c k} = I [{o c c}_{i} \leq 1 \land {o v l}_{i} \leq 1 \land s_{i} \geq 0.05]

(11)

In this study, overlap levels are mapped from the maximum IoU score using two thresholds

τ_{1} = 0.05

and

τ_{2} = 0.20

, and

{o v l}_{i} \leq 1

is used in the gate to exclude candidates with severe overlap/interference. The scale threshold

s_{i} \geq 0.05

is used to filter extremely small candidates, which are more likely to be distant, partially visible, or unreliable for decision-making under a single-view setting.

3. Results

3.1. Experimental Setup and Training Configuration

Experiments were conducted on a workstation equipped with an AMD Ryzen 5 5600 CPU (AMD, Santa Clara, CA, USA) and an NVIDIA GeForce RTX 4060 GPU (8 GB) (NVIDIA, Santa Clara, CA, USA). The software environment consisted of Python 3.8 and PyTorch 1.13.1 with CUDA 11.3 support.

The Stage I YOLO11n detector was trained for 200 epochs (batch size 16), and the Stage II occlusion classifier was trained for 100 epochs (batch size 16). For YOLO11n, the input resolution was set to 640 × 640 and the initial learning rate was set to 0.01. Stochastic gradient descent (SGD) with a momentum of 0.937 and a weight decay of 0.0005 was used as the optimizer. For the occlusion classifier, ROI inputs were resized to 224 × 224, and the network was optimized using the AdamW optimizer implemented in PyTorch (v1.13.1) with an initial learning rate of 5 × 10⁻⁴. The main training hyperparameters for Stage I and Stage II are summarized in Table 2.

3.2. Evaluation Indicators

3.2.1. Object Detection Evaluation Metrics

For object detection models, we employ the widely adopted mAP@0.5 and mAP@0.5:0.95 to assess detection accuracy while using Recall@0.5 to quantify recall rates for detected objects. Additionally, we measure inference speed (FPS) and model parameter count (Model Size) to evaluate an algorithm’s real-time performance and resource consumption during engineering deployment.

3.2.2. Harvestability Decision Evaluation Metrics

Harvestability gating in RGHD is formulated as a binary decision task. In this study, we evaluate gating outputs under an operational proxy definition derived from occlusion annotations and geometry-based cues (overlap and relative scale). Specifically, a proxy low-risk instance is defined as (occ ≤ 1) ∧ (ovl ≤ 1) ∧ (s ≥ 0.05); all other cases are treated as proxy high-risk.

To decouple detection failures from policy behavior, predicted candidates are first matched to annotated fruit objects using IoU ≥ 0.5, and policy statistics are computed only on the matched pairs. The system performance is evaluated using Accuracy (

A c c

), Precision (

P

), Recall (

R

), and Macro-

F 1

score, which are calculated based on True Positives (

T P

), False Positives (

F P

), True Negatives (

T N

), and False Negatives (

F N

):

A c c = \frac{T P + T N}{T P + T N + F P + F N}

(12)

P r e c i s i o n = \frac{T P}{T P + F P}

(13)

R e c a l l = \frac{T P}{T P + F N}

(14)

For multi-class occlusion evaluation, the Macro-

F 1

score is used to mitigate class imbalance by averaging the

F 1

-scores across all

K

classes:

M a c r o - F 1 = \frac{1}{K} \sum_{i = 1}^{K} \frac{2 \times {P r e c i s i o n}_{i} \times {R e c a l l}_{i}}{{P r e c i s i o n}_{i} + {R e c a l l}_{i}}

(15)

To evaluate the safety-efficiency trade-off, we report acceptance-related statistics alongside two customized risk-oriented rates:

F P R = \frac{F P}{F P + T N}

(16)

F N R = \frac{F N}{F N + T P}

(17)

Here,

F P R

denotes the proxy unsafe-acceptance rate (accepting proxy high-risk cases), and

F N R

denotes the proxy missed-opportunity rate (rejecting proxy low-risk cases). Notably, these metrics represent decision-level error rates based on the operational proxies, rather than standard detector-level false positives or false negatives. Jointly reporting

F P R

/

F N R

with

P r e c i s i o n

/

R e c a l l

/

F 1

provides a compact description of the operating characteristics under Safety-First and Efficiency-First policies.

3.3. Performance Comparison of Different Fruit Detection Models

To evaluate the suitability of different object detectors for greenhouse melon scenes, we compared several mainstream lightweight detectors under the same dataset split, input resolution (640 × 640), and training protocol. Detection accuracy was evaluated using mAP@0.5 and mAP@0.5:0.95, together with Precision and Recall@0.5. Inference speed (FPS) and model weight size (MB) were also reported to reflect real-time performance and deployment cost. The results are summarized in Table 3.

Among all compared detectors, YOLO11n achieved the highest mAP@0.5 (94.8%) and a high Recall@0.5 (89.9%), indicating improved detection reliability in the presence of illumination variation, occlusion, and fruit clustering. From a greenhouse harvesting perspective, higher recall is particularly important because missed detections directly translate into missed harvesting opportunities, whereas moderate over-detection can be further filtered by subsequent harvestability decision modules. From a deployment perspective, YOLO11n delivered 113.3 FPS with a compact model size of 5.24 MB, providing a favorable balance between accuracy, recall, speed, and model footprint. Therefore, YOLO11n was selected as the front-end detector for subsequent ROI construction, fruit-state perception, and risk-gated harvestability decision-making.

To verify the training stability of the selected YOLO11n model, its complete training curves are provided in Figure 8.

3.4. Occlusion Classification Network Design and Performance Comparison

Occlusion Classification Network Design and Performance Comparison Occlusion level is a key state variable affecting harvesting safety, and failing to identify severe occlusion (occ = 2) can readily lead to high-risk proxy unsafe acceptance. Therefore, we formulate fruit occlusion recognition as an ROI-based three-class classification task and conduct a systematic evaluation from three aspects: backbone selection, attention-module design, and loss-function configuration.

3.4.1. Baseline Backbone Comparison: Architecture–Accuracy–Speed Trade-Off

Occlusion level is a key state variable affecting harvesting safety, and failing to identify severe occlusion (occ = 2) may increase the risk of proxy unsafe acceptance. Therefore, we formulate occlusion recognition as an ROI-based three-class classification task. We first compare several lightweight backbones under the same training protocol and input size (224 × 224) and report overall accuracy (Acc), Macro-F1, class-wise recall for occ = 0/1/2, and inference latency per ROI. The results are summarized in Table 4.

Overall, ShuffleNetV2 achieves the best balance between classification performance (Acc = 85.4%, Macro-F1 = 82.8%) and latency (6.70 ms per ROI) among the evaluated lightweight backbones, and is therefore selected as the baseline for subsequent attention-module and loss-function ablation studies.

3.4.2. Effect of Attention Mechanisms on Occlusion Recognition Performance

To evaluate the effectiveness of attention mechanisms for occlusion recognition, we compared four variants under the same backbone (ShuffleNetV2) and identical training settings, with the loss function fixed to standard cross-entropy (CE): no attention (Baseline), SE, CBAM, and the proposed EGSA. The results are summarized in Table 5.

Overall, introducing attention improves Macro-F1 and increases recall for the safety-critical class (occ = 2). Among the tested modules, EGSA achieves the best overall performance (Acc = 87.5%, Macro-F1 = 85.2%, Recall (occ = 2) = 97.0%) while maintaining low inference latency (6.85 ms per ROI). Compared with SE and CBAM, EGSA provides a stronger accuracy/F1 gain with comparable or lower latency, suggesting that edge-guided spatial weighting can better emphasize occlusion-relevant boundary regions and suppress background texture interference under greenhouse conditions. These results support EGSA as an effective and lightweight attention design for edge-oriented occlusion recognition. This mechanism is illustrated using Grad-CAM heatmaps(Figure 9), generated using an in-house implementation based on PyTorch autograd (PyTorch v1.13.1) and saved with Matplotlib v3.7.5 For occluded targets, the EGSA classifier shows high activation concentrated near the fruit–leaf boundary. This suggests that the harvestability gate is influenced by boundary-related cues near fruit–leaf contact regions, rather than only global texture patterns.

3.4.3. Effect of Different Loss Functions on Occlusion Classification Performance

To address class imbalance among occlusion levels and the asymmetric cost of misclassifying safety-critical cases, we compared three loss functions for occlusion classification: standard cross-entropy (CE), weighted cross-entropy (WCE), and focal loss. All settings were kept identical except for the loss function. The results are reported in Table 6, and the convergence behavior during training is illustrated in Figure 10.

Overall, WCE achieves the best performance across the evaluated metrics. Compared with CE, WCE improves Acc from 85.4% to 86.9% and Macro-F1 from 82.8% to 84.5%, while also increasing Recall (occ = 2) from 96.2% to 97.0. In contrast, focal loss provides a smaller gain for the severe-occlusion class and remains inferior to WCE in Acc and Macro-F1. Therefore, WCE is selected as the training loss for the occlusion classification module in subsequent experiments.

3.4.4. Ablation Study of the Occlusion Classification Model

Based on the comparative results of backbone selection, attention mechanisms, and loss functions, the final occlusion classifier is configured as ShuffleNetV2 + EGSA + WCE. EGSA incurs only a small inference overhead, with latency increasing from 6.70 to 6.85 ms per ROI, while WCE affects training only and does not increase inference latency. As shown in Table 7, the combination of EGSA and WCE yields the best overall performance, achieving an accuracy of 89.2%, a Macro-F1 of 87.8%, and the highest Recall (occ = 2) of 97.5%. This provides a more reliable occlusion-state input for downstream risk modeling and harvestability gating.

Relative to the baseline configuration (ShuffleNetV2 trained with CE and without attention), the final model improves accuracy by 0.038 and Macro-F1 by 0.050 and increases Recall (occ = 2) by 0.013. These results suggest that EGSA and WCE provide complementary benefits for robust occlusion recognition under greenhouse conditions. In greenhouse harvesting, severely occluded fruits typically imply limited graspable area and a higher likelihood of failed or damaging attempts; improving recognition of occ = 2 therefore helps the RGHD gate reject high-risk candidates more reliably, enhancing decision safety rather than only boosting Macro-F1.

3.5. Performance Comparison of Harvestability Gating Under Different Policies

To characterize the safety–efficiency trade-off, we compared two operating policies (Safety-First and Efficiency-First) on candidates matched with IoU ≥ 0.5. All decision statistics are computed under the proxy definition in Section 3.2.2. Table 8 reports the policy operating characteristics using Precision/Recall/F1 and the two risk-oriented rates (FPR and FNR).

The two policies represent distinct operating points. The Safety-First policy yields a lower proxy unsafe-acceptance rate (FPR = 4.4%), corresponding to a more conservative acceptance behavior, while the Efficiency-First policy yields a lower proxy missed-opportunity rate (FNR = 4.7%) and higher acceptance recall (91.0%), corresponding to a more aggressive acceptance behavior. This comparison highlights an explicit and tunable safety–efficiency trade-off under the adopted proxy: Safety-First prioritizes avoiding proxy high-risk acceptance, whereas Efficiency-First prioritizes throughput at the cost of a higher chance of accepting proxy high-risk candidates (under our occlusion/overlap/scale-based proxy definition). In subsequent experiments, Safety-First is used as the default setting and Efficiency-First is reported as a reference operating mode. This comparison reflects a realistic trade-off faced in greenhouse harvesting practice: conservative strategies prioritize operational safety and crop protection, whereas aggressive strategies favor throughput at the expense of increased harvesting risk. RGHD explicitly exposes this trade-off, enabling strategy selection according to production priorities.

3.6. Consistency with Manual Judgment

To assess practical reliability, we compared RGHDs with manual expert judgments on 2026 test targets. Under the Safety-First policy, the system produced 953 harvest and 1073 skip decisions, with an overall agreement of 92.7%. Table 9 shows an asymmetric error pattern: 110 cases were classified as skip when experts judged harvestable, whereas 38 cases were classified as harvest when experts judged unsafe. This imbalance indicates that, under Safety-First, RGHD tends to err on the side of skipping uncertain targets, thereby reducing the likelihood of collision-prone attempts in unstructured greenhouse scenes.

3.7. Crowding-Stratified Decision Behavior

We further examine the Safety-First gate under different crowding levels using an overlap-density index computed from detection boxes. For each frame, the crowding index is defined as the mean of the per-candidate maximum IoU values, and frames are grouped into Low/Medium/High strata by tertiles of this index (i.e., the bottom, middle, and top third of frames ranked by overlap density). Table 10 summarizes the gate’s operating behavior across the three strata.

With increasing crowding, overlap becomes more frequent and the Safety-First gate exhibits more conservative acceptance, as reflected by a decreasing accept rate from Low to High crowding. This provides an interpretable, scene-consistent characterization of the gate’s behavior under clustered canopies. Figure 11 provides representative qualitative examples under different crowding levels, illustrating accepted and rejected candidates produced by the gate.

3.8. System Integration and Prototype Verification

3.8.1. Validation Environment and Workflow

To verify the engineering feasibility of the proposed RGHD framework, we implemented a prototype and deployed the complete perception-to-decision pipeline on a laptop computer. The prototype takes a single RGB frame as input and processes greenhouse melon images sequentially. The end-to-end workflow includes fruit detection, ROI construction, occlusion-level recognition, overlap-risk estimation, multi-source risk fusion, and harvestability output. For verification and analysis, the prototype visualizes intermediate and final results as on-image overlays and exports the corresponding outputs (e.g., predicted labels and decision results) for subsequent error-case inspection. The system interface is shown in Figure 12.

3.8.2. Prototype Implementation and Integration

The prototype was developed in Python and integrates the perception and decision modules using PyTorch, YOLO11n based on the Ultralytics repository (v8.4.3), and OpenCV (opencv-python v4.10.0.84). The functional modules are summarized as follows:

Image input: sequential image loading and queue management;
Fruit detection: localizes fruit targets using YOLO11n;
ROI construction: ROI cropping from detected boxes with a fixed outward margin;
Occlusion perception: occlusion-level prediction using the lightweight CNN classifier;
Overlap risk: overlap level inference from pairwise box relationships;
Gated decision: rule-based fusion of occlusion, overlap, and scale-related cues to output the harvestability decision;
Visualization and export: overlay rendering and result export for qualitative inspection and error analysis.

3.8.3. Prototype Verification Results

Prototype verification shows that the system can stably complete the full workflow of “detection–occlusion perception–risk fusion–harvestability decision” on the target platform and generate consistent visualization overlays and exportable results. This suggests functional completeness and integration feasibility at the prototype-validation stage, providing a practical implementation basis for subsequent deployment and policy tuning in greenhouse melon harvesting scenarios.

4. Discussion

4.1. The Necessity of Risk-Gated Reasoning

Detection alone does not indicate that a fruit is safe or feasible to pick in greenhouse harvesting. Muskmelons are often partially occluded by vines/leaves or located in crowded clusters, which increases collision risk if actions are triggered from bounding boxes alone. RGHD, therefore, treats occlusion, crowding, and target scale as explicit risk cues and approves a picking action only when the estimated risk satisfies the Safety-First criterion [30]. This reframes perception outputs as harvestability screening aligned with manipulation constraints.

4.2. Decision Support Under Deployment Constraints

RGHD targets edge deployment and avoids computationally intensive 3D point-cloud segmentation. Occlusion risk is estimated from EGSA-enhanced boundary cues, and clustering risk is approximated using a lightweight 2D IoU proxy. The same pipeline supports “Safety-First” and “Efficiency-First” modes without retraining.

We also tested a single-stage alternative that labels boxes as “pickable” or “unpickable”. In RGHD, occlusion is graded on cropped ROIs by an auxiliary CNN, leveraging leaf–fruit boundary patterns that box-level scores and post-processing do not explicitly encode. The single-stage YOLO baseline showed poorer safety-related behavior, with more errors on unpickable targets and reduced precision on pickable ones. A plausible reason is a learning trade-off: optimizing global box regression can weaken sensitivity to fine occlusion cues. Separating candidate generation from risk assessment helps keep false acceptances low (e.g., 4.4% under Safety-First), supporting collision avoidance.

4.3. Practical Deployment and System Integration

This work is presented as a risk-aware decision layer on top of detection, rather than a complete on-robot deployment study. In a practical harvesting system, RGHD can be integrated into an ROS-based pipeline as an intermediate filter within a standard “Perception–Decision–Action” loop. Specifically, YOLO11n first detects candidate fruits and outputs their bounding boxes. Each candidate is then cropped to an ROI and evaluated by the ShuffleNetV2 risk module (occlusion grading and overlap/scale-based risk cues). Only candidates classified as Harvestable under the selected policy are forwarded to the motion planner as target coordinates. Candidates rated “Unsafe” are skipped, and the system proceeds to the next target; they are not permanently discarded and can be re-evaluated in subsequent harvesting cycles.

In terms of runtime, the sequential detection–crop–classification pipeline introduces latency that increases with the number of detected candidates. The measured processing time is ~6.85 ms per ROI; for a typical scene containing 4–5 melons, the added delay is ~27–34 ms. Although the current evaluation was conducted on a workstation, this sub-100 ms overhead is small relative to the seconds-scale execution time of manipulator motions during greenhouse harvesting.

4.4. Limitations and Future Work

First, validation used smartphone imagery; deployment should be evaluated with robotic-grade industrial sensors to assess domain shift. Second, the 2D overlap proxy simplifies the 3D workspace; depth cues or multi-view geometry [31,32] may better capture approach/grasp constraints. Finally, the current study validates perception and decision-making offline; the next step is on-robot testing to quantify end-to-end picking success, collisions, and crop safety in greenhouse trials.

5. Conclusions

This study presents a Risk-Gated Harvestability Decision (RGHD) framework for greenhouse muskmelon harvesting under foliage occlusion and fruit crowding. RGHD separates fruit discovery from harvestability judgment. A lightweight YOLO11n detector is used to locate fruits, and an EGSA-enhanced ShuffleNetV2 classifier estimates occlusion. Harvestability is then decided with a configurable gate that combines occlusion level, overlap-based interference (IoU), and a scale cue, enabling Safety-First and Efficiency-First operating modes.

YOLO11n achieved 75.8% mAP50–95 at 113.3 FPS. Under Safety-First, the proxy unsafe-acceptance rate (FPR) decreased from 8.7% to 4.4% (−49.4%), while decision precision remained 88.0%; Efficiency-First increased acceptance with 91.0% recall. These results suggest that RGHD can reduce risky picking decisions and help limit fruit and vine damage in greenhouse production, and the same idea can be applied to other crops with similar occlusion and crowding. Future work will integrate depth or multi-view cues and verify decisions with real picking outcomes.

Author Contributions

Conceptualization, S.S. and G.Z.; methodology, S.S. and H.Q.; software, S.S., H.Q. and S.W.; validation, H.Q., S.W. and H.Y.; formal analysis, S.S. and H.Q.; investigation, S.S., H.Q. and Y.H.; resources, H.Q., S.W., H.Y. and Y.H.; data curation, S.S. and Y.H.; writing—original draft preparation, S.S.; writing—review and editing, G.Z., H.Q. and S.W.; visualization, S.S.; supervision, G.Z. and H.Q.; project administration, H.Y.; funding acquisition, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Department of Science and Technology of Shandong Province, grant numbers 2022CXGC020701 and YDZX2024020.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the first author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, C.; Ni, Y.; Han, T.; Ji, W. Research on Target Detection of Muskmelon Mechanized Picking. Agric. Eng. 2023, 13, 56–60. (In Chinese) [Google Scholar] [CrossRef]
Meng, F.-X.; Shi, L.-J. Key Technologies for Quality and Efficiency Improvement of Protected Muskmelon. Agric. Eng. Technol. 2024, 44, 80–81. (In Chinese) [Google Scholar] [CrossRef]
Xiao, Z.; Xie, Y.; Niu, Y.; Zhu, J. Identification of Key Aroma Compounds in Chinese Muskmelon and Their Formation Mechanisms. Eur. Food Res. Technol. 2021, 247, 777–795. [Google Scholar] [CrossRef]
Marinoudi, V.; Sørensen, C.G.; Pearson, S.; Bochtis, D. Robotics and Labour in Agriculture: A Context Consideration. Biosyst. Eng. 2019, 184, 111–121. [Google Scholar] [CrossRef]
Bac, C.W.; van Henten, E.J.; Hemming, J.; Edan, Y. Harvesting Robots for High-Value Crops: State-of-the-Art Review and Challenges Ahead. J. Field Robot. 2014, 31, 888–911. [Google Scholar] [CrossRef]
Chen, Z.; Lei, X.; Yuan, Q.; Qi, Y.; Ma, Z.; Qian, S.; Lyu, X. Key Technologies for Autonomous Fruit- and Vegetable-Picking Robots: A Review. Agronomy 2024, 14, 2233. [Google Scholar] [CrossRef]
Huang, Y.; Xu, S.; Chen, H.; Li, G.; Dong, H.; Yu, J.; Zhang, X.; Chen, R. A review of visual perception technology for intelligent fruit harvesting robots. Front. Plant Sci. 2025, 16, 1646871. [Google Scholar] [CrossRef]
Zhang, J.; Kang, N.; Qu, Q.; Zhou, L.; Zhang, H. Automatic fruit picking technology: A comprehensive review of research advances. Artif. Intell. Rev. 2024, 57, 54. [Google Scholar] [CrossRef]
Droukas, L.; Doulgeri, Z.; Tsakiridis, N.L.; Triantafyllou, D.; Kleitsiotis, I.; Mariolis, I.; Giakoumis, D.; Tzovaras, D.; Kateris, D.; Bochtis, D. A Survey of Robotic Harvesting Systems and Enabling Technologies. J. Intell. Robot. Syst. 2023, 107, 21. [Google Scholar] [CrossRef]
Fountas, S.; Mylonas, N.; Malounas, I.; Rodias, E.; Santos, C.H.; Pekkeriet, E. Agricultural Robotics for Field Operations. Sensors 2020, 20, 2672. [Google Scholar] [CrossRef]
Oliveira, L.F.P.; Moreira, A.P.; Silva, M.F. Advances in Agriculture Robotics: A State-of-the-Art Review. Robotics 2021, 10, 52. [Google Scholar] [CrossRef]
Zhao, Y.; Gong, L.; Huang, Y.; Liu, C. Robust Tomato Recognition for Robotic Harvesting Using Feature Images. Sensors 2016, 16, 173. [Google Scholar] [CrossRef]
Zhang, Q.; Chen, J.; Li, B.; Xu, C. RGB–D Fusion-Based Recognition and Localization of Tomato Cluster Picking Points. Trans. Chin. Soc. Agric. Eng. 2021, 37, 143–152. (In Chinese) [Google Scholar] [CrossRef]
Fan, X.; Zhang, Y.; Zhou, S.; Ren, M.; Wang, Y.; Chai, X. Recognition and Position Method of Tomato Picking Robot Based on Improved YOLOv8s and RGB–D Information Fusion. Trans. Chin. Soc. Agric. Eng. 2025, 41, 106–116. (In Chinese) [Google Scholar] [CrossRef]
Wang, Z.; Wang, J.; Wang, X.; Shi, J.; Bai, X.; Zhao, Y. Lightweight Real-time Apple Detection Method Based on Improved YOLO v4. Trans. Chin. Soc. Agric. Mach. 2022, 53, 294–302. (In Chinese) [Google Scholar] [CrossRef]
Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
Wang, A.; Qian, W.; Li, A.; Xu, Y.; Hu, J.; Xie, Y.; Zhang, L. NVW-YOLOv8s: An improved YOLOv8s network for real-time detection and segmentation of tomato fruits at different ripeness stages. Comput. Electron. Agric. 2024, 219, 108833. [Google Scholar] [CrossRef]
Luo, Z.; Yang, W.; Yuan, Y.; Gou, R.; Li, X. Semantic segmentation of agricultural images: A survey. Inf. Process. Agric. 2024, 11, 172–186. [Google Scholar] [CrossRef]
Ma, B.; Xu, J.; Liu, R.; Mu, J.; Li, B.; Xie, R.; Liu, S.; Hu, X.; Zheng, Y.; Zhang, H.; et al. MDAS-YOLO: A Lightweight Adaptive Framework for Multi-Scale and Dense Pest Detection in Apple Orchards. Horticulturae 2025, 11, 1273. [Google Scholar] [CrossRef]
Ma, B.; Wu, Z.; Ge, Y.; Chen, B.; Lin, J.; Zhang, H.; Xia, H. A Lightweight Segmentation Model Method for Marigold Picking Point Localization. Horticulturae 2026, 12, 97. [Google Scholar] [CrossRef]
Lang, Y.; Zhang, Y.; Sun, T.; Chai, X.; Zhang, N. Digital twin-driven system for efficient tomato harvesting in greenhouses. Comput. Electron. Agric. 2025, 236, 110451. [Google Scholar] [CrossRef]
Sun, T.; Zhang, W.; Gao, X.; Zhang, W.; Li, N.; Miao, Z. Efficient occlusion avoidance based on active deep sensing for harvesting robots. Comput. Electron. Agric. 2024, 225, 109360. [Google Scholar] [CrossRef]
Yi, T.; Zhang, D.; Luo, L.; Wang, Y.; Liu, B. View Planning for Grape Harvesting Based on Self-Supervised Deep Reinforcement Learning under Occlusion. Comput. Electron. Agric. 2025, 239, 110913. [Google Scholar] [CrossRef]
Dong, L.; Zhu, L.; Zhao, B.; Wang, R.; Ni, J.; Liu, S.; Chen, K.; Cui, X.; Zhou, L. Semantic segmentation-based observation pose estimation method for tomato harvesting robots. Comput. Electron. Agric. 2025, 230, 109895. [Google Scholar] [CrossRef]
He, Z.; Liu, Z.; Zhou, Z.; Karkee, M.; Zhang, Q. Improving picking efficiency under occlusion: Design, development, and field evaluation of an innovative robotic strawberry harvester. Comput. Electron. Agric. 2025, 237, 110684. [Google Scholar] [CrossRef]
Zhang, J.; Xie, J.; Zhang, F.; Gao, J.; Yang, C.; Song, C.; Rao, W.; Zhang, Y. Greenhouse tomato detection and pose classification algorithm based on improved YOLOv5. Comput. Electron. Agric. 2024, 216, 108519. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018; pp. 122–138. [Google Scholar] [CrossRef]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 658–666. [Google Scholar] [CrossRef]
Sun, T.; Zhang, W.; Miao, Z.; Zhang, Z.; Li, N. Object Localization Methodology in Occluded Agricultural Environments through Deep Learning and Active Sensing. Comput. Electron. Agric. 2023, 212, 108141. [Google Scholar] [CrossRef]
Ci, J.; Wang, X.; Rapado-Rincón, D.; Burusa, A.K.; Kootstra, G. 3D pose estimation of tomato peduncle nodes using deep keypoint detection and point cloud. Biosyst. Eng. 2024, 243, 57–69. [Google Scholar] [CrossRef]
Liu, Z.; Abeyrathna, R.M.R.D.; Sampurno, R.M.; Nakaguchi, V.M.; Ahamed, T. A New Occlusion-Avoidance and Dual-View Fruit Localization Method with a 6-DoF Manipulator for Orchard Harvesting. Comput. Electron. Agric. 2025, 237, 110634. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed RGHD framework.

Figure 2. Typical imaging scenarios in the constructed dataset: (a) front-lighting conditions; (b) back-lighting conditions; (c) fruit overlap; (d) leaf occlusion.

Figure 3. Network architecture of YOLO11n.

Figure 4. ROI examples illustrating occlusion levels (occ: 0–2, top) and fruit overlap levels (ovl: 0–2, bottom), with severity increasing from left to right.

Figure 5. EGSA (Edge-Guided Spatial Attention) module.

Figure 6. IoU-based fruit overlap risk estimation.

Figure 7. Risk-gated harvestability decision (RGHD).

Figure 8. Training and validation curves of the YOLO11n model.

Figure 9. Grad-CAM visualizations of the EGSA classifier.

Figure 10. Training loss curves for different loss functions.

Figure 11. Qualitative examples of harvestability gating under different crowding levels. From left to right: input RGB image, YOLO11n candidates, Safety-First decisions, and Efficiency-First decisions. Blue boxes denote YOLO11n candidate detections (before gating). Green/red boxes denote accepted/rejected candidates. The shown examples present representative cases in greenhouse muskmelon harvesting scenes: (a) oversized boxes, (b) leaf occlusion, (c) fruit overlap, and (d) small targets.

Figure 12. Interface of the mobile prototype system for RGHD-based harvestability decision.

Table 1. Occlusion and overlap level distribution.

Dataset	Image Number	Number of Objects	Occ (Level)			Ovl (Level)
Dataset	Image Number	Number of Objects	0	1	2	0	1	2
Training set	3972	12,006	36%	23.1%	40.9%	61.0%	24.3%	14.7%
Validation set	1128	3156	38.4%	19%	42.6%	60.0%	27.6%	12.4%
Test set	570	2064	34.2%	21%	44.8%	49.1%	32.3%	18.0%

Note: Percentages are computed over the total number of fruit instances.

Table 2. Stage I and Stage II Model Training Hyperparameters.

Parameter Category	YOLO11n (Stage I)	Occlusion CNN (Stage II)
Initial learning rate	0.01	5 × 10⁻⁴
Optimizer weight decay	0.0005	0.01
Optimizer	SGD	AdamW
Training cycle	200	100
Image size	640 × 640	224 × 224
Batch	16	16
Workers	8	8

Table 3. Comparison results of different detection models.

Detector	mAP@0.5	mAP@0.5:0.95	Precision	Recall@0.5	FPS	Model Size (MB)
RT-DETR(R18)	92.3%	76.3%	89.3%	88.6%	44.4	19.9
YOLOv5n	93.6%	75.8%	92.2%	86.6%	86.4	3.73
YOLOv8n	93.2%	74.6%	90.5%	86.8%	97.0	5.98
YOLOv10n	91.0%	73.2%	88.0%	82.8%	92.8	5.51
YOLO11n	94.8%	75.8%	92.5%	89.9%	113.3	5.24
YOLO12n	92.0%	72.2%	94.8%	79.7%	69.6	5.29
YOLO26n	89.3%	69.1%	91.5%	80.2%	75.1	5.14

Table 4. Comparison of different lightweight backbones for occlusion classification.

Model	Acc	Macro-F1	Recall (Occ = 0)	Recall (Occ = 1)	Recall (Occ = 2)	Latency per ROI (ms)
MobileNetV3-Small	84.8%	80.8%	93.8%	69.7%	95.9%	6.19
ShuffleNetV2	85.4%	82.8%	93.1%	67.4%	96.2%	6.70
EfficientNet-B0	84.8%	81.5%	91.7%	66.7%	95.6%	8.97
ResNet18	84.5%	81.2%	92.4%	67.0%	95.2%	4.14

Table 5. Effect of different attention mechanisms on occlusion recognition performance.

Model	Attention	Acc	Macro-F1	Recall (Occ = 2)	Latency per ROI (ms)
ShuffleNetV2	None	85.4%	82.8%	96.2%	6.70
ShuffleNetV2	SE	86.1%	83.5%	96.5%	6.91
ShuffleNetV2	CBAM	86.5%	84.0%	96.8%	6.98
ShuffleNetV2	EGSA	87.5%	85.2%	97.0%	6.85

Table 6. Effect of different loss functions on occlusion classification performance.

Loss Function	Acc	Macro-F1	Recall (Occ = 0)	Recall (Occ = 1)	Recall (Occ = 2)
CE	85.4%	82.8%	93.1%	67.4%	96.2%
Weighted CE	86.9%	84.5%	93.7%	68.6%	97.0%
Focal Loss	86.2%	83.8%	96.5%	66.3%	96.6%

Table 7. Ablation results of EGSA and WCE.

Backbone	EGSA	WCE	Acc	Macro-F1	Recall (Occ = 0)	Recall (Occ = 1)	Recall (Occ = 2)
ShuffleNetV2	–	–	85.4%	82.8%	93.1%	67.4%	96.2%
ShuffleNetV2	–	√	86.9%	84.5%	93.7%	68.6%	97.0%
ShuffleNetV2	√	–	87.5%	85.2%	93.7%	68.6%	97.0%
ShuffleNetV2	√	√	89.2%	87.8%	94.1%	69.5%	97.5%

Table 8. Policy comparison of harvestability gating.

Policy	FPR	FNR	Precision	Recall	F1
Safety-First	4.4%	20.0%	88.0%	61.3%	72.2%
Efficiency-First	8.7%	4.7%	84.4%	91.0%	87.6%

Table 9. Decision consistency between manual judgment and RGHD.

Manual Judgment	RGHD Harvest	RGHD Skip	Total
Safe to harvest	915	110	1025
Unsafe	38	963	1001
Total	953	1073	2026

Table 10. Policy behavior under crowding strata.

Crowding Stratum	Frames	Candidates/Frame	Accept Rate
Low	180	1.583	73.7%
Medium	135	4.148	45.5%
High	155	5.323	37.6%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Song, S.; Qu, H.; Wang, S.; Yang, H.; Hao, Y.; Zhang, G. RGHD: A Risk-Gated Harvestability Decision Framework for Occlusion-Aware Greenhouse Melon Harvesting. Agriculture 2026, 16, 589. https://doi.org/10.3390/agriculture16050589

AMA Style

Song S, Qu H, Wang S, Yang H, Hao Y, Zhang G. RGHD: A Risk-Gated Harvestability Decision Framework for Occlusion-Aware Greenhouse Melon Harvesting. Agriculture. 2026; 16(5):589. https://doi.org/10.3390/agriculture16050589

Chicago/Turabian Style

Song, Shijun, Huixing Qu, Shaowei Wang, Huawei Yang, Yongbing Hao, and Guohai Zhang. 2026. "RGHD: A Risk-Gated Harvestability Decision Framework for Occlusion-Aware Greenhouse Melon Harvesting" Agriculture 16, no. 5: 589. https://doi.org/10.3390/agriculture16050589

APA Style

Song, S., Qu, H., Wang, S., Yang, H., Hao, Y., & Zhang, G. (2026). RGHD: A Risk-Gated Harvestability Decision Framework for Occlusion-Aware Greenhouse Melon Harvesting. Agriculture, 16(5), 589. https://doi.org/10.3390/agriculture16050589

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RGHD: A Risk-Gated Harvestability Decision Framework for Occlusion-Aware Greenhouse Melon Harvesting

Abstract

1. Introduction

2. Materials and Methods

2.1. Framework Overview

2.2. Data Collection and Dataset Construction

2.2.1. Labeling Scheme

2.2.2. Definition of Fruit Overlap State and IoU Inference

2.2.3. Dataset Statistics

2.3. Stage I: Fruit Detection and Stable Candidate Selection

2.4. Stage II: Fruit State Perception

2.4.1. ROI Generation

2.4.2. Occlusion Classification Network

2.4.3. Edge-Guided Spatial Attention (EGSA)

2.4.4. Occlusion Classification with Weighted Cross-Entropy

2.5. Fruit Overlap Risk Estimation

2.6. Risk-Gated Harvestability Decision (RGHD)

3. Results

3.1. Experimental Setup and Training Configuration

3.2. Evaluation Indicators

3.2.1. Object Detection Evaluation Metrics

3.2.2. Harvestability Decision Evaluation Metrics

3.3. Performance Comparison of Different Fruit Detection Models

3.4. Occlusion Classification Network Design and Performance Comparison

3.4.1. Baseline Backbone Comparison: Architecture–Accuracy–Speed Trade-Off

3.4.2. Effect of Attention Mechanisms on Occlusion Recognition Performance

3.4.3. Effect of Different Loss Functions on Occlusion Classification Performance

3.4.4. Ablation Study of the Occlusion Classification Model

3.5. Performance Comparison of Harvestability Gating Under Different Policies

3.6. Consistency with Manual Judgment

3.7. Crowding-Stratified Decision Behavior

3.8. System Integration and Prototype Verification

3.8.1. Validation Environment and Workflow

3.8.2. Prototype Implementation and Integration

3.8.3. Prototype Verification Results

4. Discussion

4.1. The Necessity of Risk-Gated Reasoning

4.2. Decision Support Under Deployment Constraints

4.3. Practical Deployment and System Integration

4.4. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI