Abstract
The accurate detection and non-contact weight estimation of zucchini fruits are crucial for automated harvesting systems. This study presents a novel weakly supervised oriented object detection method for zucchini fruit recognition and weight prediction in complex greenhouse environments. Our approach, termed H2RBox-v2-SF, introduces three key enhancements to the original H2RBox-v2 model. First, the Swin Transformer V2 (SwinV2) is adopted as the backbone network to replace 50-layer Residual Networks (ResNet-50), significantly strengthening feature extraction capabilities. Second, the Bi-directional Feature Pyramid Network (BiFPN) is employed instead of the original Feature Pyramid Network (FPN) to achieve more efficient multi-scale feature fusion. Third, the FPDIoU loss is introduced to replace the CircumIoU loss, enhancing the accuracy and efficiency of bounding box regression. Furthermore, we propose a Morphology-based Fruit Weight Estimation (MFWE) algorithm that leverages depth information for non-contact weight estimation. Experimental results demonstrate that the improved model achieves an AP@0.75 of 87.8%, a precision of 69.8%, and a recall of 91.5%, representing improvements of 9.6%, 5.0%, and 4.7% respectively over the original model. Additionally, the weight estimation achieves a mean absolute error (MAE) of 55.05 g, a coefficient of determination ( ) of 0.899, and a root mean square error (RMSE) of 63.59 g. The proposed method achieves high accuracy for ‘Jinghu No. 43’ zucchini fruit detection and weight estimation under greenhouse conditions, offering an effective technical solution for automated zucchini harvesting.