Abstract
Accurate fig detection in complex environments is a significant challenge. Small targets, occlusion, and similar backgrounds are considered the main obstacles in intelligent harvesting. To address this, this study proposes Fig-YOLO, an improved YOLOv11n-based detection algorithm with multiple targeted architectural innovations. First, a Spatial–Frequency Selective Convolution (SFSConv) module is introduced into the backbone to replace conventional convolution, enabling joint modeling of spatial structures and frequency-domain texture features for more effective discrimination of figs from visually similar backgrounds. Second, an enhanced bi-branch attention mechanism (EBAM) is incorporated at the network’s terminal stage to strengthen the representation of key regions and improve robustness under severe occlusion. Third, a multi-branch dynamic sampling convolution (MFCV) module replaces the original C3k2 structure in the feature fusion stage, capturing figs of varying sizes through dynamic sampling and residual deep-feature fusion. Experimental results show that Fig-YOLO achieves precision, recall, and mAP@0.5 of 89.2%, 78.4%, and 87.3%, respectively, substantially outperforming the baseline YOLOv11n. Further evaluation confirms that the model maintains stable performance across varying fruit sizes, occlusion levels, lighting conditions, and data sources. Fig-YOLO’s innovations offer solid support for intelligent orchard monitoring and harvesting.