Abstract
Traditional segmentation methods are slow and rely on manual annotations, which are labor-intensive. To address these limitations, we propose YOLO-SAM AgriScan, a unified framework that combines the fast object detection capabilities of YOLOv11 with the zero-shot segmentation power of the Segment Anything Model 2 (SAM2). Our approach adopts a hybrid paradigm for on-plant ripe strawberry segmentation, wherein YOLOv11 is fine-tuned using a few-shot learning strategy with minimal annotated samples, and SAM2 performs mask generation without additional supervision. This architecture eliminates the bottleneck of pixel-wise manual annotation and enables the scalable and efficient segmentation of strawberries in both controlled and natural farm environments. Experimental evaluations on two datasets, a custom-collected dataset and a publicly available benchmark, demonstrate strong detection and segmentation performance in both full-data and data-constrained scenarios. The proposed framework achieved a mean Dice score of 0.95 and an IoU of 0.93 on our collected dataset and maintained competitive performance on public data (Dice: 0.95, IoU: 0.92), demonstrating its robustness, generalizability, and practical relevance in real-world agricultural settings. Our results highlight the potential of combining few-shot detection and zero-shot segmentation to accelerate the development of annotation-light, intelligent phenotyping systems.
Keywords:
precision agriculture; strawberries; few-shot; detection; YOLO; SAM; zero-shot; segmentation