Figure 1.
Cotton leaf images under different conditions. (a) Densely occluded leaf, (b) overlapping leaves within the canopy, (c) isolated and unobstructed leaf from a single plant, and (d) leaf with dust deposition.
Figure 1.
Cotton leaf images under different conditions. (a) Densely occluded leaf, (b) overlapping leaves within the canopy, (c) isolated and unobstructed leaf from a single plant, and (d) leaf with dust deposition.
Figure 2.
Annotation interface of the X-Anylabeling platform integrating the Segment Anything 2.1 (Large) model. The graphical user interface shows the vertical toolbar, which includes (from top to bottom) icons for: file operations (open, save); image navigation; drawing tools (polygon, rectangle, circle, point, etc.); annotation editing (select, delete, undo); view control; and automated annotation.
Figure 2.
Annotation interface of the X-Anylabeling platform integrating the Segment Anything 2.1 (Large) model. The graphical user interface shows the vertical toolbar, which includes (from top to bottom) icons for: file operations (open, save); image navigation; drawing tools (polygon, rectangle, circle, point, etc.); annotation editing (select, delete, undo); view control; and automated annotation.
Figure 3.
SAM-annotated cotton leaf image. (a) Raw field cotton leaf image; (b) leaf-annotated image with background removed.
Figure 3.
SAM-annotated cotton leaf image. (a) Raw field cotton leaf image; (b) leaf-annotated image with background removed.
Figure 4.
Annotation results from the purely manual workflow. (a) Densely occluded leaf, (b) overlapping leaves within the canopy, (c) isolated and unobstructed leaf, (d) leaf with dust deposition. The green dashed line is part of the annotation software’s interface captured in the screenshot.
Figure 4.
Annotation results from the purely manual workflow. (a) Densely occluded leaf, (b) overlapping leaves within the canopy, (c) isolated and unobstructed leaf, (d) leaf with dust deposition. The green dashed line is part of the annotation software’s interface captured in the screenshot.
Figure 5.
Annotation results from the SAM-assisted workflow. (a) Densely occluded leaf, (b) overlapping leaves within the canopy, (c) isolated and unobstructed leaf, (d) leaf with dust deposition. The green dashed line is part of the annotation software’s interface captured in the screenshot.
Figure 5.
Annotation results from the SAM-assisted workflow. (a) Densely occluded leaf, (b) overlapping leaves within the canopy, (c) isolated and unobstructed leaf, (d) leaf with dust deposition. The green dashed line is part of the annotation software’s interface captured in the screenshot.
Figure 6.
Examples of data augmentation. Top row: Densely occluded leaves—(a) color and growth state adjustment, (b) geometric transformation, (c) mirroring combined with noise addition, (d) color variation. Second row: Overlapping canopy leaves—(e) color and growth state adjustment, (f) geometric transformation, (g) mirroring combined with noise addition, and (h) color variation. Third row: Unobstructed single-plant leaves—(i) color and growth state adjustment, (j) geometric transformation, (k) mirroring combined with noise addition, (l) color variation. Bottom row: Dust-adhered leaves—(m) color and growth state adjustment, (n) geometric transformation, (o) mirroring combined with noise addition, (p) color variation.
Figure 6.
Examples of data augmentation. Top row: Densely occluded leaves—(a) color and growth state adjustment, (b) geometric transformation, (c) mirroring combined with noise addition, (d) color variation. Second row: Overlapping canopy leaves—(e) color and growth state adjustment, (f) geometric transformation, (g) mirroring combined with noise addition, and (h) color variation. Third row: Unobstructed single-plant leaves—(i) color and growth state adjustment, (j) geometric transformation, (k) mirroring combined with noise addition, (l) color variation. Bottom row: Dust-adhered leaves—(m) color and growth state adjustment, (n) geometric transformation, (o) mirroring combined with noise addition, (p) color variation.
Figure 7.
Examples from the cross-crop evaluation dataset. (a) Pure cotton leaf images; (b) pure soybean leaf images; (c) mixed cotton-soybean leaf images.
Figure 7.
Examples from the cross-crop evaluation dataset. (a) Pure cotton leaf images; (b) pure soybean leaf images; (c) mixed cotton-soybean leaf images.
Figure 8.
Architecture of YOLOv11-seg model.
Figure 8.
Architecture of YOLOv11-seg model.
Figure 9.
Architecture of the GS-BiFPN-YOLO model.
Figure 9.
Architecture of the GS-BiFPN-YOLO model.
Figure 10.
Structural Diagram of the GSConv Module.
Figure 10.
Structural Diagram of the GSConv Module.
Figure 11.
Architectural Overview of the Bidirectional Feature Pyramid Network (BiFPN).
Figure 11.
Architectural Overview of the Bidirectional Feature Pyramid Network (BiFPN).
Figure 12.
Schematic of the Convolutional Block Attention Module (CBAM).
Figure 12.
Schematic of the Convolutional Block Attention Module (CBAM).
Figure 13.
Training Results of GS-BiFPN-YOLO Model. The data points on the blue line represent the model performance values after each training epoch, and the orange markers represent the fitted values after curve smoothing.
Figure 13.
Training Results of GS-BiFPN-YOLO Model. The data points on the blue line represent the model performance values after each training epoch, and the orange markers represent the fitted values after curve smoothing.
Figure 14.
This figure demonstrates a comparative analysis of segmentation performance between YOLOv11n-seg and GS-BiFPN-YOLO. (a) Dense occluded leaf images segmented via YOLOv11n-seg. (b) Dense occluded leaf images segmented via GS-BiFPN-YOLO. (c) Overlapping canopy leaf images segmented with YOLOv11n-seg. (d) Overlapping canopy leaf images segmented with GS-BiFPN-YOLO. (e) Single, unobstructed leaf images processed by YOLOv11n-seg. (f) Single, unobstructed leaf images processed by GS-BiFPN-YOLO. (g) Images of dusty leaves segmented via YOLOv11n-seg. (h) Dust-adhered leaf images segmented with GS-BiFPN-YOLO.
Figure 14.
This figure demonstrates a comparative analysis of segmentation performance between YOLOv11n-seg and GS-BiFPN-YOLO. (a) Dense occluded leaf images segmented via YOLOv11n-seg. (b) Dense occluded leaf images segmented via GS-BiFPN-YOLO. (c) Overlapping canopy leaf images segmented with YOLOv11n-seg. (d) Overlapping canopy leaf images segmented with GS-BiFPN-YOLO. (e) Single, unobstructed leaf images processed by YOLOv11n-seg. (f) Single, unobstructed leaf images processed by GS-BiFPN-YOLO. (g) Images of dusty leaves segmented via YOLOv11n-seg. (h) Dust-adhered leaf images segmented with GS-BiFPN-YOLO.
Figure 15.
Presents the segmentation results of the GS-BiFPN-YOLO model in various scenarios: (a) dense occlusion, (b) group overlapping, (c) a single plant without occlusion, and (d) dust deposition.
Figure 15.
Presents the segmentation results of the GS-BiFPN-YOLO model in various scenarios: (a) dense occlusion, (b) group overlapping, (c) a single plant without occlusion, and (d) dust deposition.
Figure 16.
Examples of failure cases in four challenging scenarios: (a) dense occlusion, (b) overlapping canopy, (c) isolated leaf, (d) dust deposition.
Figure 16.
Examples of failure cases in four challenging scenarios: (a) dense occlusion, (b) overlapping canopy, (c) isolated leaf, (d) dust deposition.
Figure 17.
Cross-crop segmentation results. (a) Pure cotton leaf segmentation; (b) pure soybean leaf segmentation; (c) mixed cotton-soybean segmentation. The model maintains high segmentation quality across different crop types with minimal confusion between species.
Figure 17.
Cross-crop segmentation results. (a) Pure cotton leaf segmentation; (b) pure soybean leaf segmentation; (c) mixed cotton-soybean segmentation. The model maintains high segmentation quality across different crop types with minimal confusion between species.
Figure 18.
Performance metrics and dataset composition across crop types. (A) Average confidence score for cotton, soybean, and mixed image datasets. (B) Average number of object detections per image for each dataset type. (C) Dataset composition pie chart, showing an equal distribution of images across the three types (one-third each). Percentages are rounded to one decimal place for presentation. (D) Zero-shot performance comparison, plotting average detection rate against average confidence for each dataset type.
Figure 18.
Performance metrics and dataset composition across crop types. (A) Average confidence score for cotton, soybean, and mixed image datasets. (B) Average number of object detections per image for each dataset type. (C) Dataset composition pie chart, showing an equal distribution of images across the three types (one-third each). Percentages are rounded to one decimal place for presentation. (D) Zero-shot performance comparison, plotting average detection rate against average confidence for each dataset type.
Table 1.
Quantitative comparison of annotation efficiency under different conditions.
Table 1.
Quantitative comparison of annotation efficiency under different conditions.
| ID | Condition | Manual Time (s) | SAM-Assisted Time (s) | Time Saved (s) | Time Saved (%) |
|---|
| 001 | Densely occluded leaf | 855.35 | 241.37 | 613.98 | 71.78% |
| 003 | Densely occluded leaf | 854.36 | 355.09 | 499.27 | 58.44% |
| 016 | Densely occluded leaf | 732.92 | 245.67 | 487.25 | 66.48% |
| Densely occluded (Avg.) | 814.21 | 280.71 | 533.50 | 65.55% |
| 041 | overlapping leaves within the canopy | 660.83 | 175.52 | 485.31 | 73.44% |
| 148 | overlapping leaves within the canopy | 1080.44 | 365.96 | 714.48 | 66.13% |
| 243 | overlapping leaves within the canopy | 772.49 | 243.84 | 528.65 | 68.44% |
| Canopy overlap (Avg.) | 837.92 | 261.77 | 576.15 | 69.33% |
| 481 | isolated and unobstructed leaf from a single plant | 209.11 | 30.54 | 178.57 | 85.40% |
| 569 | isolated and unobstructed leaf from a single plant | 295.92 | 45.57 | 250.35 | 84.60% |
| 570 | isolated and unobstructed leaf from a single plant | 307.31 | 41.95 | 265.36 | 86.36% |
| Isolated leaf (Avg.) | 270.78 | 39.35 | 231.43 | 85.45% |
| 678 | Leaf with dust deposition | 681.07 | 131.93 | 549.14 | 80.63% |
| 999 | leaf with dust deposition | 686.63 | 150.38 | 536.25 | 78.11% |
| 1000 | leaf with dust deposition | 784.34 | 90.59 | 693.75 | 88.45% |
| Dust deposition (Avg.) | 717.35 | 124.30 | 593.05 | 82.40% |
| Overall Average | 660.07 | 176.53 | 483.53 | 73.26% |
Table 2.
Statistical Summary of Cross-Crop Evaluation Dataset.
Table 2.
Statistical Summary of Cross-Crop Evaluation Dataset.
| Dataset Type | Number of Images | Average Leaves per Image | Average Leaf Area (Pixels) | Total Leaves |
|---|
| Pure Cotton | 10 | 7.9 ± 7.19 | 41,956 ± 23,795 | 79 |
| Pure Soybean | 10 | 3.5 ± 3.95 | 38,856 ± 29,919 | 35 |
| Mixed Crops | 10 | 3.9 ± 3.57 | 39,581 ± 25,951 | 39 |
| Total | 30 | 5.1 ± 5.41 | 40,131 ± 26,343 | 153 |
Table 3.
Model training parameter configuration.
Table 3.
Model training parameter configuration.
| Parameter Name | Value |
|---|
| Image | 640 |
| epoch | 300 |
| Batch | 16 |
| workers | 8 |
| Optimizer | SGD |
| Learning rate | 1 × 10−2 |
| Weight decay | 5 × 10−4 |
| momentum | 0.90 |
Table 4.
Model performance comparison.
Table 4.
Model performance comparison.
| Model | P (Box) | R | mAP50 | mAP50–95 | P (Mask) | R | mAP50 | mAP50–95 | F1 | FPS | GFLOPs |
|---|
| Mask R-CNN | 0.770 | 0.801 | 0.800 | 0.801 | 0.774 | 0.805 | 0.803 | 0.688 | 0.789 | 2.44 | 180 |
| YOLOv8n-seg | 0.929 | 0.929 | 0.974 | 0.898 | 0.927 | 0.930 | 0.972 | 0.865 | 0.930 | 370 | 10.1 |
| YOLOv9t-seg | 0.915 | 0.857 | 0.944 | 0.758 | 0.913 | 0.864 | 0.949 | 0.711 | 0.890 | 243 | 8.2 |
| YOLOv10n-seg | 0.970 | 0.925 | 0.973 | 0.836 | 0.974 | 0.929 | 0.978 | 0.804 | 0.950 | 294 | 9.8 |
| YOLOv11n-seg | 0.935 | 0.933 | 0.977 | 0.906 | 0.935 | 0.932 | 0.975 | 0.868 | 0.930 | 333 | 9.6 |
| YOLOv12n-seg | 0.940 | 0.933 | 0.970 | 0.824 | 0.942 | 0.935 | 0.977 | 0.800 | 0.940 | 270 | 9.6 |
| GS-BiFPN-YOLO | 0.951 | 0.972 | 0.988 | 0.940 | 0.951 | 0.972 | 0.988 | 0.904 | 0.962 | 322 | 9.0 |
Table 5.
Fine-grained segmentation performance comparison (↑ indicates higher is better, ↓ indicates lower is better).
Table 5.
Fine-grained segmentation performance comparison (↑ indicates higher is better, ↓ indicates lower is better).
| Model | Dice ↑ | IoU ↑ | HD (px) ↓ |
|---|
| Mask R-CNN | 0.762 | 0.718 | 36.5 |
| YOLOv8n-seg | 0.926 | 0.865 | 45.5 |
| YOLOv9t-seg | 0.902 | 0.874 | 42.1 |
| YOLOv10n-seg | 0.931 | 0.824 | 38.6 |
| YOLOv11n-seg | 0.925 | 0.863 | 42.0 |
| YOLOv12n-seg | 0.933 | 0.877 | 41.2 |
| GS-BiFPN-YOLO | 0.935 | 0.881 | 39.7 |
Table 6.
Effects of different modules on model performance.
Table 6.
Effects of different modules on model performance.
| Group | Gsconv | BiFPN | CBAM | P(B) | R | mAP50 | mAP50–95 | P(M) | R | mAP50 | mAP50–95 | FPS | GFLOPs |
|---|
| 1 | | | | 0.935 | 0.933 | 0.977 | 0.906 | 0.935 | 0.932 | 0.975 | 0.868 | 333 | 9.6 |
| 2 | √ | | | 0.934 | 0.932 | 0.975 | 0.897 | 0.934 | 0.932 | 0.974 | 0.861 | 333 | 8.8 |
| 3 | | √ | | 0.929 | 0.938 | 0.976 | 0.902 | 0.928 | 0.938 | 0.975 | 0.868 | 277 | 9.6 |
| 4 | √ | √ | | 0.938 | 0.937 | 0.977 | 0.905 | 0.939 | 0.937 | 0.977 | 0.871 | 323 | 8.9 |
| 5 | √ | √ | √ | 0.951 | 0.972 | 0.988 | 0.940 | 0.951 | 0.972 | 0.988 | 0.904 | 322 | 9.0 |
Table 7.
Stepwise analysis of incremental module contributions. All Δ values are calculated relative to the baseline YOLOv11n-seg (Group 1).
Table 7.
Stepwise analysis of incremental module contributions. All Δ values are calculated relative to the baseline YOLOv11n-seg (Group 1).
| Model Variant | ΔParams (M) | Params (M) | ΔGFLOPs | GFLOPs | ΔFPS | FPS | ΔmAP50–95 (Mask) | mAP50–95 (Mask) |
|---|
| YOLOv11n-seg | - | 2.84 | - | 9.6 | - | 333 | - | 0.868 |
| +GSconv | −0.33 | 2.51 | −0.8 | 8.8 | ±0.0 | 333 | −0.007 | 0.861 |
| +BiFPN | ±0.00 | 2.84 | ±0.0 | 9.6 | −56 | 277 | ±0.000 | 0.868 |
| GSConv + BiFPN | +0.01 | 2.52 | −0.5 | 8.9 | −10 | 323 | +0.003 | 0.871 |
| GS-BiFPN-YOLO | −0.32 | 2.52 | −0.6 | 9.0 | −11 | 322 | +0.036 | 0.904 |
Table 8.
Measured Inference Speed under Different Operational Settings.
Table 8.
Measured Inference Speed under Different Operational Settings.
| Resolution | Batch Size | Avg. FPS | Latency per Image (ms) | Note |
|---|
| 640 × 640 | 1 | 86.3 | 11.6 | Measured real-time streaming performance |
| 640 × 640 | 8 | 136.4 | 7.3 | Measured small-batch processing performance |
| 960 × 960 | 1 | 72.4 | 13.8 | Measured high-resolution real-time performance |
| 640 × 640 | 16 | 322.0 | 3.1 | Theoretical peak throughput (from Section 3.1) |
Table 9.
Cross-Crop Performance Summary.
Table 9.
Cross-Crop Performance Summary.
| Crop Type | Images | Avg. Confidence | Detections per Image | Avg. Mask Area (pixels) |
|---|
| Cotton | 10 | 0.669 ± 0.250 | 7.9 ± 7.19 | 41,956 ± 23,795 |
| Soybean | 10 | 0.535 ± 0.219 | 3.5 ± 3.95 | 38,856 ± 29,919 |
| Mixed | 10 | 0.645 ± 0.280 | 3.9 ± 3.57 | 39,581 ± 25,951 |