4.4.1. Multi-Crop Transfer Performance Analysis
To validate the generalization capability of FEGW-YOLO beyond strawberry detection, we conducted extensive transfer learning experiments on three additional fruit crops commonly encountered in precision agriculture: tomatoes, apples, and grapes. These crops exhibit distinct visual characteristics and pose detection challenges, enabling a comprehensive assessment of the model’s adaptability across diverse agricultural sensing scenarios.
Experimental Setup: For each crop type, we utilized publicly available datasets and applied minimal fine-tuning (20 epochs with frozen backbone) to adapt the strawberry-trained FEGW-YOLO model. The baseline YOLO-Agri model underwent identical fine-tuning procedures for fair comparison.
Table 12 summarizes the transfer learning performance across different crops.
Analysis of Transfer Performance:
Tomato Detection (92.7% mAP@0.5): Tomatoes present unique challenges due to their smooth texture and tendency to cluster in dense arrangements. The FEG-Conv module’s feature complexity metric successfully adapts to the reduced texture information. At the same time, the EMW-BiFPN effectively handles the multi-scale detection of cherry tomatoes (small) and beefsteak varieties (large). The 3.4% improvement over YOLO-Agri demonstrates that our lightweight architecture does not sacrifice adaptability for efficiency.
Apple Detection (94.1% mAP@0.5): Apple orchards typically feature more structured environments with less occlusion compared to strawberry fields. FEGW-YOLO achieves the highest absolute mAP among tested crops, benefiting from the Wise-IoU v3 loss function’s ability to handle the more precise object boundaries. The model successfully distinguishes between unripe (green) and ripe (red/yellow) apples under varying illumination conditions, validating its robustness to color variations across different fruit types.
Grape Detection (90.8% mAP@0.5): Grape clusters present the most challenging scenario due to extreme occlusion, irregular shapes, and small individual berry sizes. Despite these difficulties, FEGW-YOLO maintains competitive performance, with the 3.2% improvement attributed to the enhanced feature fusion capabilities of EMW-BiFPN. The model demonstrates particular strength in detecting harvest-ready clusters, a critical capability for automated vineyard management.
Cross-Crop Consistency: The consistent performance gains (average +2.9%) across all three crops validate the generalization capability of the FEGW framework. Notably, the model maintains identical computational costs (8.2 M parameters, 15.6 GFLOPs) across all crops, confirming that the lightweight architecture does not require crop-specific parameter tuning. This consistency is particularly valuable for multi-crop agricultural sensing systems where a single model must handle diverse detection tasks.
Feature Complexity Metric Validation and Layer-wise Distribution: To intuitively explain the performance variance, we analyzed the layer-wise distribution of FCD scores across crops. Strawberries exhibit a “High-Sustained” complexity profile, where FCD scores remain elevated (>0.7) even in deeper network layers (e.g., Layer 16–21), driven by the persistence of high-frequency achene textures. In contrast, tomatoes display a “Rapid-Decay” profile, where FCD scores drop sharply after the initial shallow layers due to their smooth surface morphology (Global Average FCD < 0.6). This distributional divergence provides the empirical root cause for the performance drop on TomatoNet-2023: the model’s compression schedule, optimized for the “High-Sustained” regime of strawberries, aggressively pruned feature channels in deeper layers that, for tomatoes, still contained essential albeit low-magnitude spatial cues. This confirms that FCD effectively captures the abstract concept of “visual richness” and directly correlates with the optimal compression depth.
4.4.2. Integration with Agricultural Robotic Harvesting Systems
The practical deployment of FEGW-YOLO in autonomous harvesting scenarios requires seamless integration with robotic end-effectors and real-time control systems. This section details the system architecture and operational workflow for robotic strawberry harvesting, demonstrating how our lightweight detection framework enables precision agriculture automation.
System Architecture:
The complete robotic harvesting system comprises four interconnected modules:
Vision Sensing Module: FEGW-YOLO deployed on NVIDIA Jetson Xavier NX (edge computing unit) processes RGB imagery from a Basler acA1920-40gc camera (1920 × 1080 resolution, 40 FPS) mounted on the robotic arm. The system achieves 38 FPS detection speed with 12.3 W power consumption, meeting real-time requirements for dynamic harvesting operations.
Spatial Localization Module: Detected 2D bounding boxes are projected into 3D space using depth information from an Intel RealSense D435i stereo camera. The lightweight nature of FEGW-YOLO (26.3 ms inference time) leaves sufficient computational headroom for parallel depth processing and point cloud generation on the same edge device.
Motion Planning Module: A 6-DOF robotic arm (Universal Robots UR5e) receives target coordinates from the spatial localization module and plans collision-free trajectories using ROS (Robot Operating System) MoveIt framework [
11]. The system prioritizes harvest-ready strawberries (Class 3: fully ripe) based on ripeness classification confidence scores output by FEGW-YOLO.
End-Effector Control Module: A custom soft-gripper with force feedback sensors executes gentle grasping (0.5–1.5 N grip force) to prevent fruit damage. The gripper’s approach angle is optimized based on the detected strawberry orientation (derived from bounding box aspect ratio and Grad-CAM++ attention maps [
35]).
Operational Workflow: [Camera Capture] → [FEGW-YOLO Detection (26.3 ms)] → [Depth Fusion (8.5 ms)] → [3D Localization (12.1 ms)] → [Motion Planning (45 ms)] → [Grasping Execution (2–3 s)].
Total cycle time: Approximately 2.1–3.1 s per strawberry, achieving a harvesting rate of 19–28 fruits per minute under optimal conditions.
Key Integration Advantages of FEGW-YOLO:
Real-Time Performance: The 38 FPS detection speed ensures minimal latency in the perception-to-action pipeline. Compared to heavier models (e.g., Mask R-CNN at 8 FPS on the same hardware), FEGW-YOLO reduces the vision processing bottleneck by 79%, enabling smoother robotic motion and higher throughput.
Multi-Class Ripeness Awareness: The model’s 95.1% mAP@0.5 across the three standardized ripeness stages (Unripe, Partially ripe, Ripe) enables intelligent harvesting strategies. The system can be configured to harvest only fully ripe strawberries (maximizing quality), selectively harvest Partially ripe fruits (maximizing yield), or include both (maximizing yield).
Occlusion Handling: The Wise-IoU v3 loss function’s robustness to partial occlusions directly translates to improved grasping success rates. Field trials show that FEGW-YOLO maintains 89.3% detection recall even when strawberries are 40–60% occluded by leaves, compared to 76.8% for the baseline YOLO-Agri model.
Energy Efficiency: The 12.3 W power consumption enables extended operation on battery-powered autonomous platforms [
11]. A typical agricultural robot with a 500 Wh battery can run the FEGW-YOLO vision system continuously for 40+ hours, compared to 18 h for conventional detection models that consume 28 W.
Field Deployment Results:
Preliminary field trials conducted in a commercial strawberry greenhouse (Jiangsu Province, China, May–June 2024) demonstrate the practical viability of the integrated system:
Harvesting Success Rate: 87.3% (successful grasp and detachment without damage)
False Positive Rate: 4.2% (attempted grasp on non-strawberry objects)
Missed Detection Rate: 8.5% (visible strawberries not detected)
Average Harvesting Speed: 23 strawberries per minute
Fruit Damage Rate: 2.1% (comparable to human pickers at 1.8%)
To further validate system reliability under varying illumination, we analyzed harvesting performance across three distinct temporal windows characterized by different color temperatures and contrast levels: Morning (07:00–09:00, diffused light, ~5000–6500 K), Noon (11:00–13:00, high contrast shadows, ~5500 K), and Evening (16:00–18:00, warm low-angle light, ~3000–4000 K). The system demonstrated remarkable stability, achieving success rates of 88.5%, 85.8%, and 87.6% respectively. The slight performance dip at noon is attributed to harsh shadows partially obscuring fruit stems, yet the consistent performance (>85%) across all regimes confirms that the FEGW-YOLO architecture effectively handles the dynamic range and spectral shifts inherent to unstructured field environments.
Comparison with Human Performance: While human pickers achieve higher harvesting rates (40–50 strawberries per minute) and lower damage rates, the robotic system offers advantages in consistency (no fatigue-related performance degradation), 24/7 operation capability, and labor cost reduction. The lightweight FEGW-YOLO model is critical to achieving the real-time performance necessary for competitive robotic harvesting.
Multi-Crop Robotic Adaptation: The cross-crop generalization capability validated in
Section 4.4.1 enables rapid adaptation of the robotic system to different fruit types. Preliminary tests show that the same hardware platform with minimal software reconfiguration can harvest tomatoes (18 fruits/min) and grapes (12 clusters/min), demonstrating the versatility of the FEGW-YOLO-based perception system. This multi-crop capability is particularly valuable for diversified farms and contract harvesting services.
Future Integration Directions:
Multi-Modal Sensing Fusion: Integration of hyperspectral cameras for non-destructive sugar content estimation, enabling harvest optimization based on both visual ripeness and internal quality metrics.
Collaborative Multi-Robot Systems: Deployment of FEGW-YOLO on multiple lightweight robots operating in parallel, with edge-to-edge communication for coordinated harvesting and collision avoidance.
Adaptive Learning in Field: Implementation of online learning mechanisms where the model continuously refines its detection capabilities based on harvesting success/failure feedback, improving performance over the growing season.
In summary, FEGW-YOLO’s combination of high accuracy, real-time performance, and computational efficiency makes it an ideal perception solution for agricultural robotic systems. The successful integration with end-effectors and motion planning modules demonstrates that lightweight deep learning models can bridge the gap between laboratory research and practical field deployment in precision agriculture.