3.3.1. YOLOv8-Dual Model Performance
This experiment includes the YOLOv8-Dual, YOLOv8-MD, and YOLOv8-MMD models. The loss curves of the YOLOv8-Dual model are shown in
Figure 8.
Figure 8a displays the training loss curves of YOLOv8-Dual, where train/box_loss-species represents the bounding box training loss of the species branch, train/cls_loss-species is the classification training loss of the species branch, train/dfl_loss-species is the distribution training loss of the species branch, train/box_loss-quality denotes the bounding box training loss of the quality branch, train/cls_loss-quality is the classification training loss of the quality branch, and train/dfl_loss-quality is the distribution training loss of the quality branch. In
Figure 8a, all losses decrease and converge smoothly. The bounding box losses of both branches are the lowest, indicating that the predicted bounding boxes closely match the ground truth with high accuracy. The classification loss of the quality branch is the highest, suggesting some error in quality category prediction. Notably, the distribution losses of the species and quality branches are nearly identical, with only a slight initial difference in species distribution loss, demonstrating comparable bounding box prediction performance between the two branches.
Figure 8b shows the validation loss curves of YOLOv8-Dual, where val/box_loss-species represents the bounding box validation loss of the species branch, val/cls_loss-species is the classification validation loss of the species branch, val/dfl_loss-species is the distribution validation loss of the species branch, val/box_loss-quality denotes the bounding box validation loss of the quality branch, val/cls_loss-quality is the classification validation loss of the quality branch, and val/dfl_loss-quality is the distribution validation loss of the quality branch. In
Figure 8b, the classification loss curve for the quality branch exhibits significant fluctuations with scattered points, indicating errors during validation that affect prediction accuracy. The distribution losses of both branches show minor fluctuations but eventually converge. The bounding box losses of both branches perform better than the training losses, suggesting that the predicted bounding boxes during validation also closely align with the ground truth, demonstrating robust performance.
However, YOLOv8-Dual exhibits slower convergence speed and moderate fluctuations during training, particularly evident in the validation curves, indicating potential stability issues. In terms of performance metrics, the species classification accuracy is slightly lower. A significant gap exists between training and validation losses in YOLOv8-Dual, especially within the species classification branch, suggesting possible overfitting issues and slightly weaker generalization capability of the model.
The performance of the dual-branch detection model is shown in
Figure 9. All white radish tassels are successfully detected with high confidence scores, and all white radishes are labeled with dual tags (species label and quality label). Targets under the conveyor belt are accurately detected, demonstrating ideal detection outcomes. Experimental validation confirms that YOLOv8-Dual exhibits foundational multi-task processing capabilities. The model simultaneously achieves detection of white radishes and their tassels, along with quality grading, with species detection confidence maintained between 0.65 and 0.75. In quality assessment, the model shows relatively stable recognition of medium-grade (middle) radishes (confidence: 0.70–0.80), but exhibits fluctuations in identifying high-quality (good) and defective (bad) radishes. Straight radishes with wider diameters are classified as high-quality, moderately curved ones as medium-grade, and highly curved or forked malformed radishes as low-quality. The results indicate that YOLOv8-Dual provides a feasible framework for multi-task detection in intelligent white radish harvesting systems, though further improvements are needed in detection stability and quality assessment accuracy.
The evaluation metrics of the YOLOv8-Dual model are shown in
Table 4. In the target species detection task, the model demonstrates high precision (0.931) and recall (0.918), with a corresponding AP50 of 0.947 and AP50-95 of 0.709, indicating robust detection performance even under high IoU thresholds. Further analysis of specific categories reveals optimal performance for “white radish” detection, where a precision (0.976), recall (0.978), and AP50 (0.99) all approach 1, and AP50-95 reaches 0.789, confirming exceptional accuracy and robustness for “white radish” targets. For “white radish tassels,” performance is slightly lower, with precision and recall at 0.887 and 0.859, respectively, AP50 at 0.904, and AP50-95 at 0.629.
In the target quality assessment task, the overall detection performance is moderate, with precision at 0.772, recall at 0.828, AP50 at 0.849, and AP50-95 at 0.636. This indicates that the model underperforms in quality recognition compared to species detection, particularly with a more significant decline in performance under higher IoU thresholds (AP50-95), likely due to the subjective nature of quality assessment and imbalanced data distribution. For specific categories, the “good” class achieves the best performance, with precision and recall reaching 0.847 and 0.899, AP50 at 0.908, and AP50-95 at 0.681, demonstrating precise identification of high-quality targets. In contrast, the “middle” and “bad” classes show weaker performance, especially the “bad” class, with precision at 0.684, recall at 0.74, AP50 at 0.758, and AP50-95 at 0.561, reflecting the need for improved accuracy in detecting low-quality targets.
In terms of operational efficiency, the model maintains a high real-time performance of 125 frames per second (FPS) in detection tasks, demonstrating its potential for efficient processing of large-scale data in practical applications. Additionally, the floating-point operations (FLOPs) remain at 8.1 G, reflecting relatively low computational costs, making the model suitable for resource-constrained edge computing scenarios. YOLOv8-Dual achieves excellent overall performance in object detection tasks, particularly excelling in object category detection, with extremely high detection accuracy and robustness for the primary class “white radish.” However, the model slightly underperforms in object quality assessment, especially requiring further optimization in detecting low-quality targets.
3.3.2. YOLOv8-MD Model Performance
Figure 10 shows the detection loss curves of the YOLOv8-MD model, indicating significant improvements compared to the baseline YOLOv8-Dual model. In
Figure 10a, the training losses of the dual-branch model decrease markedly, with faster convergence speed, rapidly declining within the first 50 epochs. The species branch loss drops to around 0.4, a 50% reduction from the baseline model, while the quality branch loss stabilizes near 0.8. All curves exhibit smoother convergence and enhanced performance. In
Figure 10b, the validation curves of the quality branch show particularly notable improvement, with minimal scattered points and better convergence. Both branches achieve the lowest classification losses, confirming significantly improved prediction accuracy. The concentrated loss values indicate stronger generalization capability of the model.
Compared to the baseline model YOLOv8-Dual, the YOLOv8-MD model improves feature extraction capability through effective aggregation of multi-scale features, enhances perception of critical regions by incorporating attention mechanisms, and reduces the gap between training and validation losses, indicating mitigated overfitting. However, scattered points observed after training suggest that the model’s stability has not yet reached optimal performance.
The detection results of YOLOv8-MD are shown in
Figure 11. Comparative analysis of two detection groups demonstrates that the improved YOLOv8-MD model achieves significant enhancements over the baseline YOLOv8-Dual model in multiple aspects. In the species detection task, the confidence score for white radish detection increases from 0.65–0.75 to 0.80–0.85, and the confidence for tassel detection improves from 0.70–0.80 to 0.82–0.89, reflecting stronger feature extraction and object recognition capabilities. For quality assessment, the enhanced model delivers more accurate and stable evaluations across quality grades. Notably, confidence in identifying high-quality (good) radishes rises from 0.45–0.65 to 0.70–0.85, while medium-quality (middle) evaluations stabilize within 0.75–0.82. Additionally, YOLOv8-MD exhibits improved robustness in complex agricultural environments, maintaining high detection accuracy and reliable quality assessment even under occlusion and lighting variations. These advancements validate the effectiveness of the MSAA module, enabling superior performance in real-world scenarios. Compared to YOLOv8-Dual, YOLOv8-MD significantly enhances detection stability, quality assessment accuracy, and environmental adaptability, providing more dependable technical support for intelligent white radish harvesting systems.
The evaluation metrics of the YOLOv8-MD model are shown in
Table 5. The detection performance of the species branch is significantly improved: precision increases from 0.931 to 0.939, recall from 0.918 to 0.91, AP50 from 0.947 to 0.951, and AP50-95 from 0.709 to 0.723. The performance for “white radish” is further enhanced, with precision rising from 0.976 to 0.981, AP50 from 0.99 to 0.991, and AP50-95 from 0.789 to 0.802, indicating improved accuracy and robustness for this category. However, the detection performance for “white radish tassels” shows relatively modest gains—precision improves from 0.887 to 0.897, AP50 from 0.904 to 0.911, and AP50-95 from 0.629 to 0.645—remaining a weaker aspect of species detection.
Quality detection performance also shows corresponding improvements. Overall quality detection precision increases from 0.772 to 0.804, recall from 0.828 to 0.831, AP50 from 0.849 to 0.862, and AP50-95 from 0.636 to 0.644, indicating progress in object quality detection. The “good” class achieves the most notable gains, with precision rising from 0.847 to 0.879, AP50 from 0.908 to 0.919, and AP50-95 from 0.681 to 0.691. Both “middle” and “bad” classes exhibit improved performance, particularly “middle,” where AP50 increases from 0.882 to 0.889 and AP50-95 from 0.665 to 0.678, demonstrating enhanced stability in detecting medium-quality targets. For the “bad” class, AP50 and AP50-95 rise from 0.758 and 0.561 to 0.777 and 0.564, respectively, reflecting incremental improvements despite smaller gains.
The frames per second (FPS) decreased from the previous 125 FPS in the table to 100–101 FPS. This reduction is attributed to the introduction of the MSAA module, which, while enhancing feature aggregation and accuracy, adds additional computational layers and operations to the network architecture. The floating-point operations (FLOPs) remain unchanged at 8.1 G, indicating that model optimizations did not significantly increase computational resource consumption, and it remains suitable for resource-constrained practical application scenarios.
3.3.3. YOLOv8-MMD Model Performance
Figure 12 shows the detection loss curves of YOLOv8-MMD, which exhibit significant advantages in detection loss curves compared to the YOLOv8-Dual and YOLOv8-MD models. In
Figure 12a, all loss curves are more concentrated. The classification losses of both the species and quality branches show notable reductions compared to YOLOv8-Dual, with tightly clustered loss values and significantly reduced prediction accuracy errors, indicating substantial performance improvements. The distribution loss, bounding box loss, and classification loss in training are relatively similar, with the species branch achieving the lowest classification loss, reflecting higher prediction accuracy and optimal training effectiveness. Compared to YOLOv8-MD, the YOLOv8-MMD model converges faster, rapidly descending to stable levels in early training stages, with smoother loss curves. Notably, the scattered points observed in the YOLOv8-MD model disappear after incorporating the MAFE module, confirming that YOLOv8-MMD achieves superior stability and performance. This suggests that the MAFE module, by providing enhanced and stabilized multi-scale feature representations, works synergistically with the MSAA module to regularize the training process and reduce optimization oscillations, leading to smoother convergence.
The validation loss curves in
Figure 12b are more convergent compared to YOLOv8-Dual, with no excessive scattered points, particularly evident in the validation classification loss of the quality branch. The YOLOv8-Dual model exhibits noticeable oscillations in validation classification loss, whereas the YOLOv8-MMD model demonstrates more stable and smoother validation classification loss. In
Figure 12b, the loss value for classification metrics stabilizes around 0.4, indicating the highest prediction accuracy for species. Compared to the validation loss of the YOLOv8-MD model, the loss curves in
Figure 12b achieve faster convergence and better generalization capability, with reduced curve fluctuations. Additionally, the classification loss in
Figure 12b aligns with that in
Figure 12a, demonstrating a significant improvement in the prediction accuracy of the enhanced YOLOv8-MMD model.
Compared to the YOLOv8-Dual and YOLOv8-MD models, YOLOv8-MMD exhibits significantly faster convergence speed. The loss values of the species branch rapidly drop below 0.5 within the first 50 epochs, whereas the YOLOv8-Dual model requires nearly 200 epochs to reach similar levels. The initial curve rapidly declines, particularly evident in the loss of species classification, quickly falling below 1.0. In particular, for the quality assessment task, the loss curves of the quality branch in YOLOv8-MMD are smoother and ultimately converge to lower loss values (approximately 0.7), while the YOLOv8-Dual model exhibits pronounced fluctuations and higher loss values.
The training and validation curves of the YOLOv8-MMD model indicate significantly improved stability after the initial convergence phase. The loss values remain at consistent levels with minimal oscillations, demonstrating enhanced model robustness. Compared to the YOLOv8-Dual model (approximately 0.3–0.8), the final loss values of the YOLOv8-MMD model’s classification branch (around 0.3–0.4) are lower, reflecting a notable improvement in species classification accuracy. Similarly, the quality assessment branch also shows enhancements, with the YOLOv8-MMD model maintaining greater stability and generally lower loss values throughout training. The reduced gap between training and validation losses in YOLOv8-MMD indicates better generalization capability, addressing issues like overfitting observed in the YOLOv8-Dual model. Through the synergistic integration of the MSAA and MAFE modules, YOLOv8-MMD exhibits superior stability in later training stages, with significantly reduced fluctuations in loss curves across branches, suggesting the model has identified optimal feature representations and task balance. These results conclusively demonstrate that the proposed YOLOv8-MMD model not only accelerates training but also substantially enhances performance and stability.
The YOLOv8-MMD model demonstrates exceptional detection performance in intelligent white radish harvesting scenarios. The detection results of the YOLOv8-MMD model are shown in
Figure 13. In species recognition, the model maintains stable confidence scores of 0.80–0.85 for white radish detection and achieves a confidence of 0.82–0.91 in tassel recognition, showcasing robust feature extraction capabilities. For quality assessment, the model accurately classifies white radishes across quality grades—confidence for the “good” category reaches 0.77–0.81, the “middle” category stabilizes between 0.78 and −0.80, and the “bad” category improves to 0.83–0.88, highlighting superior quality evaluation accuracy. Notably, the model retains stable detection performance in complex agricultural environments, delivering high-confidence and precise results even under partial occlusion and varying lighting conditions.
Compared to the baseline YOLOv8-Dual model and the YOLOv8-MD model with only the MSAA module added, YOLOv8-MMD achieves significant improvements across multiple aspects. In species detection, the average detection confidence increases by 15–20%, far exceeding the baseline model’s 0.65–0.75 and YOLOv8-MD’s 0.75–0.80. Enhancements in quality assessment are even more pronounced, particularly for the “good” category, where confidence improves from the baseline model’s 0.45–0.65 to 0.77–0.81, marking an over 30% gain. Through the synergistic integration of the MSAA and MAFE modules, the model not only elevates detection accuracy but also strengthens feature extraction and fusion capabilities, yielding more stable and reliable results. Overall, by combining a multi-task learning framework with feature enhancement modules, the YOLOv8-MMD model comprehensively advances detection precision, quality assessment, and environmental adaptability for intelligent white radish harvesting systems, providing robust technical support for practical applications in agricultural robotics.
From the data in
Table 6, YOLOv8-MMD achieves a species detection precision of 0.945, a 0.6% improvement over YOLOv8-MD’s 0.939. The recall rate increases from 0.91 to 0.924, indicating enhanced coverage in target detection. While AP50 slightly decreases from 0.951 to 0.949 (minor fluctuation), AP50-95 remains stable at 0.723. For white radish detection, both models share a precision of 0.981, but YOLOv8-MMD improves recall from 0.976 to 0.978, with AP50 and AP50-95 stable at 0.991 and 0.802, demonstrating stable and slightly optimized performance for this category. For white radish tassel detection, YOLOv8-MMD’s precision decreases from 0.897 to 0.885, but recall improves from 0.844 to 0.869, suggesting expanded coverage for complex targets despite the precision drop. AP50 declines marginally from 0.911 to 0.908, and AP50-95 decreases from 0.645 to 0.643, highlighting room for further improvement in tassel detection.
The observed lower precision for the “bad” quality class (0.691) and “white radish tassels” (0.885) compared to the main “white radish” class (0.981) can be attributed to several factors. First, data imbalance existed within the dataset: the number of “bad” quality samples was naturally lower than the “good” and “middle” grades, and tassels often occupied smaller, less distinct regions in the images compared to the prominent radish bodies. This imbalance can hinder the model’s ability to learn robust features for these minority classes. Second, the inherent difficulty of the tasks plays a role: distinguishing low-quality radishes (“bad”) often relies on subtle, irregular morphological defects that are highly variable, while detecting thin, elongated, and often occluded tassels against a complex soil and leaf background is inherently challenging. Subjectivity in quality labeling, especially at the boundary between “middle” and “bad” grades, may also introduce label noise, further impacting the precision of the “bad” class.
To further elucidate the classification performance of the YOLOv8-MMD model, we present the confusion matrices for both tasks.
Figure 14a shows the 2 × 2 confusion matrix for species detection. The model correctly identifies 798 white radish instances, with 66 misclassified as tassels (7.6% error rate). Similarly, 780 tassel instances are correctly detected, with 46 misclassified as radishes (5.4% error rate). This symmetrical error pattern indicates that the primary confusion occurs between the radish body and its foliage, which is visually plausible given their adjacency and partial occlusion in the images.
Figure 14b presents the 3 × 3 confusion matrix for quality assessment. The model performs well on the “good” class (305 correct, 55 misclassified as “middle”, 5 as “bad”), demonstrating reliable identification of high-quality radishes. For the “middle” class, 247 are correctly classified, with 43 misclassified as “good” and 26 as “bad”, showing reasonable confusion with adjacent quality grades. The “bad” class shows the expected challenges: while 148 are correctly identified, 25 are misclassified as “middle” and 5 as “good”. This confusion matrix quantitatively supports our earlier analysis regarding the difficulty of the “bad” class, showing that most misclassifications occur with the adjacent “middle” class rather than the distant “good” class, which aligns with the subjective nature of quality grading boundaries.
In terms of quality detection performance, YOLOv8-MMD achieves an overall quality detection precision of 0.812, surpassing YOLOv8-MD’s 0.804 (an increase of approximately 0.8%). The recall rate improves from 0.831 to 0.836, indicating expanded coverage in quality assessment. AP50 slightly decreases from 0.862 to 0.859, while AP50-95 rises from 0.644 to 0.655, reflecting improved performance under higher IoU thresholds. For the good category, precision drops from 0.879 to 0.86, AP50 declines from 0.919 to 0.91, but AP50-95 remains stable at 0.691, maintaining robust detection under high IoU despite the precision dip. The middle category shows minor declines—precision decreases from 0.822 to 0.815, AP50 from 0.889 to 0.887, and AP50-95 from 0.678 to 0.676, with minimal performance fluctuations. For the bad category, YOLOv8-MMD’s precision drops from 0.709 to 0.691, AP50 increases marginally from 0.777 to 0.78, and AP50-95 improves from 0.564 to 0.565, suggesting slight high-IoU detection gains for low-quality targets, though overall performance remains suboptimal.
Despite the sequential addition of modules, YOLOv8-MMD achieves a faster detection speed (107–112 FPS) than YOLOv8-MD (100–101 FPS). This recovery in efficiency, even with the extra MAFE module, occurs because its efficient design (e.g., depthwise separable convolutions) minimizes overhead, and its synergy with MSAA yields a more stable model that requires less computational redundancy during inference. The floating-point operations (FLOPs) remain unchanged at 8.1G, indicating that the speed optimization does not introduce additional computational complexity.
Compared to YOLOv8-MD, YOLOv8-MMD demonstrates superior recall rates and detection performance under high IoU thresholds (AP50-95), along with significantly improved detection speed (FPS), showcasing enhanced real-time capabilities. However, its detection precision and certain metrics (e.g., AP50) show slight declines, particularly for complex categories like “white radish tassels” and low-quality targets such as “bad,” indicating remaining room for improvement.