1. Introduction
According to standard plant physiology definitions, seed germination is the physiological process that commences with the uptake of water (imbibition) by the quiescent dry seed and terminates with the protrusion of the embryonic primary root from its enclosing structures. This event, known as completion of germination, marks the transition to a seedling stage [
1].
Before each batch of seeds is sold or used, a standard practice is to conduct germination tests in laboratories to evaluate whether the batch meets sales standards [
2]. A high germination percentage is a crucial measure of seed quality, often indicating faster germination progress and stronger environmental adaptability [
3]. In contrast, seeds with lower germination percentages often face issues such as delayed germination and weakened growth vigour, making them susceptible to environmental constraints and resulting in reduced agricultural yields [
4]. Based on previous findings, when the length of the primary root exceeds 2 mm, the seeds are generally considered to have begun germination [
5]. However, this determination has traditionally relied upon experienced experimenters to visually identify whether visible emerging shoot or primary root structures have emerged from the seed, in order to assess germination status [
6].
Traditional methods for assessing seed germination status and calculating germination percentages rely heavily upon manual observation and operations. This approach requires substantial manpower and time, and is limited by subjectivity and inefficiency issues [
7,
8]. Due to human factors, the accuracy and consistency of detection results are often difficult to guarantee, which substantially limits the efficiency and quality of modern agricultural production [
9]. Beyond the immediate impact on cultivation, such inaccuracies carry significant economic repercussions throughout the agricultural value chain. For seed companies, the misrepresentation or inaccurate labeling of germination rates in the market can severely damage brand reputation, erode consumer trust, and lead to substantial financial losses. Furthermore, it complicates crucial seed traceability processes, hindering quality assurance and accountability. This broad economic impact underscores the critical need for precise and reliable germination assessment. Therefore, an urgent need exists for a more objective, repeatable, and efficient detection method to replace traditional manual methods [
10]. Such a method would not only enhance the accuracy and efficiency of detection but would also reduce the influence of human factors on the results, thereby providing more reliable technical support for agricultural production.
With the rapid advancement of deep-learning and computer-vision technologies, image-based methods for detecting seed germination completion and evaluating germination percentage have attracted growing attention and achieved notable progress [
11]. To swiftly and accurately assess the germination percentage and condition of maize seeds, Liu et al. [
12] proposed an innovative approach that integrates an improved local linear-embedding algorithm with near-infrared spectroscopy (NIRS). Initially, the seeds were graded based on artificial aging, after which NIRS data were collected for each sample. Subsequently, the germination percentage was tested. Through a meticulous comparison and combination of different models, a germination percentage prediction model based on partial least-squares analysis and a support vector machine was established, achieving an R-squared value of 0.8384, indicating high predictive accuracy.
In contrast, Xiao et al. [
13] adopted a different strategy by selecting various kernel functions and combining them with Gaussian process regression (GPR) methods. To enhance the robustness of the model, Monte Carlo cross-validation was employed to eliminate 12 outlier values, and baseline-correction methods and multiplicative scatter correction were applied to optimise the NIRS data. Following a series of optimisations and validations, a GPR model based on the Matern32 kernel function was successfully developed. This model not only enabled non-destructive detection of the germination percentages of maize seeds but also achieved an impressive R-squared value of 0.9599 on the test set, demonstrating outstanding stability and detection performance. However, this model involves limited utilisation of NIRS and insufficient artificial intelligence.
Unlike NIRS-based approaches that infer germination status from spectral properties, image-based deep learning methods, particularly YOLO variants, directly leverage visual features (e.g., primary root protrusion and morphology) for highly accurate and spatially precise detection. This direct visual analysis is often more intuitive and robust for identifying the physical manifestation of germination, especially when dealing with complex morphological changes and potential tangles, which can be challenging for indirect spectral methods.
Yamazaki et al. [
14] utilised the YOLOv5 model and transfer-learning techniques to construct a deep-learning model to distinguish germinating from non-germinating pollen and further integrated the results of whole-genome association studies to reveal gene patterns closely related to pollen germination vigour. However, the model did not achieve significant improvements, specifically targeting the distinctive features of germinating pollen during the detection process, which may have affected the recognition accuracy.
Zhao et al. [
15] developed the YOLO-r model by integrating image partitioning, a transformer encoder, a small target detection layer, and a complete intersection over union loss function to improve the You-Only-Look-Once (YOLO) algorithm. After extensive testing on a dataset containing 21,429 rice seeds, the YOLO-r model achieved an average detection accuracy of germinated seeds of 0.9539, with a sufficiently fast detection speed suitable for real-time applications. Nevertheless, despite its strong performance in terms of multiple aspects, the relatively low image input resolution of the model limits its detection accuracy, potentially leading to instances of missed detection in specific scenarios.
Yao et al. [
16] considered the characteristics of wild rice germination and proposed the SGR–YOLO algorithm within the YOLOv7 model, integrating the ECA attention mechanism, BiFPN, and GIOU loss function. In subsequent experiments, the data were segmented into two different experimental environments—hydroponic boxes and Petri dishes—to comprehensively evaluate the performance of the model. In the hydroponic-box environment, the SGR-YOLO model demonstrated 94% accuracy, whereas it achieved a higher accuracy of 98.2% in the Petri dish environment, demonstrating the efficiency and accuracy of the algorithm. However, despite achieving a higher accuracy in the Petri dish environment, the lower seed-density arrangement observed in laboratory Petri dishes differed significantly from the denser arrangements that seeds might exhibit in actual experimental environments.
In this study, we developed a method for detecting seed germination completion based on deep-learning technology, thereby enabling accurate germination percentage assessment. The method leveraged the advantages of deep neural networks in image recognition and feature extraction, achieving automated identification and evaluation of the seed-germination status.
First, to enhance the accuracy of the model in identifying germinated seeds, we focused on improving the model’s feature-extraction capability to ensure that it could capture the key characteristics of germinated seeds more precisely. To this end, the second layer of the stage (S) 1 block in the backbone feature-extraction network employed internal convolution to extract the initial features, thereby increasing their spatial specificity. Second, as germination testing progressed, the complexity of emerging primary root growth gradually became apparent, specifically manifesting as intertwined and entangled primary roots from multiple germinated seeds/seedlings (
Figure 1). This phenomenon markedly affected the accurate identification of individual seeds [
17]. Therefore, after the output of the last layer of the backbone feature-extraction network, an Explicit Visual Centre (EVC) module was introduced to achieve feature adjustment within the centralised layer [
18], enabling a more comprehensive and accurate representation of the seed characteristics and improving independent seed identification.
Next, we explored the detection accuracy of seeds of different scales, particularly the incomplete recognition of small-sized seeds. Therefore, the Spatial Context Pyramid (SCP) module was introduced into the enhanced extraction network [
19]. This module enhanced the detailed features of the small seeds by learning the global spatial context at each level. Finally, the model’s operation should not impose a burden on the equipment used. Based on the above challenges and requirements, we selected cabbage seeds as experimental samples and made targeted improvements to the YOLOv7 model [
20], resulting in a YOLO–Seed Germination Detection (SGD) model. This model not only effectively alleviated the issues mentioned above but also achieved fast, intelligent, and accurate detection of germinated seeds even when facing situations where cabbage seeds are small and prone to rolling.
The main contributions of this study can be summarised as follows:
(1) Proposed an innovative YOLO-SGD model: targeting the characteristics of germinated seed detection, we built an efficient and high-precision germinated seed detection algorithm based on YOLOv7-l by introducing an internal convolution structure, an Explicit Visual Centre (EVC) module, and a Spatial Context Pyramid (SCP) module.
(2) Effectively addressed core challenges during germination: significant improvements were achieved, particularly in addressing the difficulty of individual identification caused by root entanglement and the accurate detection of seeds of different scales (especially small targets).
(3) Achieved a balance between high accuracy and low overhead: experiments verified that the YOLO-SGD model achieved excellent detection accuracy while maintaining a small model size (45.22 M), with the average absolute error of germination percentage prediction controlled within 1.0%, which is sufficient to replace manual detection in laboratory environments.
(4) Provided a practical solution for the agricultural field: developed a fast and intelligent detection method for cabbage seed germination assessment, which is expected to be applied in areas such as germplasm resource evaluation, breeding selection, and agricultural production, possessing significant practical application value.
3. Results
3.1. Comparative Experiments
3.1.1. Comparison of Different Object Detection Algorithms
To objectively evaluate the performance of the YOLO–SGD detection algorithm, we conducted comparative experiments with mainstream object detection algorithms with the same hardware setup, using the same dataset and parameter configurations. Based on the results, we selected the YOLOv7-l (large) model (underlined) as the baseline model for this study (
Table 1). The letters shown in parentheses denote the model versions or sizes. For example, DC5 and 4scale denote the versions used, ‘s’ denotes a small model size, ‘m’ denotes a medium model size, ‘l’ denotes large model size, and ‘x’ denotes extra-large model size.
Table 1 shows a performance comparison of various object detection models across different difficulty levels, including key metrics such as the easy AP, hard AP, mAP, F1 score, and parameter count. The proposed YOLO–SGD model demonstrated outstanding performance across all metrics, achieving an easy AP of 99.6%, a hard AP of 96.4%, an mAP of 98.0%, and a high F1 score of 93.3%. These results indicate that the YOLO–SGD model achieved high detection accuracy and stable performance under various conditions.
Among the other models, YOLOv8 (l) achieved an easy AP of 93.2%, a hard AP of 91.8%, an mAP of 92.8%, and an F1 score of 88.5%. YOLOv8 (x) exhibited similar performance with an easy AP 93.8%, a hard AP of 91.6%, an mAP of 92.7%, and an F1 score of 88.4%. YOLOv7 (l) achieved an easy AP of 95.4%, a hard AP of 90.2%, an mAP of 92.8%, and a F1 score of 88.9%. The YOLOX series maintained good performance with relatively lower parameter counts. YOLOX (s) had a parameter count of only 8.97 M, achieving an mAP of 89.4% and an F1 score of 80.6%. YOLOX (m) and YOLOX (l) had parameter counts of 25.32 million (M) and 54.29 M, respectively, with mAPs of 90.7% and 90.4%, and F1 scores of 82.4% and 83.0%. The Rt-detr (l) and Rt-detr (x) models had parameter counts of 52.70 M and 87.20 M, achieving mAPs of 88.7% and 87.6% and F1 scores of 80.3% and 85.1%, respectively.
Figure 7 visually delineates the comparative performance of each model across five dimensions: easy AP(%), hard AP(%), mAP(%), F1(%), and parameter size (Params(M)). Based on the data presented in
Table 1, YOLOv7 (l) exhibited superior performance across the easy AP, mAP, and F1 metrics, achieving detection performance (e.g., easy AP, mAP, and F1 score) for germinated seeds of 95.4%, 92.8%, and 88.9%, respectively. With hard class detection, the model achieved the highest value by only 1.6%. Furthermore, with a model parameter size of 37.62 M, the computational strain of the YOLOv7 (l) model was relatively modest compared to most other models. In conclusion, the YOLOv7 (l) model was the most suitable baseline model for detecting seed germination. Building on this foundation, we applied targeted refinements, leveraging the phenotypic features of the germinated seeds and the camera-imaging characteristics to enhance the model’s alignment with seed-germination detection tasks. The refined model demonstrated optimal performance across all evaluated metrics: with a model parameter size of 45.22 M, it achieved respective improvements of 4.2% and 6.2% in terms of easy and hard class detections, along with a 5.2% enhancement in the mAP and a 4.4% increase in the F1 score. Of particular note, the model showed advanced detection of small germinating seeds under challenging conditions; thus, it showed the highest overall performance in comparative experiments.
3.1.2. Comparative Experiments of Involution (INV)
During model improvement, we replaced convolution with INV to accomplish preliminary feature extraction, thereby enhancing the specificity of the feature space. In this section, we discuss the impact of embedding an INV at different positions on the model’s prediction performance. As shown on the right side of
Figure 5b, S1 of the backbone network contained three feature-extraction layers, named S1-1, S1-2, and S1-3 (from bottom to top), representing scenarios in which INV replaced a specific regular convolution layer.
Figure 8 shows the model’s attention heatmaps for hard class and easy class detection, respectively, after replacing convolution at different positions with the INV module. It should be noted that these comparative experiments only varied the embedding position of INV based on the baseline, without considering other influencing factors.
With increases in the hierarchy, we observed corresponding increases in the number of parameters and computations; however, this trend did not directly correlate with the detection performance (
Table 2). The addition of convolution (INV) in S1-1 did not lead to a performance improvement; instead, all metrics declined. This observation may reflect the fact that enhancing the spatial specificity of the features in fewer feature layers inhibited the capture of detailed seed features. Introducing INV in S1-2 improved the model performance across all evaluation metrics, achieving optimal results among the three positions, with respective increases of 0.9%, 1.9%, 1.4%, and 0.8% compared with the baseline. Using INV to replace regular convolution in S1-3 resulted in a 1.0% and 0.2% increase in the hard detection and mAP metrics, respectively, but slightly decreased the metrics for easy class detection and the F1 score. In summary, implementing INV in S1-2 to replace regular convolution enhanced the model performance, particularly in terms of detecting small seeds.
3.2. Ablation Experiments
Building on the comparative analysis presented in the previous section, we adopted YOLOv7-l as the baseline algorithm. To validate the effectiveness of the improvements in seed germination detection tasks, we conducted further ablation experiments for each module and their combinations, including the INV, EVC, and SCP modules. The results demonstrated varying degrees of impact on the detection metrics for germinated seeds when different enhancement modules were adopted (
Table 3).
The baseline model achieved an easy AP of 95.4%, a hard AP of 90.2%, an mAP of 92.8%, and an F1 score of 88.9%, without any components. Significant performance improvements were observed with the addition of each module. For example, adding INV alone increased the mAP to 94.2%, adding EVC increased the mAP to 94.6%, and adding SCP increased the mAP to 95.2%. When INV and SCP were added simultaneously, the mAP significantly increased to 97.1%, with F1 reaching 92.2%. Ultimately, with all the components added, the model achieved peak performance: easy AP, 99.6%; hard AP, 96.4%; mAP, 98.0%; and F1 score, 93.3%, demonstrating the critical role of each optimisation method in enhancing the model performance.
Figure 9 and
Figure 10 intuitively illustrate the impact of different module combinations on the model’s detection capability.
Figure 9 presents the heatmaps of the hard class under various configurations.
Figure 9a–g show the attention regions generated by the model when integrating individual modules (INV, EVC, SCP) and their combinations (INV + EVC, INV + SCP, INV + EVC + SCP). As more modules are integrated, the activation regions become increasingly focused and accurate, especially in
Figure 9g, where the full combination (INV + EVC + SCP) yields the most precise and concentrated attention on the hard-class seeds, which are typically more difficult to detect.
Figure 10 shows the corresponding heatmaps for the easy class. Similar to the hard-class results,
Figure 10a–g illustrate how the attention distribution evolves with different module combinations. For these more easily detectable, germinated seeds, the model exhibits improved focus and spatial coverage with each added module. In particular, the full integration shown in
Figure 10g leads to the most comprehensive and accurate localisation. These visual results are consistent with the quantitative outcomes in
Table 3, further validating the effectiveness of the proposed INV, EVC, and SCP modules and their synergistic combinations in enhancing seed germination detection performance.
3.3. Experimental Results
Figure 11 illustrates the performance differences between the model developed in this study and the baseline model for detecting germinated cabbage seeds. The first column displays the original images, capturing both distant and close-up perspectives to depict the data for the target seeds at different scales. The second column shows the predictions of the original YOLOv7-l algorithm, and the third column displays the predictions of the YOLO-SGD model developed in this study.
Our comparison provided evidence that in both the distant and close-up detection scenarios, the YOLO–SGD model successfully detected all germinated seeds, whereas the original model exhibited instances of false positives and misses, which were particularly notable with distant detection. Furthermore, we found differences in the confidence levels for seed detection at the same positions in the same image between both models, with the YOLO–SGD model generally exhibiting higher confidence scores, further highlighting its superior detection performance.
3.4. Detection of the Cabbage Seed-Germination Rates
To comprehensively evaluate the effectiveness and practicality of the proposed YOLO–SGD algorithm, we employed a four-way, random-sampling method to detect the germination percentages of 500 cabbage seeds. The experimental process was as follows: 500 cabbage seeds were evenly divided into 10 groups, with each group containing 50 seeds, and placed in 10 Petri dishes for germination. Subsequently, images were captured at two critical time points, 48 h and 72 h after germination, at both distant and close-up distances for each Petri dish, which served as the dataset for model detection. Subsequently, the germination percentages were manually calculated after each image was captured, and the time required for each assessment was recorded.
The germination assessment data, derived from the counts of germinated seeds, reflected the overall germination status of 500 seeds, and the time shown represented the average time taken for the germination percentage calculations per Petri dish (
Table 4). After 48 h of germination, the detection results of the YOLO–SGD model for close-up images were consistent with those of manual inspection. However, for distant images, the model’s detection results were slightly lower (0.6%) than those of manual inspection, and three germinated seeds were not detected with the model. After 72 h of germination, the model missed only one germinated seed (when compared with manual inspection) for close-up detection, resulting in a discrepancy of 0.2% in the germination percentage calculation. With distant detection, the YOLO–SGD model missed five seeds (when compared with manual inspection), resulting in a discrepancy of 1.0% in the germination percentage calculation.
In addition to the quantitative analysis of germination percentages at 48 and 72 h based on
Table 4, this study also conducted visual and trend-based evaluations of the YOLO-SGD model’s performance during the initial germination stage (24 h) and throughout the overall dynamic germination process, to more comprehensively examine its accuracy and stability at different time points. As shown in
Figure 12, this figure intuitively presents the comparison between the YOLO-SGD algorithm and manual measurement results at three key time points: 24 h, 48 h, and 72 h, and clearly shows the distribution characteristics of the germination percentage at each time point through box plots. Specifically,
Figure 12a–c are comparison charts between the YOLO-SGD algorithm and manual measurements at 24 h, 48 h, and 72 h, respectively;
Figure 12d is the box plot of the germination percentage distribution at 24 h, 48 h, and 72 h. This multi-dimensional data presentation method helps to gain a deeper understanding of the model’s performance and robustness during the dynamic process of seed germination.
3.5. Generalisation Performance on Diverse Seed Types
To rigorously evaluate the generalisation capability of the proposed YOLO-SGD model beyond cabbage seeds, we conducted additional germination detection experiments on three other distinct seed types: pepper, tomato, and eggplant. These seeds were chosen to represent variations in size, shape, and germination morphology, providing a comprehensive assessment of the model’s adaptability. Crucially, during laboratory germination experiments, these vegetable seeds, like cabbage seeds, present similar challenges such as small size, proneness to rolling, or primary root entanglement. The improved YOLO-SGD model is specifically designed to address these common difficulties across diverse seed types. To ensure the fairness and comparability of these tests, the entire experimental setup—including data collection, data processing, dataset partitioning, and the specific hyperparameters for model training and prediction—was maintained strictly identical to that used for the cabbage seed experiments.
Table 5 presents the detailed performance metrics of the YOLO-SGD model across all tested seed types. As evidenced by the results, the proposed model not only maintained excellent performance on cabbage seeds (mAP of 98.0%, F1 score of 93.3%) but also demonstrated robust and high-precision detection capabilities for pepper (mAP of 95.2%, F1 score of 90.1%), tomato (mAP of 97.8%, F1 score of 91.5%), and eggplant seeds (mAP of 95.4%, F1 score of 90.1%). These consistent high scores across diverse seed species, particularly in easy AP and hard AP, unequivocally confirm the strong generalisation ability of the YOLO-SGD model.
Further qualitative validation of the model’s generalisation is illustrated in
Figure 13, which showcases representative detection results on various seed types. The clear and accurate bounding box detections on different seed morphologies demonstrate the model’s adaptability in visually identifying germinated instances regardless of the specific seed characteristics. This visual evidence, combined with the quantitative metrics, underscores the potential for the YOLO-SGD model to be broadly applied for intelligent seed germination assessment across a wide range of agricultural crops.
4. Discussion
4.1. Performance of Each Model in Comparative Experiments for Germinated Seed Detection Using the Same Dataset
The models used in the comparative experiments were divided into two main categories: those based on the transformer framework or the convolution framework. Next, a detailed comparison and analysis was conducted on both types of models. With the experimental models based on the transformer framework (including the DETR, Rt-DETR, DINO, and ViDT-swin models), starting from the data distribution, these models generally exhibited stronger performance at easy class AP levels than the convolution framework models did. The highest performance observed was 94.9% with Rt-DETR (l), which demonstrated excellent performance in detecting large sprouting seeds, possibly due to the transformer’s powerful capability of capturing long-distance semantics. However, this advantage somewhat limited the models’ ability to localise and recognise small objects, as they generally performed weaker at hard class detection levels than with convolution framework models, with the ViDT-swin showing the highest performance of 83.2%. Additionally, in terms of the F1 score, these models showed moderate performance, with the highest score being 85.1% for Rt-DETR (x), indicating that for datasets with inconsistent object sizes, the models’ performance in identifying whether a seed has completed germination was not high. Furthermore, in terms of model parameters, these models tended to have large scales, with the smallest parameter value for Rt-DETR (l) at 52.7 M. Owing to the characteristics of the transformer framework, the models involved a substantial number of matrix operations, leading to slow training processes that often required lengthy training times to gradually converge.
Excluding the aforementioned transformer-based models, the remaining models belonged to the convolutional framework. Among these models, excluding the YOLO-SGD model, YOLOv7 (l) achieved the highest AP of 95.4% in terms of easy class germinated seed detection, whereas YOLOv8 (x) reached 91.8% during hard class germinated seed detection, which was the highest in its category. In terms of mAP and F1 scores, the YOLOv7 (l) model achieved the highest values of 82.8% and 88.9%, respectively. The Faster R-CNN model, a classic two-stage detection model, showed moderate performance with the dataset used in this study. It showed longer training times than other convolution framework models and did not stand out significantly in terms of easy and hard class detection, with values 4.2% and 18.8% lower than the highest observed values, respectively. In terms of model parameters, the Faster R-CNN model also fell into the category of a large model. YOLOX, which was introduced 2 years ago, demonstrated moderate detection capabilities. Its advantage lies in its diverse model versions, with YOLOX (s) having the smallest full-model parameters at only 8.97 M. However, YOLOX showed some deficiency in hard class detection, with a 4.2% gap compared to the YOLOv8 (l) value of 91.8%. YOLOv7 and YOLOv8 were the latest additions to the YOLO series, each with its own strengths. YOLOv7 utilises numerous residual structures and multiscale information, demonstrating excellent performance on targets of different sizes. YOLOv7 (l) and YOLOv7 (x) showed a small difference of only 0.7% in the mAP but a larger difference of 4.0% in the F1 score, with the model parameters being half the size, indicating better overall performance for YOLOv7 (l). YOLOv8 combines various advantages of the YOLO series models, including improvements to the P5 and P6 feature layers, which enhance the ability of the model to handle high-resolution images effectively. These features resulted in better performance in terms of hard class detection, surpassing YOLOv7 (l) by 1.6% at its peak. The mAP differences among the models of sizes s, m, l, and x were relatively small, with a maximum difference of 2.4% between the highest and lowest values. These results suggest that YOLOv8 offers limited additional utilisation of the current dataset and that changes in the model scale did not significantly enhance the feature-extraction efficiency, which may have impacted subsequent optimisations. Although the YOLOv8 (l) model achieved the highest F1 score among the YOLOv8 models, it fell short of the YOLOv7 (l) model by 0.4%.
4.2. Discussion of the Effects of Various Optimisation Approaches on the Model Performance in Ablation Experiments
To gain deeper insights into the specific contributions of each improved module to model performance, this study designed and conducted ablation experiments to systematically evaluate the effects of INV, EVC, and SCP modules individually and in combination. This section will analyze the mechanism and extent of optimisation strategies in enhancing the detection performance of germinating seeds by discussing the ablation experiment results presented in
Table 3, the heatmaps of detection results for hard categories generated by different module combinations in
Figure 9, and the heatmaps of detection results for easy categories generated by different module combinations in
Figure 10.
INV module: Incorporating convolution contributes to an overall enhancement in model performance (
Table 3). With an increase in the parameters of only 0.79 M, the mAP and F1 scores improved by 1.4% and 0.8%, respectively. These findings demonstrate that convolution enhanced the spatial specificity of the features early in the feature-extraction process, endowing the learned seed characteristics with stronger semantic integrity. This enhancement provided a more detailed feature foundation for subsequent extraction tasks.
EVC module: The EVC module was introduced into the main backbone, where it performed a lightweight approximation of the weight redistribution on the output of the last effective feature layer. This operation deepened the focus on targets with fewer features, such as side faces, and enhanced the sensitivity to low-quality images, mitigating interference from low-quality images to some extent. Compared with the baseline model, introducing the EVC module resulted in improvements across all detection metrics, with increases of 1.0%, 2.6%, 1.8%, and 0.9%. In contrast to the INV module, a notable improvement in the mAP metric was observed with the EVC module. This improvement may indicate that introducing the EVC module effectively alleviated issues wherein germinated seeds (which should be easily detectable) were lost due to entangled roots, possibly improving the detection accuracy. Regarding the model parameters, the improvement resulted in an increase of 6.1 M, likely owing to computations from the MLP and the dual-stream structure formed by the encoded calculations. However, given the enhanced recognition accuracy for targets such as the edges of germinated seeds, this increase in computational load was acceptable.
SCP module: The SCP module was positioned after the feature-enhancement network, and detailed features were enhanced by learning global spatial contexts at each level, effectively exploiting residual valid information. Introducing the SCP module improved both easy and hard class detection, with a 3.4% improvement in hard class detection. These results indicate that enhancing the multiscale information was beneficial for seed targets with different latencies and proportions, especially for small-sized seed targets that are difficult to detect. Additionally, the model parameters increased by only 0.69 M, which did not introduce an excessive computational burden.
INV and EVC modules: When used together, these two modules contributed to a 2.6% improvement in the mAP and a 1.4% improvement in the F1 score. However, compared to using only the SCP module, only a slight decrease of 0.3% was found in terms of hard class detection. The data suggest that relying solely on the INV and EVC modules is insufficient because they cannot effectively learn and localise small seed targets.
INV and SCP modules: The INV module enhanced the spatial specificity of the features, whereas the SCP module addressed the localisation capabilities of germinated seeds at different scales. Combining both modules significantly improved the overall performance of the model. Compared with the baseline model, the model parameters increased by only 1.5 M, yet achieved an mAP of 97.1% and an F1 score of 92.2%. This improvement indicates that both the prediction and classification capabilities were substantially enhanced.
SCP and EVC modules: When these two improvements were implemented simultaneously, their impact on model performance was evident. The results showed increases of 3.5% and 4.7% in terms of easy and hard class detections, respectively. These enhancements suggest that the detection of image edges and small seed targets was substantially improved. Despite an increase of 6.8 M in the model parameters, the size of the improved model remained relatively small.
INV, EVC, and SCP modules: The combined use of these three modules significantly enhanced the model performance. Compared with the baseline model, we observed improvements of 4.2% and 6.2% in terms of easy and hard class detection, respectively, achieving an mAP of 98.0% and an F1 score of 93.3%. At this stage, the model demonstrated strong generalisation capabilities with different sizes of germinated seeds and distorted images, owing to the imaging conditions. The model’s parameter size (45.22 M) also ensured that it did not impose a significant burden on the experimental equipment.
4.3. Discussion of Manual Calculations and the YOLO–SGD Methods in Terms of Detecting Seed-Germination Rates
In both close- and long-range detection scenarios, the YOLO–SGD model exhibited significant advantages in terms of the average computation time, being over 31 times faster than manual detection. This advantage became more pronounced as the number of germinated seeds increased, greatly enhancing the efficiency of calculating seed germination percentages during experiments and effectively saving labour. Furthermore, for seed targets representing different scales in close- and long-range settings, the model maintained an error rate of within 1.0% compared with that of manual detection, demonstrating that its high-precision detection capability is sufficient to replace manual methods.
Figure 12 further details the prediction performance of the YOLO-SGD algorithm at different germination time points. As can be seen from the comparison curves shown in
Figure 12a–c, the germination percentage curves predicted by the YOLO-SGD algorithm at the three key time points of 24 h, 48 h, and 72 h show a high degree of consistency with the manual measurement curves. This indicates that throughout the dynamic process of seed germination, the YOLO-SGD algorithm can continuously and stably capture the germination status and provide results extremely close to manual judgment. This high consistency aligns with the average absolute error mentioned in our abstract, which is primarily controlled within 1.0%, strongly confirming that the model has the accuracy to replace manual detection. It is particularly noteworthy that even as germination progresses and complex situations such as increased seed quantity and intertwining primary roots from multiple germinated seeds/seedlings occur, the model can still maintain good recognition accuracy, further verifying its effectiveness in addressing the aforementioned challenges.
Figure 12d presents the distribution characteristics of the germination percentage predicted by the YOLO-SGD algorithm at different time points in the form of box plots, offering a more intuitive view. As germination time advances, the median of the germination percentage (represented by the horizontal line within the box) shows a clear upward trend, which is consistent with the biological process of natural seed germination. At the same time, the length of the box (representing the interquartile range of the data) and the range of the whiskers are generally narrow, with few outliers. This indicates that the prediction results of the YOLO-SGD algorithm at each time point have a high degree of concentration and stability, with a small range of data fluctuation, verifying the robustness of the model’s predictions. This stable distribution characteristic is an important manifestation of the reliability of the automated detection system, meaning that the system can provide consistent and trustworthy germination percentage data in practical applications, laying a solid foundation for large-scale, high-throughput seed detection.
4.4. Discussion of Detecting High-Density Seed Germination
In this study, cabbage seeds were used as the experimental samples. Their small physical characteristics and considerable primary root entanglement issues arising from a high seeding density among germinated seeds/seedlings in cultivation dishes during germination experiments pose technical challenges for detection tasks. Wang and Song [
39] combined hyperspectral-imaging technology with optimised deep-learning algorithms to evaluate sweet corn seeds. Their optimal model achieved 97.23% seed detection accuracy, with each image containing up to nine corn seeds. Jin et al. [
40] used PCA technology to select specific spectral data and combined deep-learning networks and traditional machine-learning methods to predict the germination vigour of rice seeds under natural aging conditions. With images containing eight rice seeds, most models achieved accuracies greater than 85.0%. Jiang et al. [
41] used the YOLOv8 framework to detect pea seed-germination percentages using training data containing 36 pea seeds per image and achieved a detection accuracy of 98.7%. However, these studies were conducted using low-density seed arrangements for training and testing, which may not fully meet the requirements of practical germination experiments. However, in this study, we used high-density-arranged cabbage seeds for the experimental data (50 seeds/culture dish), which is closer to real production environments. As revealed by the heatmap in the middle section of
Figure 5c, including the EVC module significantly enhanced the attention to different seed-detail targets, resulting in a 1.8% increase in the mAP. Even with partial seed–root entanglement, the system could accurately identify each seed, demonstrating the robustness and effectiveness of the model in handling high-density seed arrangements.
4.5. Impact of Multi-Scale Information Processing on the Performance of Detecting Seed Germination
Enhancing our model’s attention to multiscale information is beneficial both in terms of accuracy and transferability to detecting other seed types. Fu et al. [
42] utilised YOLOv4 for the automated detection of wheat seed-germination vigour by incorporating an FPN structure for multiscale information processing and achieved an average accuracy of 97.59%. Zhao et al. [
15] enhanced the detection of rice seeds by incorporating a small-object detection layer, achieving an average precision of 95.39% with average errors maintained within 0.1. In this study, we enhanced the sensitivity of cabbage seeds to multiscale characteristics via training with data from two scales and incorporating the SCP module. Following addition of the SCP module, both the easy and hard class detection accuracies improved, particularly when detecting small targets, showing a 3.4% improvement. Introducing multiscale-training data enhanced the ability of the model to detect seeds at different distances and angles and significantly improved its adaptability to variations in seed morphology. These features suggest that our model had immense potential and wide application prospects for use in various seed detection tasks.
4.6. Limitations and Future Work
In this study, targeted optimisation was implemented to address various issues in detecting seed germination completion without significantly increasing the computational burden of the model parameters. The YOLO–SGD model, with 45.22 M parameters, required modest computational resources, enabling the efficient operation of standard hardware. After optimisation, the number of parameters increased marginally by only 7.6 million, yet it achieved mAP and F1 scores of 98.0% and 93.3%, respectively, thereby meeting practical usage standards. The optimised YOLO–SGD model demonstrated robust performance and generalisation capabilities for cabbage seed detection.
Despite these promising results, several limitations of the current study warrant discussion and provide directions for future research. Firstly, the experiments were conducted under highly controlled laboratory conditions (constant temperature, humidity, simulated light). The system’s performance may significantly vary in more complex and uncontrolled real-world agricultural environments, where factors such as fluctuating natural illumination, presence of dust or debris, and diverse background substrates could impact detection accuracy.
Secondly, the current system primarily focuses on detecting germination completion and calculating germination percentage. A more comprehensive assessment of seed vigour, which often includes critical parameters like germination speed, mean germination time, or seedling growth metrics (e.g., shoot/root length), is not yet integrated.
Building upon these insights, future research should be conducted to expand the application of the YOLO–SGD model to encompass seed detection across various seed types. However, the initial data preparation for this process is intricate, potentially requiring significant human resources and affecting the training efficiency. To address this challenge, techniques such as few-shot learning or semi-supervised learning can be introduced for targeted model adjustment and optimisation. Furthermore, building on successful application of the YOLO–SGD model in rapidly detecting germinated seeds, we aim to gradually broaden its scope to include functionalities, such as measuring shoot lengths and predicting seed-growth potentials. The goal of this expansion will be to provide comprehensive and precise data for agricultural research and production.
4.7. Practical Applications for Growers
The high accuracy and efficiency demonstrated by the YOLO-SGD model for germinated seed detection pave the way for its significant real-time applications in modern agricultural practices, particularly for growers and commercial seed producers. This system offers a tangible solution to overcome the limitations of traditional manual germination assessment, enabling data-driven decision-making and optimised resource management. In a typical real-time application scenario, the system can be seamlessly integrated into existing seed quality control pipelines or deployed in specialised germination testing units within nurseries. Growers would place prepared seed Petri dishes onto the system’s designated sample stage. The integrated hardware automatically captures high-resolution images at predefined critical time points. These images are then immediately processed by the pre-trained YOLO-SGD model, which rapidly identifies and counts the germinated seeds, providing precise bounding box detections and classification. The detection results, including the exact number of germinated seeds and the automatically calculated germination percentage, are presented on an intuitive user interface for instant access. This real-time capability confers profound advantages over conventional manual methods; growers receive immediate feedback on the germination status of a seed lot, enabling prompt, data-driven decisions regarding sowing density, whether to re-sow, or adjusting environmental conditions. Such timely interventions are crucial for optimizing seed usage, minimizing economic losses attributable to poor germination, and ensuring an optimal and uniform stand establishment in the field. Furthermore, the objective and consistent nature of automated detection reduces human error and subjectivity, leading to more reliable and repeatable germination assessments.