YOLO-SGD: Precision-Oriented Intelligent Detection of Seed Germination Completion

Yang, Tianyu; Peng, Bo; You, Li; Zhang, Jun; Zhang, Dongfang; Shang, Yulei; Fan, Xiaofei

doi:10.3390/agronomy15092146

Open AccessArticle

YOLO-SGD: Precision-Oriented Intelligent Detection of Seed Germination Completion

by

Tianyu Yang

^1,2,†,

Bo Peng

^1,†,

Li You

³,

Jun Zhang

^1,2,

Dongfang Zhang

^2,4,

Yulei Shang

¹ and

Xiaofei Fan

^1,2,*

¹

College of Mechanical and Electrical Engineering, Hebei Agricultural University, Baoding 071000, China

²

State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071000, China

³

Faculty of Information Science and Engineering, Baoding University of Technology, Baoding 071000, China

⁴

College of Horticulture, Hebei Agricultural University, Baoding 071000, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Agronomy 2025, 15(9), 2146; https://doi.org/10.3390/agronomy15092146

Submission received: 3 July 2025 / Revised: 27 July 2025 / Accepted: 2 September 2025 / Published: 8 September 2025

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

The seed-germination percentage is an important indicator of the seed viability and growth potential and has important implications for plant breeding and agricultural production. Thus, to increase the speed and accuracy in measuring the completion of germination in experimental seed batches for precise germination percentage calculation, we evaluated a You-Only-Look-Once (YOLO)–Seed Germination Detection (SGD) algorithm that integrates deep-learning technology and texture feature-extraction mechanisms specific to germinating seeds. The algorithm was built upon YOLOv7-l, and its applicability was optimised based on the results of our germination experiments. In the backbone network, an internal convolution structure was substituted to enhance the spatial specificity of the initial features. Following the output of the main feature-extraction network, an Explicit Visual Centre (EVC) module was introduced to mitigate the interference caused by intertwined primary roots from germinated seeds, which can affect recognition accuracy. Furthermore, a Spatial Context Pyramid (SCP) module was embedded after enhancing the feature-extraction network to improve the model’s accuracy in identifying seeds of different scales, particularly in recognising small target seeds. Our results with cabbage seeds showed that the YOLO–SGD model, with a model size of 45.22 M, achieved an average detection accuracy of 99.6% for large-scale seeds and 96.4% for small-scale seeds. The model also achieved a mean average precision and F1 score of 98.0% and 93.3%, respectively. Compared with manual germination-rate detection, the model maintained an average absolute error of prediction within 1.0%, demonstrating sufficient precision to replace manual methods in laboratory environments and efficiently detect germinated seeds for precise germination percentage assessment.

Keywords:

germinated seed detection; germination rate; YOLOv7 algorithm; involution; explicit visual centre; spatial context pyramid module; cabbage seed

1. Introduction

According to standard plant physiology definitions, seed germination is the physiological process that commences with the uptake of water (imbibition) by the quiescent dry seed and terminates with the protrusion of the embryonic primary root from its enclosing structures. This event, known as completion of germination, marks the transition to a seedling stage [1].

Before each batch of seeds is sold or used, a standard practice is to conduct germination tests in laboratories to evaluate whether the batch meets sales standards [2]. A high germination percentage is a crucial measure of seed quality, often indicating faster germination progress and stronger environmental adaptability [3]. In contrast, seeds with lower germination percentages often face issues such as delayed germination and weakened growth vigour, making them susceptible to environmental constraints and resulting in reduced agricultural yields [4]. Based on previous findings, when the length of the primary root exceeds 2 mm, the seeds are generally considered to have begun germination [5]. However, this determination has traditionally relied upon experienced experimenters to visually identify whether visible emerging shoot or primary root structures have emerged from the seed, in order to assess germination status [6].

Traditional methods for assessing seed germination status and calculating germination percentages rely heavily upon manual observation and operations. This approach requires substantial manpower and time, and is limited by subjectivity and inefficiency issues [7,8]. Due to human factors, the accuracy and consistency of detection results are often difficult to guarantee, which substantially limits the efficiency and quality of modern agricultural production [9]. Beyond the immediate impact on cultivation, such inaccuracies carry significant economic repercussions throughout the agricultural value chain. For seed companies, the misrepresentation or inaccurate labeling of germination rates in the market can severely damage brand reputation, erode consumer trust, and lead to substantial financial losses. Furthermore, it complicates crucial seed traceability processes, hindering quality assurance and accountability. This broad economic impact underscores the critical need for precise and reliable germination assessment. Therefore, an urgent need exists for a more objective, repeatable, and efficient detection method to replace traditional manual methods [10]. Such a method would not only enhance the accuracy and efficiency of detection but would also reduce the influence of human factors on the results, thereby providing more reliable technical support for agricultural production.

With the rapid advancement of deep-learning and computer-vision technologies, image-based methods for detecting seed germination completion and evaluating germination percentage have attracted growing attention and achieved notable progress [11]. To swiftly and accurately assess the germination percentage and condition of maize seeds, Liu et al. [12] proposed an innovative approach that integrates an improved local linear-embedding algorithm with near-infrared spectroscopy (NIRS). Initially, the seeds were graded based on artificial aging, after which NIRS data were collected for each sample. Subsequently, the germination percentage was tested. Through a meticulous comparison and combination of different models, a germination percentage prediction model based on partial least-squares analysis and a support vector machine was established, achieving an R-squared value of 0.8384, indicating high predictive accuracy.

In contrast, Xiao et al. [13] adopted a different strategy by selecting various kernel functions and combining them with Gaussian process regression (GPR) methods. To enhance the robustness of the model, Monte Carlo cross-validation was employed to eliminate 12 outlier values, and baseline-correction methods and multiplicative scatter correction were applied to optimise the NIRS data. Following a series of optimisations and validations, a GPR model based on the Matern32 kernel function was successfully developed. This model not only enabled non-destructive detection of the germination percentages of maize seeds but also achieved an impressive R-squared value of 0.9599 on the test set, demonstrating outstanding stability and detection performance. However, this model involves limited utilisation of NIRS and insufficient artificial intelligence.

Unlike NIRS-based approaches that infer germination status from spectral properties, image-based deep learning methods, particularly YOLO variants, directly leverage visual features (e.g., primary root protrusion and morphology) for highly accurate and spatially precise detection. This direct visual analysis is often more intuitive and robust for identifying the physical manifestation of germination, especially when dealing with complex morphological changes and potential tangles, which can be challenging for indirect spectral methods.

Yamazaki et al. [14] utilised the YOLOv5 model and transfer-learning techniques to construct a deep-learning model to distinguish germinating from non-germinating pollen and further integrated the results of whole-genome association studies to reveal gene patterns closely related to pollen germination vigour. However, the model did not achieve significant improvements, specifically targeting the distinctive features of germinating pollen during the detection process, which may have affected the recognition accuracy.

Zhao et al. [15] developed the YOLO-r model by integrating image partitioning, a transformer encoder, a small target detection layer, and a complete intersection over union loss function to improve the You-Only-Look-Once (YOLO) algorithm. After extensive testing on a dataset containing 21,429 rice seeds, the YOLO-r model achieved an average detection accuracy of germinated seeds of 0.9539, with a sufficiently fast detection speed suitable for real-time applications. Nevertheless, despite its strong performance in terms of multiple aspects, the relatively low image input resolution of the model limits its detection accuracy, potentially leading to instances of missed detection in specific scenarios.

Yao et al. [16] considered the characteristics of wild rice germination and proposed the SGR–YOLO algorithm within the YOLOv7 model, integrating the ECA attention mechanism, BiFPN, and GIOU loss function. In subsequent experiments, the data were segmented into two different experimental environments—hydroponic boxes and Petri dishes—to comprehensively evaluate the performance of the model. In the hydroponic-box environment, the SGR-YOLO model demonstrated 94% accuracy, whereas it achieved a higher accuracy of 98.2% in the Petri dish environment, demonstrating the efficiency and accuracy of the algorithm. However, despite achieving a higher accuracy in the Petri dish environment, the lower seed-density arrangement observed in laboratory Petri dishes differed significantly from the denser arrangements that seeds might exhibit in actual experimental environments.

In this study, we developed a method for detecting seed germination completion based on deep-learning technology, thereby enabling accurate germination percentage assessment. The method leveraged the advantages of deep neural networks in image recognition and feature extraction, achieving automated identification and evaluation of the seed-germination status.

First, to enhance the accuracy of the model in identifying germinated seeds, we focused on improving the model’s feature-extraction capability to ensure that it could capture the key characteristics of germinated seeds more precisely. To this end, the second layer of the stage (S) 1 block in the backbone feature-extraction network employed internal convolution to extract the initial features, thereby increasing their spatial specificity. Second, as germination testing progressed, the complexity of emerging primary root growth gradually became apparent, specifically manifesting as intertwined and entangled primary roots from multiple germinated seeds/seedlings (Figure 1). This phenomenon markedly affected the accurate identification of individual seeds [17]. Therefore, after the output of the last layer of the backbone feature-extraction network, an Explicit Visual Centre (EVC) module was introduced to achieve feature adjustment within the centralised layer [18], enabling a more comprehensive and accurate representation of the seed characteristics and improving independent seed identification.

Next, we explored the detection accuracy of seeds of different scales, particularly the incomplete recognition of small-sized seeds. Therefore, the Spatial Context Pyramid (SCP) module was introduced into the enhanced extraction network [19]. This module enhanced the detailed features of the small seeds by learning the global spatial context at each level. Finally, the model’s operation should not impose a burden on the equipment used. Based on the above challenges and requirements, we selected cabbage seeds as experimental samples and made targeted improvements to the YOLOv7 model [20], resulting in a YOLO–Seed Germination Detection (SGD) model. This model not only effectively alleviated the issues mentioned above but also achieved fast, intelligent, and accurate detection of germinated seeds even when facing situations where cabbage seeds are small and prone to rolling.

The main contributions of this study can be summarised as follows:

(1) Proposed an innovative YOLO-SGD model: targeting the characteristics of germinated seed detection, we built an efficient and high-precision germinated seed detection algorithm based on YOLOv7-l by introducing an internal convolution structure, an Explicit Visual Centre (EVC) module, and a Spatial Context Pyramid (SCP) module.

(2) Effectively addressed core challenges during germination: significant improvements were achieved, particularly in addressing the difficulty of individual identification caused by root entanglement and the accurate detection of seeds of different scales (especially small targets).

(3) Achieved a balance between high accuracy and low overhead: experiments verified that the YOLO-SGD model achieved excellent detection accuracy while maintaining a small model size (45.22 M), with the average absolute error of germination percentage prediction controlled within 1.0%, which is sufficient to replace manual detection in laboratory environments.

(4) Provided a practical solution for the agricultural field: developed a fast and intelligent detection method for cabbage seed germination assessment, which is expected to be applied in areas such as germplasm resource evaluation, breeding selection, and agricultural production, possessing significant practical application value.

2. Materials and Methods

2.1. Overall Research Framework

This study aims to develop an efficient and accurate automated method for detecting seed germination completion to overcome the limitations of traditional manual detection in terms of efficiency and objectivity and to meet the needs of large-scale germination experiments. To this end, we constructed a complete collaborative system integrating image acquisition, data processing, deep learning model design, and validation, aimed at achieving intelligent and high-precision identification of germinated seeds for accurate germination percentage calculation. The step-by-step operational workflow of this system is comprehensively illustrated in Figure 2, guiding the reader through the entire process from initial image capture to final germination assessment. Its overall architecture and key components are shown in Figure 3, which intuitively presents the interrelationships between modules and data flow.

Figure 3a shows the custom-built hardware platform (developed at the Agricultural Artificial Intelligence Laboratory, Hebei Agricultural University, Baoding, Hebei, China) for seed germination image acquisition in this study. The setup consists of five key components: a 20-megapixel industrial area-scan camera (MV-CS200-10UC, Hangzhou Hikvision Digital Technology Co., Ltd., Hangzhou, China) with an image resolution of 5472 × 3648; an adjustable-height camera stand (10–50 cm); dual side lighting sources for uniform illumination; a sample stage for positioning seed trays; and a bottom light source for enhanced contrast. This configuration ensures stable and repeatable image acquisition, providing high-quality inputs for subsequent analysis.

During the algorithm development phase, Figure 3b depicts the dataset construction and partitioning process. High-quality annotated datasets are key to training an efficient YOLO-SGD model. This process strictly controls image preprocessing, precise annotation of germinated seeds, and scientific partitioning of the dataset to support the model’s effective learning and generalisation capabilities. The core YOLO-SGD detection algorithm model structure is shown in Figure 3c. This model is based on targeted optimisation of YOLOv7-l, introducing an internal convolution structure to enhance the spatial specificity of initial features, while embedding the Explicit Visual Centre (EVC) module to alleviate the impact of root entanglement on individual seed identification, and utilizing the Spatial Context Pyramid (SCP) module to improve the model’s recognition accuracy for seeds of different scales (especially small targets).

Finally, to verify the actual effectiveness of the algorithm and provide an intuitive comparison, Figure 3d illustrates the comparison between the germination percentage predicted by the YOLO-SGD algorithm and the results of manual detection. This comparison aims to directly evaluate the consistency between the model’s detection accuracy and traditional methods and highlight its potential to replace manual detection in a laboratory environment.

Subsequent sections will elaborate on the above aspects in detail, including specific materials, experimental procedures, algorithm improvement details, and performance evaluation methods.

2.2. Cabbage Seed Materials

In this study, we sampled 10,000 cabbage seeds from the same batch to ensure the accuracy and objectivity of the experimental results. The cabbage seeds used typically exhibit an ellipsoidal shape, with an average length of 2.5 ± 0.2 mm, a width of 1.8 ± 0.15 mm, and a thousand-grain weight of approximately 3.5 g. Their colour is predominantly dark brown. Representative images of these seeds throughout the data processing flow are presented in Figure 3b. A double-blind method was used to divide the seeds into two groups for independent experiments, thereby minimising potential errors. During the preparation phase, we strictly adhered to the experimental procedures described below [21]. First, the seeds were randomly selected using the quartering method to ensure that each 9 × 9 cm culture dish contained 50 evenly distributed seeds for a total of 100 culture dishes [22]. To ensure the purity and sterility of the seeds, they were soaked in 70% ethanol for 10 min. The culture dishes, filter paper (paper beds), and germination boxes were thoroughly exposed to ultraviolet light and wiped with disinfectants to eliminate fungi, bacteria, and other microorganisms that could impede seed germination [23].

During cultivation, we maintained a constant temperature (24 °C) and humidity (65%) and provided simulated natural light to mimic the natural growing environment of cabbage seeds. Continuous observations of seed germination and growth were made, and the germination status was recorded for 3 consecutive days starting from the second day of germination.

2.3. Collecting and Preprocessing of Cabbage Seed Germination Data

2.3.1. Collecting Germination Data

During the image-acquisition phase, we utilised the hardware platform shown in Figure 3a. This platform integrated a stable light source system, a 20-megapixel industrial area scan camera MV-CS200-10UC with a resolution of 5472 × 3648, and a sample stage for fixing the germination dishes. Its operation interface included a touchscreen display, external shortcut keys, and a control area. It was also equipped with a light intensity display to ensure that clear and uniformly illuminated images of germinating seeds were acquired under standard, repeatable conditions, providing high-quality raw data for subsequent accurate analysis. During image-data collection, the camera height was fixed, and two specific lens-to-sample stage distances were set at 37 cm and 15 cm, with corresponding lens focal lengths of 4.5 cm and 2.1 cm, respectively, in order to obtain image data for germinated seeds at two different size ratios. This setup ensured the diversity and richness of the image data. Image acquisition began based on the environment of the equipment. Images were captured every 24 h to ensure precise identification of the phenotypic characteristics of the germinated cabbage seeds at various stages for robust single-pass detection. To ensure the timeliness and accuracy of the data, the image-acquisition process commenced when approximately one-third of the seeds in each culture dish began to germinate (usually after 24 h).

2.3.2. Dataset Construction

Following the strict image-acquisition strategy described in Section 2.3.1, 600 images reflecting the various germination stages of the cabbage seeds were obtained using an image-acquisition device. Then, 300 images were captured at close range (Figure 4a) and 300 images over a long range (Figure 4b). To enhance the adaptability of the model when dealing with targets of different scales, we categorised the images based on their capture distances. Specifically, close-range images were defined as the “easy” category, whereas long-range images were categorised as the “hard” category, reflecting the increased difficulty in detection of seed germination.

After obtaining the data regarding the cabbage seed-germination states, each cabbage seed in every image was annotated using LabelImg (1.8.6) software. During the annotation process, each target seed was accurately delineated based on its largest bounding box (Figure 3c), resulting in the complete annotation of a close-range image. Following the completion of annotation, we partitioned the 600 original images into training, testing, and validation sets at a 4:1:1 ratio. Specifically, the training set comprised 400 images used for model training and optimisation. The testing and validation sets each contained 100 images employed to evaluate the performance and generalisability of the model, respectively. Importantly, to ensure objectivity and fairness with the evaluation results, none of the images in the validation and testing sets overlapped with those in the training set.

2.3.3. Data Augmentation

Data augmentation techniques were applied to the training set to further enhance the training effectiveness of the model. Specifically, the methods used included randomly rotating the images by ±15 degrees, adjusting the image saturation by ±35%, adjusting the brightness by ±25%, and adding random noise ranging from 0 to 3.5 pixels. Through data augmentation, the number of images in the training set was increased from 400 to 1200. These measures not only increased the diversity of the training samples, enabling the model to learn features under various conditions, but also effectively mitigated overfitting issues.

2.4. YOLO–SGD Cabbage Seed-Germination Detection Algorithm

Figure 5a illustrates the overall network architecture of the YOLO–SGD algorithm. This algorithm leverages YOLOv7-l as the baseline model and introduces targeted improvements tailored to seed-germination detection. The enhanced algorithm consists of three main parts: a backbone feature-extraction network, an enhanced feature-extraction network, and a detection head. Initially, the seed-germination images, enhanced through data augmentation, entered the backbone feature-extraction network to generate rich initial seed features after undergoing internal convolution [24] during the first channel-compression stage; its detailed structure is shown in Figure 5b. Subsequently, three effective feature layers of different sizes were obtained using multiple feature-sampling processes. The fifth effective feature layer passed through the EVC module, thereby enhancing the encoding capability as well as the feature sensitivity to large receptive fields within the feature stream using a lightweight MLP, serving as a new effective feature layer for subsequent feature-enhancement tasks; the detailed structure of this module is shown in Figure 5c. Subsequently, with the feature-enhancement network, PaNet [25], we subjected two layers of features extracted from the backbone and one layer of features processed by the EVC module to multiple rounds of feature fusion to obtain three layers of processed-feature information. These three layers of information then were processed with the SCP module to extract further useful values from each effective pixel and focus algorithmically on the salience of the seed-target characteristics of different scales; its specific structure is shown in Figure 5d. Finally, in the detection head section, the multiple predicted boxes generated in the previous step are matched with the ground truth boxes to obtain the regression and classification of seed targets.

2.4.1. Internal Convolution Structure

In this study, the backbone feature-extraction network of the algorithm was divided into five stages (S1–S5), as shown structurally in Figure 5b. The S1 layer consisted of three feature-sampling modules. The first and third modules comprised combinations of the convolution, BatchNorm2d, and SiLU activation functions (CBS blocks) [26], whereas the second module used an internal convolution instead of the original standard convolution. After feature-map sampling in S1, the initial feature extraction and size adjustment of the input data were completed, providing a detailed foundation for extracting features from additional layers.

2.4.2. EVC Module

The EVC module was embedded in the backbone network structure, thereby altering the algorithmic structure of the backbone network. The EVC module reconstructed the feature information of the last effective feature layer by aggregating the global and local features of the image. Finally, the enhanced feature information was concatenated along the channel dimensions and passed to downstream recognition algorithms as the EVC output, thereby enhancing the effective information intensity of the seed targets. This process is represented by Equation (1):

X = c a t (M L P (X_{i n}); L V C (X_{i n}))

(1)

where X is the output of the EVC, cat(·) denotes the concatenation of feature maps along the channel dimension; MLP(X_in) and LVC(X_in) represent the output features of the lightweight MLP and the learnable visual centre mechanism, respectively; and X_in is the output of the last effective feature layer in the backbone feature-extraction network.

Figure 5c illustrates the structure of the EVC module, which comprised two parallel blocks: a lightweight MLP that captures global feature dependencies and a learnable visual centre (LVC) mechanism that aggregated local features. The EVC was composed of a DConv [27] and channel-based MLP with residual connections employed in the structure to effectively prevent uncontrolled gradient changes. Both modules executed channel scaling and dropPath [28] operations to enhance generalisation and robustness. This process can be expressed as Equation (2):

X_{i n}^{*} = D C o n v (G N (X_{i n})) + X_{i n}

(2)

X_{i n}^{*}

represents the output based on the depthwise convolution module, GN(·) denotes group normalisation, and DConv(·) represents depthwise separable convolution with a kernel size of 1 × 1.

With the EVC module, the LVC module served as an encoder with an inherent dictionary consisting of two components: (1) the inherent codebook B = {b₁, b₂,…,b_K}, where N = H × W and is the total spatial quantity of the input features, and H and W represent the height and width of the feature-map spatial dimensions, respectively. (2) A set of scaling factors S = {s₁, s₂, …, s_K}, which are LVCs. Specifically, the input feature X_in was first encoded through a convolutional combination and then processed by the CBR block. The encoded feature X_in was inputted into the codebook, where each xi value was sequentially mapped to a b_k value using a set of scaling factors, s, resulting in centralised encoded information features. The entire image relative to the k-encoded information was calculated using Equation (3):

e_{k} = \sum_{i = 1}^{N} \frac{e^{- s_{k} {‖x_{i}^{*} - b_{k}‖}^{2}}}{\sum_{j = 1}^{K} e^{- s_{k} {‖x_{i}^{*} - b_{k}‖}^{2}}} (x_{i}^{*} - b_{k})

(3)

where

x_{i}^{*}

is the ith pixel point, b_k is the kth learnable visual codeword, s_k is the kth scaling factor,

x_{i}^{*} - b_{k}

represents the information of each pixel relative to the codeword position, and K is the total number of visual centres.

2.4.3. SCP Module

In this study, we employed the SCP module within an enhancement-extraction network to learn global spatial contexts and strengthen the fine-grained features of the module. The SCP module comprised layers composed of context aggregation blocks (CABlocks) and residual connections, as shown in the left side of Figure 5d. The module implemented a weighted strategy to fuse global features and reduce information ambiguity. The right side of Figure 5d illustrates the internal structure of a CABlock, which included three 1 × 1 convolutions for feature-mapping adjustments. The latter two convolutions undergo matrix multiplication and sigmoid weighting before being multiplied by the former, culminating in a combined residual-structure output. The pixel-level, spatial-context calculation within each CABlock is represented by Equation (4):

Q_{i}^{j} = P_{i}^{j} \sum_{j = 1}^{N_{i}} [\frac{e x p (w_{k} P_{i}^{j})}{\sum_{m = 1}^{N_{i}} e x p (w_{k} P_{i}^{m})} w_{v} P_{i}^{j}]

(4)

where P_i and Q_i represent the input and output feature maps, respectively, of the ith layer in the feature pyramid, and each feature map contains N_i pixels. j, m∈{1,N_i} denotes the indices of each pixel, and w_k and w_v are the linear-projection matrices used to project the feature maps. Equation (4) simplifies the widely used self-attention mechanism [29] by replacing the matrix multiplication between the query and key with linear projections, thereby reducing the parameters and computational costs while maintaining attentional weighting. The embedding of the SCP modules maximised the informational value of every critical pixel in an image, thereby ensuring strong correlations between the pixels. The effective performance of the module positively affected the acquisition of small-scale seed features.

2.5. Experimental Setup

We conducted our experiments with an Ubuntu 20.04.3 LTS operating system utilising an AMD EPYC 7551P 8-core 2.3 GHz processor (Advanced Micro Devices (AMD) Inc., Santa Clara, CA, USA), 64 GB of random-access memory (RAM), and an NVIDIA A5000 graphics-processing unit (GPU) with 24 GB of video RAM (NVIDIA Corporation, Santa Clara, CA, USA). CUDA 11.6 and cuDNN 8.5.0 software were used for GPU acceleration, and the experiments were implemented and deployed using the PyTorch 2.0.0 and torchvision 0.15.1 deep-learning frameworks.

2.6. Training Strategy

The YOLO–SGD model was deployed and trained based on the environment specified in the previous section. The training process comprised 200 epochs. To accelerate the algorithm training, we adopted a backbone-freezing strategy: the first 80 epochs were designated as the freezing phase with a batch size of 48, followed by a non-freezing phase for the next 80 epochs with a batch size of 8. The initial maximum learning rate was set to 1 × 10⁻³, and the minimum learning rate was set to 1 × 10⁻⁵ Additionally, based on empirical training experience, the learning rate was reduced by a factor of 10 every 60 iterations to accelerate convergence. SGD was used as the optimiser to enhance model convergence, with a momentum parameter of 0.9 employed to control inertia during updates, accelerate convergence, and reduce oscillations. Figure 6 illustrates the variations in loss and mean average precision (mAP) observed over 200 training iterations under the aforementioned settings.

2.7. Evaluation Metrics

To validate the performance of the YOLO–SGD algorithm and demonstrate its effectiveness more intuitively, we employed the following evaluation metrics as standards for assessing the algorithm’s quality. Specifically, the commonly used metrics in the field of object detection were the average precision (AP), mAP, and F1 score [30,31]. Before this, it was necessary to confirm the precision and recall. Precision measures the percentage of accurate predictions, whereas recall measures the percentage of positives identified during the prediction process.

A P = 1 n A P = \frac{1}{n} \sum_{r \in (r e c a l l s)} {m a x}_{r^{*} \geq r} P (r^{*})

(5)

Equation (5) describes the AP calculation, where recalls are the recall values calculated at different confidence thresholds,

P (r^{*})

represents the maximum precision value at recall

r^{*}

, and n is the count. Thus, the AP was determined as the area enclosed by the precision–recall curve and the axes.

m A P = \frac{1}{m} A P

(6)

Equation (6) represents the method used to calculate the mAP, where m represents the number of classes. Specifically, the mAP was computed for the AP of each class and represented an average of the AP values across all classes. When calculating the AP, the model evaluated its performance based on the degree of matching between the predicted results and actual annotations.

F 1 = 2 \times \frac{p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(7)

Equation (7) describes the calculation of the F1 score. F1 is a statistical metric used to measure the accuracy of binary-classification (or multitask binary-classification) models. It combines the precision and recall of the classification model. The F1 score is the weighted average of the precision and recall of the model, with a maximum value of 1 and minimum value of 0. A higher F1 score indicates a better predictive capability of the model. Therefore, F1 was a suitable evaluation metric for detecting germinated seeds in both the easy and hard categories.

Different parameters could be used to measure the size and complexity of the algorithms. In general, a larger parameter value indicates a more complex algorithm, which may increase the risk of overfitting. To demonstrate that the algorithm proposed in this paper imposed a low burden on devices, the parameters were considered as one of the criteria for the evaluation.

3. Results

3.1. Comparative Experiments

3.1.1. Comparison of Different Object Detection Algorithms

To objectively evaluate the performance of the YOLO–SGD detection algorithm, we conducted comparative experiments with mainstream object detection algorithms with the same hardware setup, using the same dataset and parameter configurations. Based on the results, we selected the YOLOv7-l (large) model (underlined) as the baseline model for this study (Table 1). The letters shown in parentheses denote the model versions or sizes. For example, DC5 and 4scale denote the versions used, ‘s’ denotes a small model size, ‘m’ denotes a medium model size, ‘l’ denotes large model size, and ‘x’ denotes extra-large model size.

Table 1 shows a performance comparison of various object detection models across different difficulty levels, including key metrics such as the easy AP, hard AP, mAP, F1 score, and parameter count. The proposed YOLO–SGD model demonstrated outstanding performance across all metrics, achieving an easy AP of 99.6%, a hard AP of 96.4%, an mAP of 98.0%, and a high F1 score of 93.3%. These results indicate that the YOLO–SGD model achieved high detection accuracy and stable performance under various conditions.

Among the other models, YOLOv8 (l) achieved an easy AP of 93.2%, a hard AP of 91.8%, an mAP of 92.8%, and an F1 score of 88.5%. YOLOv8 (x) exhibited similar performance with an easy AP 93.8%, a hard AP of 91.6%, an mAP of 92.7%, and an F1 score of 88.4%. YOLOv7 (l) achieved an easy AP of 95.4%, a hard AP of 90.2%, an mAP of 92.8%, and a F1 score of 88.9%. The YOLOX series maintained good performance with relatively lower parameter counts. YOLOX (s) had a parameter count of only 8.97 M, achieving an mAP of 89.4% and an F1 score of 80.6%. YOLOX (m) and YOLOX (l) had parameter counts of 25.32 million (M) and 54.29 M, respectively, with mAPs of 90.7% and 90.4%, and F1 scores of 82.4% and 83.0%. The Rt-detr (l) and Rt-detr (x) models had parameter counts of 52.70 M and 87.20 M, achieving mAPs of 88.7% and 87.6% and F1 scores of 80.3% and 85.1%, respectively.

Figure 7 visually delineates the comparative performance of each model across five dimensions: easy AP(%), hard AP(%), mAP(%), F1(%), and parameter size (Params(M)). Based on the data presented in Table 1, YOLOv7 (l) exhibited superior performance across the easy AP, mAP, and F1 metrics, achieving detection performance (e.g., easy AP, mAP, and F1 score) for germinated seeds of 95.4%, 92.8%, and 88.9%, respectively. With hard class detection, the model achieved the highest value by only 1.6%. Furthermore, with a model parameter size of 37.62 M, the computational strain of the YOLOv7 (l) model was relatively modest compared to most other models. In conclusion, the YOLOv7 (l) model was the most suitable baseline model for detecting seed germination. Building on this foundation, we applied targeted refinements, leveraging the phenotypic features of the germinated seeds and the camera-imaging characteristics to enhance the model’s alignment with seed-germination detection tasks. The refined model demonstrated optimal performance across all evaluated metrics: with a model parameter size of 45.22 M, it achieved respective improvements of 4.2% and 6.2% in terms of easy and hard class detections, along with a 5.2% enhancement in the mAP and a 4.4% increase in the F1 score. Of particular note, the model showed advanced detection of small germinating seeds under challenging conditions; thus, it showed the highest overall performance in comparative experiments.

3.1.2. Comparative Experiments of Involution (INV)

During model improvement, we replaced convolution with INV to accomplish preliminary feature extraction, thereby enhancing the specificity of the feature space. In this section, we discuss the impact of embedding an INV at different positions on the model’s prediction performance. As shown on the right side of Figure 5b, S1 of the backbone network contained three feature-extraction layers, named S1-1, S1-2, and S1-3 (from bottom to top), representing scenarios in which INV replaced a specific regular convolution layer.

Figure 8 shows the model’s attention heatmaps for hard class and easy class detection, respectively, after replacing convolution at different positions with the INV module. It should be noted that these comparative experiments only varied the embedding position of INV based on the baseline, without considering other influencing factors.

With increases in the hierarchy, we observed corresponding increases in the number of parameters and computations; however, this trend did not directly correlate with the detection performance (Table 2). The addition of convolution (INV) in S1-1 did not lead to a performance improvement; instead, all metrics declined. This observation may reflect the fact that enhancing the spatial specificity of the features in fewer feature layers inhibited the capture of detailed seed features. Introducing INV in S1-2 improved the model performance across all evaluation metrics, achieving optimal results among the three positions, with respective increases of 0.9%, 1.9%, 1.4%, and 0.8% compared with the baseline. Using INV to replace regular convolution in S1-3 resulted in a 1.0% and 0.2% increase in the hard detection and mAP metrics, respectively, but slightly decreased the metrics for easy class detection and the F1 score. In summary, implementing INV in S1-2 to replace regular convolution enhanced the model performance, particularly in terms of detecting small seeds.

3.2. Ablation Experiments

Building on the comparative analysis presented in the previous section, we adopted YOLOv7-l as the baseline algorithm. To validate the effectiveness of the improvements in seed germination detection tasks, we conducted further ablation experiments for each module and their combinations, including the INV, EVC, and SCP modules. The results demonstrated varying degrees of impact on the detection metrics for germinated seeds when different enhancement modules were adopted (Table 3).

The baseline model achieved an easy AP of 95.4%, a hard AP of 90.2%, an mAP of 92.8%, and an F1 score of 88.9%, without any components. Significant performance improvements were observed with the addition of each module. For example, adding INV alone increased the mAP to 94.2%, adding EVC increased the mAP to 94.6%, and adding SCP increased the mAP to 95.2%. When INV and SCP were added simultaneously, the mAP significantly increased to 97.1%, with F1 reaching 92.2%. Ultimately, with all the components added, the model achieved peak performance: easy AP, 99.6%; hard AP, 96.4%; mAP, 98.0%; and F1 score, 93.3%, demonstrating the critical role of each optimisation method in enhancing the model performance.

Figure 9 and Figure 10 intuitively illustrate the impact of different module combinations on the model’s detection capability. Figure 9 presents the heatmaps of the hard class under various configurations. Figure 9a–g show the attention regions generated by the model when integrating individual modules (INV, EVC, SCP) and their combinations (INV + EVC, INV + SCP, INV + EVC + SCP). As more modules are integrated, the activation regions become increasingly focused and accurate, especially in Figure 9g, where the full combination (INV + EVC + SCP) yields the most precise and concentrated attention on the hard-class seeds, which are typically more difficult to detect.

Figure 10 shows the corresponding heatmaps for the easy class. Similar to the hard-class results, Figure 10a–g illustrate how the attention distribution evolves with different module combinations. For these more easily detectable, germinated seeds, the model exhibits improved focus and spatial coverage with each added module. In particular, the full integration shown in Figure 10g leads to the most comprehensive and accurate localisation. These visual results are consistent with the quantitative outcomes in Table 3, further validating the effectiveness of the proposed INV, EVC, and SCP modules and their synergistic combinations in enhancing seed germination detection performance.

3.3. Experimental Results

Figure 11 illustrates the performance differences between the model developed in this study and the baseline model for detecting germinated cabbage seeds. The first column displays the original images, capturing both distant and close-up perspectives to depict the data for the target seeds at different scales. The second column shows the predictions of the original YOLOv7-l algorithm, and the third column displays the predictions of the YOLO-SGD model developed in this study.

Our comparison provided evidence that in both the distant and close-up detection scenarios, the YOLO–SGD model successfully detected all germinated seeds, whereas the original model exhibited instances of false positives and misses, which were particularly notable with distant detection. Furthermore, we found differences in the confidence levels for seed detection at the same positions in the same image between both models, with the YOLO–SGD model generally exhibiting higher confidence scores, further highlighting its superior detection performance.

3.4. Detection of the Cabbage Seed-Germination Rates

To comprehensively evaluate the effectiveness and practicality of the proposed YOLO–SGD algorithm, we employed a four-way, random-sampling method to detect the germination percentages of 500 cabbage seeds. The experimental process was as follows: 500 cabbage seeds were evenly divided into 10 groups, with each group containing 50 seeds, and placed in 10 Petri dishes for germination. Subsequently, images were captured at two critical time points, 48 h and 72 h after germination, at both distant and close-up distances for each Petri dish, which served as the dataset for model detection. Subsequently, the germination percentages were manually calculated after each image was captured, and the time required for each assessment was recorded.

The germination assessment data, derived from the counts of germinated seeds, reflected the overall germination status of 500 seeds, and the time shown represented the average time taken for the germination percentage calculations per Petri dish (Table 4). After 48 h of germination, the detection results of the YOLO–SGD model for close-up images were consistent with those of manual inspection. However, for distant images, the model’s detection results were slightly lower (0.6%) than those of manual inspection, and three germinated seeds were not detected with the model. After 72 h of germination, the model missed only one germinated seed (when compared with manual inspection) for close-up detection, resulting in a discrepancy of 0.2% in the germination percentage calculation. With distant detection, the YOLO–SGD model missed five seeds (when compared with manual inspection), resulting in a discrepancy of 1.0% in the germination percentage calculation.

In addition to the quantitative analysis of germination percentages at 48 and 72 h based on Table 4, this study also conducted visual and trend-based evaluations of the YOLO-SGD model’s performance during the initial germination stage (24 h) and throughout the overall dynamic germination process, to more comprehensively examine its accuracy and stability at different time points. As shown in Figure 12, this figure intuitively presents the comparison between the YOLO-SGD algorithm and manual measurement results at three key time points: 24 h, 48 h, and 72 h, and clearly shows the distribution characteristics of the germination percentage at each time point through box plots. Specifically, Figure 12a–c are comparison charts between the YOLO-SGD algorithm and manual measurements at 24 h, 48 h, and 72 h, respectively; Figure 12d is the box plot of the germination percentage distribution at 24 h, 48 h, and 72 h. This multi-dimensional data presentation method helps to gain a deeper understanding of the model’s performance and robustness during the dynamic process of seed germination.

3.5. Generalisation Performance on Diverse Seed Types

To rigorously evaluate the generalisation capability of the proposed YOLO-SGD model beyond cabbage seeds, we conducted additional germination detection experiments on three other distinct seed types: pepper, tomato, and eggplant. These seeds were chosen to represent variations in size, shape, and germination morphology, providing a comprehensive assessment of the model’s adaptability. Crucially, during laboratory germination experiments, these vegetable seeds, like cabbage seeds, present similar challenges such as small size, proneness to rolling, or primary root entanglement. The improved YOLO-SGD model is specifically designed to address these common difficulties across diverse seed types. To ensure the fairness and comparability of these tests, the entire experimental setup—including data collection, data processing, dataset partitioning, and the specific hyperparameters for model training and prediction—was maintained strictly identical to that used for the cabbage seed experiments.

Table 5 presents the detailed performance metrics of the YOLO-SGD model across all tested seed types. As evidenced by the results, the proposed model not only maintained excellent performance on cabbage seeds (mAP of 98.0%, F1 score of 93.3%) but also demonstrated robust and high-precision detection capabilities for pepper (mAP of 95.2%, F1 score of 90.1%), tomato (mAP of 97.8%, F1 score of 91.5%), and eggplant seeds (mAP of 95.4%, F1 score of 90.1%). These consistent high scores across diverse seed species, particularly in easy AP and hard AP, unequivocally confirm the strong generalisation ability of the YOLO-SGD model.

Further qualitative validation of the model’s generalisation is illustrated in Figure 13, which showcases representative detection results on various seed types. The clear and accurate bounding box detections on different seed morphologies demonstrate the model’s adaptability in visually identifying germinated instances regardless of the specific seed characteristics. This visual evidence, combined with the quantitative metrics, underscores the potential for the YOLO-SGD model to be broadly applied for intelligent seed germination assessment across a wide range of agricultural crops.

4. Discussion

4.1. Performance of Each Model in Comparative Experiments for Germinated Seed Detection Using the Same Dataset

The models used in the comparative experiments were divided into two main categories: those based on the transformer framework or the convolution framework. Next, a detailed comparison and analysis was conducted on both types of models. With the experimental models based on the transformer framework (including the DETR, Rt-DETR, DINO, and ViDT-swin models), starting from the data distribution, these models generally exhibited stronger performance at easy class AP levels than the convolution framework models did. The highest performance observed was 94.9% with Rt-DETR (l), which demonstrated excellent performance in detecting large sprouting seeds, possibly due to the transformer’s powerful capability of capturing long-distance semantics. However, this advantage somewhat limited the models’ ability to localise and recognise small objects, as they generally performed weaker at hard class detection levels than with convolution framework models, with the ViDT-swin showing the highest performance of 83.2%. Additionally, in terms of the F1 score, these models showed moderate performance, with the highest score being 85.1% for Rt-DETR (x), indicating that for datasets with inconsistent object sizes, the models’ performance in identifying whether a seed has completed germination was not high. Furthermore, in terms of model parameters, these models tended to have large scales, with the smallest parameter value for Rt-DETR (l) at 52.7 M. Owing to the characteristics of the transformer framework, the models involved a substantial number of matrix operations, leading to slow training processes that often required lengthy training times to gradually converge.

Excluding the aforementioned transformer-based models, the remaining models belonged to the convolutional framework. Among these models, excluding the YOLO-SGD model, YOLOv7 (l) achieved the highest AP of 95.4% in terms of easy class germinated seed detection, whereas YOLOv8 (x) reached 91.8% during hard class germinated seed detection, which was the highest in its category. In terms of mAP and F1 scores, the YOLOv7 (l) model achieved the highest values of 82.8% and 88.9%, respectively. The Faster R-CNN model, a classic two-stage detection model, showed moderate performance with the dataset used in this study. It showed longer training times than other convolution framework models and did not stand out significantly in terms of easy and hard class detection, with values 4.2% and 18.8% lower than the highest observed values, respectively. In terms of model parameters, the Faster R-CNN model also fell into the category of a large model. YOLOX, which was introduced 2 years ago, demonstrated moderate detection capabilities. Its advantage lies in its diverse model versions, with YOLOX (s) having the smallest full-model parameters at only 8.97 M. However, YOLOX showed some deficiency in hard class detection, with a 4.2% gap compared to the YOLOv8 (l) value of 91.8%. YOLOv7 and YOLOv8 were the latest additions to the YOLO series, each with its own strengths. YOLOv7 utilises numerous residual structures and multiscale information, demonstrating excellent performance on targets of different sizes. YOLOv7 (l) and YOLOv7 (x) showed a small difference of only 0.7% in the mAP but a larger difference of 4.0% in the F1 score, with the model parameters being half the size, indicating better overall performance for YOLOv7 (l). YOLOv8 combines various advantages of the YOLO series models, including improvements to the P5 and P6 feature layers, which enhance the ability of the model to handle high-resolution images effectively. These features resulted in better performance in terms of hard class detection, surpassing YOLOv7 (l) by 1.6% at its peak. The mAP differences among the models of sizes s, m, l, and x were relatively small, with a maximum difference of 2.4% between the highest and lowest values. These results suggest that YOLOv8 offers limited additional utilisation of the current dataset and that changes in the model scale did not significantly enhance the feature-extraction efficiency, which may have impacted subsequent optimisations. Although the YOLOv8 (l) model achieved the highest F1 score among the YOLOv8 models, it fell short of the YOLOv7 (l) model by 0.4%.

4.2. Discussion of the Effects of Various Optimisation Approaches on the Model Performance in Ablation Experiments

To gain deeper insights into the specific contributions of each improved module to model performance, this study designed and conducted ablation experiments to systematically evaluate the effects of INV, EVC, and SCP modules individually and in combination. This section will analyze the mechanism and extent of optimisation strategies in enhancing the detection performance of germinating seeds by discussing the ablation experiment results presented in Table 3, the heatmaps of detection results for hard categories generated by different module combinations in Figure 9, and the heatmaps of detection results for easy categories generated by different module combinations in Figure 10.

INV module: Incorporating convolution contributes to an overall enhancement in model performance (Table 3). With an increase in the parameters of only 0.79 M, the mAP and F1 scores improved by 1.4% and 0.8%, respectively. These findings demonstrate that convolution enhanced the spatial specificity of the features early in the feature-extraction process, endowing the learned seed characteristics with stronger semantic integrity. This enhancement provided a more detailed feature foundation for subsequent extraction tasks.

EVC module: The EVC module was introduced into the main backbone, where it performed a lightweight approximation of the weight redistribution on the output of the last effective feature layer. This operation deepened the focus on targets with fewer features, such as side faces, and enhanced the sensitivity to low-quality images, mitigating interference from low-quality images to some extent. Compared with the baseline model, introducing the EVC module resulted in improvements across all detection metrics, with increases of 1.0%, 2.6%, 1.8%, and 0.9%. In contrast to the INV module, a notable improvement in the mAP metric was observed with the EVC module. This improvement may indicate that introducing the EVC module effectively alleviated issues wherein germinated seeds (which should be easily detectable) were lost due to entangled roots, possibly improving the detection accuracy. Regarding the model parameters, the improvement resulted in an increase of 6.1 M, likely owing to computations from the MLP and the dual-stream structure formed by the encoded calculations. However, given the enhanced recognition accuracy for targets such as the edges of germinated seeds, this increase in computational load was acceptable.

SCP module: The SCP module was positioned after the feature-enhancement network, and detailed features were enhanced by learning global spatial contexts at each level, effectively exploiting residual valid information. Introducing the SCP module improved both easy and hard class detection, with a 3.4% improvement in hard class detection. These results indicate that enhancing the multiscale information was beneficial for seed targets with different latencies and proportions, especially for small-sized seed targets that are difficult to detect. Additionally, the model parameters increased by only 0.69 M, which did not introduce an excessive computational burden.

INV and EVC modules: When used together, these two modules contributed to a 2.6% improvement in the mAP and a 1.4% improvement in the F1 score. However, compared to using only the SCP module, only a slight decrease of 0.3% was found in terms of hard class detection. The data suggest that relying solely on the INV and EVC modules is insufficient because they cannot effectively learn and localise small seed targets.

INV and SCP modules: The INV module enhanced the spatial specificity of the features, whereas the SCP module addressed the localisation capabilities of germinated seeds at different scales. Combining both modules significantly improved the overall performance of the model. Compared with the baseline model, the model parameters increased by only 1.5 M, yet achieved an mAP of 97.1% and an F1 score of 92.2%. This improvement indicates that both the prediction and classification capabilities were substantially enhanced.

SCP and EVC modules: When these two improvements were implemented simultaneously, their impact on model performance was evident. The results showed increases of 3.5% and 4.7% in terms of easy and hard class detections, respectively. These enhancements suggest that the detection of image edges and small seed targets was substantially improved. Despite an increase of 6.8 M in the model parameters, the size of the improved model remained relatively small.

INV, EVC, and SCP modules: The combined use of these three modules significantly enhanced the model performance. Compared with the baseline model, we observed improvements of 4.2% and 6.2% in terms of easy and hard class detection, respectively, achieving an mAP of 98.0% and an F1 score of 93.3%. At this stage, the model demonstrated strong generalisation capabilities with different sizes of germinated seeds and distorted images, owing to the imaging conditions. The model’s parameter size (45.22 M) also ensured that it did not impose a significant burden on the experimental equipment.

4.3. Discussion of Manual Calculations and the YOLO–SGD Methods in Terms of Detecting Seed-Germination Rates

In both close- and long-range detection scenarios, the YOLO–SGD model exhibited significant advantages in terms of the average computation time, being over 31 times faster than manual detection. This advantage became more pronounced as the number of germinated seeds increased, greatly enhancing the efficiency of calculating seed germination percentages during experiments and effectively saving labour. Furthermore, for seed targets representing different scales in close- and long-range settings, the model maintained an error rate of within 1.0% compared with that of manual detection, demonstrating that its high-precision detection capability is sufficient to replace manual methods.

Figure 12 further details the prediction performance of the YOLO-SGD algorithm at different germination time points. As can be seen from the comparison curves shown in Figure 12a–c, the germination percentage curves predicted by the YOLO-SGD algorithm at the three key time points of 24 h, 48 h, and 72 h show a high degree of consistency with the manual measurement curves. This indicates that throughout the dynamic process of seed germination, the YOLO-SGD algorithm can continuously and stably capture the germination status and provide results extremely close to manual judgment. This high consistency aligns with the average absolute error mentioned in our abstract, which is primarily controlled within 1.0%, strongly confirming that the model has the accuracy to replace manual detection. It is particularly noteworthy that even as germination progresses and complex situations such as increased seed quantity and intertwining primary roots from multiple germinated seeds/seedlings occur, the model can still maintain good recognition accuracy, further verifying its effectiveness in addressing the aforementioned challenges.

Figure 12d presents the distribution characteristics of the germination percentage predicted by the YOLO-SGD algorithm at different time points in the form of box plots, offering a more intuitive view. As germination time advances, the median of the germination percentage (represented by the horizontal line within the box) shows a clear upward trend, which is consistent with the biological process of natural seed germination. At the same time, the length of the box (representing the interquartile range of the data) and the range of the whiskers are generally narrow, with few outliers. This indicates that the prediction results of the YOLO-SGD algorithm at each time point have a high degree of concentration and stability, with a small range of data fluctuation, verifying the robustness of the model’s predictions. This stable distribution characteristic is an important manifestation of the reliability of the automated detection system, meaning that the system can provide consistent and trustworthy germination percentage data in practical applications, laying a solid foundation for large-scale, high-throughput seed detection.

4.4. Discussion of Detecting High-Density Seed Germination

In this study, cabbage seeds were used as the experimental samples. Their small physical characteristics and considerable primary root entanglement issues arising from a high seeding density among germinated seeds/seedlings in cultivation dishes during germination experiments pose technical challenges for detection tasks. Wang and Song [39] combined hyperspectral-imaging technology with optimised deep-learning algorithms to evaluate sweet corn seeds. Their optimal model achieved 97.23% seed detection accuracy, with each image containing up to nine corn seeds. Jin et al. [40] used PCA technology to select specific spectral data and combined deep-learning networks and traditional machine-learning methods to predict the germination vigour of rice seeds under natural aging conditions. With images containing eight rice seeds, most models achieved accuracies greater than 85.0%. Jiang et al. [41] used the YOLOv8 framework to detect pea seed-germination percentages using training data containing 36 pea seeds per image and achieved a detection accuracy of 98.7%. However, these studies were conducted using low-density seed arrangements for training and testing, which may not fully meet the requirements of practical germination experiments. However, in this study, we used high-density-arranged cabbage seeds for the experimental data (50 seeds/culture dish), which is closer to real production environments. As revealed by the heatmap in the middle section of Figure 5c, including the EVC module significantly enhanced the attention to different seed-detail targets, resulting in a 1.8% increase in the mAP. Even with partial seed–root entanglement, the system could accurately identify each seed, demonstrating the robustness and effectiveness of the model in handling high-density seed arrangements.

4.5. Impact of Multi-Scale Information Processing on the Performance of Detecting Seed Germination

Enhancing our model’s attention to multiscale information is beneficial both in terms of accuracy and transferability to detecting other seed types. Fu et al. [42] utilised YOLOv4 for the automated detection of wheat seed-germination vigour by incorporating an FPN structure for multiscale information processing and achieved an average accuracy of 97.59%. Zhao et al. [15] enhanced the detection of rice seeds by incorporating a small-object detection layer, achieving an average precision of 95.39% with average errors maintained within 0.1. In this study, we enhanced the sensitivity of cabbage seeds to multiscale characteristics via training with data from two scales and incorporating the SCP module. Following addition of the SCP module, both the easy and hard class detection accuracies improved, particularly when detecting small targets, showing a 3.4% improvement. Introducing multiscale-training data enhanced the ability of the model to detect seeds at different distances and angles and significantly improved its adaptability to variations in seed morphology. These features suggest that our model had immense potential and wide application prospects for use in various seed detection tasks.

4.6. Limitations and Future Work

In this study, targeted optimisation was implemented to address various issues in detecting seed germination completion without significantly increasing the computational burden of the model parameters. The YOLO–SGD model, with 45.22 M parameters, required modest computational resources, enabling the efficient operation of standard hardware. After optimisation, the number of parameters increased marginally by only 7.6 million, yet it achieved mAP and F1 scores of 98.0% and 93.3%, respectively, thereby meeting practical usage standards. The optimised YOLO–SGD model demonstrated robust performance and generalisation capabilities for cabbage seed detection.

Despite these promising results, several limitations of the current study warrant discussion and provide directions for future research. Firstly, the experiments were conducted under highly controlled laboratory conditions (constant temperature, humidity, simulated light). The system’s performance may significantly vary in more complex and uncontrolled real-world agricultural environments, where factors such as fluctuating natural illumination, presence of dust or debris, and diverse background substrates could impact detection accuracy.

Secondly, the current system primarily focuses on detecting germination completion and calculating germination percentage. A more comprehensive assessment of seed vigour, which often includes critical parameters like germination speed, mean germination time, or seedling growth metrics (e.g., shoot/root length), is not yet integrated.

Building upon these insights, future research should be conducted to expand the application of the YOLO–SGD model to encompass seed detection across various seed types. However, the initial data preparation for this process is intricate, potentially requiring significant human resources and affecting the training efficiency. To address this challenge, techniques such as few-shot learning or semi-supervised learning can be introduced for targeted model adjustment and optimisation. Furthermore, building on successful application of the YOLO–SGD model in rapidly detecting germinated seeds, we aim to gradually broaden its scope to include functionalities, such as measuring shoot lengths and predicting seed-growth potentials. The goal of this expansion will be to provide comprehensive and precise data for agricultural research and production.

4.7. Practical Applications for Growers

The high accuracy and efficiency demonstrated by the YOLO-SGD model for germinated seed detection pave the way for its significant real-time applications in modern agricultural practices, particularly for growers and commercial seed producers. This system offers a tangible solution to overcome the limitations of traditional manual germination assessment, enabling data-driven decision-making and optimised resource management. In a typical real-time application scenario, the system can be seamlessly integrated into existing seed quality control pipelines or deployed in specialised germination testing units within nurseries. Growers would place prepared seed Petri dishes onto the system’s designated sample stage. The integrated hardware automatically captures high-resolution images at predefined critical time points. These images are then immediately processed by the pre-trained YOLO-SGD model, which rapidly identifies and counts the germinated seeds, providing precise bounding box detections and classification. The detection results, including the exact number of germinated seeds and the automatically calculated germination percentage, are presented on an intuitive user interface for instant access. This real-time capability confers profound advantages over conventional manual methods; growers receive immediate feedback on the germination status of a seed lot, enabling prompt, data-driven decisions regarding sowing density, whether to re-sow, or adjusting environmental conditions. Such timely interventions are crucial for optimizing seed usage, minimizing economic losses attributable to poor germination, and ensuring an optimal and uniform stand establishment in the field. Furthermore, the objective and consistent nature of automated detection reduces human error and subjectivity, leading to more reliable and repeatable germination assessments.

5. Conclusions

We developed an efficient germinated seed detection algorithm, YOLO–SGD, which can rapidly identify completed germination events at different stages and calculate the germination percentages of seed batches. Our findings support the following main conclusions: (1) Replacing conventional convolution with INV, alongside the parallel use of convolution kernels of varying sizes, enhanced the spatial specificity of features and improved the detailed extraction for more accurate assessment of the seed germination status. (2) The strengthened interaction of multiscale information comprehensively captured seed image characteristics, adapting to different types and sizes of seeds, thereby enhancing the model’s robustness and generalisation capability. (3) The proposed method outperformed other models with a model size of only 45.22 M. It achieved a detection accuracy of 99.6% in the easy category and 96.4% in the hard category, with an overall mAP of 98.0% and an F1 score of 93.3%. The method significantly outperformed manual detection in terms of speed, demonstrating the potential to replace manual inspection and meet high application standards.

Author Contributions

Conceptualisation, T.Y.; Data curation, T.Y., B.P., and Y.S.; Formal analysis, B.P. and L.Y.; Funding acquisition, X.F.; Investigation, J.Z., D.Z., and Y.S.; Methodology, T.Y., B.P., and L.Y.; Project administration, B.P. and X.F.; Software, T.Y., L.Y., and D.Z.; Supervision, Y.S. and X.F.; Validation, J.Z. and D.Z.; Visualisation, T.Y. and J.Z.; Writing—original draft, T.Y. and L.Y.; Writing—review and editing, X.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 32072572; the Earmarked Fund for China Agriculture Research System (CARS), grant number CARS-23; the Innovative Research Group Project of Hebei Natural Science Foundation, grant number C2020204111; and the 2025 Provincial Postgraduate Innovation Capability Training Funding Program of Hebei Provincial Department of Education, grant number CXZZBS2025083.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AP	average precision
CABlock	context aggregation block
SPPCSPC	Spatial Pyramid Pooling Concurrent Spatial Pyramid Convolution
EVC	Explicit Visual Centre
GPR	Gaussian process regression
GPU	graphics-processing unit
H	height
INV	involution
l	large
LVC	learnable visual centre
m	medium
M	million
mAP	mean average precision
NIRS	near-infrared spectroscopy
RAM	random-access memory
s	small
S	stage
SCP	Spatial Context Pyramid
SGD	stochastic gradient descent
W	width
x	extra large
YOLO	You-Only-Look-Once
MLP	Multilayer Perceptron
SiLU	Sigmoid Linear Unit
FPNs	Feature Pyramid Networks

References

Bewley, J.D.; Bradford, K.; Hilhorst, H. Seeds: Physiology of Development, Germination and Dormancy; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Ranal, M.A.; Santana, D.G. How and why to measure the germination process? Braz. J. Bot. 2006, 29, 1–11. [Google Scholar] [CrossRef]
Sarmah, B.; Rajkhowa, R.; Chakraborty, I.; Govindaraju, I.; Dwivedi, S.K.; Mazumder, N.; Baruah, V.J. Precision opto-imaging techniques for seed quality assessment: Prospects and scope of recent advances. In Remote Sensing in Precision Agriculture; Academic Press: Cambridge, MA, USA, 2024; pp. 455–486. [Google Scholar]
Reed, R.C.; Bradford, K.J.; Khanday, I. Seed germination and vigor: Ensuring crop sustainability in a changing climate. Heredity 2022, 128, 450–459. [Google Scholar] [CrossRef] [PubMed]
Tobe, K.; Li, X.; Omasa, K. Seed germination and primary root growth of a halophyte, Kalidium caspicum (Chenopodiaceae). Ann. Bot. 2000, 85, 391–396. [Google Scholar] [CrossRef]
Wang, Y.; Chen, Y. Research on multilateral collaboration strategies in agricultural seed quality assurance. Sci. Rep. 2024, 14, 11310. [Google Scholar] [CrossRef]
Genze, N.; Bharti, R.; Grieb, M.; Schultheiss, S.J.; Grimm, D.G. Accurate machine learning-based germination detection, prediction and quality assessment of three grain crops. Plant Methods 2020, 16, 157. [Google Scholar] [CrossRef]
Ligterink, W.; Hilhorst, H.W.M. High-throughput scoring of seed germination. In Plant Hormones: Methods and Protocols; Springer: New York, NY, USA, 2017; pp. 57–72. [Google Scholar]
Malik, A.; Ram, B.; Arumugam, D.; Jin, Z.; Sun, X.; Xu, M. Predicting gypsum tofu quality from soybean seeds using hyperspectral imaging and machine learning. Food Control 2024, 160, 110357. [Google Scholar] [CrossRef]
Colmer, J.; O’Neill, C.M.; Wells, R.; Bostrom, A.; Reynolds, D.; Websdale, D.; Shiralagi, G.; Lu, W.; Lou, Q.; Le Cornu, T.; et al. SeedGerm: A cost-effective phenotyping platform for automated seed imaging and machine-learning based phenotypic analysis of crop seed germination. New Phytol. 2020, 228, 778–793. [Google Scholar] [CrossRef]
Sandhiya, M.; Visvesh, B.; Ugendrababu, M.; Tinisha, A. Varietal seed classification and seed germination prediction system. In Proceedings of the 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE), Vellore, India, 23–24 February 2024; IEEE: New York, NY, USA, 2024; pp. 1–5. [Google Scholar]
Liu, S.; Chen, Z.; Jiao, F. Detection of maize seed germination percentage based on improved locally linear embedding. Comput. Electron. Agric. 2023, 204, 107514. [Google Scholar] [CrossRef]
Xiao, H.; Chen, Z.; Yi, S.; Liu, J. Rapid detection of maize seed germination percentage based on Gaussian process regression with selection kernel function. Vib. Spectrosc. 2023, 129, 103595. [Google Scholar] [CrossRef]
Yamazaki, A.; Takezawa, A.; Nagasaka, K.; Motoki, K.; Nishimura, K.; Nakano, R.; Nakazaki, T. A simple method for measuring pollen germination percentage using machine learning. Plant Reprod. 2023, 36, 355–364. [Google Scholar] [CrossRef]
Zhao, J.; Ma, Y.; Yong, K.; Zhu, M.; Wang, Y.; Luo, Z.; Wei, X.; Huang, X. Deep-learning-based automatic evaluation of rice seed germination rate. J. Sci. Food Agric. 2023, 103, 1912–1924. [Google Scholar] [CrossRef] [PubMed]
Yao, Q.; Zheng, X.; Zhou, G.; Zhang, J. SGR-YOLO: A method for detecting seed germination percentage in wild rice. Front. Plant Sci. 2024, 14, 1305081. [Google Scholar] [CrossRef] [PubMed]
de Paiva Gonçalves, J.; Gasparini, K.; de Toledo Picoli, E.A.; Costa, M.D.B.L.; Araujo, W.L.; Zsögön, A.; Ribeiro, D.M. Metabolic control of seed germination in legumes. J. Plant Physiol. 2024, 286, 154206. [Google Scholar] [CrossRef] [PubMed]
Quan, Y.; Zhang, D.; Zhang, L.; Tang, J. Centralized feature pyramid for object detection. IEEE Trans. Image Process. 2023, 32, 4341–4354. [Google Scholar] [CrossRef]
Liu, Y.; Li, H.; Hu, C.; Luo, S.; Luo, Y.; Chen, C.W. Learning to aggregate multi-scale context for instance segmentation in remote sensing images. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 595–609. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–24 June 2023; pp. 7464–7475. [Google Scholar]
Althunian, T.A.; de Boer, A.; Klungel, O.H.; Insani, W.N.; Groenwold, R.H. Methods of defining the non-inferiority margin in randomized, double-blind controlled trials: A systematic review. Trials 2017, 18, 107. [Google Scholar] [CrossRef]
Mitchell, K. Quantitative analysis by the point-centered quarter method. arXiv 2010, arXiv:1010.3303. [Google Scholar]
Lindsey, B.E., III; Rivero, L.; Calhoun, C.S.; Grotewold, E.; Brkljacic, J. Standardized method for high-throughput sterilization of Arabidopsis seeds. J. Vis. Exp. 2017, 128, e56587. [Google Scholar]
Li, D.; Hu, J.; Wang, C.; Li, X.; She, Q.; Zhu, L.; Zhang, T.; Chen, Q. Involution: Inverting the inherence of convolution for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12321–12330. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T. Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009; Volume 2. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015; Volume 28. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs beat YOLOs on real-time object detection. arXiv 2023, arXiv:2304.08069. [Google Scholar]
Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.-Y. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
Song, H.; Sun, D.; Chun, S.; Jampani, V.; Han, D.; Heo, B.; Kim, W.; Yang, M.H. ViDT: An efficient and effective fully transformer-based object detector. arXiv 2021, arXiv:2110.03921. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-time flying object detection with YOLOv8. arXiv 2023, arXiv:2305.09972. [Google Scholar] [CrossRef]
Wang, Y.; Song, S. Detection of sweet corn seed viability based on hyperspectral imaging combined with firefly algorithm optimized deep learning. Front. Plant Sci. 2024, 15, 1361309. [Google Scholar] [CrossRef]
Jin, B.; Qi, H.; Jia, L.; Tang, Q.; Gao, L.; Li, Z.; Zhao, G. Determination of viability and vigor of naturally-aged rice seeds using hyperspectral imaging with machine learning. Infrared Phys. Technol. 2022, 122, 104097. [Google Scholar] [CrossRef]
Jiang, H.; Hu, F.; Fu, X.; Chen, C.; Wang, C.; Tian, L.; Shi, Y. YOLOv8-Peas: A lightweight drought tolerance method for peas based on seed germination vigor. Front. Plant Sci. 2023, 14, 1257947. [Google Scholar] [CrossRef]
Fu, X.; Han, B.; Liu, S.; Zhou, J.; Zhang, H.; Wang, H.; Zhang, H.; Ouyang, Z. WSVAS: A YOLOv4-based phenotyping platform for automatically detecting the salt tolerance of wheat based on seed germination vigour. Front. Plant Sci. 2022, 13, 1074360. [Google Scholar] [CrossRef]

Figure 1. Representative images illustrating seed–root entanglement. The red and blue bounding boxes indicate typical cases of roots wrapping around adjacent seeds.

Figure 2. Complete operational flowchart of the germinated seed detection and assessment system. This diagram delineates the sequential stages from initial image acquisition and data preprocessing, through intelligent detection by the YOLO-SGD model, to the ultimate calculation of germination percentage and its practical application.

Figure 3. Schematic diagram of the overall seed germination percentage detection system framework. (a) Hardware platform for seed germination image acquisition. This illustrates the physical equipment used for capturing seed germination images, including key hardware components for image acquisition, illumination control, and user interaction. (b) Data processing flow of seed germination images. This outlines the process from raw image acquisition through preprocessing and precise annotation to dataset preparation for model training, validation, and testing. (c) Architecture of the YOLO-SGD model. This presents the core structure of the proposed YOLO-SGD detection algorithm, which is based on YOLOv7-l and integrates backbone improvements, feature fusion, and detection heads, along with the EVC and SCP modules. (d) Comparison between the YOLO-SGD algorithm and manual germination percentage estimation. This provides a visual comparison between the model’s detection results and manually counted outcomes to evaluate the accuracy and reliability of the algorithm in real-world applications.

Figure 4. Illustration of data acquisition samples. (a) Close-up image of cabbage seeds captured at a short distance. (b) Image of cabbage seeds captured from a longer distance. (c) Annotated image prepared for model training.

Figure 5. Diagram of the key network structures and modules in the YOLO-SGD algorithm. (a) Overall architecture of the YOLO-SGD network. This illustrates the core structure of the YOLO-SGD algorithm, improved based on YOLOv7-l, including the backbone feature-extraction network, enhanced feature extractor, and detection heads. (b) Detailed diagram of the involution structure. This shows the involutional operation used in the second layer of the S1 block within the backbone feature extractor. (c) Structure of the Explicit Visual Centre (EVC) module. This presents the internal structure of the EVC module and its role in feature modulation. (d) Structure of the Spatial Context Pyramid (SCP) module. This depicts the SCP module design for capturing multi-scale spatial context information.

Figure 6. Variation in loss and mAP during model training. The curves illustrate the changes in training loss, validation loss, and mAP over 200 epochs. A sharp increase in mAP is observed around epoch 25. At epoch 80, the model enters the fine-tuning phase with all layers unfrozen, resulting in a temporary fluctuation in loss. After epoch 125, both loss and mAP trends gradually stabilise, indicating convergence and improved detection performance.

Figure 7. Radar chart comparison of model performance across different metrics. This figure compares the YOLO-SGD algorithm with various state-of-the-art object detection models using radar charts. The five key metrics assessed are F1 score, Easy Average Precision (AP), hard AP, mean AP (mAP), and model parameters. Higher values indicate better performance for F1 and AP metrics, while lower values are preferred for model parameters. (a) Faster R-CNN vs. YOLO-SGD; (b) DETR(DC5) vs. YOLO-SGD; (c) Rt-detr(l/x) vs. YOLO-SGD; (d) DINO(4scale) vs. YOLO-SGD; (e) ViDT-swin vs. YOLO-SGD; (f) YOLOX(s/m/l/x) vs. YOLO-SGD; (g) YOLOv7(m/l/x) vs. YOLO-SGD; (h) YOLOv8(s/m/l/x) vs. YOLO-SGD. These comparisons visually represent YOLO-SGD’s competitive performance and efficiency across different object detection benchmarks.

Figure 8. Detection results after integrating the INV module at different positions within Stage 1. The left group shows detection on a hard sample, and the right group on an easy sample. (a–c) and (d–f) represent results when the INV module is inserted at different layers of Stage 1 (S1-1, S1-2, and S1-3). The heatmaps indicate the model’s attention and detection response. The hard sample exhibits more pronounced feature enhancement, while the easy sample maintains stable detection with less interference.

Figure 9. Heatmaps of hard class detection with different module combinations in ablation study. This figure displays heatmaps illustrating the model’s activation for detecting ‘hard’ class instances in an ablation study. The top-left panel shows the original image for reference. Each subsequent heatmap (a–g) visualises the model’s focus (warmer colours) when different modules or their combinations are integrated. (a) INV; (b) EVC; (c) SCP; (d) INV+EVC; (e) INV+SCP; (f) EVC+SCP; (g) INV+EVC+SCP. These heatmaps visually demonstrate the contribution of individual modules and their combinations to the model’s ability to accurately localise challenging instances.

Figure 10. Heatmaps of easy class detection with different module combinations in ablation study. This figure displays heatmaps illustrating the model’s activation for detecting ‘easy’ class instances in an ablation study. The top-left panel shows the original image for reference. Each subsequent heatmap (a–g) visualises the model’s focus (warmer colours) when different modules or their combinations are integrated. (a) INV; (b) EVC; (c) SCP; (d) INV+EVC; (e) INV+SCP; (f) EVC+SCP; (g) INV+EVC+SCP. These heatmaps visually demonstrate the contribution of individual modules and their combinations to the model’s ability to accurately localise easily detectable instances.

Figure 11. Detection comparison before and after model improvement. The figure compares detection results of the original YOLOv7-l and the proposed YOLO-SGD model on both close-up (top row) and long-shot (bottom row) seed germination images. The yellow arrows highlight missed or incorrect detections in YOLOv7-l that are successfully detected by YOLO-SGD. Enlarged red-box regions emphasise areas with dense seed distribution, where the proposed model demonstrates superior localisation accuracy and robustness.

Figure 12. Comparison of YOLO-SGD algorithm with manual germination percentage measurement at different germination time points and distribution trend of germination percentage. (a–c) Comparison of YOLO-SGD algorithm with manually measured germination percentage at 24, 48, and 72 h. These scatter plots illustrate the strong correlation between germination percentages predicted by the YOLO-SGD algorithm and those obtained through manual measurement at 24 h (a), 48 h (b), and 72 h (c) of germination. Blue dots represent individual data points. Linear fittings (solid red lines) show high coefficients of determination (R2 values of 0.95, 0.99, and 0.98, respectively), indicating excellent agreement across all time points. The shaded pink and light pink areas represent the 95% confidence and prediction bands, respectively, demonstrating the reliability and precision of the YOLO-SGD predictions. Inset images display representative samples of seedlings at each corresponding time point. (d) Box plot of YOLO-SGD predicted germination percentage distribution at 24, 48, and 72 h. This box plot summarises the distribution of absolute prediction errors for the YOLO-SGD algorithm across the three measurement times (24 h, 48 h, and 72 h). Each box displays the interquartile range (25% to 75% of data), with the median error indicated by the horizontal line. Whiskers extend to 1.5 times the interquartile range, and blue dots represent outliers. The black star within each box signifies the mean error value, providing insights into the algorithm’s overall accuracy, consistency, and the spread of prediction errors over time.

Figure 13. Detection performance of the YOLO-SGD model on diverse seed types. Subfigures (a–c) showcase accurate detection results on pepper, tomato, and eggplant seeds, demonstrating the model’s consistent ability to identify germinated instances across varying characteristics, dimensions, morphologies, and growth patterns. These visual examples, presented with bounding boxes indicating detected germinated seeds, collectively affirm the YOLO-SGD model’s strong generalisation capability and reliable performance across a range of agricultural crops.

Table 1. Comparison of the results obtained using different germinated seed detection models.

Model	Easy AP (%)	Hard AP (%)	mAP (%)	F1 (%)	Params (M)
Faster R-CNN [32]	91.2	73.0	82.1	75.6	69.00 M
DETR (DC5) [33]	93.1	76.1	84.6	80.3	81.20 M
Rt-detr (l) [34]	94.9	82.5	88.7	80.3	52.70 M
Rt-detr (x) [34]	93.7	81.5	87.6	85.1	87.20 M
DINO (4scale) [35]	93.8	75.8	84.8	78.2	97.60 M
ViDT-swin [36]	94.8	83.2	89.0	77.1	101.80 M
YOLOX (s) [37]	91.2	87.6	89.4	80.6	8.97 M
YOLOX (m) [37]	92.5	88.9	90.7	82.4	25.32 M
YOLOX (l) [37]	94.1	86.7	90.4	83.0	54.29 M
YOLOX (x) [37]	92.7	88.7	90.7	83.7	99.07 M
YOLOv7 (m) [20]	91.9	89.7	91.3	83.4	20.40 M
YOLOv7 (l) [20]	95.4	90.2	92.8	88.9	37.62 M
YOLOv7 (x) [20]	94.5	89.7	92.1	85.9	71.34 M
YOLOv8 (s) [38]	91.4	89.4	90.4	81.9	25.90 M
YOLOv8 (m) [38]	91.8	91.6	91.7	83.5	49.70 M
YOLOv8 (l) [38]	93.2	91.8	92.8	88.5	83.70 M
YOLOv8 (x) [38]	93.8	91.6	92.7	88.4	131.00 M
YOLO-SGD	99.6	96.4	98.0	93.3	45.22 M

Table 2. Comparison of the results after embedding INV at different positions.

	Easy AP (%)	Hard AP (%)	mAP (%)	F1 (%)	Params (M)
	95.4	90.2	92.8	88.9	37.62 M
S1-1	92.9	86.3	89.6	81.6	37.70 M
S1-2	96.3	92.1	94.2	89.7	38.41 M
S1-3	94.8	91.2	93.0	88.7	39.66 M

Table 3. Ablation experiment results of the model.

INV	EVC	SCP	Easy AP (%)	Hard AP (%)	mAP (%)	F1 (%)	Params (M)
			95.4	90.2	92.8	88.9	37.62 M
√			96.3	92.1	94.2	89.7	38.41 M
	√		96.4	92.8	94.6	89.8	43.72 M
		√	96.8	93.6	95.2	89.1	38.31 M
√	√		97.5	93.3	95.4	90.3	44.22 M
√		√	98.8	95.4	97.1	92.2	39.12 M
	√	√	98.9	94.9	96.9	91.6	44.42 M
√	√	√	99.6	96.4	98.0	93.3	45.22 M

Table 4. Germination percentages detected with the manual inspection and YOLO–SGD models.

		Germination Percentage (%)	Germination Time (h)	Time (s)
Close-up image	Manual detection	77.2 (386)	48	46.7
Close-up image	YOLO–SGD	77.2 (386)	48	1.47
Distant image	Manual detection	80.2 (401)	48	46.7
Distant image	YOLO–SGD	79.6 (398)	48	1.62
Close-up image	Manual detection	96.4 (482)	72	61.7
Close-up image	YOLO–SGD	96.2 (481)	72	1.48
Distant image	Manual detection	95.2 (476)	72	61.7
Distant image	YOLO–SGD	94.2 (471)	72	1.96

Table 5. Performance metrics of the YOLO-SGD model on diverse seed types.

Seed Types	Easy AP (%)	Hard AP (%)	mAP (%)	F1 (%)
Cabbage seeds	99.6	96.4	98.0	93.3
Pepper seeds	97.4	93.0	95.2	90.1
Tomato seeds	98.9	96.7	97.8	91.5
Eggplant seeds	97.5	93.3	95.4	90.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, T.; Peng, B.; You, L.; Zhang, J.; Zhang, D.; Shang, Y.; Fan, X. YOLO-SGD: Precision-Oriented Intelligent Detection of Seed Germination Completion. Agronomy 2025, 15, 2146. https://doi.org/10.3390/agronomy15092146

AMA Style

Yang T, Peng B, You L, Zhang J, Zhang D, Shang Y, Fan X. YOLO-SGD: Precision-Oriented Intelligent Detection of Seed Germination Completion. Agronomy. 2025; 15(9):2146. https://doi.org/10.3390/agronomy15092146

Chicago/Turabian Style

Yang, Tianyu, Bo Peng, Li You, Jun Zhang, Dongfang Zhang, Yulei Shang, and Xiaofei Fan. 2025. "YOLO-SGD: Precision-Oriented Intelligent Detection of Seed Germination Completion" Agronomy 15, no. 9: 2146. https://doi.org/10.3390/agronomy15092146

APA Style

Yang, T., Peng, B., You, L., Zhang, J., Zhang, D., Shang, Y., & Fan, X. (2025). YOLO-SGD: Precision-Oriented Intelligent Detection of Seed Germination Completion. Agronomy, 15(9), 2146. https://doi.org/10.3390/agronomy15092146

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLO-SGD: Precision-Oriented Intelligent Detection of Seed Germination Completion

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Research Framework

2.2. Cabbage Seed Materials

2.3. Collecting and Preprocessing of Cabbage Seed Germination Data

2.3.1. Collecting Germination Data

2.3.2. Dataset Construction

2.3.3. Data Augmentation

2.4. YOLO–SGD Cabbage Seed-Germination Detection Algorithm

2.4.1. Internal Convolution Structure

2.4.2. EVC Module

2.4.3. SCP Module

2.5. Experimental Setup

2.6. Training Strategy

2.7. Evaluation Metrics

3. Results

3.1. Comparative Experiments

3.1.1. Comparison of Different Object Detection Algorithms

3.1.2. Comparative Experiments of Involution (INV)

3.2. Ablation Experiments

3.3. Experimental Results

3.4. Detection of the Cabbage Seed-Germination Rates

3.5. Generalisation Performance on Diverse Seed Types

4. Discussion

4.1. Performance of Each Model in Comparative Experiments for Germinated Seed Detection Using the Same Dataset

4.2. Discussion of the Effects of Various Optimisation Approaches on the Model Performance in Ablation Experiments

4.3. Discussion of Manual Calculations and the YOLO–SGD Methods in Terms of Detecting Seed-Germination Rates

4.4. Discussion of Detecting High-Density Seed Germination

4.5. Impact of Multi-Scale Information Processing on the Performance of Detecting Seed Germination

4.6. Limitations and Future Work

4.7. Practical Applications for Growers

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI