1. Introduction
Licorice (
Glycyrrhiza uralensis Fisch.) is a leguminous plant [
1]. Licorice is an important medicinal crop. Its component, glycyrrhizin, is widely used as a natural sweetener and in pharmaceutical preparations due to its anti-inflammatory and liver-protective properties. In addition, licorice extract is also used in cosmetics, food additives, tobacco flavorings, and confectionery foods [
2]. Global climate change and soil salinization pose challenges to the cultivation of licorice, making it an urgent need to screen and cultivate new licorice varieties that are tolerant to low temperatures and salts. The characteristic analysis technology based on the dynamic changes in the radicle morphology during the germination stage of licorice to scientifically evaluate the seed vigor level has become a key means in the research on the screening of licorice germplasm resources and genetic improvement. The traditional research and judgment of radicle characteristics rely on manual work, which has the disadvantages of high cost, long time consumption, and being error-prone [
3]. Based on the above needs, this paper proposes a rapid, accurate, and automated method for segmenting, extracting, and calculating the radicle characteristics of licorice seeds.
The highly hierarchical structure and powerful learning ability of deep learning play a huge role in the agricultural field [
4]. The developed object detection models can perform classification and prediction particularly well, being flexible and adaptable, and able to deal with various highly complex challenges [
5]. Especially the YOLO (You Only Look Once) algorithm [
6] stands out among numerous visual models due to its efficient computational speed, accurate recognition ability, as well as excellent object detection, tracking and image segmentation performance. In the agricultural field, global scholars have specifically optimized the YOLO model to address the complex issues existing in the agricultural production process. For example, Lippi et al. [
7] used the YOLOv4 model to place visual information of the sticky traps in a hazelnut orchard to detect and judge the threats of pests and diseases, achieving an average precision of 94.5% for pest detection, which provides assistance for the prevention and control of pests and diseases of hazelnuts. Cui et al. [
8] constructed the YOLO-FT network structure based on the YOLOv7 model framework. By using this network to detect UAVs, agricultural drones were made into pollination media, thus providing a method to achieve precise crop pollination and increasing the fruit setting rate of crops. Gai et al. [
9] proposed an improved YOLO-V4 deep learning algorithm to detect the maturity of cherry fruits and carried out artificial intervention on fruits with large maturity differences, ultimately increasing the yield of cherry fruits. Mirhaji et al. [
10] detected the RGB images of oranges using YOLOV4 under different lighting conditions and achieved an orchard yield estimation with the highest precision of 91.23%. Wang et al. [
11] proposed the Mushroom-YOLO model based on the improved YOLOv5 for the detection of small objects such as mushrooms, with a mean average precision as high as 99.24%, and its performance is far better than that of the original YOLOv5. Zhang et al. [
12] took five kinds of lettuce under greenhouse cultivation conditions as the research objects, combined the advantages of the object detection mechanism and the object classification mechanism, and established the YOLO-VOLO-LS multi-growth-stage lettuce variety classification model, achieving a lettuce variety classification accuracy of 96.01%. Although the above YOLO-based object detection algorithms have high precision, their functions are limited to classification and they cannot provide more complex crop germination phenotypic information such as root length and leaf area.
Image segmentation technology can provide detailed pixel-level object instances. In agriculture, image segmentation is widely used for crop and soil monitoring, predicting the optimal time for sowing, fertilizing, and harvesting, estimating crop yields, and detecting plant diseases [
13]. It has also been applied to the field of crop radicle phenotyping research. Wang et al. [
14] proposed a fully connected neural network called SegRoot based on the SegNet architecture for automatic root segmentation and length estimation. Experimental results showed that compared with traditional image processing methods, the proposed SegRoot network significantly improved root segmentation performance in soil. Smith et al. [
15] investigated a U-Net-based convolutional neural network system for segmenting root images in soil to replace the manual line-intersect measurement method, achieving a linear correlation coefficient of 0.9748 and a goodness-of-fit of 0.9217. Shen et al. [
16] improved the upsampling part of the DeepLabv3+ network and validated it using in situ images of cotton roots obtained via a micro-root window monitoring system. After training, the model achieved a recall rate of 0.9847 and a precision of 0.9702. Jnonas et al. [
17,
18] employed spatial pyramid pooling and local adaptive receptive field inference for 3D segmentation of cassava root systems. They compared the segmentation results with a root segmentation analysis reference algorithm for a set of cassava plant roots and qualitatively demonstrated the ability to segment increased root voxels and root branches. The results showed that combining the proposed DCNN method with dynamic inference detected more, especially finer, root structures compared to classical analytical reference methods. Unlike other instance segmentation frameworks such as Mask R-CNN, DeepLab, and U-Net, the YOLO model uses the largest feature map by size as the input to the Mask branch and obtains mask features through convolutional layer processing. The YOLO framework integrates object detection and instance segmentation into a unified framework, simplifying the processes of object recognition and pixel segmentation. Meanwhile, due to the simpler and more lightweight design of YOLO-series algorithms, they are easier to deploy on edge devices. Additionally, YOLO’s open network architecture and rich open-source resources facilitate researchers in improving and iteratively upgrading the model based on the YOLO structure.
The YOLOv11 model, as the current state-of-the-art (SOTA) model, is the latest model developed by Ultralyics based on previous YOLO versions. It introduces new features and improvements to enhance performance and applicability. Although YOLOv11 has excellent segmentation performance, the radicle characteristics of licorice seeds are quite complex. In the early stage of germination, the radicles are tiny and their color features are not obvious, making it difficult to extract the radicle characteristics of licorice. In the hydroponic seed germination experiments conducted by agronomists, under the cultivation condition of pure-color cotton cloth, there are interferences from tiny fibers, which makes it difficult to distinguish licorice radicles. The YOLO model has poor recognition accuracy for objects with tiny features, so the model needs to be improved to enhance its accuracy. Meanwhile, to obtain the root length and its length change information after licorice radicle segmentation, a module for subsequent processing of the output images needs to be added to the model. In response to the above problems, this paper will propose a method named YOLOV11-LicoSeg specifically for measuring the radicle length during the germination process of licorice seeds. The following work will be carried out:
(1) Based on the YOLOv11-seg model, the SSA will be added. This helps the model capture the feature information of the images more effectively while performing efficient computations, improving the model’s ability to extract radicle characteristics and enhancing its image-processing performance.
(2) The ordinary convolution in the C3K2 module unique to YOLOv11-seg will be replaced with the MEEM. This provides multi-scale edge detail features, assisting the model in distinguishing the tiny white fibers in the blue cotton cloth background from the radicle targets, and thus enabling more accurate recognition of the licorice radicle contour.
(3) The NWD loss function will be introduced into the YOLOv11-seg model to enhance the model’s segmentation ability for tiny radicles.
(4) A subsequent image-processing method will be presented, along with a module for calculating the length of licorice radicles, and the corresponding data analysis results will be provided.
(5) Using the model and the continuous time-series crop growth vitality monitoring system, the morphological evolution of the licorice radicle contour characteristics over the germination time will be obtained.
The remaining parts of this paper are organized as follows. In the second part, the data collection equipment, the treatment methods of seed stress, the image data collection method, the structure of YOLOv11-LicoSeg and its improved modules, the image processing module, the model evaluation indicators, and the seed germination vigor evaluation indicators are discussed. The third part explains the results and discussions, including the training environment and parameter settings, the results and discussions of the model ablation experiments and comparative experiments, and the analysis of the growth patterns of licorice radicles under temperature and salt stress. The fourth part presents the summary, limitations, and future work of this paper.
2. Materials and Methods
2.1. Equipment Construction
We used the continuous time-series crop growth vitality monitoring system (located at the Institute of Mechanical Equipment, Xinjiang Academy of Agricultural and Reclamation Science, Shihezi, China) to conduct the seed germination experiment of licorice. This system includes a seed germination incubator and an image acquisition device, as shown in
Figure 1.
The incubator has a length of 1055 mm, a width of 740 mm, and a height of 1740 mm (Henan Lvke Electric Technology Co., Ltd., Zhengzhou, China). It is equipped with a PTC hot air circulation system and a Tp-100 thermocouple, which can maintain a constant temperature environment. The temperature range can be adjusted between 5 °C and 50 °C to meet the germination temperature requirements of different seeds.
In addition, the incubator is equipped with an internal LED lighting system, which is used to provide light during the image capture process. Its illuminance is 183.9K lux, and the brightness is 46.341 K cd/m2, which is automatically triggered during the shooting process.
The incubator can accommodate six cultivation trays with a size of 250 mm × 250 mm. Two rails are installed at the top inside the incubator. A HIK Vision industrial camera is mounted on the rails, equipped with a CMOS sensor, model MV-CS200-10GC, and a 30mm focal length telecentric lens (model LD-23-0.18X145; Suzhou Youxin Zeda Co., Suzhou, China). A stepping motor (model 57HS56-3004A-123; Suzhou Science and Trade Co., Ltd., Suzhou, China) installed on the rails provides power to enable the camera to move on the rails inside the incubator, so as to capture high-resolution images of specific cultivation trays.
The camera communicates with the host computer using an RJ-45 Gigabit interface, and the resolution of the captured images is 2900 × 3000 pixels. Users can adjust the camera’s focal length and shooting interval, and set the pre-cropping of the images, etc., through the software interface of the host computer. The final images can be viewed in the data folder of the host computer for the next step of dataset annotation. Parameter settings for the system and its accompanying software are shown in
Table 1.
2.2. Licorice Image Acquisition and Pre-Processing
A hydroponic experiment was conducted using the Glycyrrhiza uralensis Fisch. seeds collected locally in Xinjiang, China. The experimental temperatures were set at three groups: 18–19 °C, 24–25 °C, and 30–31 °C. Each group was subjected to a salt stress test, with the salt concentration gradients being 0 µs/cm, 600 µs/cm, 1200 µs/cm, 1800 µs/cm, 2400 µs/cm, and 3000 µs/cm respectively. The salts with different contents were all extracted from the local soil leachate. The experimental process is shown in
Figure 2. There was a total of 18 groups in the experiment, which lasted for 4 days in total. A total of 23,310 images were collected, with 1295 images collected for each group. All the images were saved in jpg format with a resolution of 2900 × 3000 pixels. By excluding the images in which the changes in the licorice radicles were not obvious and the images during the period when the seeds did not germinate, we retained 1835 images to construct the dataset and train the model. The relevant parameters of the experiment are shown in
Table 2.
2.3. YOLOv11-LicoSeg: Design of Licorice Radicle Segmentation
The YOLOv11-seg model consists of three modules: Backbone, Neck, and Head. It is homologous to YOLOv8. Compared with YOLOv8, the C2f module in its backbone is replaced by the C3k2 module. This module optimizes the residual connection method, which can not only expand the receptive field to capture more abundant context information but also enhance the feature diversity while keeping the computational complexity controllable.
A C2PSA module is added after the last layer (sppf layer) in the backbone. Through the cross-scale attention mechanism, it can dynamically integrate features at different levels and perform more refined modeling of the spatial position information of the target, enhancing the model’s detection ability for small and dense targets.
In the Neck, the idea of a path aggregation network is adopted. By introducing a bidirectional feature pyramid network, top-down and bottom-up feature fusion is achieved. This structure establishes a more efficient information flow path between feature maps of different scales, enabling full interaction between low-level detail features and high-level semantic features, and effectively improving the model’s adaptability to multi-scale targets.
In the decoupled Head, the two Conv layers in the classification and detection head are replaced by DWConv. By decomposing the standard convolution into depthwise convolution and pointwise convolution, it significantly reduces the number of parameters and the amount of computation while retaining the effectiveness of feature extraction. This improvement not only improves the model’s inference speed but also reduces the memory usage, making the model more suitable for deployment on resource-constrained devices while maintaining high detection accuracy.
To improve the accuracy of licorice seed radicle detection, better recognize the licorice radicle outline, and enhance the model’s segmentation ability for tiny licorice radicles, we made the following improvements:
(1) Add SSA to improve the model’s segmentation ability and image processing ability.
(2) Replace the ordinary convolution in the C3K2 module unique to YOLOv11-seg with the MEEM to provide multi-scale edge detail features.
(3) Introduce the NWD loss function into the YOLOv11-seg model to improve the model’s segmentation accuracy for tiny radicle structures.
(4) Introduce a module for calculating the length of licorice radicles and a module for calculating the radicle growth rate based on the continuous-time image sequence.
Figure 3 is a schematic diagram of the structure of the YOLOv11-LicoSeg model. The following subsections will introduce the design and implementation of the SSA module, the MEEM, the NWD loss function, and the method for calculating the radicle length in sequence.
2.3.1. Spatial Strip Attention
The SSA is based on the concept of the attention mechanism. By assigning different weights to different spatial positions of the input feature map, it enables the model to selectively focus on important spatial regions (Cui et al. [
8]). It uses a lightweight computational module to generate attention weights, avoiding the high computational cost of the self-attention mechanism. Meanwhile, it can expand the receptive field in both the horizontal and vertical directions, effectively aggregating information from adjacent positions to better capture spatial context. Its structure is shown in
Figure 4.
As can be seen from
Figure 4, for the input feature map (with a size of H × W), the following steps are carried out.
First, the horizontal attention weights are generated. A strip-shaped region of size 1 × K is selected in the horizontal direction, and a global average pooling (GAP) operation is performed on it to compress the features. Then, through a convolutional layer (Conv) and a Sigmoid function, the horizontal attention weights are generated.
Subsequently, the horizontal information is aggregated. The generated horizontal attention weights are convolved with the features of the 1 × K strip-shaped region (the “∗” in the figure represents convolution) to achieve the aggregation of information from adjacent positions in the horizontal direction, obtaining a new feature representation.
Next, the vertical attention weights are generated. For the new feature representation, a strip-shaped region of size K × 1 is selected in the vertical direction, and the aforementioned operations, namely GAP, Conv, and Sigmoid, are repeated to generate the vertical attention weights. The vertical attention weights are convolved with the features of the K × 1 strip-shaped region to complete the vertical information aggregation.
Finally, the features processed in the horizontal and vertical directions are added to the original input features through a skip connection to obtain the final output of the SSA. Although adding the SSA to the model increases the computational overhead of the system, the image segmentation accuracy of the entire model for licorice radicles will be improved.
2.3.2. Multi-Scale Edge Detail Enhancement Module
MEEM compensates for the deficiency of the YOLOv11 model in detail capture through multi-scale processing and edge enhancement. It uses average pooling to expand the receptive field, extracts the edge information of the image at different scales, and then highlights the object edges through an edge enhancer, enabling the model to better perceive the boundaries and details of objects. By fusing the multi-scale edge information, MEEM can provide richer and more accurate detail features for subsequent salient object detection, thereby improving the model’s accuracy in locating and segmenting salient objects in complex scenes [
19]. Its structure is shown in
Figure 5.
MEEM is mainly used in the task of salient object detection to extract multi-scale edge information from the input image and enhance the details. Its specific structure is as follows:
Local Feature Extraction:
First, a 3 × 3 convolutional layer is applied to the input image to obtain local features containing preliminary detail information. This step serves as the foundation for subsequent processing and provides data sources for the extraction of edge information.
Multi-scale Edge Information Extraction:
The local features are processed through a 1 × 1 convolutional layer to obtain the initial edge features. Then, a combination of average pooling and 1 × 1 convolutional layers is used to gradually extract multi-scale edge information in three steps. In each processing step, the features from the previous step are first convolved with a 1 × 1 convolutional layer, and then the receptive field is expanded through 3 × 3 average pooling, so as to comprehensively obtain edge information at different scales.
Edge Enhancement Processing:
For the edge features obtained at each scale, they are processed by an edge enhancer. The edge enhancer first calculates the difference between the features at each scale and the features after average pooling to highlight the edge parts. Then, after the difference is processed by a 1 × 1 convolutional layer, it is added to the original features, making the edge information more prominent.
Feature Fusion Output:
The features at different scales that have been processed by the edge enhancement are concatenated channel-wise. Then, they are fused through a 1 × 1 convolutional layer, and finally, features containing multi-scale edge information are obtained. These fused features will be combined with the features of the main branch in the Detail Enhancement Module (DEM) to supplement the lacking detail information, thereby improving the model’s detection accuracy for salient objects.
2.3.3. Normalized Weighted Distance
A customized loss function can enhance the model’s accuracy while optimizing computational efficiency. In this study, while retaining the original Generalized Intersection over Union (Giou) loss function, the NWD loss function [
20], which is specifically designed for the recognition of tiny objects, is introduced.
The core design concept of NWD lies in its multi-scale window mechanism and cross-level feature fusion strategy. By dynamically adjusting the size of the detection window at different training stages and adaptively matching the window dimensions according to the characteristics of the target, a fine-grained analysis of tiny structures can be achieved. At the same time, this loss function captures rich context information by fusing multi-level feature maps, thereby enhancing the feature representation ability of tiny targets.
It is worth noting that the architectural design of NWD inherently reduces the sensitivity to the change in the target size, making it particularly outstanding in extracting the semantic similarity between micro-scale targets. When using the AI-TOD dataset with an average detected target size of 12.8 pixels, the detection accuracy improved by approximately 0.8%. Under the premise of maintaining computational efficiency, this method significantly enhances the model’s fine-grained segmentation ability for tiny objects, providing a powerful solution for tasks that require high-precision recognition of tiny structures.
In the licorice germination experiment conducted in this paper, when the radicle breaks through the seed coat at the initial stage of licorice germination, the radicle is extremely fine. Within the first 8 h of germination, the size of the segmented radicle does not exceed 20 pixels. The addition of the NWD loss function helps the model focus on the characteristics of radicles in the early stage of licorice germination and improves the segmentation accuracy of these radicles. Its calculation formulas can be represented by Formula (1).
Among them, represents the second-order Wasserstein distance between and , and is a constant related to the characteristics of the dataset.
2.3.4. Calculation Method for the Radicle Length of Licorice
By calling the segmented mask information, the model outputs a binary image, and the radicle feature map of germinated licorice seeds is obtained. We used the Canny edge detection algorithm to measure the number of pixels of the radicle’s edge contour. The pixel size of the cropped image is 1500 × 1500, corresponding to an actual length of 250 mm × 250 mm. Formulas (2) and (3) are used to convert and calculate the radicle length of licorice. The length is half of the irregular polygon formed by the radicle after segmentation and calculation [
21].
In the formulas,
actual_length is a constant of 250 mm and
image_unilateral_pixel is the pixel value of one side of the image. The calculated
R is the scale, and
sum_pixels is the number of border pixels obtained after calculation by the Cal module. By substituting these parameters into Formula (3), the licorice root length identified by the model can be obtained.
Figure 6 shows the specific process of the calculation method for the radicle length of licorice.
2.4. Evaluation Method
The mean average precision (mAP
0.5 and mAP
0.5–0.95) is used to evaluate the box precision and mask precision of the model in detecting and segmenting the radicles of licorice. They can be expressed by Formulas (4) and (5):
The higher the mean average precision is, the better the performance of the model in actually detecting and segmenting the radicles of licorice. In addition, params and FLOPS are also used to evaluate the lightweight level of the model and the computational amount required by the model respectively. They are given by Formulas (6) and (7), where K
2 is the size of the convolutional kernel, H × W represents the product of the height and width of the input feature map of the image, C
out represents the number of output channels, and C
in represents the number of input channels.
4. Discussion
The YOLO model has wide applications. Xu et al. [
22] proposed the CSW-YOLO model for bitter gourd detection; Luo et al. [
23] developed the YOLOTree model for crown volume estimation and Zhong et al. [
24] also introduced the YOLOv11-seg-DEDB model for classifying the severity of aloe anthracnose. This paper presents a method for measuring licorice rhizome length based on YOLOv11-LicoSeg. Building upon the original model architecture, this approach incorporates a Spatial Strip Attention mechanism (SSA) to enhance the model’s ability to extract rhizome features; employs the MEEM instead of traditional convolutional layers to generate multi-scale edge detail features, enabling more accurate identification of licorice rhizome contours; and introduces the NWD loss function to improve segmentation accuracy for fine rhizomes.
Experimental results show YOLOv11-LicoSeg achieves 78.2% mAP50–95 for segmentation and 81.7% for detection, reaching the highest precision in licorice radicle segmentation. Compared with unimproved YOLOv11—seg, detection mAP50 increases by 0.7%, mAP50–95 by 1.3%, segmentation mAP50 by 0.7%, and mAP50–95 by 0.8%. The linearity of 0.9421 and the goodness of fit of 0.94408 between manual and machine-vision measurements prove the model’s feasibility in actual licorice root length calculation. Radicle length extraction helps study radicle morphological evolution, determine seed viability, and explore stress effects. This is the first application of the YOLOv11-seg model to licorice seed radicle characteristic analysis, and a fast and accurate licorice radicle length measurement method is created.
Although the YOLOv11-LicoSeg model has achieved its design objectives, this method still has some limitations. For instance, compared with the initial version of the YOLOv11-seg model, the size of the YOLOv11-LicoSeg model has increased, which raises the difficulty of deployment. Meanwhile, there are misidentifications during the radicle segmentation process. In the early stage of licorice germination, when the radicles are short and overlap with the background of similar color patches, it may lead to false detections of the radicles.
The YOLOv11-LicoSeg model proposed in this study demonstrates high accuracy and robustness in measuring the length of licorice radicles, validating the significant potential of deep learning techniques in plant phenotypic analysis, particularly in the segmentation of small organs. Compared with traditional manual measurement methods, this approach achieves automated, non-destructive, and efficient radicle length extraction, substantially reducing human error and experimental costs. As one of the most critical organs during the early stage of seed germination, the growth pattern of the radicle is closely related to seed viability, stress responses, and seedling establishment. Therefore, this method is not only applicable to licorice but can also be extended to the study of radicle or hypocotyl characteristics in other small-seeded species (such as Arabidopsis, tobacco, and alfalfa), providing technical support for rapid seed quality testing, germplasm resource evaluation, and early crop growth monitoring.
However, as noted in the results, the current model still has certain limitations. The increased model size poses challenges for deployment on edge computing devices (e.g., embedded systems or field-portable devices). Future work could address this through model pruning, knowledge distillation, or lightweight architectural designs (e.g., incorporating lightweight backbones such as MobileNet or ShuffleNet). Furthermore, to address the issue of false detections caused by radicles blending into similarly colored backgrounds or early-stage radicles being too short or overlapping, future research could consider attention-guided data augmentation strategies, contextual information fusion modules, or weakly supervised learning mechanisms to further enhance the model’s segmentation capability in complex backgrounds. Moreover, the current study is primarily based on static images. Future efforts can integrate video streams or time-series imaging techniques to achieve dynamic tracking of individual seed germination processes and continuous measurement of radicle growth, thereby providing richer data support for the study of seed germination kinetics. In summary, YOLOv11-LicoSeg provides an effective tool for phenotypic analysis of small plant organs and holds promising application prospects, though continuous improvements in model lightweighting and adaptability to complex scenarios are still needed.