1. Introduction
Panax ginseng C.A. Mey, a perennial medicinal herb from the Araliaceae family, is a relict species of the Tertiary period found in temperate regions of the Northern Hemisphere. Rich in active compounds like ginsenosides, polysaccharides, and volatile oils, it offers various pharmacological benefits, including immune regulation, antioxidant effects, fatigue resistance, and anti-tumor properties, and is revered as the “King of All Herbs” [
1,
2,
3]. Understory ginseng refers to ginseng plants grown from seeds that are manually sown in forests, simulating the growing environment of wild ginseng. This method involves artificial care and cultivation. It contains a wide range of ginsenosides with stable medicinal effects [
4]. Due to climate and environmental limitations, understory ginseng is only cultivated in a few countries. Jilin Province in China is the primary producer, accounting for 60% of China’s production and 40% of global output, with significant economic value. However, diseases remain the main challenge for stable, high-yield cultivation in Jilin. Although traditional manual inspections can identify diseases and their severity, they are time-consuming, labor-intensive, prone to plant damage, and difficult to scale for real-time monitoring [
5,
6]. Therefore, an urgent need exists for an automated, efficient, and accurate disease detection method to ensure both the yield and quality of understory ginseng.
In the field of crop disease diagnosis, computer image segmentation methods are widely used for leaf and leaf diseases analysis. Early studies primarily relied on traditional image processing techniques such as thresholding, clustering, and region growing methods [
7,
8,
9,
10,
11]. For example, Sengar et al. [
7] proposed an adaptive intensity thresholding method, which achieved high accuracy in segmenting cherry leaf powdery mildew. Clustering based technology, such as Febrinanto et al. [
8] used k-means for citrus leaf segmentation, providing initial clusters specific to citrus. Chodey et al. [
9] combined fuzzy c-means with integrated color features to extract leaves and lesions in complex backgrounds. Chen et al. [
10] further integrated region growing with non-local filtering for precise segmentation of vegetable and maize leaf lesions in greenhouse and field environments. Additionally, Septiarini et al. [
11] employed multi-edge detection for tomato lesion segmentation. While traditional image segmentation methods perform well under specific conditions, they are sensitive to image quality, lighting variations, and complex backgrounds. They also struggle with lesions that vary in shape, size, or overlap, limiting their application and robustness in complex agricultural environments.
Unlike traditional image processing methods, deep learning has driven the development of crop disease segmentation technologies by automatically extracting image features, enabling stable recognition of lesions in complex backgrounds. Image processing-based detection methods, known for their speed, non-destructiveness, and high efficiency, enable the implementation of real-time evaluation of leaf diseases. Several advanced models have been proposed for leaf disease segmentation tasks, achieving excellent results across different crops [
12]. For instance, Cheng et al. [
13] introduced ALDNet, a two-stage method combining deep aggregation and multi-scale fusion strategies. Using PBGNet, it achieved precise segmentation of apple leaves, while PDFNet, with residual paths and multi-scale feature fusion, enhanced segmentation of leaf disease margins and minute disease spots. In complex backgrounds, the model achieved a 77.41% mIoU for apple leaf disease segmentation. Yang et al. [
14] developed FATDNet, which incorporated dual-path fusion adversarial algorithms and multi-dimensional attention mechanisms to enhance the ability to distinguish minute disease spots. The Gaussian-weighted edge segmentation module alleviated edge information loss, significantly improving robustness in complex backgrounds. Ding et al. [
15] proposed AS-DeepLabv3+, which enhanced DeepLabv3+ with a multi-fusion attention module and dynamic dilation rate, achieving a 98.00% mIoU in apple leaf disease segmentation. Wang et al. [
16] developed WE-DeepLabv3+ with a Window Attention-ASPP module and Efficient Channel Attention (ECA), using MobileNetV2 as the backbone to reduce model parameters while enhancing small target and edge feature extraction, achieving an 82.0% mIoU in Panax notoginseng leaf disease segmentation. Both methods highlight the trend of improving lesion extraction with enhanced Atrous Spatial Pyramid Pooling (ASPP) and multi-attention mechanisms, although the models still maintain relatively high parameter and computational demands.
Although deep learning has shown promising potential in the task of leaf disease segmentation for understory ginseng, it still faces three key challenges: (1) Minute disease spots and leaf disease with blurred margins make precise segmentation difficult. Understory ginseng leaf diseases are typically small in scale, irregular in shape, and have indistinct edges, which can be erased or merged with the background during downsampling, leading to the loss of boundary information and affecting segmentation accuracy. (2) Strong background interference and minute disease spots features result in a high risk of false positives and missed detections. Ginseng leaves often blend with background elements such as soil, weeds, and light and shadow, and the leaf diseases resemble the healthy tissue in color and texture, making it difficult for the model to distinguish leaf diseases from the background, thus reducing detection accuracy. (3) Limited computational resources require a balance between model lightness and high precision. In practical agricultural applications, such as deployment on mobile or field devices, the model needs to be lightweight with low computational complexity.
To address the aforementioned issues, this paper proposes a lightweight and high-precision semantic segmentation network, the LD-SAGE model, for leaf disease segmentation in understory ginseng. This model improves upon the original DeepLabv3+ architecture with optimizations to the backbone network, edge-guidance module, and multi-scale perception mechanism. First, the lightweight StarNet replaces the Xception backbone, significantly reducing parameters and computational load while maintaining strong feature extraction capabilities, making it suitable for resource-constrained environments. Secondly, the integration of the Gaussian-Edge Channel Fusion (GECF) module enhances boundary perception and channel semantic modulation capabilities, effectively strengthening the feature representation of small lesions and blurred boundaries, thereby significantly improving segmentation accuracy and boundary consistency. Finally, the efficient Multi-scale Attention-guided Context Modulation (MACM) Module replaces the traditional ASPP, achieving multi-scale perception while reducing model size. Through these optimizations, the model not only improves recognition accuracy but also achieves a significant lightweight design, substantially reducing computational resource consumption.
3. Results and Discussion
3.1. Analysis of Semantic Segmentation Performance of the LD-SAGE Network
As shown in
Figure 10, the training and validation performance of the LD-SAGE network are presented. The green line with solid circles represents the mIoU on the training set. The blue line with crosses and the red line with triangles represents the loss functions on the training and validation sets, respectively.
As shown in
Figure 10, the mIoU on the validation set steadily increased, while both training and validation losses decreased. Early on, the loss was high, and segmentation accuracy was low. However, as training progressed, the model’s feature extraction and target region recognition improved, optimizing segmentation performance. Eventually, both the loss and performance curves stabilized, indicating good convergence and generalization. This stable trend indicates that the LD-SAGE network can effectively learn disease spot features without overfitting. The high consistency between the training and validation curves suggests good generalization ability of the model on unseen samples. Furthermore, the balanced dataset and Gaussian-smoothed attention-based modules help the model maintain stable performance under varying lighting and background conditions. Overall, the LD-SAGE network demonstrates strong learning ability and reliable prediction performance in semantic segmentation tasks.
3.2. Performance Analysis Based on Different Backbone Networks
In the semantic segmentation task of ginseng leaf disease, the model’s ability to extract fine-grained features directly impacts the final segmentation accuracy, especially in complex scenarios where the disease spots are small, the edges are blurry, and the color is similar to the background. Therefore, the choice of the backbone network is critical to overall performance, while also considering lightweight characteristics to meet the practical deployment needs of disease detection. To validate the effectiveness of the selected backbone network, this paper conducts substitution experiments to compare the performance of different networks in terms of segmentation accuracy and computational cost. Based on the DeepLabv3+ backbone structure, five representative networks Xception, StarNet, MobileNetV2 [
25], MobileNetV4 [
26], and Vgg [
27] are selected as candidate backbones. Five experimental groups are constructed to compare their performance in ginseng leaf segmentation. Comparison metrics include mIoU, Precision, Recall, Parameters, and GFLOPs. The experimental results are shown in
Table 2.
Table 2 shows that the model with Xception as the backbone delivers the best segmentation performance, achieving an mIoU of 90.98%, Recall of 94.16%, and Precision of 94.76%. However, it has a high model size of 54.709 M parameters and incurs a computational cost of 166.8 GFLOPs. This high complexity and computational cost make it challenging to deploy in practical agricultural scenarios, limiting its potential for lightweight applications. In comparison, the model with StarNet as the backbone maintains high segmentation accuracy while being significantly lighter. This performance improvement is primarily attributed to the star-shaped residual structure of StarNet, which captures multi-scale disease spot features with fewer parameters. This design reduces computational burden while maintaining sensitivity to details, enabling efficient deployment of the model on edge devices in field settings. It achieves an mIoU of 90.66%, just 0.32% lower than Xception, with Precision of 94.51% and Recall of 93.60%. More importantly, its model size is just 3.585 M parameters and requires 57.639 GFLOPs, representing a 65% reduction. This shows that StarNet excels in feature extraction, has a compact structure, and consumes minimal computational resources, making it an ideal backbone for both accuracy and deployment efficiency. In comparison with the MobileNet series, the MobileNetV2-based model achieves an mIoU of 89.60%, with 5.814 M parameters and 52.875 GFLOPs. While lighter, its segmentation accuracy is slightly lower than StarNet’s. The MobileNetV4 backbone, with 5.070 M parameters and only 45.699 GFLOPs, the lowest among the compared models. However, its mIoU drops to 88.47%, further lowering segmentation performance and making it harder to balance accuracy and efficiency. The Vgg-based model, despite a strong Recall of 95.41% and Precision of 94.56%, has a large model size of 20.144 M parameters and a heavy computational load of 332.318 GFLOPs, which makes it unsuitable for deployment on resource-constrained devices compared to StarNet.
The comparison shows that StarNet strikes the best balance between performance and efficiency in understory ginseng disease segmentation. Its high accuracy, low parameter count, and computational cost ensure precise segmentation while remaining practical for deployment. Thus, StarNet is a feasible choice as the backbone network, demonstrating the advantages of the proposed model’s backbone.
3.3. Performance Comparison of Different Semantic Segmentation Models
To validate the effectiveness of the proposed model in understory ginseng leaf spot segmentation, five representative semantic segmentation models were selected: U-Net [
28], PSPNet [
29], SegFormer [
30], DeepLabv3+, and the enhanced LD-SAGE. A comprehensive comparison was conducted across segmentation accuracy, model parameters, and GFLOPs, as shown in
Table 3.
As shown in
Table 3, U-Net achieves an mIoU of 91.66%, with Precision and Recall at 95.33% and 95.07%, respectively, demonstrating strong performance. However, its parameter count of 24.891 M and 451.706 GFLOPs result in substantial computational overhead, limiting its deployment in resource-constrained agricultural environments. Although PSPNet, a classical model, is representative, it underperforms in disease spot classification with a spot class IoU of only 67% and an overall mIoU of 85.56%. Its high parameter count (46.707 M) and GFLOPs (118.428) make it inefficient. SegFormer, a Transformer-based model, excels in lightweight design, with just 3.715 M parameters and 13.546 GFLOPs, but it shows the lowest segmentation accuracy for disease spots, with an IoU of 66% and an overall mIoU of 84.32%, indicating room for improvement in small target recognition. In contrast, DeepLabv3+ strikes a more balanced performance with an mIoU of 90.98% and disease IoU of 77%, though its high parameter count (54.709 M) and GFLOPs (166.849) also pose deployment challenges.
In contrast, the proposed LD-SAGE achieves an IoU of 81% for disease classification, with Precision and Recall reaching 96.34% and 95.21%, respectively. The overall mIoU is 92.48%. This performance improvement primarily stems from the integration of the GECF and MACM modules, which enhance edge refinement and contextual feature aggregation capabilities. Compared to other models, LD-SAGE reduces the number of parameters while maintaining or even improving accuracy, validating the effectiveness of the lightweight design. Furthermore, its stable performance under varying lighting and background conditions demonstrates the model’s robustness in real agricultural scenarios. While maintaining high accuracy, its parameter count is reduced to 2.524 M, and GFLOPs are lowered to 36.857, cutting computational overhead by approximately 78% compared to the original DeepLabv3+. This demonstrates that LD-SAGE effectively enhances multi-scale feature fusion and disease spot region perception, significantly reducing model complexity while improving key class recognition accuracy. It offers an optimal balance of precision, efficiency, and deployment feasibility, making it an ideal choice for practical disease segmentation tasks.
3.4. Ablation Study
The ablation study evaluates the impact of three components: backbone network replacement, GECF module introduction, and MACM module integration on model performance and complexity. Results validate the effectiveness and rationale of the proposed improvements.
Table 4 presents a comparison of the ablation experiment results. To minimize the impact of randomness in the experimental results and ensure the statistical significance and reproducibility of the research conclusions, we maintained consistent datasets, training parameters, and base network configurations across all experiments to ensure fairness in model comparisons. Additionally, each set of experimental results is based on the average of three independent repetitions, reported in the form of “mean ± standard deviation,” covering key evaluation metrics such as mIoU, Recall, and Precision, to comprehensively reflect the stability and reliability of the model’s performance.
As shown in
Table 4, replacing the original DeepLabv3+ backbone with the lightweight StarNet reduces the model’s parameters from 54.709 M to 3.585 M, and GFLOPs from 166.849 to 57.639, achieving reductions of approximately 93% and 65%, respectively. Despite this significant reduction in computational cost, the mIoU remains at 90.66
0.3%, and Precision reaches 94.51
0.1%, with virtually no loss in performance. This demonstrates that StarNet is an ideal backbone, balancing accuracy and efficiency. The introduction of the GECF module further enhances edge perception, improving mIoU to 91.57
0.1%, Precision to 95.98
0.2%, and Recall to 94.22
0.3%. The parameters increase slightly to 3.649 M, and GFLOPs remain at 57.762, indicating that the GECF module effectively optimizes segmentation of disease spot edges without significantly increasing complexity. Finally, integrating the MACM module forms the complete LD-SAGE model, resulting in an mIoU of 92.48
0.3%, with Recall and Precision reaching 95.21
0.1% and 96.34
0.1%, respectively, the best values in all experimental configurations. The parameter count further decreases to 2.524 M, and GFLOPs drop to 36.857, reducing computational overhead by approximately 31% compared to the configuration with only the GECF module. This highlights that the MACM module not only enhances multi-scale disease spot feature capture but also optimizes structure to significantly lower computational costs, making it a crucial component for practical deployment. The ablation study results show that each module independently contributes to the overall performance. StarNet improves computational efficiency, GECF enhances boundary detection, and MACM boosts multi-scale contextual reasoning. Removing any module significantly decreases mIoU, highlighting their complementary roles in the network structure. Statistical analysis indicates that each module makes an independent contribution to the overall performance, and the standard deviation of the three independent experiments remains consistently small, with the standard deviation of all metrics being less than 0.3%. This demonstrates the robustness and reproducibility of the results.
The ablation study results highlight that backbone network replacement, edge attention enhancement, and feature fusion optimization progressively improve the model’s overall performance in accuracy, efficiency, and practicality. The final LD-SAGE model demonstrates a significant advantage in the understory ginseng disease segmentation task.
3.5. Visualization and Analysis of Segmentation Results
To visually compare the performance of different semantic segmentation models in real-world applications, this study selects typical understory ginseng leaf disease images and analyzes the segmentation results of five models: U-Net, PSPNet, SegFormer, DeepLabV3+, and the proposed LD-SAGE. The image samples cover various challenging scenarios, including minute disease spot areas, blurred edges, overlapping lesions with leaf margins, and strong background interference, to thoroughly evaluate the models’ ability to represent disease structures and segmentation accuracy. The visualization results are shown in
Figure 11.
Figure 11 visualizes the segmentation results, highlighting significant differences between models. DeepLabv3+ shows stable performance but misclassifies background shadows as lesions in Image1 and Image4 and misses small lesions in Image5. U-Net accurately segments the leaf area but misidentifies large background shadow areas as lesions in Image1 and Image2. PSPNet, while capable of multi-scale feature extraction, lacks sufficient detail recovery at the decoder stage, causing blurry edges and loss of details at the leaf and lesion boundaries, leading to incomplete segmentation. SegFormer struggles with rough boundary delineation, failing to accurately outline leaf contours in Image1, and misclassifying large areas of background as lesions or leaves in Image3 and Image4. In contrast, LD-SAGE demonstrates superior segmentation performance across all samples. The model accurately captures small lesion details and maintains high consistency in images with overlapping leaf edges, blurred textures, or significant lighting changes. For instance, in Image4, it clearly distinguishes between background, leaf, and lesion areas, providing a complete lesion boundary, while other models fail to do so. In Image3 and Image5, the model effectively suppresses background interference and captures fine lesion details.
Overall, the visual comparison results show that the combination of the GECF and MACM modules significantly enhances the model’s ability to identify disease spot edges and maintain structural integrity under complex lighting and background conditions. The GECF module improves edge perception and clarity, while the MACM mechanism strengthens the fusion of multi-scale contextual features. As a result, LD-SAGE performs more consistently in segmenting small and blurred disease spots, aligning with previous results. The consistency between the visual and numerical results confirms the model’s interpretability, robustness, and its potential application in real ginseng disease monitoring.
To more clearly present the effective information extracted by the LD-SAGE model, we compared the proposed LD-SAGE network with the original DeepLabV3+ network and U-Net network, generating the visual comparison results shown in
Figure 12. Heatmaps were used because they provide a clear visual representation of the model’s attention to different regions, especially in distinguishing between leaf and disease areas. The color gradient (from blue to red) effectively showcases the differences in how each model handles these regions.
As shown in
Figure 12, the LD-SAGE model generates a stable and continuous response region along the leaf edges, closely matching the true contour without any breaks or false extensions. This demonstrates the model’s strong ability to differentiate the leaf from the background, maintaining consistent contour detection even in complex lighting and texture conditions. Additionally, there are minimal false activations in the background, indicating that LD-SAGE’s feature extraction and attention mechanisms effectively suppress irrelevant information, resulting in a more focused and clean response.
In disease detection, LD-SAGE not only responds strongly to large, high-contrast lesions but also detects small, subtle spots that are harder to distinguish, demonstrating its effectiveness in multi-scale feature fusion. Whether for large central lesions or smaller edge spots, the hotspot distribution aligns closely with the actual disease locations. In contrast, U-Net is less stable, with frequent breaks in the leaf edge, incomplete contours, and poor sensitivity to small lesions, leading to missed detections. U-Net’s heatmap often shows irregular background activations, indicating weak background suppression.
DeepLabV3+ exhibits a different issue: its response range is too broad, blurring the leaf boundary and spilling into the background. Although it increases recall, it also results in many false positives, lowering accuracy. Compared to these methods, LD-SAGE excels in detail perception, edge transitions, and restoring target structures, demonstrating its potential for diagnosing leaf diseases in shaded environments.
These results further demonstrate that the LD-SAGE model achieves an optimal balance between computational efficiency and segmentation accuracy, making it ideal for real-time agricultural applications. Its successful deployment on the Jetson Orin Nano indicates that the model can be integrated into portable plant disease monitoring systems for automated field detection and management of ginseng diseases. Compared to traditional large networks, LD-SAGE significantly reduces inference latency and energy consumption while maintaining high accuracy, proving its potential for large-scale, low-cost applications in smart agriculture. Additionally, the consistency between actual inference performance and the model’s theoretical design goals further validates the robustness and scalability of the proposed method.
3.6. Model Deployment Performance Evaluation
To evaluate the feasibility and deployment efficiency of the LD-SAGE model on edge devices, this study modeled the deployment of the LD-SAGE algorithm on the NVIDIA Jetson Orin Nano device (NVIDIA Corporation, Santa Clara, CA, USA), as shown in
Figure 13a,b. The device is configured with Python 3.8 and PyTorch 1.8 in its software environment, while its hardware consists of an Arm Cortex-A78AE CPU (Arm Limited, Cambridge, UK) and a 32-core Tensor GPU (NVIDIA Corporation, Santa Clara, CA, USA). The above configurations represent the actual experimental environment used in this study for model deployment and real-time testing. The experiment assessed the real-time segmentation speed of the LD-SAGE model on the edge device through the real-time frame rate (FPS) to determine its suitability for real-world understory ginseng disease segmentation. As shown in
Figure 13c, the LD-SAGE model deployed on the Jetson Orin Nano maintains a stable frame rate between 12 and 15 FPS, demonstrating good real-time performance.
This frame rate range is not a hardware requirement but is based on the measured results of Jetson Orin Nano’s computational capabilities. The LD-SAGE model has good hardware flexibility and can be deployed on other edge devices or GPUs, with the frame rate varying according to the device’s computational power. Higher FPS can be achieved on more powerful GPUs, while on devices with lower power consumption, the frame rate may be slightly lower but still maintain stable real-time inference performance. Although the model can run on various embedded development boards, this study selected the Jetson Orin Nano for testing after considering performance, energy efficiency, and device availability.
As shown in
Figure 13,
Figure 13d–f show the segmentation results of the DeepLabv3+, U-Net, and SegFormer models, respectively, while
Figure 13g shows the segmentation result of the LD-SAGE model. The DeepLabv3+ model has poor segmentation accuracy, misclassifying the background as leaves, and the disease segmentation performance is inadequate. The U-Net model performs poorly in leaf disease segmentation and has a slow runtime of only 2.46 FPS, indicating its low real-time application efficiency on edge devices. The SegFormer model achieves a higher FPS of 8.14 compared to the other models, but its segmentation performance remains suboptimal, especially at the edges of leaves and in diseased areas, leading to mis-segmentation of parts of the background. In contrast, the LD-SAGE model proposed in this study demonstrates the best performance in both speed and segmentation accuracy on edge devices, highlighting its ability to operate stably on resource-constrained hardware platforms.
Furthermore, the measurement results not only validate the real-time performance of LD-SAGE but also highlight the efficiency of its lightweight design for practical deployment. The stable frame rate of 12–15 FPS achieved on the Jetson Orin Nano demonstrates that the model’s architecture can reduce computational overhead without compromising segmentation accuracy. This balance between speed and accuracy indicates that LD-SAGE can operate reliably on low-power hardware, which is crucial for continuous monitoring in real-world agricultural scenarios.
These results further demonstrate that the LD-SAGE model achieves an optimal balance between computational efficiency and segmentation accuracy, making it ideal for real-time agricultural applications. Its successful deployment on the Jetson Orin Nano suggests that the model can be integrated into portable plant disease monitoring systems for field-based automation and management of ginseng diseases. Compared to traditional large networks, LD-SAGE significantly reduces inference latency and energy consumption while maintaining high accuracy, confirming its potential for large-scale, low-cost applications in smart agriculture. Additionally, the alignment between the measured inference performance and the model’s theoretical design objectives further validates the robustness and scalability of the proposed approach.
4. Conclusions
The paper addresses the challenges in intelligent recognition of understory ginseng disease, which include minute disease spots, blurry boundaries and complex backgrounds. It proposes a lightweight and efficient semantic segmentation model, LD-SAGE. The model integrates the StarNet structure into the backbone network for efficient feature extraction and reduced computational cost. GECF is incorporated to enhance the recognition of fuzzy boundaries and minute disease spots. The MACM module replaces ASPP, enabling precise multi-scale contextual fusion and structural detail recovery. Experimental results demonstrate that the LD-SAGE model achieves excellent performance in segmenting understory ginseng diseases. First, by comparing experimental results across different backbone networks, StarNet demonstrated its advantages in feature extraction efficiency and lightweight design. Second, when compared against other mainstream segmentation models, LD-SAGE achieved an mIoU of 92.48%, a recall of 95.21%, and a precision of 96.34%, outperforming existing methods across key metrics. Simultaneously, this model significantly reduced the number of parameters and GFLOPs, demonstrating strong lightweight characteristics. Finally, in the analysis of segmentation results, the comparison of representative understory ginseng leaf disease images further confirms the superior performance of LD-SAGE in real-world scenarios. Compared to models like DeepLabv3+, U-Net, PSPNet, and SegFormer, LD-SAGE can more accurately restore disease contours, fine-grained structures, and boundary details. Particularly in cases of overlapping diseases, dense small targets, and drastic lighting changes, the segmentation results are more stable, with significantly reduced false positives and missed detections. Based on the combined experimental results and segmentation image analysis, LD-SAGE achieves lightweight deployment while maintaining high accuracy, fully demonstrating its overall superiority in understory ginseng disease segmentation tasks.
In future research, the dataset will be further enriched by incorporating data on different growth stages and various medicinal herb diseases, enhancing the model’s generalization ability and robustness. Additionally, an end-to-end framework integrating disease segmentation and disease identification will be explored, combining segmentation and classification tasks to realize an intelligent process from detection to diagnosis. This will provide more reliable technical support for the precise prevention and control of medicinal herb diseases, particularly those of Jilin’s distinctive herbs, while also supporting the precise disease control of other medicinal herbs.