Efficient and Lightweight LD-SAGE Model for High-Accuracy Leaf Disease Segmentation in Understory Ginseng

Xu, Yanlei; Yu, Ziyuan; Wang, Dongze; Liu, Chao; Lu, Zhen; Zhao, Chen; Zhou, Yang

doi:10.3390/agronomy15112450

Open AccessArticle

Efficient and Lightweight LD-SAGE Model for High-Accuracy Leaf Disease Segmentation in Understory Ginseng

by

Yanlei Xu

¹,

Ziyuan Yu

¹,

Dongze Wang

²,

Chao Liu

¹,

Zhen Lu

¹,

Chen Zhao

¹ and

Yang Zhou

^1,*

¹

College of Information and Technology, Jilin Agricultural University, Changchun 130118, China

²

College of Plant Protection, Jilin Agricultural University, Changchun 130118, China

^*

Author to whom correspondence should be addressed.

Agronomy 2025, 15(11), 2450; https://doi.org/10.3390/agronomy15112450

Submission received: 23 September 2025 / Revised: 15 October 2025 / Accepted: 20 October 2025 / Published: 22 October 2025

(This article belongs to the Section Pest and Disease Management)

Download

Browse Figures

Versions Notes

Abstract

Understory ginseng, with superior quality compared to field-cultivated varieties, is highly susceptible to diseases, which negatively impact both its yield and quality. Therefore, this paper proposes a lightweight, high-precision leaf spot segmentation model, Lightweight DeepLabv3+ with a StarNet Backbone and Attention-guided Gaussian Edge Enhancement (LD-SAGE). This study first introduces StarNet into the DeepLabv3+ framework to replace the Xception backbone, reducing the parameter count and computational complexity. Secondly, the Gaussian-Edge Channel Fusion module uses multi-scale Gaussian convolutions to smooth blurry areas, combining Scharr edge-enhanced features with a lightweight channel attention mechanism for efficient edge and semantic feature integration. Finally, the proposed Multi-scale Attention-guided Context Modulation module replaces the traditional Atrous Spatial Pyramid Pooling. It integrates Multi-scale Grouped Dilated Convolution, Convolutional Multi-Head Self-Attention, and dynamic modulation fusion. This reduces computational costs and improves the model’s ability to capture contextual information and texture details in disease areas. Experimental results show that the LD-SAGE model achieves an mIoU of 92.48%, outperforming other models in terms of precision and recall. The model’s parameter count is only 4.6% of the original, with GFLOPs reduced to 22.1% of the baseline model. Practical deployment experiments on the Jetson Orin Nano device further confirm the advantage of the proposed method in the real-time frame rate, providing support for the diagnosis of leaf diseases in understory ginseng.

Keywords:

understory ginseng; leaf spot segmentation; lightweight; gaussian-edge channel fusion; StarNet backbone network

1. Introduction

Panax ginseng C.A. Mey, a perennial medicinal herb from the Araliaceae family, is a relict species of the Tertiary period found in temperate regions of the Northern Hemisphere. Rich in active compounds like ginsenosides, polysaccharides, and volatile oils, it offers various pharmacological benefits, including immune regulation, antioxidant effects, fatigue resistance, and anti-tumor properties, and is revered as the “King of All Herbs” [1,2,3]. Understory ginseng refers to ginseng plants grown from seeds that are manually sown in forests, simulating the growing environment of wild ginseng. This method involves artificial care and cultivation. It contains a wide range of ginsenosides with stable medicinal effects [4]. Due to climate and environmental limitations, understory ginseng is only cultivated in a few countries. Jilin Province in China is the primary producer, accounting for 60% of China’s production and 40% of global output, with significant economic value. However, diseases remain the main challenge for stable, high-yield cultivation in Jilin. Although traditional manual inspections can identify diseases and their severity, they are time-consuming, labor-intensive, prone to plant damage, and difficult to scale for real-time monitoring [5,6]. Therefore, an urgent need exists for an automated, efficient, and accurate disease detection method to ensure both the yield and quality of understory ginseng.

In the field of crop disease diagnosis, computer image segmentation methods are widely used for leaf and leaf diseases analysis. Early studies primarily relied on traditional image processing techniques such as thresholding, clustering, and region growing methods [7,8,9,10,11]. For example, Sengar et al. [7] proposed an adaptive intensity thresholding method, which achieved high accuracy in segmenting cherry leaf powdery mildew. Clustering based technology, such as Febrinanto et al. [8] used k-means for citrus leaf segmentation, providing initial clusters specific to citrus. Chodey et al. [9] combined fuzzy c-means with integrated color features to extract leaves and lesions in complex backgrounds. Chen et al. [10] further integrated region growing with non-local filtering for precise segmentation of vegetable and maize leaf lesions in greenhouse and field environments. Additionally, Septiarini et al. [11] employed multi-edge detection for tomato lesion segmentation. While traditional image segmentation methods perform well under specific conditions, they are sensitive to image quality, lighting variations, and complex backgrounds. They also struggle with lesions that vary in shape, size, or overlap, limiting their application and robustness in complex agricultural environments.

Unlike traditional image processing methods, deep learning has driven the development of crop disease segmentation technologies by automatically extracting image features, enabling stable recognition of lesions in complex backgrounds. Image processing-based detection methods, known for their speed, non-destructiveness, and high efficiency, enable the implementation of real-time evaluation of leaf diseases. Several advanced models have been proposed for leaf disease segmentation tasks, achieving excellent results across different crops [12]. For instance, Cheng et al. [13] introduced ALDNet, a two-stage method combining deep aggregation and multi-scale fusion strategies. Using PBGNet, it achieved precise segmentation of apple leaves, while PDFNet, with residual paths and multi-scale feature fusion, enhanced segmentation of leaf disease margins and minute disease spots. In complex backgrounds, the model achieved a 77.41% mIoU for apple leaf disease segmentation. Yang et al. [14] developed FATDNet, which incorporated dual-path fusion adversarial algorithms and multi-dimensional attention mechanisms to enhance the ability to distinguish minute disease spots. The Gaussian-weighted edge segmentation module alleviated edge information loss, significantly improving robustness in complex backgrounds. Ding et al. [15] proposed AS-DeepLabv3+, which enhanced DeepLabv3+ with a multi-fusion attention module and dynamic dilation rate, achieving a 98.00% mIoU in apple leaf disease segmentation. Wang et al. [16] developed WE-DeepLabv3+ with a Window Attention-ASPP module and Efficient Channel Attention (ECA), using MobileNetV2 as the backbone to reduce model parameters while enhancing small target and edge feature extraction, achieving an 82.0% mIoU in Panax notoginseng leaf disease segmentation. Both methods highlight the trend of improving lesion extraction with enhanced Atrous Spatial Pyramid Pooling (ASPP) and multi-attention mechanisms, although the models still maintain relatively high parameter and computational demands.

Although deep learning has shown promising potential in the task of leaf disease segmentation for understory ginseng, it still faces three key challenges: (1) Minute disease spots and leaf disease with blurred margins make precise segmentation difficult. Understory ginseng leaf diseases are typically small in scale, irregular in shape, and have indistinct edges, which can be erased or merged with the background during downsampling, leading to the loss of boundary information and affecting segmentation accuracy. (2) Strong background interference and minute disease spots features result in a high risk of false positives and missed detections. Ginseng leaves often blend with background elements such as soil, weeds, and light and shadow, and the leaf diseases resemble the healthy tissue in color and texture, making it difficult for the model to distinguish leaf diseases from the background, thus reducing detection accuracy. (3) Limited computational resources require a balance between model lightness and high precision. In practical agricultural applications, such as deployment on mobile or field devices, the model needs to be lightweight with low computational complexity.

To address the aforementioned issues, this paper proposes a lightweight and high-precision semantic segmentation network, the LD-SAGE model, for leaf disease segmentation in understory ginseng. This model improves upon the original DeepLabv3+ architecture with optimizations to the backbone network, edge-guidance module, and multi-scale perception mechanism. First, the lightweight StarNet replaces the Xception backbone, significantly reducing parameters and computational load while maintaining strong feature extraction capabilities, making it suitable for resource-constrained environments. Secondly, the integration of the Gaussian-Edge Channel Fusion (GECF) module enhances boundary perception and channel semantic modulation capabilities, effectively strengthening the feature representation of small lesions and blurred boundaries, thereby significantly improving segmentation accuracy and boundary consistency. Finally, the efficient Multi-scale Attention-guided Context Modulation (MACM) Module replaces the traditional ASPP, achieving multi-scale perception while reducing model size. Through these optimizations, the model not only improves recognition accuracy but also achieves a significant lightweight design, substantially reducing computational resource consumption.

2. Materials and Methods

2.1. Dataset

The dataset used in this study was collected from ginseng planting bases in Xiangqian Town, Tonghua County, and Diaoyu Village, Ji’an City, Jilin Province. The data was collected from June to August 2024, covering various growth stages of ginseng under the forest canopy, from the initial leaf expansion to the maturation of the disease. During the collection process, environmental variations such as geographical location and light intensity were carefully considered to ensure the diversity and representativeness of the data. The images were captured with a Xiaomi 13 Pro mobile phone (Xiaomi Corporation, Beijing, China), with the shooting distance controlled between 15 and 20 cm, and each image has a resolution of 3072 × 3072 pixels. All images were captured vertically on the leaf surface to minimize perspective distortion and ensure consistent spatial scale between samples. The dataset includes samples taken at different times of the day (from 9 a.m. to 6 p.m.) and under various lighting conditions (sunny, cloudy, and rainy) to enhance the model’s robustness to lighting and shadow variations. After collection, blurry or obstructed images were removed, resulting in a final selection of 1100 representative disease images. To clarify the composition of the dataset, it contains 150 healthy leaf images and approximately 950 images of diseased leaves, covering four major diseases: anthracnose, black spot, leaf rust, and gray mold. Each disease includes samples from mild, moderate, and severe stages to ensure diversity and representativeness of symptoms. Under real field conditions, ginseng leaves are often affected by multiple pathogens simultaneously, making it difficult to ensure that each image contains only a single disease. To address this, we labeled the images based on the most prominent visible symptoms on the leaves. This labeling strategy more accurately reflects the natural growth and infection conditions, allowing the model to learn more discriminative and robust features in complex mixed infection scenarios. To ensure consistent model input and stable training, all images were uniformly cropped to a size of 512 × 512 pixels. The dataset covers a variety of complex scenarios commonly encountered in understory ginseng disease in real-world planting environments, authentically reflecting the complexity and diversity of the disease. As shown in Figure 1, examples of actual understory ginseng disease images collected.

The ginseng disease image dataset collected in this study was divided into training, validation, and test sets at a 7:2:1 ratio, with 770 images in the training set, 220 in the validation set, and 110 in the test set, ensuring the independence and balanced distribution of samples. Additionally, each image in the dataset corresponds to a unique ginseng leaf, with no repeated captures or division of the same plant or leaf. This ensures independence between dataset subsets, allowing the model’s generalization performance on unseen samples to be evaluated fairly and reliably. To enrich the training data, enhance robustness, and improve generalization ability, while ensuring the validation and test sets can effectively evaluate model performance, data augmentation was applied solely to the training set. The augmentation techniques included random flipping, brightness adjustment, random occlusion, and addition of Gaussian noise. After augmentation, the training set was expanded to 3850 samples. The effect of data augmentation is illustrated in Figure 2.

To enhance the model training effectiveness, this study manually annotates the images in the dataset frame by frame using the open-source tool X-AnyLabeling-v2.3.0. The annotations include the contours of the ginseng leaves and the diseased areas, with the leaves marked in red, the leaf disease areas in green, and the background in black. This annotation method helps to distinguish different regions, improving the model’s recognition accuracy and ensuring more precise disease detection and classification. As shown in Figure 3, the original sample data and corresponding annotated labels are displayed.

2.2. LD-SAGE

This study addresses the challenges in segmenting lesions in understory ginseng, including small lesion sizes, blurred edges leading to information loss, and strong background interference complicating the distinction between leaf disease and healthy tissue. Therefore, the overall design of LD-SAGE aligns the identified challenges with specific structural modules: a lightweight backbone network is adopted for efficient inference on edge devices, an edge-aware enhancement module is introduced to accurately detect blurred or small disease spots, and an adaptive multi-scale context modeling mechanism is employed to effectively address complex backgrounds and semantic ambiguity. To meet the demands for high precision and lightweight models under limited computational resources, the DeepLabv3+ architecture [17] is improved, resulting in the LD-SAGE model designed to enhance both segmentation accuracy and computational efficiency. The original Xception [18] backbone is replaced by the lightweight StarNet [19], which significantly reduces model parameters and computational load while maintaining robust feature extraction capabilities. The goal of adopting StarNet is to meet deployment constraints by significantly reducing parameters and GFLOPs, while retaining multi-scale feature quality, enabling real-time inference on edge devices without sacrificing accuracy. In addition, the GECF module is constructed, which integrates multi-scale Gaussian blur feature extraction [20], Scharr edge enhancement [21], ECA [22], and feature fusion structure, thereby significantly enhancing the ability to detect small lesions and blurry boundaries. The introduction of GECF aims to address unclear boundaries and small disease spots: Gaussian smoothing stabilizes low-signal textures, Scharr enhances high-frequency edges, and ECA selectively amplifies channels related to disease spots. Together, they collaboratively improve boundary localization and sensitivity to small targets. The traditional ASPP structure is replaced by the MACM module, utilizing Multi-Scale Grouped Dilated Convolution (MSGDC) [23], Convolutional Multi-head Self-Attention (Conv-MHSA) [24], and dynamic modulation mechanisms to enhance contextual modeling and region contrast. The reason for replacing ASPP is its high computational cost due to fixed dilation rates and difficulty in handling multi-scale features. MSGDC efficiently provides receptive field diversity, while Conv-MHSA introduces global dependencies to eliminate ambiguity in cluttered backgrounds. Dynamic modulation adapts branch contributions based on content, generating segmentation results with clearer boundaries and higher contextual consistency, all while maintaining low computational overhead. These improvements collectively enhance segmentation accuracy, robustness in complex backgrounds, and deployment efficiency on edge devices, resulting in a lightweight and efficient model tailored for understory ginseng disease segmentation. As shown in Figure 4, the schematic diagram of the overall structure of the proposed LD-SAGE model is presented.

2.2.1. DeepLabv3+

This study utilizes the DeepLabv3+ model as the foundational framework for semantic segmentation. Developed by Google, this model represents a significant optimization within the DeepLab series, leveraging its multi-scale feature extraction advantages. The architecture introduces an encoder–decoder mechanism to enhance boundary localization and detail restoration. The encoder consists of two modules: the feature extraction network and ASPP. The network first extracts deep semantic features from the image through backbone networks such as Xception, followed by the ASPP module, which employs various dilated convolutions and global pooling layers to capture contextual information from different scales, enabling precise detection of targets of varying sizes. Unlike previous structures that relied solely on deep features, DeepLabv3+ integrates shallow features from earlier layers with high-level features from ASPP during the decoding stage, employing multiple convolution and upsampling operations to effectively restore spatial structure and local details, addressing boundary blurring issues. To balance performance and computational efficiency, depthwise separable convolutions are introduced within the convolutional layers, reducing parameter count and computational burden while maintaining excellent feature extraction capabilities. As a result, this model offers high segmentation accuracy while ensuring operational efficiency, making it suitable for scenarios requiring high target contour clarity.

Despite the advantages of DeepLabv3+ for semantic segmentation, its application to understory ginseng leaf disease segmentation faces certain limitations. Specifically, the model struggles with small-scale, blurry-edged leaf diseases, often losing important details, resulting in imprecise segmentation. While the ASPP module offers multi-scale modeling capabilities, it incurs significant computational cost, reducing the efficiency of training and inference. The high parameter count and computational complexity of backbone networks like Xception further hinder deployment on edge devices. Moreover, in field environments with variable lighting and complex backgrounds, the model’s ability to distinguish between diseases and the background is limited, leading to false positives and missed detections. To address these challenges, this study introduces the LD-SAGE model, a lightweight yet high-accuracy solution built upon DeepLabv3+.

2.2.2. StarNet

To address the issues of complex feature extraction and large parameter count in the Xception backbone of the DeepLabv3+ model, this study introduces StarNet as an alternative feature extraction backbone. The StarNet framework, shown in Figure 5, follows a traditional hierarchical network architecture, consisting of four stages, each of which performs convolutional downsampling while doubling the number of channels. The core of the network is composed of multiple repeated star-shaped blocks, each integrating depthwise separable convolutions, multiple 1 × 1 convolutions, and residual connections for efficient feature extraction and refinement. The star-shaped operations utilize activation functions and element-wise multiplication to capture complex feature interactions, ensuring smooth gradient flow and computational efficiency. As a result, StarNet achieves excellent feature extraction performance without relying on complex designs or a large number of parameters, offering significant advantages in reducing computational costs and accelerating inference. Replacing Xception with StarNet as the backbone of DeepLabv3+ cuts down computational costs and parameter size. This maintains strong performance in multi-scale feature extraction and boundary detail restoration, leading to a more efficient and lightweight segmentation solution.

2.2.3. Gaussian-Edge Channel Fusion Module

In the task of diagnosing leaf disease segmentation in understory ginseng, the target lesions are typically small, with indistinct edges, colors that blend with the background, and minimal variation in internal texture. These characteristics make it challenging to effectively separate the lesions in feature maps, and they are easily erased during the network downsampling process, leading to issues like boundary fusion, detail loss, false positives, and missed detections. Traditional convolutional segmentation networks are limited by their small receptive fields, which struggle to balance detail preservation with global modeling, resulting in insufficient representation of small lesions and blurred boundaries. To address these challenges, this study introduces the Gaussian-Edge Channel Fusion (GECF) module, as shown in Figure 6. This lightweight multi-branch attention mechanism integrates Gaussian smoothing, edge enhancement, and channel attention to improve the feature representation and spatial perception accuracy of the lesion areas.

Compared to traditional Edge-Gaussian Aggregation (EGA) Module or single-scale blur modeling methods, this module takes the shallow features of the backbone network, denoted as

X \in R^{B \times C \times H \times W}

, as input, where

B

is the batch size,

C

is the number of channels, and

H

and

W

are the height and width of the feature map, respectively. First, the module integrates multiple sets of fixed-scale Gaussian convolution kernels to adapt to different blurred areas and soft boundary scales, effectively enhancing boundary contours and improving structural stability. As shown in Equation (1):

X_{g} = \frac{1}{N} \sum_{i = 1}^{N} X * K_{i}

(1)

Here,

K_{i}

represents the

i t h

Gaussian convolution kernel, and

N

is the total number of kernels. The

*

denotes a channel-wise independent convolution operation, resulting in the Gaussian-smoothed feature

X_{g}

. To enhance the model’s ability to perceive the boundaries of small lesions, the Scharr kernels

S_{x}

and

S_{y}

are used to extract high-frequency edge features in the horizontal and vertical directions. The gradient magnitude is then computed to generate the edge-enhanced feature map

X_{e}

, This is subsequently fused with the original features to emphasize boundary details and improve the preservation of fine structures. The calculation is as shown in Equation (2):

X_{e} = \sqrt{{(X * S_{x})}^{2} + {(X * S_{y})}^{2} + ϵ}

(2)

where

ϵ

is a constant, used to prevent the square root from being undefined. After the Gaussian smoothing features and edge features are fused, the fused features

F

are obtained through batch normalization (

B N

) and the rectified linear unit (

R e L U

) activation function processing, as shown below.

F = R e L U (B N (X + X_{g} + X_{e}))

(3)

To further enhance cross-channel semantic representation, the module introduces an ECA to amplify the feature responses of crucial channels while suppressing irrelevant noise, thus achieving fine-grained fusion of cross-channel semantic information, as shown in Equation (4). Channel weights

ω (\cdot)

are generated through global average pooling (

G A P

) and 1D convolution and then applied element-wise to the channels to obtain

F^{'}

.

F^{'} = F ⊙ ω (C o n v 1 D (G A P (F)))

(4)

where

⊙

denotes element-wise multiplication across channels, and

σ (\cdot)

is the Sigmoid activation function. Finally, the channel-weighted features

F^{'}

are integrated through a 1 × 1 convolution, followed by normalization and activation to obtain the output feature

F_{O u t p u t}

, as shown in Equation (5).

F_{O u t p u t} = R e L U (B N (C o n v_{1 \times 1} (F^{'})))

(5)

2.2.4. Multi-Scale Attention-Guided Context Modulation Module

The ASPP module typically consists of a 1 × 1 convolution, multiple 3 × 3 dilated convolutions with varying dilation rates, and a global pooling branch. By adjusting the dilation rates, ASPP captures contextual information at different scales. However, this structure requires parallel execution of multiple convolution operations, leading to a significant increase in model parameters and computational overhead, as well as prolonged training times. Moreover, in the case of understory ginseng disease segmentation, where there is a substantial difference in the scale of lesions, convolutions with larger dilation rates expand the receptive field but weaken the response to small-scale lesions, thus impairing fine-grained segmentation performance. To reduce computational cost while improving segmentation performance for small targets and complex boundaries, this paper redesigns contextual relationships and introduces a novel MACM module as a replacement for the traditional ASPP. The structure of the MACM module, as shown in Figure 7, integrates the improved MSGDC structure with the proposed Conv-MHSA mechanism. This design achieves a balance between multi-scale texture modeling, spatial-semantic fusion, and channel-level control, significantly enhancing the perception of disease spots and the contextual adaptability, while maintaining a lightweight structure.

To extract contextual features at different scales, the MSGDC module is designed with three parallel branches, each using group convolutions with different dilation rates

d_{n} \in {1,3, 5}, n = 1,2, 3

to capture local, neighborhood, and long-range dependencies. The MSGDC module is illustrated in Figure 8. Let

X_{l}

be the output feature map processed by the MACM module, then the input feature map of layer

l - 1

is

X_{l - 1} \in R^{B \times C \times H \times W}

, where

B

,

C

,

H

, and

W

represent batch size, the number of channels, height, and width of the feature map, respectively. First, the input features are binarized, as shown in Equation (6).

X_{l - 1}^{b} = B_{a} (X_{l - 1})

(6)

where

B_{a} (\cdot)

is the binarization function, used to map real-valued feature maps to binary feature maps in order to reduce computational overhead. The binarized features are then passed through three parallel branches of grouped dilated convolutions. Each branch has dilation rates

d n

of 1, 3, and 5, designed to capture local details, neighborhood context, and long-range semantic information. The number of groups is g, which effectively reduces both the number of parameters and computational load. The convolution results are

B N

to yield three feature maps

F_{n}

, where

{C o n v}_{(3 \times 3)}^{d n}

denotes 3 × 3 convolutions performed in parallel over g groups, as shown in Equation (7).

F_{n} = B N (C o n v_{3 \times 3, g}^{d_{n}} (X_{l - 1}^{b})), d_{n} \in {1, 3, 5}, n = 1, 2, 3

(7)

To address the issue of inconsistent contributions from different dilation branches in different regions, the MACM introduces a dynamic modulation mechanism. After concatenating the features

F_{n}

from the three branches along the channel dimension, they are passed through a 1 × 1 convolution

{C o n v}_{1}

,

B N

,

R e L U

, and adaptive average pooling (

A A P

) to generate channel descriptors. Then, a second 1 × 1 convolution

{C o n v}_{2}

maps them into three scores, which are normalized by Softmax to obtain the weights

Δ w = [Δ w 1, Δ w 2, Δ w 3]

. These weights adjust the relative contribution of each branch’s features, enabling the network to adaptively select the optimal receptive field based on feature content, thereby suppressing redundant information, as shown in Equation (8).

Δ w = S o f t m a x (C o n v_{2} (A A P (R e L U (B N (C o n v_{1} (C o n c a t (F_{1}, F_{2}, F_{3})))))))

(8)

After weight modulation, the features from the three branches are passed into the Conv-MHSA module to simultaneously model global dependencies and local edge information. The structure of the Conv-MHSA module is illustrated in Figure 9. Conv-MHSA first applies 1 × 1 and 3 × 3 convolutions to each branch to generate queries

Q^{n}

, keys

K^{n}

, and values

V^{n}

, and calculates spatial correlations through multi-head attention. The result is then projected through a 1 × 1 convolution to obtain the output

Z^{n}

. Finally,

Z^{n}

is added to the residual of the input

X_{l - 1}

to enhance gradient flow and feature reuse, as shown in Equation (9).

Z^{n} = C o n v_{1 \times 1} (S o f t m a x (\frac{Q^{n} K^{n}}{\sqrt{d}}) V^{n}) + X_{l - 1}

(9)

where

Q^{n} = {C o n v}_{3 \times 3} ({C o n v}_{1 \times 1} ({Δ w}_{n} \cdot F_{n}))

,

K^{n} = {C o n v}_{3 \times 3} ({C o n v}_{1 \times 1} ({Δ w}_{n} \cdot F_{n}))

, and

V^{n} = {C o n v}_{3 \times 3} ({C o n v}_{1 \times 1} ({Δ w}_{n} \cdot F_{n}))

, with

d

representing the single-head feature dimension.

After activation by

R P R e L U

, the outputs from each branch are summed and fused to form the main feature

X_{l - 1}^{″}

, as shown in Equation (10).

X_{l - 1}^{″} = R P R e L U (Z^{1}) + R P R e L U (Z^{2}) + R P R e L U (Z^{3})

(10)

Then, the result is projected through a 1 × 1 convolution and

B N

to obtain the output of the MACM module,

X_{l}

, as shown in Equation (11).

X_{l} = B N (C o n v_{1 \times 1} (X_{l - 1}^{″}))

(11)

2.3. Evaluation Metrics

To quantitatively evaluate the performance of the proposed method against other comparative approaches, metrics such as mean Intersection over Union (

m I o U

),

P r e c i s i o n

, and

R e c a l l

are employed to assess the algorithm’s segmentation performance. Additionally, the number of parameters and Giga Floating Point Operations per Second (GFLOPs) are key metrics for evaluating the model’s lightweight nature, which indicate the degree of model compactness.

m I o U

is a commonly used evaluation metric in semantic segmentation and object detection, measuring the overlap between the model’s predictions and the ground truth annotations. The calculation method is given in Equations (12) and (13).

I o U = \frac{T P_{i}}{T P_{i} + F P_{i} + F N_{i}}

(12)

m I o U = \frac{1}{N} \sum_{i = 1}^{N} \frac{T P_{i}}{T P_{i} + F P_{i} + F N_{i}}

(13)

where

N

is the number of categories,

T P

denotes true positives (correctly predicted as positive pixels),

F P

represents false positives (incorrectly predicted as positive pixels), and

F N

indicates false negatives (incorrectly predicted as negative pixels). Precision refers to the proportion of actual positive samples among those identified as positive by the model. High precision implies that the model can accurately identify the desired features, such as leaf spots. The calculation method is given by Equation (14).

P r e c i s i o n = \frac{T P}{T P + F P}

(14)

R e c a l l

is the ratio of correctly identified positive samples to the total number of positive samples. This metric is crucial for determining all potential occurrences of a disease, ensuring comprehensive monitoring of features such as leaf disease spots. The calculation method is given by Equation (15).

R e c a l l = \frac{T P}{T P + F N}

(15)

2.4. Experimental Setup

The computational hardware used in this experiment includes an Intel (R) Core (TM) i7-14700KF processor (Intel Corporation, Santa Clara, CA, USA) with a frequency of 3.40 GHz, a 64 GB memory system, and an NVIDIA GeForce RTX 4090D GPU (NVIDIA Corporation, Santa Clara, CA, USA) with 24 GB of memory. The software environment is configured with Python 3.8 and PyTorch 1.9.0, and the CUDA version utilized is 11.0. The specific training parameters for the model are detailed in Table 1.

3. Results and Discussion

3.1. Analysis of Semantic Segmentation Performance of the LD-SAGE Network

As shown in Figure 10, the training and validation performance of the LD-SAGE network are presented. The green line with solid circles represents the mIoU on the training set. The blue line with crosses and the red line with triangles represents the loss functions on the training and validation sets, respectively.

As shown in Figure 10, the mIoU on the validation set steadily increased, while both training and validation losses decreased. Early on, the loss was high, and segmentation accuracy was low. However, as training progressed, the model’s feature extraction and target region recognition improved, optimizing segmentation performance. Eventually, both the loss and performance curves stabilized, indicating good convergence and generalization. This stable trend indicates that the LD-SAGE network can effectively learn disease spot features without overfitting. The high consistency between the training and validation curves suggests good generalization ability of the model on unseen samples. Furthermore, the balanced dataset and Gaussian-smoothed attention-based modules help the model maintain stable performance under varying lighting and background conditions. Overall, the LD-SAGE network demonstrates strong learning ability and reliable prediction performance in semantic segmentation tasks.

3.2. Performance Analysis Based on Different Backbone Networks

In the semantic segmentation task of ginseng leaf disease, the model’s ability to extract fine-grained features directly impacts the final segmentation accuracy, especially in complex scenarios where the disease spots are small, the edges are blurry, and the color is similar to the background. Therefore, the choice of the backbone network is critical to overall performance, while also considering lightweight characteristics to meet the practical deployment needs of disease detection. To validate the effectiveness of the selected backbone network, this paper conducts substitution experiments to compare the performance of different networks in terms of segmentation accuracy and computational cost. Based on the DeepLabv3+ backbone structure, five representative networks Xception, StarNet, MobileNetV2 [25], MobileNetV4 [26], and Vgg [27] are selected as candidate backbones. Five experimental groups are constructed to compare their performance in ginseng leaf segmentation. Comparison metrics include mIoU, Precision, Recall, Parameters, and GFLOPs. The experimental results are shown in Table 2.

Table 2 shows that the model with Xception as the backbone delivers the best segmentation performance, achieving an mIoU of 90.98%, Recall of 94.16%, and Precision of 94.76%. However, it has a high model size of 54.709 M parameters and incurs a computational cost of 166.8 GFLOPs. This high complexity and computational cost make it challenging to deploy in practical agricultural scenarios, limiting its potential for lightweight applications. In comparison, the model with StarNet as the backbone maintains high segmentation accuracy while being significantly lighter. This performance improvement is primarily attributed to the star-shaped residual structure of StarNet, which captures multi-scale disease spot features with fewer parameters. This design reduces computational burden while maintaining sensitivity to details, enabling efficient deployment of the model on edge devices in field settings. It achieves an mIoU of 90.66%, just 0.32% lower than Xception, with Precision of 94.51% and Recall of 93.60%. More importantly, its model size is just 3.585 M parameters and requires 57.639 GFLOPs, representing a 65% reduction. This shows that StarNet excels in feature extraction, has a compact structure, and consumes minimal computational resources, making it an ideal backbone for both accuracy and deployment efficiency. In comparison with the MobileNet series, the MobileNetV2-based model achieves an mIoU of 89.60%, with 5.814 M parameters and 52.875 GFLOPs. While lighter, its segmentation accuracy is slightly lower than StarNet’s. The MobileNetV4 backbone, with 5.070 M parameters and only 45.699 GFLOPs, the lowest among the compared models. However, its mIoU drops to 88.47%, further lowering segmentation performance and making it harder to balance accuracy and efficiency. The Vgg-based model, despite a strong Recall of 95.41% and Precision of 94.56%, has a large model size of 20.144 M parameters and a heavy computational load of 332.318 GFLOPs, which makes it unsuitable for deployment on resource-constrained devices compared to StarNet.

The comparison shows that StarNet strikes the best balance between performance and efficiency in understory ginseng disease segmentation. Its high accuracy, low parameter count, and computational cost ensure precise segmentation while remaining practical for deployment. Thus, StarNet is a feasible choice as the backbone network, demonstrating the advantages of the proposed model’s backbone.

3.3. Performance Comparison of Different Semantic Segmentation Models

To validate the effectiveness of the proposed model in understory ginseng leaf spot segmentation, five representative semantic segmentation models were selected: U-Net [28], PSPNet [29], SegFormer [30], DeepLabv3+, and the enhanced LD-SAGE. A comprehensive comparison was conducted across segmentation accuracy, model parameters, and GFLOPs, as shown in Table 3.

As shown in Table 3, U-Net achieves an mIoU of 91.66%, with Precision and Recall at 95.33% and 95.07%, respectively, demonstrating strong performance. However, its parameter count of 24.891 M and 451.706 GFLOPs result in substantial computational overhead, limiting its deployment in resource-constrained agricultural environments. Although PSPNet, a classical model, is representative, it underperforms in disease spot classification with a spot class IoU of only 67% and an overall mIoU of 85.56%. Its high parameter count (46.707 M) and GFLOPs (118.428) make it inefficient. SegFormer, a Transformer-based model, excels in lightweight design, with just 3.715 M parameters and 13.546 GFLOPs, but it shows the lowest segmentation accuracy for disease spots, with an IoU of 66% and an overall mIoU of 84.32%, indicating room for improvement in small target recognition. In contrast, DeepLabv3+ strikes a more balanced performance with an mIoU of 90.98% and disease IoU of 77%, though its high parameter count (54.709 M) and GFLOPs (166.849) also pose deployment challenges.

In contrast, the proposed LD-SAGE achieves an IoU of 81% for disease classification, with Precision and Recall reaching 96.34% and 95.21%, respectively. The overall mIoU is 92.48%. This performance improvement primarily stems from the integration of the GECF and MACM modules, which enhance edge refinement and contextual feature aggregation capabilities. Compared to other models, LD-SAGE reduces the number of parameters while maintaining or even improving accuracy, validating the effectiveness of the lightweight design. Furthermore, its stable performance under varying lighting and background conditions demonstrates the model’s robustness in real agricultural scenarios. While maintaining high accuracy, its parameter count is reduced to 2.524 M, and GFLOPs are lowered to 36.857, cutting computational overhead by approximately 78% compared to the original DeepLabv3+. This demonstrates that LD-SAGE effectively enhances multi-scale feature fusion and disease spot region perception, significantly reducing model complexity while improving key class recognition accuracy. It offers an optimal balance of precision, efficiency, and deployment feasibility, making it an ideal choice for practical disease segmentation tasks.

3.4. Ablation Study

The ablation study evaluates the impact of three components: backbone network replacement, GECF module introduction, and MACM module integration on model performance and complexity. Results validate the effectiveness and rationale of the proposed improvements. Table 4 presents a comparison of the ablation experiment results. To minimize the impact of randomness in the experimental results and ensure the statistical significance and reproducibility of the research conclusions, we maintained consistent datasets, training parameters, and base network configurations across all experiments to ensure fairness in model comparisons. Additionally, each set of experimental results is based on the average of three independent repetitions, reported in the form of “mean ± standard deviation,” covering key evaluation metrics such as mIoU, Recall, and Precision, to comprehensively reflect the stability and reliability of the model’s performance.

As shown in Table 4, replacing the original DeepLabv3+ backbone with the lightweight StarNet reduces the model’s parameters from 54.709 M to 3.585 M, and GFLOPs from 166.849 to 57.639, achieving reductions of approximately 93% and 65%, respectively. Despite this significant reduction in computational cost, the mIoU remains at 90.66

\pm

0.3%, and Precision reaches 94.51

\pm

0.1%, with virtually no loss in performance. This demonstrates that StarNet is an ideal backbone, balancing accuracy and efficiency. The introduction of the GECF module further enhances edge perception, improving mIoU to 91.57

\pm

0.1%, Precision to 95.98

\pm

0.2%, and Recall to 94.22

\pm

0.3%. The parameters increase slightly to 3.649 M, and GFLOPs remain at 57.762, indicating that the GECF module effectively optimizes segmentation of disease spot edges without significantly increasing complexity. Finally, integrating the MACM module forms the complete LD-SAGE model, resulting in an mIoU of 92.48

\pm

0.3%, with Recall and Precision reaching 95.21

\pm

0.1% and 96.34

\pm

0.1%, respectively, the best values in all experimental configurations. The parameter count further decreases to 2.524 M, and GFLOPs drop to 36.857, reducing computational overhead by approximately 31% compared to the configuration with only the GECF module. This highlights that the MACM module not only enhances multi-scale disease spot feature capture but also optimizes structure to significantly lower computational costs, making it a crucial component for practical deployment. The ablation study results show that each module independently contributes to the overall performance. StarNet improves computational efficiency, GECF enhances boundary detection, and MACM boosts multi-scale contextual reasoning. Removing any module significantly decreases mIoU, highlighting their complementary roles in the network structure. Statistical analysis indicates that each module makes an independent contribution to the overall performance, and the standard deviation of the three independent experiments remains consistently small, with the standard deviation of all metrics being less than 0.3%. This demonstrates the robustness and reproducibility of the results.

The ablation study results highlight that backbone network replacement, edge attention enhancement, and feature fusion optimization progressively improve the model’s overall performance in accuracy, efficiency, and practicality. The final LD-SAGE model demonstrates a significant advantage in the understory ginseng disease segmentation task.

3.5. Visualization and Analysis of Segmentation Results

To visually compare the performance of different semantic segmentation models in real-world applications, this study selects typical understory ginseng leaf disease images and analyzes the segmentation results of five models: U-Net, PSPNet, SegFormer, DeepLabV3+, and the proposed LD-SAGE. The image samples cover various challenging scenarios, including minute disease spot areas, blurred edges, overlapping lesions with leaf margins, and strong background interference, to thoroughly evaluate the models’ ability to represent disease structures and segmentation accuracy. The visualization results are shown in Figure 11.

Figure 11 visualizes the segmentation results, highlighting significant differences between models. DeepLabv3+ shows stable performance but misclassifies background shadows as lesions in Image1 and Image4 and misses small lesions in Image5. U-Net accurately segments the leaf area but misidentifies large background shadow areas as lesions in Image1 and Image2. PSPNet, while capable of multi-scale feature extraction, lacks sufficient detail recovery at the decoder stage, causing blurry edges and loss of details at the leaf and lesion boundaries, leading to incomplete segmentation. SegFormer struggles with rough boundary delineation, failing to accurately outline leaf contours in Image1, and misclassifying large areas of background as lesions or leaves in Image3 and Image4. In contrast, LD-SAGE demonstrates superior segmentation performance across all samples. The model accurately captures small lesion details and maintains high consistency in images with overlapping leaf edges, blurred textures, or significant lighting changes. For instance, in Image4, it clearly distinguishes between background, leaf, and lesion areas, providing a complete lesion boundary, while other models fail to do so. In Image3 and Image5, the model effectively suppresses background interference and captures fine lesion details.

Overall, the visual comparison results show that the combination of the GECF and MACM modules significantly enhances the model’s ability to identify disease spot edges and maintain structural integrity under complex lighting and background conditions. The GECF module improves edge perception and clarity, while the MACM mechanism strengthens the fusion of multi-scale contextual features. As a result, LD-SAGE performs more consistently in segmenting small and blurred disease spots, aligning with previous results. The consistency between the visual and numerical results confirms the model’s interpretability, robustness, and its potential application in real ginseng disease monitoring.

To more clearly present the effective information extracted by the LD-SAGE model, we compared the proposed LD-SAGE network with the original DeepLabV3+ network and U-Net network, generating the visual comparison results shown in Figure 12. Heatmaps were used because they provide a clear visual representation of the model’s attention to different regions, especially in distinguishing between leaf and disease areas. The color gradient (from blue to red) effectively showcases the differences in how each model handles these regions.

As shown in Figure 12, the LD-SAGE model generates a stable and continuous response region along the leaf edges, closely matching the true contour without any breaks or false extensions. This demonstrates the model’s strong ability to differentiate the leaf from the background, maintaining consistent contour detection even in complex lighting and texture conditions. Additionally, there are minimal false activations in the background, indicating that LD-SAGE’s feature extraction and attention mechanisms effectively suppress irrelevant information, resulting in a more focused and clean response.

In disease detection, LD-SAGE not only responds strongly to large, high-contrast lesions but also detects small, subtle spots that are harder to distinguish, demonstrating its effectiveness in multi-scale feature fusion. Whether for large central lesions or smaller edge spots, the hotspot distribution aligns closely with the actual disease locations. In contrast, U-Net is less stable, with frequent breaks in the leaf edge, incomplete contours, and poor sensitivity to small lesions, leading to missed detections. U-Net’s heatmap often shows irregular background activations, indicating weak background suppression.

DeepLabV3+ exhibits a different issue: its response range is too broad, blurring the leaf boundary and spilling into the background. Although it increases recall, it also results in many false positives, lowering accuracy. Compared to these methods, LD-SAGE excels in detail perception, edge transitions, and restoring target structures, demonstrating its potential for diagnosing leaf diseases in shaded environments.

These results further demonstrate that the LD-SAGE model achieves an optimal balance between computational efficiency and segmentation accuracy, making it ideal for real-time agricultural applications. Its successful deployment on the Jetson Orin Nano indicates that the model can be integrated into portable plant disease monitoring systems for automated field detection and management of ginseng diseases. Compared to traditional large networks, LD-SAGE significantly reduces inference latency and energy consumption while maintaining high accuracy, proving its potential for large-scale, low-cost applications in smart agriculture. Additionally, the consistency between actual inference performance and the model’s theoretical design goals further validates the robustness and scalability of the proposed method.

3.6. Model Deployment Performance Evaluation

To evaluate the feasibility and deployment efficiency of the LD-SAGE model on edge devices, this study modeled the deployment of the LD-SAGE algorithm on the NVIDIA Jetson Orin Nano device (NVIDIA Corporation, Santa Clara, CA, USA), as shown in Figure 13a,b. The device is configured with Python 3.8 and PyTorch 1.8 in its software environment, while its hardware consists of an Arm Cortex-A78AE CPU (Arm Limited, Cambridge, UK) and a 32-core Tensor GPU (NVIDIA Corporation, Santa Clara, CA, USA). The above configurations represent the actual experimental environment used in this study for model deployment and real-time testing. The experiment assessed the real-time segmentation speed of the LD-SAGE model on the edge device through the real-time frame rate (FPS) to determine its suitability for real-world understory ginseng disease segmentation. As shown in Figure 13c, the LD-SAGE model deployed on the Jetson Orin Nano maintains a stable frame rate between 12 and 15 FPS, demonstrating good real-time performance.

This frame rate range is not a hardware requirement but is based on the measured results of Jetson Orin Nano’s computational capabilities. The LD-SAGE model has good hardware flexibility and can be deployed on other edge devices or GPUs, with the frame rate varying according to the device’s computational power. Higher FPS can be achieved on more powerful GPUs, while on devices with lower power consumption, the frame rate may be slightly lower but still maintain stable real-time inference performance. Although the model can run on various embedded development boards, this study selected the Jetson Orin Nano for testing after considering performance, energy efficiency, and device availability.

As shown in Figure 13, Figure 13d–f show the segmentation results of the DeepLabv3+, U-Net, and SegFormer models, respectively, while Figure 13g shows the segmentation result of the LD-SAGE model. The DeepLabv3+ model has poor segmentation accuracy, misclassifying the background as leaves, and the disease segmentation performance is inadequate. The U-Net model performs poorly in leaf disease segmentation and has a slow runtime of only 2.46 FPS, indicating its low real-time application efficiency on edge devices. The SegFormer model achieves a higher FPS of 8.14 compared to the other models, but its segmentation performance remains suboptimal, especially at the edges of leaves and in diseased areas, leading to mis-segmentation of parts of the background. In contrast, the LD-SAGE model proposed in this study demonstrates the best performance in both speed and segmentation accuracy on edge devices, highlighting its ability to operate stably on resource-constrained hardware platforms.

Furthermore, the measurement results not only validate the real-time performance of LD-SAGE but also highlight the efficiency of its lightweight design for practical deployment. The stable frame rate of 12–15 FPS achieved on the Jetson Orin Nano demonstrates that the model’s architecture can reduce computational overhead without compromising segmentation accuracy. This balance between speed and accuracy indicates that LD-SAGE can operate reliably on low-power hardware, which is crucial for continuous monitoring in real-world agricultural scenarios.

These results further demonstrate that the LD-SAGE model achieves an optimal balance between computational efficiency and segmentation accuracy, making it ideal for real-time agricultural applications. Its successful deployment on the Jetson Orin Nano suggests that the model can be integrated into portable plant disease monitoring systems for field-based automation and management of ginseng diseases. Compared to traditional large networks, LD-SAGE significantly reduces inference latency and energy consumption while maintaining high accuracy, confirming its potential for large-scale, low-cost applications in smart agriculture. Additionally, the alignment between the measured inference performance and the model’s theoretical design objectives further validates the robustness and scalability of the proposed approach.

4. Conclusions

The paper addresses the challenges in intelligent recognition of understory ginseng disease, which include minute disease spots, blurry boundaries and complex backgrounds. It proposes a lightweight and efficient semantic segmentation model, LD-SAGE. The model integrates the StarNet structure into the backbone network for efficient feature extraction and reduced computational cost. GECF is incorporated to enhance the recognition of fuzzy boundaries and minute disease spots. The MACM module replaces ASPP, enabling precise multi-scale contextual fusion and structural detail recovery. Experimental results demonstrate that the LD-SAGE model achieves excellent performance in segmenting understory ginseng diseases. First, by comparing experimental results across different backbone networks, StarNet demonstrated its advantages in feature extraction efficiency and lightweight design. Second, when compared against other mainstream segmentation models, LD-SAGE achieved an mIoU of 92.48%, a recall of 95.21%, and a precision of 96.34%, outperforming existing methods across key metrics. Simultaneously, this model significantly reduced the number of parameters and GFLOPs, demonstrating strong lightweight characteristics. Finally, in the analysis of segmentation results, the comparison of representative understory ginseng leaf disease images further confirms the superior performance of LD-SAGE in real-world scenarios. Compared to models like DeepLabv3+, U-Net, PSPNet, and SegFormer, LD-SAGE can more accurately restore disease contours, fine-grained structures, and boundary details. Particularly in cases of overlapping diseases, dense small targets, and drastic lighting changes, the segmentation results are more stable, with significantly reduced false positives and missed detections. Based on the combined experimental results and segmentation image analysis, LD-SAGE achieves lightweight deployment while maintaining high accuracy, fully demonstrating its overall superiority in understory ginseng disease segmentation tasks.

In future research, the dataset will be further enriched by incorporating data on different growth stages and various medicinal herb diseases, enhancing the model’s generalization ability and robustness. Additionally, an end-to-end framework integrating disease segmentation and disease identification will be explored, combining segmentation and classification tasks to realize an intelligent process from detection to diagnosis. This will provide more reliable technical support for the precise prevention and control of medicinal herb diseases, particularly those of Jilin’s distinctive herbs, while also supporting the precise disease control of other medicinal herbs.

Author Contributions

Conceptualization, Y.X. and Z.Y.; methodology, Y.X., Y.Z. and Z.Y.; software, C.L. and Z.L.; validation, C.L., Z.L. and C.Z.; formal analysis, Z.Y., D.W. and C.L.; investigation, Z.L. and C.Z.; resources, Y.X. and Y.Z.; data curation, Y.Z.; writing—original draft preparation, Z.Y.; writing—review and editing, Y.X., Y.Z. and Z.Y.; visualization, D.W. and C.L.; supervision, Y.X. and Y.Z.; project administration, Y.X.; funding acquisition, Y.X. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Jilin Provincial Department of Science and Technology-Key Research and Development, grant number 20230202035NC.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, Y.; Wang, H.; Zhang, M.; Shi, M.; Yang, C.; Ni, Q.; Wang, Q.; Li, J.; Wang, X.; Zhang, C.; et al. Safety and antifatigue effect of Korean Red Ginseng capsule: A randomized, double-blind and placebo-controlled clinical trial. J. Ginseng Res. 2022, 46, 543–549. [Google Scholar] [CrossRef]
Choi, J.H.; Jang, M.; Kim, E.-J.; Lee, M.J.; Park, K.S.; Kim, S.-H.; In, J.-G.; Kwak, Y.-S.; Park, D.-H.; Cho, S.-S.; et al. Korean Red Ginseng alleviates dehydroepiandrosterone-induced polycystic ovarian syndrome in rats via its antiinflammatory and antioxidant activities. J. Ginseng Res. 2020, 44, 790–798. [Google Scholar] [CrossRef] [PubMed]
Lizi, Z.; Yonghua, X. Introduction of Panax Ginseng (Origin, Distribution, Germplasm, Cultivation and Economics Importance). In The Ginseng Genome; Springer: Berlin/Heidelberg, Germany, 2021; pp. 1–13. [Google Scholar]
Pang, F.; Gao, Y.; Zhuang, Z.; Li, C.; Zhao, Y.; Liu, Q. Microecological mechanisms of mountainous forest cultivated ginseng growth vigor and saponin accumulation, and the characterization of bionic microbial fertilizer. Front. Microbiol. 2025, 16, 1548481. [Google Scholar] [CrossRef] [PubMed]
Tian, Z.; Ma, W.; Yang, Q.; Duan, F. Application status and challenges of machine vision in plant factory—A review. Inf. Process. Agric. 2022, 9, 195–211. [Google Scholar] [CrossRef]
Abu John, M.; Bankole, I.; Ajayi-Moses, O.; Ijila, T.; Jeje, T.; Lalit, P.; Jeje, O. Relevance of advanced plant disease detection techniques in disease and Pest Management for Ensuring Food Security and Their Implication: A review. Am. J. Plant Sci. 2023, 14, 1260–1295. [Google Scholar] [CrossRef]
Sengar, N.; Gupta, V.; Dutta, M.K.; Travieso, C.M. Image processing based method for identification of fish freshness using skin tissue. In Proceedings of the 2018 4th International Conference on Computational Intelligence & Communication Technology (CICT), Ghaziabad, India, 9–10 February 2018; pp. 1–4. [Google Scholar]
Febrinanto, F.; Dewi, C.; Triwiratno, A. The implementation of k-means algorithm as image segmenting method in identifying the citrus leaves disease. IOP Conf. Ser. Earth Environ. Sci. 2019, 243, 012024. [Google Scholar] [CrossRef]
Chodey, M.D.; Shariff, C.N. Pest detection via hybrid classification model with fuzzy C-means segmentation and proposed texture feature. Signal Process. Control. 2023, 84, 104710. [Google Scholar] [CrossRef]
Chen, C.; Wang, X.; Heidari, A.A.; Yu, H.; Chen, H. Multi-threshold image segmentation of maize diseases based on elite comprehensive particle swarm optimization and otsu. Front. Plant Sci. 2021, 12, 789911. [Google Scholar] [CrossRef] [PubMed]
Septiarini, A.; Hamdani, H.; Sari, S.U.; Hatta, H.R.; Puspitasari, N.; Hadikurniawati, W. Image processing techniques for tomato segmentation applying k-means clustering and edge detection approach. In Proceedings of the 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), Jakarta, Indonesia, 29–30 January 2022; pp. 92–96. [Google Scholar]
Shoaib, M.; Shah, B.; Ei-Sappagh, S.; Ali, A.; Ullah, A.; Alenezi, F.; Gechev, T.; Hussain, T.; Ali, F. An advanced deep learning models-based plant disease detection: A review of recent research. Front. Plant Sci. 2023, 14, 1158933. [Google Scholar] [CrossRef] [PubMed]
Cheng, J.; Song, Z.; Wu, Y.; Xu, J. ALDNet: A two-stage method with deep aggregation and multi-scale fusion for apple leaf disease spot segmentation. Measurement 2025, 253, 117706. [Google Scholar] [CrossRef]
Yang, Z.; Sun, L.; Liu, Z.; Deng, J.; Zhang, L.; Huang, H.; Zhou, G.; Hu, Y.; Li, L. FATDNet: A fusion adversarial network for tomato leaf disease segmentation under complex backgrounds. Comput. Electron. Agric. 2025, 234, 110270. [Google Scholar] [CrossRef]
Ding, Y.; Yang, W.; Zhang, J. An improved DeepLabV3+ based approach for disease spot segmentation on apple leaves. Comput. Electron. Agric. 2025, 231, 110041. [Google Scholar] [CrossRef]
Wang, Z.; Yang, L.; Wang, R.; Lei, L.; Ding, H.; Yang, Q. WE-DeepLabV3+: A lightweight segmentation model for Panax notoginseng leaf diseases. Comput. Electron. Agric. 2024, 227, 109612. [Google Scholar] [CrossRef]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Zhang, X.; Wang, Z.; Wang, X.; Luo, T.; Xiao, Y.; Fang, B.; Xiao, F.; Luo, D. Starnet: An efficient spatiotemporal feature sharing reconstructing network for automatic modulation classification. Wirel. Commun. 2024, 23, 13300–13312. [Google Scholar] [CrossRef]
Nah, S.; Kim, T.H.; Lee, K.M. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3883–3891. [Google Scholar]
Scharr, H. Optimal Operators in Digital Image Processing. 2000. Available online: https://api.semanticscholar.org/CorpusID:28512423 (accessed on 8 August 2025).
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11534–11542. [Google Scholar]
Gao, T.; Zhang, Y.; Zhang, Z.; Liu, H.; Yin, K.; Xu, C.; Kong, H. BHViT: Binarized Hybrid Vision Transformer. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 10–17 June 2025; pp. 3563–3572. [Google Scholar]
Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 22–31. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Qin, D.; Leichner, C.; Delakis, M.; Fornoni, M.; Luo, S.; Yang, F.; Wang, W.; Banbury, C.; Ye, C.; Akin, B. MobileNetV4: Universal models for the mobile ecosystem. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 78–96. [Google Scholar]
Vedaldi, A.; Zisserman, A. VGG Convolutional Neural Networks Practical. University of Oxford. 2016. Available online: https://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html#part-31-traini (accessed on 8 August 2025).
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–14 December 2021; Volume 34, pp. 12077–12090. [Google Scholar]

Figure 1. Examples from the understory ginseng disease image dataset. (a) Healthy leaf; (b) Leaf disease overlapping with leaf edge; (c) Leaf under low light conditions; (d) Leaf in a cluttered and complex background; (e) Rainwater interference; (f) Blurred leaf boundaries.

Figure 2. Examples of data augmentation effects. (a) Original data sample; (b) Random flipping; (c) Brightness adjustment; (d) Random occlusion; (e) Gaussian noise.

Figure 3. Images from the understory ginseng leaf disease dataset along with their corresponding labeled tags. Panels (a–f) display different understory ginseng leaf disease samples and their standard labels. In each column, the first row shows data samples of understory ginseng leaves and their diseases under different environments, while the second row presents the corresponding labeled tags. Here, red represents the leaf, green represents leaf disease, and black represents the background.

Figure 4. Schematic diagram of the overall structure of the LD-SAGE model. Arrows indicate the direction of data/feature flow. Color-coded modules are defined as follows: light orange blocks represent 1 × 1 convolution (1 × 1 Conv), yellow blocks represent 3×3 convolution (3×3 Conv), and the “C” symbol denotes feature concatenation (Concat). The Encoder (purple dashed box) extracts hierarchical features via StarBlocks and enhancement modules (EGA++, MACM), while the Decoder (red dashed box) performs multi-scale feature fusion and upsampling to generate the final segmentation result.

Figure 5. Schematic of the StarNet network architecture.

Figure 6. Schematic diagram of the GECF module structure.

Figure 7. Schematic diagram of the MACM module structure.

Figure 8. Schematic diagram of the MSGDC module structure.

Figure 9. Schematic diagram of the Conv-MHSA module structure. Arrows indicate the direction of data/feature flow. The “×” symbol represents matrix multiplication. Color-coded components are defined as follows: light orange blocks denote Conv1×1 layers, orange blocks denote Conv3×3 layers, light blue blocks denote Layer Normalization (Layer Norm) layers, purple blocks denote Softmax layers, and light green blocks denote input feature maps. This explanation clarifies the visual elements for readers to better understand the module’s operation.

Figure 10. Training and validation performance of the LD-SAGE network.

Figure 11. Visualization comparison of segmentation results from different semantic segmentation models. Here, red represents the ginseng leaf, and green represents the leaf disease.

Figure 12. Heatmap comparison of different networks on leaf and disease regions. The left shows two original leaf images (Image1 and Image2). Rows (a,c) illustrate each network’s attention on leaf regions, while rows (b,d) highlight differences in attention to disease areas. The three result sets on the right correspond to LD-SAGE, U-Net, and DeepLabV3+ predictions. Each set includes two images: the left column with blue background shows model-generated heatmaps (blue-to-red indicating low-to-high disease probability), and the right column with green background overlays the heatmaps on the original leaf images.

Figure 13. Experimental results obtained from the actual deployment of different models. (a) Jetson Orin Nano edge device; (b) Physical image of the overall edge device experimental setup; (c) Original images and segmentation results on embedded devices; (d) Segmentation results of the DeepLabv3+ model on embedded devices; (e) Segmentation results of the U-Net model on embedded devices; (f) Segmentation results of the SegFormer model on embedded devices; (g) Segmentation results of the LD-SAGE model on embedded devices. The red circle in (b) indicates the NVIDIA Jetson Orin Nano device shown in (a), and the green box represents the monitor. Color explanations: In (c–g), green represents ginseng leaves, and red represents leaf diseases, and black is the background.

Table 1. Model training parameters.

Parameter	Value
Epochs	300
Batch size	8
Image size	512 × 512
Optimizer algorithm	Adam
Learning rate	0.01
Weight decay	0.0001

Table 2. Performance evaluation results of different backbone networks in the understory ginseng disease segmentation task.

EX	Baseline Model	Backbone Network	mIoU (%)	Recall (%)	Precision (%)	Parameters (M)	GFLOPs (G)
1	DeepLabV3+	Xception	90.98	94.16	94.76	54.709	166.849
2		Starnet	90.66	93.60	94.51	3.585	57.639
3		MobileNetv2	89.60	93.85	94.32	5.814	52.875
4		MobileNetv4	88.47	93.66	95.35	5.070	45.699
5		Vgg	90.07	95.41	94.56	20.144	332.318

Table 3. Performance comparison of semantic segmentation models in understory ginseng leaf lesions segmentation.

Semantic Segmentation Model	IoU (%)			mIoU (%)	Recall (%)	Precision (%)	Parameters (M)	GFLOPs (G)
Semantic Segmentation Model	Background	Leaf	Disease	mIoU (%)	Recall (%)	Precision (%)	Parameters (M)	GFLOPs (G)
U-Net	98	97	80	91.66	95.07	95.33	24.891	451.706
PSPnet	98	95	67	5.56	90.84	93.61	46.707	118.428
SegFormer	95	92	66	84.32	90.79	92.23	3.715	13.546
DeepLabv3+	97	99	77	90.98	94.16	94.76	54.709	166.849
LD-SAGE	98	98	81	92.48	95.21	96.34	2.524	36.857

Table 4. Comparison of ablation experiment results.

Model	StarNet	GECF	MACM	mIoU (%)	Recall (%)	Precision (%)	Parameters (M)	GFLOPs (G)
DeepLabv3+				$90.98 \pm$ 0.2	$94.16 \pm$ 0.1	$94.76 \pm$ 0.2	54.709	166.849
	√			$90.66 \pm$ 0.3	$93.60 \pm$ 0.2	$94.51 \pm$ 0.1	3.585	57.639
	√	√		$91.57 \pm$ 0.1	$94.22 \pm$ 0.3	$95.98 \pm$ 0.2	3.649	57.762
LD-SAGE	√	√	√	$92.48 \pm$ 0.3	$95.21 \pm$ 0.1	$96.34 \pm$ 0.1	2.524	36.857

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Y.; Yu, Z.; Wang, D.; Liu, C.; Lu, Z.; Zhao, C.; Zhou, Y. Efficient and Lightweight LD-SAGE Model for High-Accuracy Leaf Disease Segmentation in Understory Ginseng. Agronomy 2025, 15, 2450. https://doi.org/10.3390/agronomy15112450

AMA Style

Xu Y, Yu Z, Wang D, Liu C, Lu Z, Zhao C, Zhou Y. Efficient and Lightweight LD-SAGE Model for High-Accuracy Leaf Disease Segmentation in Understory Ginseng. Agronomy. 2025; 15(11):2450. https://doi.org/10.3390/agronomy15112450

Chicago/Turabian Style

Xu, Yanlei, Ziyuan Yu, Dongze Wang, Chao Liu, Zhen Lu, Chen Zhao, and Yang Zhou. 2025. "Efficient and Lightweight LD-SAGE Model for High-Accuracy Leaf Disease Segmentation in Understory Ginseng" Agronomy 15, no. 11: 2450. https://doi.org/10.3390/agronomy15112450

APA Style

Xu, Y., Yu, Z., Wang, D., Liu, C., Lu, Z., Zhao, C., & Zhou, Y. (2025). Efficient and Lightweight LD-SAGE Model for High-Accuracy Leaf Disease Segmentation in Understory Ginseng. Agronomy, 15(11), 2450. https://doi.org/10.3390/agronomy15112450

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient and Lightweight LD-SAGE Model for High-Accuracy Leaf Disease Segmentation in Understory Ginseng

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. LD-SAGE

2.2.1. DeepLabv3+

2.2.2. StarNet

2.2.3. Gaussian-Edge Channel Fusion Module

2.2.4. Multi-Scale Attention-Guided Context Modulation Module

2.3. Evaluation Metrics

2.4. Experimental Setup

3. Results and Discussion

3.1. Analysis of Semantic Segmentation Performance of the LD-SAGE Network

3.2. Performance Analysis Based on Different Backbone Networks

3.3. Performance Comparison of Different Semantic Segmentation Models

3.4. Ablation Study

3.5. Visualization and Analysis of Segmentation Results

3.6. Model Deployment Performance Evaluation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI