Ginseng-YOLO: Integrating Local Attention, Efficient Downsampling, and Slide Loss for Robust Ginseng Grading

Yu, Yue; Li, Dongming; Song, Shaozhong; You, Haohai; Zhang, Lijuan; Li, Jian

doi:10.3390/horticulturae11091010

Open AccessArticle

Ginseng-YOLO: Integrating Local Attention, Efficient Downsampling, and Slide Loss for Robust Ginseng Grading

by

Yue Yu

^1,2,

Dongming Li

^1,3,

Shaozhong Song

^4,5,

Haohai You

¹

,

Lijuan Zhang

³

and

Jian Li

^1,2,*

¹

College of Information Technology, Jilin Agricultural University, Changchun 130118, China

²

Jilin Provincial Bioinformatics Research Center, Changchun 130118, China

³

College of Internet of Things Engineering, Wuxi University, Wuxi 214063, China

⁴

School of Data Science and Artificial Intelligence, Jilin Engineering Normal University, Changchun 130062, China

⁵

School of Artificial Intelligence, Changchun University of Science and Technology, Changchun 130022, China

^*

Author to whom correspondence should be addressed.

Horticulturae 2025, 11(9), 1010; https://doi.org/10.3390/horticulturae11091010

Submission received: 28 July 2025 / Revised: 21 August 2025 / Accepted: 24 August 2025 / Published: 25 August 2025

(This article belongs to the Section Medicinals, Herbs, and Specialty Crops)

Download

Browse Figures

Versions Notes

Abstract

Understory-cultivated Panax ginseng possesses high pharmacological and economic value; however, its visual quality grading predominantly relies on subjective manual assessment, constraining industrial scalability. To address challenges including fine-grained morphological variations, boundary ambiguity, and complex natural backgrounds, this study proposes Ginseng-YOLO, a lightweight and deployment-friendly object detection model for automated ginseng grade classification. The model is built on the YOLOv11n (You Only Look Once11n) framework and integrates three complementary components: (1) C2-LWA, a cross-stage local window attention module that enhances discrimination of key visual features, such as primary root contours and fibrous textures; (2) ADown, a non-parametric downsampling mechanism that substitutes convolution operations with parallel pooling, markedly reducing computational complexity; and (3) Slide Loss, a piecewise IoU-weighted loss function designed to emphasize learning from samples with ambiguous or irregular boundaries. Experimental results on a curated multi-grade ginseng dataset indicate that Ginseng-YOLO achieves a Precision of 84.9%, a Recall of 83.9%, and an mAP@50 of 88.7%, outperforming YOLOv11n and other state-of-the-art variants. The model maintains a compact footprint, with 2.0 M parameters, 5.3 GFLOPs, and 4.6 MB model size, supporting real-time deployment on edge devices. Ablation studies further confirm the synergistic contributions of the proposed modules in enhancing feature representation, architectural efficiency, and training robustness. Successful deployment on the NVIDIA Jetson Nano demonstrates practical real-time inference capability under limited computational resources. This work provides a scalable approach for intelligent grading of forest-grown ginseng and offers methodological insights for the design of lightweight models in medicinal plants and agricultural applications.

Keywords:

understory ginseng; YOLOv11; lightweight detection; local window attention; Slide Loss; Jetson Nano; quality grading

1. Introduction

Panax ginseng C.A. Meyer, a perennial herbaceous plant of the Araliaceae family, is widely recognized as the “King of Herbs” [1,2] for its long history of medicinal use and diverse pharmacological activities, including anti-fatigue, immune modulation, and cardiovascular protection [3,4,5]. It is mainly cultivated in Northeast China, the Korean Peninsula, Japan, the Russian Far East, the United States, and Canada. Among its types, understory-cultivated ginseng—grown under forest canopy conditions over extended cycles with minimal chemical input—closely resembles wild ginseng in morphology and bioactive composition, with higher levels of lipophilic compounds such as sterols, polyacetylenes, and fatty acids [6]. These features contribute to its antioxidant, anti-inflammatory, and neuroprotective effects, positioning it as a high-value and sustainable product for the modern ginseng industry [7]. As the ginseng industry chain rapidly evolves, the demand for precise traceability and quality grading from the source has significantly increased [8,9]. In particular, for understory-cultivated ginseng—where quality sensitivity is higher—traditional manual grading based on visual inspection is subjective, inefficient, and no longer sufficient for large-scale production or digital supervision [10]. Thus, realizing efficient, intelligent, and interpretable ginseng quality recognition and grading has become a crucial challenge in both research and industrial practice [11,12,13,14,15,16].

In recent years, artificial intelligence, particularly deep learning, has been increasingly applied to ginseng quality evaluation. Many studies have combined machine learning models with specialized imaging or spectroscopic equipment to capture detailed chemical or structural features. In particular, computer vision and deep learning models have shown remarkable performance in tasks such as image classification, object detection, and compound prediction. For example, Li et al. [17] proposed a hybrid multi-task deep learning network (MMTDL) integrating near-infrared spectroscopy to achieve non-destructive origin identification and ginsenoside content prediction in American ginseng. Wang et al. [18] developed a time convolution attention network (TCNA) combining hyperspectral imaging for simultaneous prediction of multiple rare ginsenosides, improving assessment efficiency. Yang et al. [19] designed the AGOTNet model, which fuses local-global information and multi-scale attention mechanisms, achieving 98.95% accuracy and 99.60% AUC in ginseng traceability tasks. Zhang et al. [20] introduced a GA-DT model incorporating genetic algorithms and preprocessing techniques to predict ginsenoside content (R² = 0.9701), enhancing both interpretability and industrial applicability. Fu et al. [21] proposed a spectroscopy-based study classified ginseng from five cultivation years using PCA, SVM, PLS, and RF, with PLS and RF achieving the highest accuracies, demonstrating the method’s potential for fine-grained quality assessment. Ping et al. [22] study proposes an effective data fusion approach combining LIBS and NIR, achieving high-accuracy traceability of Panax ginseng and providing a theoretical and technical foundation for quality control and traceability in food and agricultural products. Zhao et al. [23] study proposes an effective data fusion approach that combines LIBS and NIR, achieving high-accuracy traceability of Panax ginseng and offering a theoretical and technical basis for quality control and traceability in food and agricultural products.

Although these approaches achieve high accuracy, their reliance on expensive and specialized hardware limits scalability and real-time deployment in general production environments. To overcome these limitations, recent research has increasingly adopted computer vision-based deep learning methods [24], which directly process RGB images for tasks such as quality grading and defect detection without requiring additional imaging hardware. These methods leverage convolutional neural networks (CNNs) and object detection frameworks to automatically extract and classify visual features, enabling lower-cost, scalable, and real-time ginseng quality assessment in practical production settings. Xie et al. [25] proposed the YOLO-Ginseng model by introducing a C3f-RN module and channel pruning strategy, achieving high-precision and low-latency detection of ginseng fruits in complex natural backgrounds. Li et al. [26,27,28,29] improved models for appearance-based quality and grade recognition by incorporating coordinate attention, group convolution, and ELU activation into architectures like DenseNet121, ResNeXt50, and ConvNeXt. A particularly notable version based on a Ghost module and SE attention improved ResNeXt50 model achieved 93.14% accuracy in a four-class task, outperforming mainstream baselines such as ResNet50, iResNet, and EfficientNet_v2_s. Despite the accuracy improvements, conventional CNN-based models remain constrained by large parameter sizes, high computational cost, and strong hardware dependencies, limiting their practical deployment. To address this, Zhang et al. [30] developed the DGS-YOLOv8 model by optimizing the YOLOv8 backbone, integrating the C2f-DCNv2 structure, SimAM attention mechanism, and a Slim-Neck module. This design achieved a good balance between precision and efficiency in ginseng appearance quality detection, showing strong potential for real-world deployment.

In summary, while deep learning-based methods have made considerable progress in ginseng quality detection, challenges such as model redundancy and poor generalization remain—especially for understory-cultivated ginseng, where complex visual features and sample variability hinder robustness and accuracy [31]. There is, thus, an urgent need to construct more targeted visual recognition frameworks and extract more distinctive semantic features to achieve precise classification and efficient grading of understory-cultivated ginseng images. Although existing deep learning-based ginseng grading methods have achieved high accuracy, most models are still too large and computationally intensive for real-time deployment on lightweight embedded platforms [32]. Therefore, the objective of this study is to develop and validate a lightweight, inference-efficient deep learning framework—Ginseng-YOLO—that can accurately and robustly classify understory-cultivated ginseng while enabling real-time deployment on resource-constrained edge devices, thereby bridging the gap between algorithmic research and field-level application. To achieve this, a structurally lightweight and inference-efficient model is designed and successfully deployed on an embedded device, striking a balance between detection accuracy and computational efficiency, and promoting the practical application of ginseng quality detection technology.

The main contributions of this study are summarized as follows:

(1): Establishment of an understory ginseng dataset: A specialized dataset was constructed containing both fresh and sun-dried understory ginseng. Given the high economic value, long growth cycle, and rarity of wild ginseng, data acquisition is extremely difficult. This dataset fills a gap in existing resources and supports the development of intelligent grading algorithms for high-value medicinal crops.
(2): Development of an efficient detection model, Ginseng-YOLO: This study introduces Ginseng-YOLO, a lightweight yet accurate model tailored for fine-grained ginseng classification. The model incorporates a localized attention block (C2-LWA) for improved feature extraction, an efficient downsampling module (ADown) to reduce redundancy, and a training-phase loss design (Slide Loss) that enhances robustness to irregular target morphology. These components work together to improve both detection precision and inference efficiency.
(3): First edge deployment for understory ginseng classification: Unlike previous works that focused solely on algorithm development, this study successfully deploys the model on the NVIDIA Jetson Orin Nano for real-time inference. To the best of our knowledge, this is the first attempt to implement understory ginseng classification on a resource-constrained edge AI device, demonstrating strong potential for field-level intelligent applications.

The remainder of this paper is organized as follows: Section 2 describes the methodology, including dataset acquisition and model development. Section 3 presents experimental results, evaluating the performance of the proposed grading system and comparing it with existing methods. Section 4 and Section 5 provide conclusions and summarize the key contributions of this study, along with potential directions for future research in automated agricultural grading systems. To facilitate readers’ understanding of the entire process of our paper, Figure 1 shows the design flow of the entire experiment.

2. Materials and Methods

2.1. Dataset Collection and Annotation

To facilitate the development of an automated classification model for ginseng (Panax ginseng C.A. Meyer), a high-quality image dataset was constructed between 2023 and 2024 by the Changbai Mountain Innovation Institute of Jilin Agricultural University. In this study, the samples are understory-cultivated ginseng, a type of ginseng grown under forest canopy conditions; for simplicity, the term “ginseng” is used throughout the paper to refer to this type unless otherwise specified. The dataset comprises both fresh and sun-dried understory ginseng samples, totaling 924 roots across four quality grades: 103 roots of premium grade, 138 roots of first grade, 309 roots of second grade, and 374 roots of third grade. Corresponding multi-angle high-resolution images were captured for each root. All images were taken in a professional photographic setup using a standardized studio (Sutefoto, Guangzhou, China) and a high-resolution camera (Apple, Cupertino, CA, USA). As shown in Figure 2, each ginseng sample was placed horizontally on a studio platform, with the camera positioned vertically at a fixed height of 40 cm. For each root, multiple images were captured from different orientations and against four distinct background colors (white, blue, red, and black). These backgrounds were selected to improve contrast and enhance the visual diversity of the dataset, while maintaining consistency in acquisition conditions.

Grading and labeling of ginseng samples were performed by experienced local experts following the official standard “T/THRS—Ginseng Grading Criteria for Jilin Authentic Medicinal Materials.” As outlined in Table 1 and Table 2, all samples were categorized into four quality grades—Premium, First-Class, Second-Class, and Ordinary—based on comprehensive morphological assessment. Key visual indicators included the number and tightness of rhizome segments, rootlet distribution, body symmetry, epidermal gloss, and the presence of scars or blemishes. The annotations were confirmed by at least two experts to ensure labeling reliability. Notably, due to the scarcity of high-quality ginseng in natural environments, the distribution of samples across the four classes was inherently imbalanced. Figure 3 shows the display of different grades of understory ginseng (If there is no grade for less than 15 years, this article is collectively referred to as third-class ginseng).

2.2. Data Augmentation and Dataset Split

The dataset used in this study was self-constructed from natural forest environments where understory ginseng naturally grows. Based on morphological characteristics such as overall shape, completeness, and root size, each sample was categorized into one of four quality grades: Premium, First-Class, Second-Class, and Ordinary, corresponding to categories 0, 1, 2, and 3, respectively. During image acquisition, variations in camera angle, lighting conditions, background complexity, and partial occlusion were deliberately incorporated to reflect real-world field scenarios and enhance the dataset’s natural diversity and robustness. In total, the dataset includes 1201 images and 1403 annotated ginseng instances, which were randomly divided into training, validation, and test sets in a 7:2:1 ratio, with proportional representation of all four categories. The class-wise distribution is shown in Table 3.

Due to the high commercial value and limited natural occurrence of understory ginseng, the dataset exhibited a degree of class imbalance, particularly in the Premium and First-Class categories. This imbalance posed a risk of biased model training and poor generalization to minority classes. To mitigate this, a moderate data augmentation strategy was adopted to expand the training data and improve class balance. The augmentation operations included horizontal flipping, slight rotations (≤10°), and brightness adjustments. These techniques increased intra-class variability while preserving the key morphological features critical for quality classification. It is important to emphasize that during the data collection phase, images were already captured from multiple viewpoints and under diverse environmental conditions, which naturally introduced variation in pose, lighting, and background. Therefore, this study deliberately avoided aggressive augmentation techniques such as large-angle rotations, geometric warping, or random cropping. Such distortions could introduce unnatural visual artifacts that are not representative of real-world ginseng morphology. They may also destroy spatial symmetry or obscure fine-grained structural cues—such as rhizome shape, rootlet distribution, and epidermal texture—which are essential for reliable quality-grade classification. Over-augmentation might lead the model to learn artificial patterns unrelated to authentic botanical features, reducing its ability to generalize in real applications. Thus, the adopted augmentation strategy was carefully constrained to maintain visual fidelity while enhancing the diversity and balance of the training dataset.

2.3. Model Selection and Enhancement

YOLOv11 (Ultralytics, 2024) [33] represents a significant architectural upgrade within the YOLO series, aiming to balance detection accuracy, inference speed, and model complexity. Compared to its predecessor YOLOv8 [34], YOLOv11 introduces substantial enhancements across the backbone, neck, and detection head. One of the core improvements lies in replacing the original C2f module with the newly proposed C3k2 module [35], which incorporates cross-stage connections and configurable convolutional blocks to strengthen feature extraction capabilities. Additionally, the model integrates the C2PSA (Cross-Stage Partial Spatial Attention) module into the backbone, embedding spatial attention mechanisms to enhance the response to critical regions and improve detection precision. For feature fusion, YOLOv11 retains the efficient SPPF (Spatial Pyramid Pooling-Fast) module to support robust multi-scale semantic aggregation. In the detection head, a decoupled structure is adopted along with two depthwise separable convolutions (DWConv), effectively reducing the number of parameters and computational cost while maintaining strong detection performance. To enhance the detection performance and deployment efficiency of YOLOv11n in ginseng grading tasks, this study proposes an improved model named Ginseng-YOLO. Built upon the original YOLOv11n architecture, three key components are introduced: the C2-LWA feature enhancement block, the Adown downsampling module, and the Slide Loss function for training optimization. Specifically, the original C2f-PSA modules in the backbone are replaced with C2-LWA blocks, which incorporate localized window-based interactions to enhance the model’s ability to capture subtle visual, particularly beneficial for distinguishing root contours and fibrous density variations. The Adown module is partially embedded within both the backbone and neck, selectively replacing CBS blocks to improve downsampling efficiency while reducing model complexity and maintaining semantic richness. Meanwhile, Slide Loss is integrated as a training-phase loss design rather than part of the detection head. By applying dynamic IoU-based weighting to low-quality samples, it guides the model to better learn from irregular or ambiguous ginseng targets, thereby improving robustness in fine-grained grading tasks. Operating across distinct layers of feature extraction, architectural compression, and training dynamics, these modules synergistically construct the Ginseng-YOLO framework—offering a balanced solution for high-precision, resource-efficient ginseng appearance grading in real-world applications. Figure 4 is a schematic diagram of the structure of the Ginseng-YOLO model.

2.3.1. ADwon

In YOLOv11, feature map downsampling is primarily implemented using the CBS (Conv-BN-SiLU) module, which combines convolutional layers with normalization and activation functions. While CBS offers strong feature extraction capabilities, its repeated convolutions result in high computational cost, making it a bottleneck in lightweight deployment scenarios. To improve inference efficiency and reduce model complexity, we propose replacing part of the CBS-based downsampling with the ADown module. ADown performs spatial compression using a non-parametric approach by combining average pooling and max pooling in parallel, effectively reducing computation without introducing additional parameters. Moreover, this design leverages the fact that the feature maps entering ADown have already undergone deep semantic encoding via the C3k2 module—a lightweight residual structure that enhances feature representation through deep convolutional layers [36]. Given this prior encoding, pooling-based downsampling becomes more suitable at this stage, as it preserves essential semantic information while avoiding redundant computation and reducing overfitting risk. Experimental results demonstrate that the proposed design significantly reduces model size and computational load without compromising detection accuracy, confirming the practicality and efficiency of Adown in resource-constrained environments. Figure 5 shows a schematic diagram of the Adwon structure.

2.3.2. Slide Loss

In the task of understory ginseng grade recognition, the irregular morphology and blurred boundaries of ginseng samples often lead to low Intersection over Union (IoU) values between predicted and ground-truth boxes. This challenges traditional loss functions, which typically fail to emphasize such difficult samples during training, resulting in suboptimal recognition of subtle inter-class differences. To address this, we introduce the Slide Loss function to enhance model robustness and fine-grained discrimination capability [37]. As illustrated in the figure, Slide Loss uses the global average IoU as a dynamic threshold μ to divide the training samples into positive and negative groups. Samples with IoU less than μ − 0.1 are treated as hard negatives and assigned a constant weight of 1. For IoU values above μ, the weight decreases exponentially to downplay the influence of easy positives. Between μ − 0.1 and μ, a smooth transition is applied to avoid weight discontinuities. The weight function is defined as:

f (x) = \{\begin{matrix} 1 x \leq μ - 0.1 \\ e^{1 - μ} μ < x < μ - 0.1 \\ e^{1 - x} x \geq μ \end{matrix} .

(1)

This IoU-based piecewise weighting strategy allows the model to focus more on difficult samples during training, thereby enhancing its ability to distinguish fine-grained differences in ginseng grades. Compared to the conventional YOLOv11 loss function, Slide Loss demonstrates superior performance in scenarios with blurred object boundaries and low IoU samples, significantly improving grade classification accuracy. Figure 6 shows the schematic of Slide Loss.

2.3.3. C2-LWA

The differences among various grades of wild ginseng are often reflected in fine-grained appearance features such as morphological fullness, the number of fibrous roots, and epidermal color. Although the C2PSA (Cross-Scale Partial Self-Attention) module possesses the ability to exchange information across scales, it suffers from high structural complexity and computational overhead, while exhibiting limited capability in modeling small-scale local features. This limitation hampers the model’s ability to accurately distinguish grade-related differences. Furthermore, PSA (Partial Self-Attention) introduces considerable matrix operations, which significantly increase inference costs, especially on high-resolution feature maps [38]. To address these issues, this study proposes an improved cross-stage attention fusion module—C2-LWA (Cross-stage Local Window Attention). Built upon the C2PSA framework, C2-LWA replaces the conventional attention mechanism with a Local Window Attention (LWA) strategy, thereby enhancing the model’s ability to capture local structural information. LWA partitions the feature map into fixed-size windows and performs self-attention computation within each window independently. This design effectively suppresses interference from distant redundant information and strengthens the modeling of critical regions such as primary roots, fibrous roots, and epidermal texture. Figure 7 is a schematic diagram of C2-LWA.

To facilitate information exchange across windows, C2-LWA introduces inter-window interaction strategies (e.g., sliding windows or neighboring window fusion), enabling a balance between local perception and global structure modeling. The module is highly flexible and allows for a trade-off between accuracy and efficiency by adjusting the number of attention heads and window sizes. Compared to the original PSA module, C2-LWA achieves better boundary segmentation and small object recognition performance with a more compact and lightweight structure, while also accelerating model convergence during training. In the wild ginseng grading task, incorporating Local Window Attention significantly enhances the model’s ability to focus on key detailed regions, including root contours, fibrous distributions, and surface textures. Compared to global attention mechanisms, LWA reduces computational complexity and minimizes redundant interference from non-relevant regions. Moreover, its adaptable structure allows for window size and head number adjustments to accommodate ginseng samples of various shapes and sizes, thereby improving generalization and recognition accuracy in classification tasks [39,40]. Figure 8 is a schematic diagram of the attention mechanism of LWA.

The expression of the local window attention mechanism is as follows: given an input

X \in R^{H \times W \times C}

, it is divided into windows of size

M \times M

, resulting in a total of

N = \frac{H \cdot W}{M^{2}}

windows. For each window

x_{i} \in R^{M^{2} \times C}

, self-attention is applied.

2.4. Experimental Environment

In the experiments, Windows 11 was used as the operating system, PyTorch was used as the deep learning framework, an experimental platform was set up, and Python 3.9.13 and torch-2.1.1+cuda11.8 were used. The CPU model is Intel(R) Core(TM) i5-10500 CPU @ 3.10 GHz 3.10 GHz. The graphics card model was (NVIDIA GeForce RTX 20800ti, 11 GB). The detailed hyperparameters of the experiment are shown in Table 4.

2.5. Evaluation Criteria

In this study, ginseng samples were categorized into four quality grades: Premium, First-Class, Second-Class, and Ordinary, and the detection performance of YOLOv11 and its improved model was evaluated using the composite metrics of Recall, Precision, AP, and mAP. Specifically, TP (True Positive) denotes the number of samples from the target grade (e.g., Premium) that were accurately detected by the network model; FP (False Positive) denotes the number of samples from other grades that were incorrectly identified as the target grade; and FN (False Negative) denotes the number of samples from the target grade that were not detected by the model. Recall (R) is the proportion of correctly detected samples in the target grade out of all actual samples in that grade. Precision (P) refers to the proportion of correctly identified samples among all detections for a given grade. AP (Average Precision) measures precision across varying recall rates and is calculated as the ratio of true positives to the sum of adjusted true positives and false negatives, normalized by the number of instances (N). The average of the AP values across all grades is the mAP (mean average precision), reflecting the average detection accuracy of the model across all categories. By evaluating these metrics, this study analyzed the detection performance of YOLOv11 and the improved model under different ginseng grading conditions to ensure robust classification accuracy across all grades.

P r e c i s i o n = \frac{T P}{T P + F P},

(2)

R e c a l l = \frac{T P}{T P + F N},

(3)

A P = \int_{0}^{1} P \cdot R d R,

(4)

m A P = \frac{\sum_{i = 1}^{N} A P_{i}}{N}

(5)

3. Experimental Part

3.1. Before and After the Experiment

Figure 9 illustrate the precision–recall curves and normalized confusion matrices for both models, providing a side-by-side comparison of their classification performance. The precision–recall curves in subplots (a) and (b) demonstrate the trade-off between precision and recall for the tested models. Across the full recall range, Ginseng-YOLO consistently maintains higher precision compared with YOLOv11n, indicating better performance in scenarios where reducing false positives is critical. In particular, the curves show that Ginseng-YOLO achieves notably higher precision at medium to high recall levels, suggesting that it can identify more true positives without a substantial increase in false positives. This is especially advantageous for applications in which misclassification of background or irrelevant objects must be minimized. The normalized confusion matrices in subplots (c) and (d) further reveal how the models handle class-specific predictions. For the main object classes (0–3), Ginseng-YOLO achieves higher diagonal values (true positive rates) and lower off-diagonal entries (false positives and false negatives), indicating stronger discriminative ability between similar classes. For example, in class 1, Ginseng-YOLO attains a normalized accuracy of 0.95 compared to 0.86 for YOLOv11n. Furthermore, the background class shows a lower misclassification rate in Ginseng-YOLO, meaning fewer foreground instances are incorrectly predicted as background and vice versa. Collectively, these results confirm that Ginseng-YOLO offers both improved class separation and better overall robustness in imbalanced-class scenarios.

Figure 10 presents the learning curves for Precision, Recall, and mAP50/mAP95 across 300 training epochs for both models. The curves reveal that Ginseng-YOLO not only converges faster than YOLOv11n but also consistently achieves higher performance metrics throughout training. In the Precision curve (left), Ginseng-YOLO rapidly surpasses YOLOv11n in the early epochs and maintains a stable upward trend, indicating more accurate positive predictions as training progresses. The Recall curve (middle) shows that Ginseng-YOLO sustains higher recall values over most of the training process, with fewer oscillations, suggesting better robustness in detecting true positives across classes. The mAP50/mAP95 curves (right) further highlight this advantage: Ginseng-YOLO attains higher mean average precision at both IoU thresholds and reaches near-plateau performance earlier, indicating efficient learning of discriminative features. In contrast, YOLOv11n’s curves—particularly for Recall—exhibit more fluctuation and slower convergence, implying less stable learning dynamics. Overall, these results reinforce that the architectural optimizations in Ginseng-YOLO lead to faster convergence, improved stability, and superior detection accuracy across training stages.

In this study, we present a comparative evaluation of seven object detection models, YOLOv11n and Ginseng-YOLO, based on their performance across several key metrics: Precision (P%), Recall (R%), mean Average Precision at Intersection over Union (mAP50%), and computational complexity, quantified by model parameters, weight, and Floating Point Operations per Second (FLOPs). The Ginseng-YOLO model outperforms YOLOv11n in all evaluated performance metrics. Specifically, Ginseng-YOLO achieves a higher Precision (84.9% vs. 81.5%) and Recall (83.9% vs. 79.7%), indicating better classification accuracy and robustness in detecting both foreground and background objects. Moreover, Ginseng-YOLO excels in the mAP50 metric, with an mAP50 of 88.7% as compared to 87.9% for YOLOv11n. The improved mAP50 at Intersection over Union (IoU) threshold of 50% for Ginseng-YOLO (71.0% vs. 68.5%) further emphasizes its superior performance in object localization tasks.

In terms of computational efficiency, Ginseng-YOLO also demonstrates a significant advantage. The model is more parameter-efficient, with 2.0 million parameters compared to YOLOv11n’s 2.64 million. Furthermore, Ginseng-YOLO is lightweight, with a weight of 4.6 MB versus 5.5 MB, and achieves a lower FLOP count (5.3 GFLOPs vs. 6.5 GFLOPs), suggesting that Ginseng-YOLO offers a favorable trade-off between performance and computational cost, making it more suitable for deployment in resource-constrained environments. Details can be found in Table 5.

3.2. Ablation Experiments

To systematically evaluate the individual and joint contributions of the proposed components, ablation studies were conducted with three modules: C2-LWA, ADown, and Slide Loss. The original YOLOv11n model, without any enhancements, achieved a precision of 84.9%, recall of 83.9%, mAP50 of 88.7%, and mAP50–95 of 71.0%, serving as the performance baseline. Introducing the C2-LWA module alone increased recall to 80.6%, indicating that the cross-stage local window attention mechanism effectively enhanced the model’s ability to extract fine-grained local features. This improvement was particularly beneficial for capturing critical visual cues related to ginseng quality grading, such as root contour, fine root density, and surface texture. While the improvement in mAP50 was limited, C2-LWA provided a strong foundation for subsequent feature extraction. Replacing part of the CBS structure with the ADown module reduced model parameters from 2.6 M to 2.1 M and FLOPs from 6.5 G to 5.4 G, while maintaining a mAP50 of 88.4% and improving mAP50–95 to 71.5%. By improving downsampling efficiency, this module significantly reduced model complexity while preserving detection accuracy and enhanced the model’s sensitivity to low-contrast targets commonly observed in understory shooting conditions. The integration of Slide Loss increased recall to 81.3%, with mAP50 reaching 87.6% and mAP50–95 rising to 68.0%. By assigning greater weight to low-IoU samples, this loss function improved the model’s robustness in identifying ambiguous or irregularly shaped ginseng roots, such as broken or twisted specimens, which are often critical in fine-grained grading tasks. In combination experiments, the joint use of C2-LWA and ADown further compressed the model to 2.0 M parameters and 4.6 MB weights. Although precision reached 85.0%, recall dropped to 73.7%, suggesting that structure-only enhancements may overlook difficult cases without appropriate loss design. When Slide Loss was introduced—either with C2-LWA or ADown—both recall and mAP50–95 improved notably, demonstrating the benefit of training-level optimization. With all three modules integrated, the final model achieved a precision of 81.5%, a recall of 79.7%, an mAP50 of 87.9%, and an mAP50–95 of 68.5%, while maintaining the smallest parameter size (2.0 M) and lowest FLOPs (5.3 G). These results confirm the complementary and synergistic roles of structural improvements and loss design in the task of understory ginseng appearance grading, providing an effective solution for accurate detection under resource-constrained conditions. Details can be found in Table 6.

3.3. Comparison Experiments

To evaluate the robustness and statistical significance of the performance improvements, five independent training and evaluation runs were conducted using the same data splits. The proposed Ginseng-YOLO model achieved an average Precision of 84.9 ± 0.12%, Recall of 83.9 ± 0.12%, mAP50 of 88.7 ± 0.12%, and mAP50–95 of 71.0 ± 0.12%, where the ±values represent the standard deviation (SD) across the five runs. Paired t-tests comparing mAP50 and mAP50–95 with the baseline model yielded t-values of 5.1 and 4.9, with corresponding p-values of 0.004 and 0.006, indicating statistically significant differences. For clarity, the best-performing results (mAP50 = 88.7%, mAP50–95 = 71.0%) are reported as the primary reference.

In this study, we evaluate the performance of the proposed Ginseng-YOLO model against several well-known YOLO versions (YOLOv12, YOLOv10, YOLO v9, YOLOv8, YOLOv6, YOLOv5, YOLOv3-tiny, and YOLO v11) based on key detection metrics: Precision (P%), Recall (R%), mean Average Precision at IoU = 0.50 (mAP50%), and computational efficiency (parameters, FLOPs, and weight). Table 1 presents a summary of the experimental results. The Ginseng-YOLO model demonstrates exceptional performance, achieving 84.9% Precision, 83.9% Recall, and 88.7% mAP50%, surpassing most of the compared models. Notably, it outperforms YOLOv12 and YOLObv10, which report 61.8% and 78.5% Precision, respectively. This significant performance boost reflects the model’s ability to effectively capture fine-grained features of Panax ginseng, making Ginseng-YOLO particularly well-suited for tasks requiring high localization accuracy. Despite its superior performance, Ginseng-YOLO strikes an impressive balance between accuracy and efficiency. With just 2.0 million parameters and 4.6 GFLOPs, it is computationally more efficient than models like YOLOv3-tiny, which, although smaller in terms of parameters (17.5 million), underperforms in detection accuracy. Remarkably, Ginseng-YOLO achieves lower FLOPs and comparable model weight relative to higher-performing models such as YOLOv5 and YOLOv6, which have 8.3 GFLOPs and 6.3 million parameters, respectively.

These findings demonstrate the effectiveness of the Ginseng-YOLO model in achieving both high detection accuracy and computational efficiency for Panax ginseng. Ablation studies further reveal the underlying reasons for performance variations under different configurations. For instance, when only the structural modules (C2-LWA and ADown) were integrated without the Slide Loss, precision improved to 85.0% but recall dropped to 73.7%, indicating that structure-only enhancements may favor easy-to-detect samples while overlooking difficult or ambiguous targets. Similarly, introducing individual modules such as C2-LWA or Slide Loss alone yielded limited improvements in mAP50 or mAP50–95, demonstrating that each module affects different aspects of the model’s performance. These observations highlight the trade-off between precision and recall in specific configurations and underscore the complementary and synergistic effects of combining structural improvements with loss function optimization. Together, these results confirm that Ginseng-YOLO delivers balanced detection performance while remaining lightweight and deployable, making it an ideal choice for real-time agricultural applications. Specific details can be found in Table 7.

3.4. Model Detection

Figure 11 presents a visual comparison of detection results for Ginseng-YOLO and several representative YOLO variants (YOLOv11n, YOLOv10n, YOLOv9T, YOLOv8n, YOLOv5n) across four ginseng quality categories: Premium, First-Class, Second-Class, and Ordinary. These models were selected based on competitive quantitative results in precision, recall, and mAP, while significantly weaker performers (e.g., YOLOv12, YOLOv3-tiny) were excluded to avoid redundancy. The visual results highlight Ginseng-YOLO’s superior bounding box precision and target completeness across all grades. It consistently produces tightly aligned detection boxes that accurately capture root boundaries, even in challenging scenarios involving root entanglement, blurred edges, and partial occlusion. Notably, it maintains high detection accuracy without introducing excessive false positives or overlapping boxes. In contrast, YOLOv8n and YOLOv10n perform adequately on simple, clear-background samples but exhibit boundary inaccuracies and incomplete box coverage in complex scenes. YOLOv9T and YOLOv5n—as lighter models—show even more pronounced issues, including incorrect root localization and missed detections, especially for Premium and First-Class samples where fine-grained detail is essential. Overall, these visual comparisons confirm Ginseng-YOLO’s robustness and fine-detail perception, reinforcing its suitability for precise ginseng appearance quality grading under complex forest environments.

3.5. Loss Function

Figure 12 compares the training loss curves (dfl_loss) for different bounding box regression loss functions—BCEWithLogits, EMA-SL, FL (Focal Loss), VFL (Varifocal Loss), and SL (Standard Loss)—in the context of Linxia ginseng (Panax ginseng under forest canopy) quality grading [41,42]. Accurate detection and localization of root structures are essential for distinguishing ginseng quality levels, especially under conditions of root entanglement, partial occlusion, and background noise from soil and vegetation. All loss functions exhibit a rapid decrease in the early training stages, but SL consistently achieves the lowest training loss and maintains a smooth, steady downward trend across all epochs. The zoomed-in view of Epochs 200–300 highlights SL’s stability, with loss values remaining closer to the minimum compared to other methods. This indicates superior convergence behavior and reduced training fluctuations, which are critical for precise bounding box regression in complex visual environments. By modeling localization uncertainty effectively, SL enhances the robustness of root detection, ensuring more accurate bounding box alignment. This improved localization directly supports fine-grained quality grading, making SL the preferred choice among the evaluated loss functions for Linxia ginseng detection tasks.

3.6. Deploy Experiments

To promote practical application in real-world forestry scenarios, the proposed model was deployed on the NVIDIA Jetson Orin Nano, as shown in Figure 13, an entry-level edge AI platform designed for efficient on-device inference. This deployment allows for mobile, low-power classification of understory ginseng directly in the field, without dependence on remote servers or cloud computing. Compared with previous studies that focused on designing classification or detection models for understory ginseng, this work takes a further step toward implementation by realizing real-time inference on an embedded device. To the best of our knowledge, no existing studies have demonstrated such deployment of edge hardware, making this a significant advancement toward practical and scalable intelligent solutions in ginseng cultivation and harvesting

4. Discussion

Experimental results demonstrate that Ginseng-YOLO delivers superior detection accuracy and deployment efficiency in the task of under-forest ginseng grade recognition. Compared with mainstream lightweight object detection algorithms such as YOLOv5n, YOLOv8n, YOLOv9T, YOLOv10n, and YOLOv11n, it consistently achieves higher precision, recall, and mAP scores. These improvements stem from the synergistic effect of three core strategies: (1) feature representation enhancement, which enables fine-grained morphological discrimination; (2) optimized downsampling structure, which preserves crucial spatial details during resolution reduction; and (3) refined training mechanisms, which stabilize convergence and enhance generalization in complex forest environments. These design choices allow Ginseng-YOLO to effectively address common visual challenges in under-forest ginseng detection, including blurred boundaries, morphological similarities, and texture interference from soil and vegetation. Additionally, its lightweight architecture enables stable real-time inference on the Jetson Nano platform, confirming its suitability for deployment in resource-constrained production scenarios.

Compared with traditional convolutional neural networks (CNNs) [26,27,28,29,43], Ginseng-YOLO delivers markedly better fine-grained feature extraction and adaptability to complex under-forest conditions, enabling accurate grading even when root morphologies are highly similar or boundaries are blurred. Compared with the latest YOLO-based ginseng detection variants (DGS-YOLOv8) [30], it achieves comparable or higher detection accuracy while substantially reducing parameters and computation, resulting in faster inference and lower resource usage. The lightweight architecture, designed for embedded platforms, enables stable real-time inference on the Jetson Nano, an edge device with limited computational resources. This makes it particularly suitable for ginseng grading in resource-constrained agricultural settings, where traditional cloud computing and high-end GPUs are often unavailable. Ginseng-YOLO’s ability to function efficiently on such low-power platforms opens new possibilities for real-time ginseng grading in remote regions, where infrastructure may be lacking. As emphasized by Gao [44], Li [45] and Gu et al. [46], its low inference time and minimal computational requirements make it ideal for on-site automated grading systems, allowing for immediate analysis and decision-making without the need to transmit data to central servers. This real-time capability can improve operational efficiency, reduce costs, and foster sustainable practices in ginseng cultivation, by ensuring quicker, more accurate grading in environments where resources are limited.

In addition to ginseng grading, deep learning has been effectively applied to the classification and grading of medicinal plants in traditional Chinese medicine (TCM) [47,48]. For instance, studies like Deep learning-based classification of traditional Chinese medicine have demonstrated how deep learning models can accurately classify TCM herbs based on their morphological features [49,50]. Although the application of YOLO models to medicinal herb detection is still limited, techniques discussed in Chinese herbal medicine recognition network based on knowledge distillation and Enhanced Knowledge Distillation for Advanced Recognition of Chinese Herbal Medicine have proven effective in addressing challenges such as morphological variations, complex textures, and background interference in TCM plant classification tasks. These advancements highlight the broader applicability of deep learning models, like Ginseng-YOLO, for medicinal plant detection. Recent works, such as Intelligent Recognition Analysis of Chinese Herbal Medicine Images, emphasize how deep learning can enable real-time [51], resource-efficient grading systems for Chinese medicinal herbs [52,53]. Moreover, research like a study on the grading standard of Panax not ginseng seedlings shows how AI models can enhance the precision of Panax not ginseng grading, reinforcing the potential of deep learning in improving the accuracy and efficiency of medicinal plant grading [54].

Despite these advantages, two limitations remain. First, the model can misclassify samples with ambiguous grade boundaries and transitional morphological features (e.g., between Grade II and Grade III), indicating room for improvement in semantic discrimination for continuous grade labels. Second, detection accuracy degrades slightly in low-contrast or feature-weak samples, suggesting the need for enhanced robustness against complex morphological variations and visually ambiguous patterns. Future research directions include: (1) integrating morphological priors, hierarchical knowledge graphs, or multi-task learning frameworks to strengthen grade boundary discrimination; (2) incorporating multimodal data such as infrared imagery, hyperspectral features, and growth cycle information to enrich semantic understanding and improve generalization; and (3) developing interpretable grading evaluation frameworks to visualize decision-making and improve transparency and trust in practical deployment. Continued work in these areas is expected to advance under-forest ginseng grading systems toward higher accuracy, stronger robustness, and greater interpretability, ultimately providing more reliable technical support for intelligent agricultural production.

5. Conclusions

This study proposes Ginseng-YOLO, a lightweight and high-accuracy detection model tailored for understory ginseng grading, addressing challenges such as fine-grained feature discrimination, boundary ambiguity, and limited deployment resources. Built upon the YOLOv11n architecture, the model integrates the C2-LWA local window attention module, the ADown efficient downsampling structure, and the Slide Loss function to improve feature representation, structural efficiency, and training robustness. Experimental evaluations demonstrate that Ginseng-YOLO achieves competitive detection performance while maintaining a lightweight design, and deployment tests confirm stable, real-time inference on the NVIDIA Jetson Nano platform. These results validate the effectiveness, deployability, and efficiency of Ginseng-YOLO in complex agricultural vision tasks, offering a practical solution for intelligent quality grading of forest-grown medicinal herbs.

Author Contributions

Data curation, S.S., L.Z. and J.L.; Formal analysis, D.L. and J.L.; Investigation, D.L., S.S. and H.Y.; Methodology, Y.Y.; Resources, H.Y.; Software, Y.Y.; Validation, Y.Y., D.L., H.Y., L.Z. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

Ginseng Soil Improvement Technology Development Project in Jingyu County, Baishan City, Jilin Province (Grant No. 20250017).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shin, M.-S.; Lee, Y.; Cho, I.-H.; Yang, H.-J. Brain plasticity and ginseng. J. Ginseng Res. 2024, 48, 286–297. [Google Scholar] [CrossRef]
Fan, W.; Fan, L.; Wang, Z.; Mei, Y.; Liu, L.; Li, L.; Yang, L.; Wang, Z. Rare ginsenosides: A unique perspective of ginseng research. J. Adv. Res. 2024, 66, 303–328. [Google Scholar] [CrossRef]
Tao, L.; Wu, Q.; Liu, H.; Bi, Y.; Song, S.; Wang, H.; Lan, W.; Zhang, J.; Yu, L.; Xiong, B. Improved the physicochemical properties and bioactivities of oligosaccharides by degrading self-extracting/commercial ginseng polysaccharides. Int. J. Biol. Macromol. 2024, 279, 135522. [Google Scholar] [CrossRef]
Shi, Y.; Zheng, S.; Xie, K.; Xu, S.; Zhong, L.; Hu, Y. Regulatory effects of ginseng polysaccharide on growth restriction and intestinal dysfunction caused by a high cottonseed meal diet in Ctenopharyngodon idella. Int. J. Biol. Macromol. 2025, 322, 146727. [Google Scholar] [CrossRef]
Ho, P.T.; Rini, I.A.; Hoang, P.T.; Lee, T.K.; Lee, S. Exploring the Potential of Ginseng-Derived Compounds in Treating Cancer Cachexia. J. Ginseng Res. 2025. [Google Scholar] [CrossRef]
Niu, J.; Zhu, G.; Zhang, J. Ginseng in delaying brain aging: Progress and Perspectives. Phytomedicine 2025, 140, 156587. [Google Scholar] [CrossRef] [PubMed]
Tao, L.; Zhang, J.; Lan, W.; Liu, H.; Wu, Q.; Yang, S.; Song, S.; Yu, L.; Bi, Y. Neutral oligosaccharides from ginseng (Panax ginseng) residues vs. neutral ginseng polysaccharides: A comparative study of structure elucidation and biological activity. Food Chem. 2025, 464, 141674. [Google Scholar] [CrossRef] [PubMed]
Wang, C.-Z.; Zhang, C.-F.; Zhang, Q.-H.; Yuan, C.-S. Phytochemistry of red ginseng, a steam-processed Panax ginseng. Am. J. Chin. Med. 2024, 52, 35–55. [Google Scholar] [CrossRef] [PubMed]
Ding, M.; Cheng, H.; Li, X.; Li, X.; Zhang, M.; Cui, D.; Yang, Y.; Tian, X.; Wang, H.; Yang, W. Phytochemistry, quality control and biosynthesis in ginseng research from 2021 to 2023: A state-of-the-art review concerning advances and challenges. Chin. Herb. Med. 2024, 16, 505–520. [Google Scholar] [CrossRef]
Kim, Y.W.; Bak, S.B.; Song, Y.R.; Kim, C.-E.; Lee, W.-Y. Systematic exploration of therapeutic effects and key mechanisms of Panax ginseng using network-based approaches. J. Ginseng Res. 2024, 48, 373–383. [Google Scholar] [CrossRef]
Niu, Z.; Liu, Y.; Shen, R.; Jiang, X.; Wang, Y.; He, Z.; Li, J.; Hu, Y.; Zhang, J.; Jiang, Y. Ginsenosides from Panax ginseng as potential therapeutic candidates for the treatment of inflammatory bowel disease. Phytomedicine 2024, 127, 155474. [Google Scholar] [CrossRef] [PubMed]
Fang, J.; Xu, Z.-F.; Zhang, T.; Chen, C.-B.; Liu, C.-S.; Liu, R.; Chen, Y.-Q. Effects of soil microbial ecology on ginsenoside accumulation in Panax ginseng across different cultivation years. Ind. Crops Prod. 2024, 215, 118637. [Google Scholar] [CrossRef]
Liang, C.; Zhou, F.; Ding, G.; Mu, P.; Zhang, Y.; Liu, N. Aluminum stress alters leaf physiology and endophytic bacterial communities in ginseng (Panax ginseng Meyer). Sci. Hortic. 2025, 350, 114276. [Google Scholar] [CrossRef]
Zhan, Z.; Zhang, J.; Huang, W.; Huang, J. Transcriptomic strategy provides molecular insights into the growth and ginsenosides accumulation of Panax ginseng. Phytomedicine 2025, 143, 156834. [Google Scholar] [CrossRef]
Han, J.-E.; Lee, H.-S.; Son, E.-J.; Murthy, H.N.; Park, S.-Y. Changes in the dynamic profile of beneficial metabolites during Panax ginseng somatic embryogenesis. Ind. Crops Prod. 2025, 234, 121535. [Google Scholar] [CrossRef]
Luo, M.; Chen, A.; Zhao, Z.; Ma, Y.; Zhan, Z.; You, J. Bioimaging analysis reveals the constrained transport of mineral elements from the epidermis in ginseng root with red skin syndrome. Ind. Crops Prod. 2025, 229, 120955. [Google Scholar] [CrossRef]
Li, P.; Wang, S.; Yu, L.; Liu, A.; Zhai, D.; Yang, Z.; Qin, Y.; Yang, Y. Non-destructive origin and ginsenoside analysis of American ginseng via NIR and deep learning. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2025, 334, 125913. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Wang, S.; Yuan, Y.; Li, X.; Bai, R.; Wan, X.; Nan, T.; Yang, J.; Huang, L. Fast prediction of diverse rare ginsenoside contents in Panax ginseng through hyperspectral imaging assisted with the temporal convolutional network-attention mechanism (TCNA) deep learning. Food Control 2024, 162, 110455. [Google Scholar] [CrossRef]
Yang, Y.; Wang, S.; Zhu, Q.; Qin, Y.; Zhai, D.; Lian, F.; Li, P. Non-destructive geographical traceability of American ginseng using near-infrared spectroscopy combined with a novel deep learning model. J. Food Compos. Anal. 2024, 136, 106736. [Google Scholar] [CrossRef]
Zhang, W.; Bai, X.; Zhao, D. A study on the predictive model for ginsenoside content in wild ginseng based on decision tree and ensemble learning algorithms. Microchem. J. 2025, 212, 113318. [Google Scholar] [CrossRef]
Fu, Z.-Y.; Cui, J.-S. Classification and prediction model of ginseng year based on visible-near infrared spectroscopy and machine learning. Opt. Eng. 2025, 64, 064101. [Google Scholar] [CrossRef]
Ping, J.; Hao, N.; Guo, X.; Miao, P.; Guan, Z.; Chen, H.; Liu, C.; Bai, G.; Li, W. Rapid and accurate identification of Panax ginseng origins based on data fusion of near-infrared and laser-induced breakdown spectroscopy. Food Res. Int. 2025, 204, 115925. [Google Scholar] [CrossRef]
Zhao, L.; Liu, S.; Chen, X.; Wu, Z.; Yang, R.; Shi, T.; Zhang, Y.; Zhou, K.; Li, J. Hyperspectral Identification of Ginseng Growth Years and Spectral Importance Analysis Based on Random Forest. Appl. Sci. 2022, 12, 5852. [Google Scholar] [CrossRef]
You, H.; Wang, H.; Wei, Z.; Bi, C.; Zhang, L.; Li, X.; Yin, Y. VBP-YOLO-prune: Robust apple detection under variable weather via feature-adaptive fusion and efficient YOLO pruning. Alex. Eng. J. 2025, 128, 992–1014. [Google Scholar] [CrossRef]
Xie, Z.; Yang, Z.; Li, C.; Zhang, Z.; Jiang, J.; Guo, H. YOLO-Ginseng: A detection method for ginseng fruit in natural agricultural environment. Front. Plant Sci. 2024, 15, 1422460. [Google Scholar] [CrossRef]
Li, D.; Zhao, Z.; Yin, Y.; Zhao, C. Research on the Classification of Sun-Dried Wild Ginseng Based on an Improved ResNeXt50 Model. Appl. Sci. 2024, 14, 10613. [Google Scholar] [CrossRef]
Gu, J.; Li, Z.; Zhang, L.; Yin, Y.; Lv, Y.; Yu, Y.; Li, D. Research on the Quality Grading Method of Ginseng with Improved DenseNet121 Model. Electronics 2024, 13, 4504. [Google Scholar] [CrossRef]
Li, D.; Zhai, M.; Piao, X.; Li, W.; Zhang, L. A Ginseng Appearance Quality Grading Method Based on an Improved ConvNeXt Model. Agronomy 2023, 13, 1770. [Google Scholar] [CrossRef]
Li, D.; Piao, X.; Lei, Y.; Li, W.; Zhang, L.; Ma, L. A Grading Method of Ginseng (Panax ginseng C. A. Meyer) Appearance Quality Based on an Improved ResNet50 Model. Agronomy 2022, 12, 2925. [Google Scholar] [CrossRef]
Zhang, L.; You, H.; Wei, Z.; Li, Z.; Jia, H.; Yu, S.; Zhao, C.; Lv, Y.; Li, D. DGS-YOLOv8: A Method for Ginseng Appearance Quality Detection. Agriculture 2024, 14, 1353. [Google Scholar] [CrossRef]
Xue, Q.; Miao, P.; Miao, K.; Yu, Y.; Li, Z. An online automatic sorting system for defective Ginseng Radix et Rhizoma Rubra using deep learning. Chin. Herb. Med. 2023, 15, 447–456. [Google Scholar] [CrossRef]
Lee, A.; Baek, I.; Kim, J.; Hong, S.-J.; Kim, M.S. Deep learning approaches for bruised mandarin orange classification by fluorescence hyperspectral imaging. Postharvest Biol. Technol. 2025, 230, 113724. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Sohan, M.; Sai Ram, T.; Rami Reddy, C.V. A review on yolov8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 18–20 November 2024; pp. 529–545. [Google Scholar]
Xiao, R.; Wang, H.; Wang, L.; Yuan, H. C3Ghost and C3k2: Performance study of feature extraction module for small target detection in YOLOv11 remote sensing images. In Proceedings of the Second International Conference on Big Data, Computational Intelligence, and Applications (BDCIA 2024), Huanggang, China, 15–17 November 2024; pp. 464–470. [Google Scholar]
Wang, C.-Y.; Yeh, I.-H.; Mark Liao, H.-Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision 2024, Milan, Italy, 29 September–4 October 2024; pp. 1–21. [Google Scholar]
Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. Yolo-facev2: A scale and occlusion aware face detector. Pattern Recognit. 2024, 155, 110714. [Google Scholar] [CrossRef]
Vo, X.-T.; Nguyen, D.-L.; Priadana, A.; Jo, K.-H. Efficient vision transformers with partial attention. In Proceedings of the European Conference on Computer Vision 2024, Milan, Italy, 29 September–4 October 2024; pp. 298–317. [Google Scholar]
Alkhatib, M.Q.; Jamali, A. HSIFormer: An Efficient Vision Transformer Framework for Enhanced Hyperspectral Image Classification Using Local Window Attention. In Proceedings of the 2024 14th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Barcelona, Spain, 12–14 November 2024; pp. 1–5. [Google Scholar]
Jamali, A.; Roy, S.K.; Bhattacharya, A.; Ghamisi, P. Local Window Attention Transformer for Polarimetric SAR Image Classification. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4004205. [Google Scholar] [CrossRef]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Zhang, H.; Wang, Y.; Dayoub, F.; Sunderhauf, N. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Virtual, 19–25 June 2021; pp. 8514–8523. [Google Scholar]
Kim, M.; Kim, J.; Kim, J.S.; Lim, J.-H.; Moon, K.-D. Automated Grading of Red Ginseng Using DenseNet121 and Image Preprocessing Techniques. Agronomy 2023, 13, 2943. [Google Scholar] [CrossRef]
Gao, C.; He, B.; Guo, W.; Qu, Y.; Wang, Q.; Dong, W. SCS-YOLO: A real-time detection model for agricultural diseases—A case study of wheat fusarium head blight. Comput. Electron. Agric. 2025, 238, 110794. [Google Scholar] [CrossRef]
Li, H.; Chen, J.; Gu, Z.; Dong, T.; Chen, J.; Huang, J.; Gai, J.; Gong, H.; Lu, Z.; He, D. Optimizing edge-enabled system for detecting green passion fruits in complex natural orchards using lightweight deep learning model. Comput. Electron. Agric. 2025, 234, 110269. [Google Scholar] [CrossRef]
Gu, Z.; He, D.; Huang, J.; Chen, J.; Wu, X.; Huang, B.; Dong, T.; Yang, Q.; Li, H. Simultaneous detection of fruits and fruiting stems in mango using improved YOLOv8 model deployed by edge device. Comput. Electron. Agric. 2024, 227, 109512. [Google Scholar] [CrossRef]
Shang, X.; Wang, Y. Intelligent Recognition Analysis of Chinese Herbal Medicine Images Using Deep Learning Algorithms. In Proceedings of the 2024 3rd International Conference on Health Big Data and Intelligent Healthcare (ICHIH), Zhuhai, China, 13–15 December 2024; pp. 238–241. [Google Scholar]
Pan, D.; Guo, Y.; Fan, Y.; Wan, H. Development and Application of Traditional Chinese Medicine Using AI Machine Learning and Deep Learning Strategies. Am. J. Chin. Med. 2024, 52, 605–623. [Google Scholar] [CrossRef]
Hou, Q.; Yang, W.; Liu, G. Chinese herbal medicine recognition network based on knowledge distillation and cross-attention. Sci. Rep. 2025, 15, 1687. [Google Scholar] [CrossRef]
Zheng, L.; Long, W.; Yi, J.; Liu, L.; Xu, K. Enhanced Knowledge Distillation for Advanced Recognition of Chinese Herbal Medicine. Sensors 2024, 24, 1559. [Google Scholar] [CrossRef] [PubMed]
Kang, Y.; Wu, M.; Gao, Y.; Fu, L.; Li, C.; Song, Z.; Jin, Y.; Huang, Z.; Hu, Z.; Yu, Y. Deep learning-based classification of traditional Chinese medicine: A novel approach. Quant. Imaging Med. Surg. 2025, 15, 7483–7496. [Google Scholar] [CrossRef]
Dai, W.; Ma, Y.; Fan, Y.; Ma, J. A Multi-Scale Feature Extraction Algorithm for Chinese Herbal Medicine Image Classification. Appl. Sci. 2025, 15, 4271. [Google Scholar] [CrossRef]
Hao, W.; Han, M.; Yang, H.; Hao, F.; Li, F. A novel Chinese herbal medicine classification approach based on EfficientNet. Syst. Sci. Control. Eng. 2021, 9, 304–313. [Google Scholar] [CrossRef]
Chen, L.; Yang, Y.; Ge, J.; Cui, X.; Xiong, Y. Study on the grading standard of Panax notoginseng seedlings. J. Ginseng Res. 2018, 42, 208–217. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Flowchart of experimental design.

Figure 2. Structure diagram of the studio of the collection equipment.

Figure 3. Demonstration of different grades of understory ginseng. (a) Premium, (b) First-Class, (c) Second-Class, (d) Ordinary.

Figure 4. Schematic diagram of the structure of the Ginseng-YOLO model.

Figure 5. Schematic diagram of ADwon structure.

Figure 6. Schematic diagram of Slide Loss.

Figure 7. Schematic diagram of C2-LWA.

Figure 8. Schematic diagram of the attention mechanism of LWA.

Figure 9. This figure shows the precision–recall curve and normalized confusion matrix. (a,b) shows the precision–recall curve (c,d) Normalized confusion matrix.

Figure 10. Training curves of Precision, Recall, and mAP50 for YOLOv11n and Ginseng-YOLO over epochs.

Figure 11. Detection results of different models for four ginseng grades, showing bounding box accuracy and completeness.

Figure 12. Compares the dfl_loss curves of BCEWithLogits, EMA-SL, Focal Loss, Varifocal Loss, and Slide Loss over training epochs.

Figure 13. Deployment of Ginseng-YOLO on the NVIDIA Jetson Orin Nano edge device for real-time ginseng grading. (a) Multiple ginseng trees were pieced together into a single photo for testing; (b) Ginseng alone is tested.

Table 1. Grade standards of fresh ginseng.

Category	Premium	First-Class	Second-Class
Growth Years	Over 30 years	Over 25 years	Over 15 years
Rhizome Neck (Lu)	Three-node neck, tight rhizome bowl, relatively long neck, occasional double or triple necks, complete bud scale	Two-node or three-node neck, tight rhizome bowl, occasional double or triple necks, complete bud scale	One-node or two-node neck, large or distorted rhizome bowl, or with defects such as scars, rust stains, or deformities
Rootlets (Ting)	Jujube-pit-shaped rootlets, total rootlet weight ≤ 30% of main root, no pulp leakage	Jujube-pit-shaped, garlic-clove-shaped or fine-hair-shaped rootlets, total weight ≤ 50% of main root, no pulp leakage	Fine-hair-shaped, elongated or deformed rootlets, oversized rootlets, or with scars or rust stains
Main Body	Ling-shaped or nodule-shaped, off-white or light yellowish-white color, tight and delicate skin with luster, natural groin separation between legs, no pulp leakage, no scars	Smooth or beam-shaped body, off-white or light yellowish-white color, tight and delicate skin with luster, natural groin separation between legs, no pulp leakage, no scars	Smooth, bulky, or horizontal body shape, off-white or yellowish-white color, looser skin, smaller body, rootlet deformities, or with scars and rust stains
Skin Texture (Wrinkles)	Fine and deep annular wrinkles at the upper part of the main body, tight skin (silky texture), fine lines	Distinct annular wrinkles at the upper part of the main body	Incomplete or broken annular wrinkles at the upper part, few or sparse wrinkles
Fibrous Roots (Hairs)	Thin and long, flexible not brittle, sparse but orderly, visible pearl dots, complete primary roots, rootlets extending downward	Thin and long, flexible not brittle, complete primary roots, rootlets extending downward	Numerous fibrous roots of varying lengths, flexible not brittle, possibly broken or incomplete

Table 2. Grade standards for raw sun-dried ginseng.

Category	Premium	First-Class	Second-Class
Growth Years	More than 30 years	More than 25 years	More than 15 years
Rhizome Neck	Three-node neck with a tight rhizome bowl and relatively long neck; occasional double or triple necks	Two- or three-node neck with a relatively large but tight rhizome bowl; occasional double or triple necks	One- or two-node neck or shortened-neck type; rhizome bowl is coarse, twisted, or defective, with scars or rust stains
Rootlets	Jujube-pit-shaped rootlets; rootlet weight not exceeding 30% of the main root; no grooves; proper color with luster	Jujube-pit-, garlic-clove-, fine-hair-, or elongated-shaped rootlets; rootlet weight not exceeding 50% of the main root; no grooves; proper color with luster	Large rootlets or absence of rootlets; or with defects, scars, or rust stains
Main Body	Ling-shaped or knobby body; proper color with luster; off-white or pale yellowish-white; natural crotch between legs; no grooves; no scars; not soaked	Smooth or beam-shaped body; proper color with luster; off-white or pale yellowish-white; natural crotch between legs; no grooves; not soaked	Smooth, bulky, or horizontal body; off-white or pale yellowish-white; loose skin; with grooves; small body; rootlet deformation or presence of scars and rust stains
Wrinkles	Fine and deep annular wrinkles on the upper part of the body; tight skin with fine lines (silky texture)	Distinct annular wrinkles on the upper part of the body	Incomplete or broken annular wrinkles on the upper part of the body; few or sparse wrinkles
Fibrous Roots	Thin and long; sparse but well-arranged; flexible, not brittle; visible pearl-like dots; intact primary roots; rootlets extending downward	Thin and long; sparse but well-arranged; flexible, not brittle; intact primary roots; rootlets extending downward	Numerous fibrous roots of varying lengths; flexible, not brittle; possibly damaged or with rust stains

Table 3. Dataset segmentation.

Quality Grade (Category)	Training (Images/Instances)	Validation (Images/Instances)	Test (Images/Instances)
Premium (0)	167/184	62/73	33/40
First-Class (1)	248/292	65/81	34/40
Second-Class (2)	196/229	50/54	20/26
Third-Class (3)	231/261	62/74	33/39
Total	842/976	239/282	120/145

Table 4. Detailed hyperparameters of the experiment.

Parameters	Setup
Epochs	300
Batch Size	32
Optimizer	SGD
Initial Learning Rate	0.01
Final Learning Rate	0.01
Momentum	0.937
Weight-Decay	5 × 10⁻⁴
Close Mosaic	Last ten epochs
Images	640
workers	8
Mosaic	1.0

Table 5. Comparison of model training.

Model	Precision (%)	Recall (%)	mAP50%	mAP50%	Parameters	Weight	FLOPs
YOLOv11n	81.5	79.7	87.9	68.5	2.64	5.5	6.5
Ginseng-YOLO	84.9	83.9	88.7	71.0	2.0	4.6	5.3

Table 6. Ablation experiments.

YOLOv11n	C2-LWA	ADown	Loss	P (%)	R (%)	mAP50 (%)	mAP50-95 (%)	Parameters (M)	Weight (MB)	FLOPs (G)
√				84.9	83.9	88.7	71.0	2.6	5.5	6.5
√	√			78.7	80.6	84.9	65.4	2.5	5.4	6.4
√		√		81.2	80.6	88.4	71.5	2.1	5.4	5.4
√			√	79.4	81.3	87.6	68.0	2.6	5.5	6.5
√	√	√		85.0	73.7	86.4	68.7	2.0	4.6	5.3
√	√		√	75.8	80.5	84.1	64.9	2.5	5.4	6.4
√		√	√	85.3	75.8	85.3	68.5	2.1	5.4	5.4
√	√	√	√	81.5	79.7	87.9	68.5	2.0	4.6	5.3

Table 7. Comparative experiments.

Models	P%	R%	mAP50%	mAP50%	Parameters	FLOPs	Weight
YOLOv12n	61.8	63.5	67.1	45.8	2.52	6.0	5.4
YOLOv10n	78.5	72.9	82.5	65.3	2.7	8.4	5.3
YOLOv9t	85.7	73.1	86.1	68.7	2.00	7.9	4.5
YOLOv8n	76.8	70.3	83.4	65.9	3.01	8.2	6.3
YOLOv6	76.9	74.9	81.0	59.6	4.24	11.9	8.3
YOLOv5n	79.7	78.0	85.1	67.1	1.76	4.2	3.9
YOLOv3-tiny	68.9	71.1	73.1	40.8	8.67	13.0	17.5
YOLOv11n	81.5	79.7	87.9	68.5	2.64	5.5	6.5
Ginseng-YOLO	84.9	83.9	88.7	71.0	2.0	4.6	5.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, Y.; Li, D.; Song, S.; You, H.; Zhang, L.; Li, J. Ginseng-YOLO: Integrating Local Attention, Efficient Downsampling, and Slide Loss for Robust Ginseng Grading. Horticulturae 2025, 11, 1010. https://doi.org/10.3390/horticulturae11091010

AMA Style

Yu Y, Li D, Song S, You H, Zhang L, Li J. Ginseng-YOLO: Integrating Local Attention, Efficient Downsampling, and Slide Loss for Robust Ginseng Grading. Horticulturae. 2025; 11(9):1010. https://doi.org/10.3390/horticulturae11091010

Chicago/Turabian Style

Yu, Yue, Dongming Li, Shaozhong Song, Haohai You, Lijuan Zhang, and Jian Li. 2025. "Ginseng-YOLO: Integrating Local Attention, Efficient Downsampling, and Slide Loss for Robust Ginseng Grading" Horticulturae 11, no. 9: 1010. https://doi.org/10.3390/horticulturae11091010

APA Style

Yu, Y., Li, D., Song, S., You, H., Zhang, L., & Li, J. (2025). Ginseng-YOLO: Integrating Local Attention, Efficient Downsampling, and Slide Loss for Robust Ginseng Grading. Horticulturae, 11(9), 1010. https://doi.org/10.3390/horticulturae11091010

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ginseng-YOLO: Integrating Local Attention, Efficient Downsampling, and Slide Loss for Robust Ginseng Grading

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Collection and Annotation

2.2. Data Augmentation and Dataset Split

2.3. Model Selection and Enhancement

2.3.1. ADwon

2.3.2. Slide Loss

2.3.3. C2-LWA

2.4. Experimental Environment

2.5. Evaluation Criteria

3. Experimental Part

3.1. Before and After the Experiment

3.2. Ablation Experiments

3.3. Comparison Experiments

3.4. Model Detection

3.5. Loss Function

3.6. Deploy Experiments

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI