Next Article in Journal
Genome-Wide Analysis of NAC Transcription Factor Gene Family in Morus atropurpurea
Previous Article in Journal
Aquorin Bioluminescence-Based Ca2+ Imaging Reveals Differential Calcium Signaling Responses to Abiotic Stresses in Physcomitrella patens
Previous Article in Special Issue
Research on Lettuce Canopy Image Processing Method Based on Hyperspectral Imaging Technology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Small-Sample Authenticity Identification and Variety Classification of Anoectochilus roxburghii (Wall.) Lindl. Using Hyperspectral Imaging and Machine Learning

1
College of Optical, Mechanical and Electrical Engineering, Zhejiang A&F University, Hangzhou 311300, China
2
State Key Laboratory of Subtropical Silviculture, Department of Chinese Herbal Medicine Zhejiang A&F University, Hangzhou 311300, China
*
Author to whom correspondence should be addressed.
Plants 2025, 14(8), 1177; https://doi.org/10.3390/plants14081177
Submission received: 6 March 2025 / Revised: 7 April 2025 / Accepted: 9 April 2025 / Published: 10 April 2025

Abstract

:
This study aims to utilize hyperspectral imaging technology combined with machine learning methods for the authenticity identification and classification of Anoectochilus roxburghii and its counterfeit species. Hyperspectral data were collected from the front and back leaves of nine species of Goldthread and two counterfeit species (Bloodleaf and Spotted-leaf), followed by classification using a variety of machine learning models, including Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), Linear Discriminant Analysis (LDA), and Convolutional Neural Networks (CNN). The experimental results demonstrated that the SVM model achieved 100% classification accuracy for distinguishing Goldthread from its counterfeit species, effectively capturing the spectral differences between the front and back leaves. In contrast, traditional machine learning models showed varied performance, with SVM proving superior due to its ability to handle high-dimensional feature spaces. The introduction of a multi-view spectral fusion CNN model, which integrates spectral data from both the front and back leaves, further enhanced classification accuracy, achieving a perfect classification rate of 100%. This approach highlights the potential of hyperspectral imaging and machine learning in plant authenticity identification and provides a new perspective for the detection of counterfeit species.

1. Introduction

Anoectochilus roxburghii (Wall.) Lindl., commonly known as Goldthread, is a highly prized medicinal herb in traditional Asian medicine, renowned for its anti-inflammatory, antioxidant, and anticancer properties [1,2,3]. However, due to its high market demand, A. roxburghii is vulnerable to adulteration and misidentification [4,5,6]. This substitution not only undermines market integrity but also poses significant health risks due to the loss of therapeutic efficacy and potential toxicity. Ensuring the authenticity and precise classification of A. roxburghii is therefore critical for consumer safety and quality control in herbal medicine production.
Traditional methods for the authenticity identification and variety classification of medicinal plants primarily rely on visual inspection, microscopic examination, and chemical analysis [7]. In recent years, advanced techniques such as genomic analysis, mass spectrometry, and DNA testing have been developed to enhance identification accuracy [8,9,10,11]. These approaches are labor-intensive, destructive to samples, and often fail to distinguish subtle interspecies variations. Recent advancements in spectral technology have shown promise in addressing these challenges [12]. For instance, Li et al. [13] utilized near-infrared spectroscopy combined with chemometrics to establish a partial least squares discriminant analysis (PLS-DA) model, enabling the accurate classification of authentic A. roxburghii powder and its two counterfeit counterparts. Similarly, Chai et al. [14] employed near-infrared spectroscopy to obtain spectral data of A. roxburghii and its adulterants and designed an improved one-dimensional convolutional neural network (1D-CNN) model for processing the NIRS data, distinguishing between genuine and counterfeit A. roxburghii. Additionally, Li [15] proposed a fast and accurate classification model for A. roxburghii varieties based on a handheld near-infrared spectrometer and Adaboost ensemble learning, achieving an accuracy of up to 95.6%. These studies show that near-infrared spectroscopy offers high accuracy in the detection of A. roxburghii, but its application is typically limited to powdered forms of the plant, as it requires grinding, compressing, or other sample preparation methods.
Although previous studies [14,15] successfully distinguished intact A. roxburghii from its adulterants, they faced a key limitation: destructive sampling, which restricts their applicability for non-invasive plant authentication. Moreover, these methods inherently lose spatial and structural information, potentially overlooking biochemical heterogeneity between leaf surfaces. In contrast, hyperspectral imaging (HSI) provides a non-destructive alternative, capturing spatially resolved spectral features from both the adaxial and abaxial leaf surfaces of intact plants [16]. This approach not only preserves sample integrity but also enhances classification accuracy by extracting complementary biochemical information [17]. HSI, in particular, preserves sample integrity, captures a broad range of spectral bands across the electromagnetic spectrum, and provides detailed spectral information that can be used for precise material identification and quality assessment. HSI and machine learning (ML) are increasingly utilized in botanical research, as noted in the referenced review [18]. Applications include early detection of wheat fusarium head blight and grapevine powdery mildew, monitoring nitrogen deficiency in rice and drought stress in maize, distinguishing citrus cultivars and coffee bean varieties, and mapping soil organic matter for precision agriculture. These examples highlight the versatility of HSI-ML in plant science and sustainable agriculture. ML and deep learning (DL) have revolutionized pattern recognition in high-dimensional datasets [19,20], making them ideal for analyzing hyperspectral data [21,22,23,24]. While conventional ML models are widely used, their performance depends heavily on manual feature engineering and preprocessing. In contrast, DL architectures like convolutional neural networks autonomously extract discriminative features, enabling end-to-end classification. Despite these advancements, the potential of A. roxburghii for authenticity identification and multi-variety classification remains largely unexplored. Moreover, most studies rely on single-view spectral data (e.g., leaf front), overlooking the complementary insights provided by multi-view perspectives (e.g., front and back leaves). To date, no research has integrated multi-view HSI with hybrid ML/DL frameworks to simultaneously tackle the challenges of authenticity verification and intra-species classification in A. roxburghii.
This study bridges these gaps by proposing a novel framework that combines hyperspectral imaging with machine learning for non-destructive, high-precision identification of A. roxburghii and its counterfeit species. We introduce three key innovations: (1) Multi-view spectral fusion: Leveraging front and back leaf spectra to capture complementary biochemical and morphological features. (2) Hybrid ML/DL modeling: Comparing traditional ML algorithms (SVM, KNN, RF) with a custom 1D-CNN architecture optimized for spectral data. (3) Comprehensive validation: Evaluating model robustness across nine A. roxburghii varieties and two counterfeit species under diverse preprocessing techniques.
Our work not only advances the application of HSI in medicinal plant authentication but also provides a scalable, non-destructive solution for quality assurance in herbal markets. By addressing the limitations of existing methods, this research contributes to safeguarding consumer health and promoting sustainable practices in traditional medicine industries.

2. Materials and Methods

2.1. Sample Preparation

The A. roxburghii samples and counterfeit samples used in this experiment were provided and identified by Professor Wang Hongzhen from the Department of Traditional Chinese Herbal Medicine at Zhejiang A&F University. A total of 9 different varieties of A. roxburghii were included: Small Round Leaf (1), Pointed Leaf (2), Red Sunset (3), J6 Male (4), Colorful Sunset (5), Large Round Leaf (6), Red Sunset Large Leaf (7), Golden Vein No. 1 (8), and Taiwan Red (9). Two counterfeit species, Ludisia discolor (Ker-Gawl.) A. Rich. (L) and Goodyera schlechtendaliana Rchb. f. (G), were also included. Each variety included 10 leaves of similar size and maturity to minimize biological variability. Both the adaxial (front) and abaxial (back) sides of each leaf were scanned to capture multi-view spectral information. Samples were cleaned with distilled water to remove surface contaminants and air-dried under controlled laboratory conditions (25 °C, 60% humidity) prior to imaging.

2.2. Hyperspectral Image System

The hyperspectral imaging system used in this study is shown in Figure 1 (GaiaField-N17E, Dualix Spectral Imaging, Chengdu, China). It covers a spectral range from 900 nm to 1700 nm, with a spectral resolution of 5 nm and a spatial resolution of 640 pixels across 512 bands. The system consists of an indoor testing chamber equipped with four 50 W halogen lamps to provide stable illumination. The scanning hyperspectral spectrometer uses an array detector oriented perpendicular to the direction of motion, enabling it to scan two-dimensional space as the platform advances. The conveyor belt speed is set to 0.8 cm/s, ensuring that the samples pass through the scanning area at a stable and uniform speed. The exposure time is adjusted to auto, with the gain factor set to 1. The vertical distance between the sample and the lens is fixed at 42 cm. Samples are placed on a black base plate during scanning to enhance image contrast and recognition, as well as to minimize the interference of background diffuse reflection on the sample’s spectral data. During the data collection process, the hyperspectral camera scans each sample twice to improve data reliability and repeatability. After the scanning is completed, the raw hyperspectral images undergo black-and-white correction [25]. Subsequently, spectral extraction is performed for the regions of interest (ROI) of each sample using the software ENVI 5.3. In ENVI, a rectangular frame is used to randomly select a quarter of each sample as the ROI. Each pixel within the ROI contains a set of different spectral information. The final spectral value of the sample is obtained by averaging the spectral reflectance of all pixels within that region.

2.3. Data Preprocessing

In this study, various filtering techniques were applied to preprocess the hyperspectral data to reduce noise and enhance data quality, ensuring the accuracy and robustness of subsequent analyses. Since our study focuses on small-sample learning, we applied spectral preprocessing techniques such as denoising and augmentation to enhance the model’s robustness under limited data conditions. Additionally, cross-validation was employed to ensure reliable model evaluation and prevent overfitting. The following filtering methods were used:
Median filtering (MF). Median filtering is a non-linear filtering technique that replaces each pixel value with the median value of its neighborhood. This method is particularly effective in removing salt-and-pepper noise. The filtering process involves specifying the window size to determine the neighborhood for each pixel, then computing the median value within that neighborhood and replacing the original pixel value. Median filtering preserves edge details and is particularly useful for removing impulsive noise without blurring the image too much.
Average filtering (AF). Average filtering works by replacing each pixel value with the average value of its neighboring pixels. It is effective for reducing random noise but may blur edges, making it less effective in preserving fine details. This method is computationally simple and efficient, making it a popular choice for initial noise reduction in hyperspectral data.
Gaussian filtering (GF). Gaussian filtering is a common smoothing technique that uses a Gaussian function to weigh the neighboring pixel values, giving higher weights to those closer to the center. This method smooths the image while retaining more details in the central region compared to average filtering. Gaussian filtering is widely used in hyperspectral data preprocessing because it effectively reduces noise while minimizing the blurring effect.
Savitzky–Golay (SG) Filtering. The Savitzky–Golay filter is a smoothing method commonly used in signal processing. It applies polynomial fitting within a sliding window to smooth the data, thereby reducing high-frequency noise. In hyperspectral data processing, SG filtering helps smooth out small random fluctuations while preserving the main signal features.
Principal Component Analysis (PCA). PCA is an unsupervised statistical method for dimensionality reduction of data. It extracts the most important features, called principal components, by maximizing the variance of the data. In spectral analysis, PCA can help identify differences between samples and potential chemical changes.

2.4. Spectral Data Processing

Various classification models are used in this paper. The dataset was split using a stratified 80/20 ratio, ensuring that all classes were proportionally represented in both the training and testing sets. Additionally, a 5-fold cross-validation strategy was employed to assess the robustness of the models.
Support Vector Machine (SVM). SVM was employed for the classification of nine A. roxburghii varieties based on hyperspectral data. The SVM algorithm identifies the optimal hyperplane that maximizes the margin between different classes, offering strong generalization capabilities. The Radial Basis Function (RBF) kernel was selected due to its effectiveness in handling non-linear data patterns. Prior to training, the spectral data were standardized using StandardScaler to ensure consistency in feature scales. Hyperparameters, including the penalty parameter (C), kernel type, gamma, and polynomial degree, were optimized using Grid Search with 5-fold cross-validation. Model performance was evaluated using accuracy, precision, recall, F1-score, and confusion matrix, providing a comprehensive assessment of classification accuracy and errors.
K-Nearest Neighbors (KNN). The KNN algorithm was used for spectral classification, where the classification of a sample is based on the majority vote of its nearest neighbors in the feature space. The KNN algorithm first calculates the distance between the unknown sample and all samples in the training set using distance metrics such as Euclidean, Manhattan, or Minkowski distance. The k closest neighbors are selected, and the voting method (counting the frequency of each category in the k neighbors) is used to determine the sample’s category.
Linear Discriminant Analysis (LDA). LDA is used to extract features and classify spectral data by maximizing the separability between classes. Based on Bayesian theory, LDA calculates the intra-class scatter matrix, which measures the dispersion of samples within the same class, and the inter-class scatter matrix, which reflects the differences between classes. By solving a generalized eigenvalue problem, LDA finds the optimal projection direction, which is then used to project the data into a lower-dimensional space for classification.
Convolutional Neural Network (CNN). CNN was employed to classify the hyperspectral data of A. roxburghii and its varieties. The model consists of three 1D convolutional layers, each followed by max-pooling, to extract and downsample features from the spectral data. The convolutional layers use ReLU activation and L1 regularization to prevent overfitting. After feature extraction, the data is flattened and passed through a fully connected layer with 128 neurons, followed by a softmax output layer with 9 units corresponding to the 9 plant categories, as shown in Figure 2. The model is compiled with the Adam optimizer and sparse categorical cross-entropy loss and trained for 500 epochs. This architecture allows the model to learn complex patterns in the hyperspectral data, achieving accurate classification of different varieties of A. roxburghii.

3. Results and Discussion

3.1. Authenticity Identification

The initial spectral curves of the nine A. roxburghii varieties and their counterfeit species Figure 3a,b exhibited high similarity across most wavelengths, making visual differentiation challenging. To address this, machine learning algorithms were employed to extract discriminative spectral features. Among the tested models, the Support Vector Machine (SVM) demonstrated exceptional performance in distinguishing A. roxburghii from its counterfeit counterparts. The SVM model was applied to spectral data from both the adaxial (front) and abaxial (back) leaf surfaces, achieving 100% classification accuracy across all authenticity identification tasks. The confusion matrices Figure 3c,d displayed perfect classification with no misclassifications, confirming the model’s ability to accurately distinguish between A. roxburghii and counterfeit species. The confusion matrix presents results from the test set, which consists of 20% of the total samples per class. Given that each variety contains 10 leaves, this results in two test samples per class, leading to values such as 8/8 instead of 10/10. Specifically, the matrices for both front and back leaf classifications showed a diagonal structure (e.g., [8 0], [0 8]), indicating zero misclassifications in all cases. To assess the potential risk of overfitting [26], a learning curve analysis was conducted in Figure 3e. The figure illustrates model accuracy across different training sample sizes, with the training set accuracy shown in blue and the validation set accuracy in red. As the number of training samples increases, both accuracies exhibit a stable trend with minimal discrepancy. Notably, when using 35 samples, the accuracy difference between the training and validation sets is less than 0.04. Furthermore, once the training sample size reaches 50, both accuracies converge to 1, indicating consistent model performance across training and validation sets without signs of overfitting. These results demonstrate that both training and validation losses converge smoothly, confirming the model’s strong generalization capability.
From a mechanistic perspective, the SVM’s high classification accuracy can be attributed to several key factors. The spectral differences between A. roxburghii and its counterfeit species are primarily driven by variations in chemical composition, surface texture, and light reflectance characteristics. These differences are reflected in the distinct spectral responses of the front and back leaf surfaces, which are influenced by factors such as chlorophyll content, cell arrangement, and surface morphology. The SVM was able to exploit these unique spectral characteristics, particularly in the near-infrared and visible regions, to create clear decision boundaries between the authentic and counterfeit samples. The superior performance of the SVM can be attributed to the following factors: (1) high-dimensional feature space [27]: SVM maps the input spectral data into a higher-dimensional space using a Radial Basis Function (RBF) kernel, which enables linear separation of classes that are non-linearly separable in the original space. This transformation effectively enhances the spectral differences between the authentic and counterfeit species, which are influenced by factors such as chemical composition, surface morphology, and light reflectance properties. (2) Margin Maximization [28]: A key feature of SVM is its ability to maximize the margin between classes, which improves classification accuracy by ensuring robust generalization. This mechanism reduces overfitting, which is particularly important in hyperspectral data, where high dimensionality and limited sample sizes can lead to overfitting issues. Additionally, SVM’s inherent robustness to noise and outliers contributed to stable performance [29], even in the presence of minor spectral fluctuations. These results underscore the suitability of SVM for hyperspectral authenticity identification, particularly when leveraging multi-view spectral data. The model’s ability to effectively distinguish between A. roxburghii and its counterfeit species highlights its potential as a powerful and non-destructive tool for the authentication of herbal medicines.

3.2. Traditional Machine Learning Models for Goldthread Classification

Traditional machine learning methods, such as SVM, KNN, and LDA, have been widely applied in hyperspectral plant species identification. However, the performance of these models is often heavily influenced by data feature selection and preprocessing methods. Data preprocessing is particularly crucial in hyperspectral data, as it is often affected by noise and incomplete data. This section evaluates the classification performance of these traditional models when applied to the classification of nine varieties of A. roxburghii using different preprocessing methods, including MF, AF, GF, S-G Filtering, and PCA. To comprehensively assess the performance of these models, accuracy, precision, recall, and F1-score were used as performance metrics. These metrics evaluate the overall classification performance of the models, with accuracy measuring the proportion of correct predictions, precision indicating the reliability of positive predictions, recall reflecting the model’s ability to identify relevant instances, and F1-score providing a balanced measure of precision and recall, as shown in Figure 4.
Despite using a range of preprocessing methods and model combinations, the classification accuracy was generally low, with little improvement in performance. Among the tested models, SVM performed the best when classifying the back leaf surface of A. roxburghii, achieving an accuracy of around 0.8 for all four metrics when preprocessing with either AF or S-GSG. However, KNN performed poorly, with most metrics below 0.5, and showed no significant improvement with different preprocessing methods. This is likely due to KNN’s sensitivity to local fluctuations in data; when the distribution of the data is relatively simple and there are no major variations, preprocessing has a minimal effect on the results. LDA showed a significant dependence on preprocessing, but overall, its performance was not high. Accuracy for the front leaf was around 0.5, and for the back leaf, it was around 0.7, with the best performance of 0.8 achieved using GF. These results indicate that traditional machine learning models, such as SVM, KNN, and LDA, in combination with various preprocessing methods, did not achieve the desired classification accuracy. This suggests that traditional machine learning models struggle with the complexity of hyperspectral data, especially when classifying fine spectral differences between A. roxburghii and its counterfeit species. The relatively low performance could be attributed to the difficulty these models face in handling high-dimensional, noisy data, as well as the inherent limitations of hand-crafted feature extraction techniques. Therefore, there is a need to explore more effective or advanced classification models that can better handle the complexity of hyperspectral data and improve classification accuracy.

3.3. Multi-View Spectral Fusion Model

We further employed a Convolutional Neural Network (CNN) to classify the nine varieties of A. roxburghii. Figure 5 presents the training results of the model. (a) shows the loss function, where the sparse categorical cross-entropy loss was used, which is suitable for multi-class classification problems, especially when the labels are encoded as integers. The formula for the loss function is as follows [30]:
L = i = 1 C y i log ( p i )
where L is the loss value, C is the total number of classes, yi is the true label of class i (1 if the sample belongs to class i, otherwise 0), and Pi is the model’s predicted probability for class i. Figure 5a shows the change in the loss function during training. It is evident that as the number of training epochs increases, the loss value decreases significantly and stabilizes, indicating an effective learning process of the model. Figure 5b illustrates the accuracy trend during training. As seen in the figure, the accuracy steadily approaches 1 as training progresses, with reduced fluctuation in the accuracy, further confirming the stability and reliability of the model. Figure 5c displays the confusion matrix for the trained model on the test set. The matrix reveals that the model correctly classified all samples for each category with no misclassifications. Table 1 provides a comprehensive performance comparison of all models (SVM, KNN, LDA, and CNN) in terms of accuracy, precision, recall, and F1-score. This table improves result transparency and effectively highlights the CNN model’s superiority over traditional machine learning approaches. To further evaluate the reliability of our model, we computed the standard deviation and 95% confidence interval (CI) for the test set. Since spectral data classification does not inherently involve standard deviations for test set accuracy, we instead report the standard deviation and 95% CI of training set accuracy over multiple runs. Table 1 presents the test accuracy alongside the mean and standard deviation of training accuracy, demonstrating the stability of model performance. The CNN model consistently achieved 100% accuracy, while traditional machine learning models exhibited variability. The 95% CI values indicate that our training process is stable, minimizing concerns about overfitting. This result demonstrates the exceptional performance of the multi-view spectral fusion model, which successfully utilized spectral data from both the adaxial (test accuracy: 0.8889) and abaxial (test accuracy: 0.9722) leaf surfaces to improve classification accuracy.
To assess whether the performance differences among the models are statistically significant, we conducted a one-way ANOVA test on the accuracy results of all nine models. The analysis yielded an F-statistic of 298.59 and a p-value of approximately 9.03 × 10−57, indicating highly significant differences among model performances (p < 0.001). A post hoc Tukey HSD test was further applied to perform pairwise comparisons between models. The results, visualized in Figure 5d, reveal that the CNN-based models (CNN-Front, CNN-Back, CNN-Fusion) significantly outperformed the traditional machine learning models (SVM, KNN, LDA), providing robust statistical evidence of the effectiveness of deep learning in this classification task. As shown in Figure 5c, the 100% accuracy can be attributed to several factors: complementarity, data augmentation, and feature diversity within the model’s structure and optimization. By using spectral data from both the front and back leaf surfaces, the model leverages multi-view information, providing richer feature representations. The spectral responses of the front and back leaf surfaces may differ in certain physical properties, such as light reflectance and scattering characteristics. These differences help capture the varying structures and compositions of the leaves, ultimately improving classification precision. The spectral data from the front and back surfaces are complementary, enhancing the robustness and accuracy of the model. Moreover, different leaf species exhibit significant spectral differences, particularly in specific wavelength regions. The front and back spectra provide different angles of information, which can help better capture these subtle spectral differences. By incorporating this additional feature dimension (i.e., both front and back spectra), the model gains more information, thus improving its generalization ability. The introduction of feature diversity allows the model to identify more potential patterns during training, thereby enhancing classification accuracy. In contrast to traditional hand-crafted feature extraction methods, deep learning models like CNNs can automatically learn optimal feature combinations through end-to-end training, resulting in more precise classifications. As the number of layers and neurons in the network increases, the model can process more complex spectral data and extract deeper-level features. Overall, the qualitative model developed in this study demonstrated excellent training performance and generalization ability. These results highlight the potential of CNN-based models in hyperspectral data classification, particularly when leveraging multi-view spectral fusion. The model’s ability to effectively capture the fine spectral differences between A. roxburghii varieties underscores the effectiveness of deep learning techniques in tackling complex classification tasks in the realm of plant species identification. Our findings suggest that even with a limited number of samples, HSI combined with machine learning can effectively classify A. roxburghii varieties and detect adulterants. This highlights the potential of small-sample learning approaches in spectral analysis, particularly for rare medicinal plants where large-scale data collection is challenging.

4. Conclusions

This study successfully demonstrated the application of hyperspectral imaging and machine learning techniques for the high-precision classification and authenticity identification of Anoectochilus roxburghii and its counterfeit species. Among the machine learning models tested, the SVM model showed exceptional performance, achieving 100% accuracy in distinguishing Goldthread from its counterfeit species by leveraging spectral data from both the front and back leaves. Traditional machine learning models, such as KNN and LDA, exhibited relatively lower classification accuracy, especially when applied to complex hyperspectral data, highlighting the limitations of these models in handling high-dimensional features. The introduction of the multi-view spectral fusion model, which combines the spectral data from both the front and back sides of the leaves, significantly improved classification performance and robustness, demonstrating the benefits of utilizing complementary information for classification tasks. The findings of this study offer valuable insights into the potential applications of hyperspectral imaging and machine learning for authenticity identification and counterfeit detection in medicinal plants, with implications for improving quality control and preventing fraud in the herbal medicine industry. Furthermore, the proposed multi-view fusion model provides a promising approach for improving the accuracy of plant species identification, which could be extended to a wide range of botanical applications. In addition, these results contribute to the broader ecological and economic context by promoting sustainable use of genuine herbal resources, protecting endangered medicinal plant species from overexploitation, and supporting fair market practices through reliable authentication technologies. Such advancements are crucial for maintaining biodiversity, ensuring consumer safety, and enhancing the integrity of the herbal medicine supply chain.

Author Contributions

Visualization, Investigation, Writing-original draft, Y.X.; methodology, H.D.; software, T.Z.; validation, Z.W.; resources, H.W.; supervision, L.Z.; project administration, Y.D.; conceptualization, funding acquisition, writing-review and editing, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Zhejiang Province Welfare Technology Applied Research Project (LGC22C130001), the Zhejiang Provincial Natural Science Foundation of China (ZCLQN25A0408), and the Zhejiang Xinmiao Talents Program (2024R412B050).

Data Availability Statement

A subset of the dataset, including representative hyperspectral images and classification labels, has been made publicly available at [10.6084/m9.figshare.28711589]. The full dataset can be accessed upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, C.Q.; Chen, X.Y.; Liu, Y.H.; Dai, D.J. First report of fusarium oxysporum associated with stem rot on seedlings of Jinxianlian (Anoectochilus roxburghii) in China. Plant Dis. 2022, 106, 1991. [Google Scholar] [CrossRef]
  2. Chen, X.Y.; Zhang, C.Q.; Zhou, X.J.; Zhu, L.Y.; He, X.C. First report of gray mold on Jinxianlian (Anoectochilus roxburghii) caused by Botrytis cinerea in China. Plant Dis. 2020, 104, 1861. [Google Scholar] [CrossRef]
  3. Xing, B.; Wan, S.; Su, L.; Riaz, M.W.; Li, L.; Ju, Y.; Shao, Q. Two polyamines-responsive WRKY transcription factors from Anoectochilus roxburghii play opposite functions on flower development. Plant Sci. 2023, 327, 111566. [Google Scholar] [CrossRef] [PubMed]
  4. Wang, H.; Chen, X.; Yan, X.; Xu, Z.; Shao, Q.; Wu, X.; Wang, H. Induction, proliferation, regeneration and kinsenoside and flavonoid content analysis of the Anoectochilus roxburghii (Wall.) Lindl protocorm-like body. Plants 2022, 11, 2465. [Google Scholar] [CrossRef]
  5. Jin, Q.R.; Mao, J.W.; Zhu, F. The effects of Anoectochilus roxburghii polysaccharides on the innate immunity and disease resistance of Procambarus clarkii. Aquaculture 2022, 555, 738210. [Google Scholar] [CrossRef]
  6. Han, T.; Xu, E.; Yao, L.; Zheng, B.; Younis, A.; Shao, Q. Regulation of flowering time using temperature, photoperiod and spermidine treatments in Anoectochilus roxburghii. Physiol. Mol. Biol. Plants 2020, 26, 247–260. [Google Scholar] [CrossRef]
  7. Klein-Junior, L.C.; de Souza, M.R.; Viaene, J.; Bresolin, T.M.B.; de Gasper, A.L.; Henriques, A.T.; Heyden, Y.V. Quality Control of Herbal Medicines: From Traditional Techniques to State-of-the-art Approaches. Planta. Med. 2021, 87, 964–988. [Google Scholar] [CrossRef]
  8. Song, C.; Liu, Y.; Song, A.; Dong, G.; Zhao, H.; Sun, W.; Chen, S. The Chrysanthemum nankingense genome provides insights into the evolution and diversification of chrysanthemum flowers and medicinal traits. Mol. Plant 2018, 11, 1482–1491. [Google Scholar] [CrossRef]
  9. He, S.; Wang, D.; Zhang, Y.; Yang, S.; Li, X.; Wei, D.; Qin, J. Chemical components and biological activities of the essential oil from traditional medicinal food, Euryale ferox Salisb., seeds. J. Essent. Oil Bear. Plants. 2019, 22, 73–81. [Google Scholar] [CrossRef]
  10. Li, Q.; Zhu, T.; Zhang, R.; Bu, Q.; Yin, J.; Zhang, L.; Chen, W. Molecular cloning and functional analysis of hyoscyamine 6β-hydroxylase (H6H) in the poisonous and medicinal plant Datura innoxia mill. Plant Physiol. Biochem. 2020, 153, 11–19. [Google Scholar] [CrossRef]
  11. Tong, Y.; Xue, J.; Li, Q.; Zhang, L. A generalist regulator: MYB transcription factors regulate the biosynthesis of active compounds in medicinal plants. J. Exp. Bot. 2024, 75, 4729–4744. [Google Scholar] [CrossRef] [PubMed]
  12. Kiani, S.; van Ruth, S.M.; Minaei, S. Hyperspectral imaging, a non-destructive technique in medicinal and aromatic plant products industry: Current status and potential future applications. Comput. Electron. Agric. 2018, 152, 9–18. [Google Scholar] [CrossRef]
  13. Li, S.; Wang, Z.; Shao, Q.; Fang, H.; Zhu, J.; Wu, X.; Zheng, B. Rapid detection of adulteration in Anoectochilus roxburghii by near-infrared spectroscopy coupled with chemometric methods. J. Food Sci. Technol. 2018, 55, 3518–3525. [Google Scholar] [CrossRef]
  14. Chai, Q.; Zeng, J.; Lin, D.; Li, X.; Huang, J.; Wang, W. Improved 1D convolutional neural network adapted to near-infraredspectroscopy for rapid discrimination of Anoectochilus roxburghii and its counterfeits. J. Pharm. Biomed. Anal. 2021, 199, 114035. [Google Scholar] [CrossRef]
  15. Li, Y.; Cai, B. Study on Classification of Anoectochilus roxburghii Strains by Hand-Held Near Infrared Spectrometer. LNEE 2022, 805, 319–326. [Google Scholar]
  16. Naeem, S.; Ali, A.; Chesneau, C.; Tahir, M.H.; Jamal, F.; Sherwani, R.A.K.; Ul Hassan, M. The Classification of Medicinal Plant Leaves Based on Multispectral and Texture Feature Using Machine Learning Approach. Agronomy 2021, 11, 263. [Google Scholar] [CrossRef]
  17. Zhou, Y.; Li, X.; Chen, C.; Zhou, L.; Zhao, Y.; Chen, J.; Du, H. Coupling the PROSAIL Model and Machine Learning Approach for Canopy Parameter Estimation of Moso Bamboo Forests from UAV Hyperspectral Data. Forests 2024, 15, 946. [Google Scholar] [CrossRef]
  18. García-Vera, Y.E.; Polochè-Arango, A.; Mendivelso-Fajardo, C.A.; Gutiérrez-Bernal, F.J. Hyperspectral image analysis and machine learning techniques for crop disease detection and identification: A review. Sustainability 2024, 16, 6064. [Google Scholar] [CrossRef]
  19. Dai, Y.; Gao, X.; Liu, Z. Accuracy Improvement of Mn Element in Aluminum Alloy by the Combination of LASSO-LSSVM and Laser-Induced Breakdown Spectroscopy. Spectrosc. Spectral Anal. 2024, 44, 977–982. [Google Scholar]
  20. Hu, G.; Liu, Y.; Chu, X.; Liu, Z. Fourier ptychographic layer-based imaging of hazy environments. Results Phys. 2024, 56, 107216. [Google Scholar] [CrossRef]
  21. Ma, Q.; Liu, Z.; Sun, T.; Gao, X.; Dai, Y. Small-sample stacking model for qualitative analysis of aluminum alloys based on femtosecond laser-induced breakdown spectroscopy. Opt. Express. 2023, 31, 27633–27653. [Google Scholar] [CrossRef] [PubMed]
  22. Ma, Q.; Liu, Z.; Zhang, T.; Zhao, S.; Gao, X.; Sun, T.; Dai, Y. Multielement simultaneous quantitative analysis of trace elements in stainless steel via full spectrum laser-induced breakdown spectroscopy. Talanta 2024, 272, 125745. [Google Scholar] [CrossRef]
  23. Liu, Z.; Ma, Q.; Zhang, T.; Zhao, S.; Gao, X.; Sun, T.; Dai, Y. Quantitative modeling and uncertainty estimation for small-sample LIBS using Gaussian negative log-likelihood and monte carlo dropout methods. Opt. Laser Technol. 2025, 181, 111720. [Google Scholar] [CrossRef]
  24. Dai, Y.; Ma, Q.; Zhang, T.; Zhao, S.; Zhou, L.; Gao, X.; Liu, Z. Classification of aluminum alloy using laser-induced breakdown spectroscopy combined with discriminative restricted Boltzmann machine. Chemom. Intell. Lab. Syst. 2025, 258, 105342. [Google Scholar] [CrossRef]
  25. Song, Y.; Cao, S.; Chu, X.; Zhou, Y.; Xu, Y.; Sun, T.; Zhou, G.; Liu, X. Non-destructive detection of moisture and fatty acid content in rice using hyperspectral imaging and chemometrics. J. Food Compos. Anal. 2023, 121, 105397. [Google Scholar] [CrossRef]
  26. Bischl, B.; Binder, M.; Lang, M.; Pielok, T.; Richter, J.; Coors, S.; Lindauer, M. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2023, 13, e1484. [Google Scholar] [CrossRef]
  27. Gong, Q.; Yu, J.; Guo, Z.; Fu, K.; Xu, Y.; Zou, H.; Han, Z. Accumulation mechanism of metabolite markers identified by machine learning between Qingyuan and Xiushui counties in Polygonatum cyrtonema Hua. BMC Plant Biol. 2024, 24, 173. [Google Scholar] [CrossRef] [PubMed]
  28. Nie, F.; Hao, Z.; Wang, R. Multi-Class Support Vector Machine with Maximizing Minimum Margin. In Proceedings of the AAAI’24/IAAI’24/EAAI’24, Vancouver, BC, Canada, 20–27 February 2024; pp. 14466–14473. [Google Scholar]
  29. Salcedo-Sanz, S.; Rojo-Álvarez, L.; Martínez-Ramón, M.; Camps-Valls, G. Support vector machines in engineering: An overview. Data Min. Knowl. Disc. 2014, 4, 234–267. [Google Scholar] [CrossRef]
  30. Zhu, Q.; Zhang, P.; Wang, Z.; Ye, X. A New Loss Function for CNN Classifier Based on Predefined Evenly-Distributed Class Centroids. IEEE Access 2020, 8, 10888–10895. [Google Scholar] [CrossRef]
Figure 1. (a) Schematic diagram of the hyperspectral experimental setup; (b) photographs of the 9 varieties of A. roxburghii samples; (c) hyperspectral images of the samples.
Figure 1. (a) Schematic diagram of the hyperspectral experimental setup; (b) photographs of the 9 varieties of A. roxburghii samples; (c) hyperspectral images of the samples.
Plants 14 01177 g001
Figure 2. Architecture of the CNN model used for classifying hyperspectral data of A. roxburghii varieties.
Figure 2. Architecture of the CNN model used for classifying hyperspectral data of A. roxburghii varieties.
Plants 14 01177 g002
Figure 3. (a,b) Original hyperspectral spectra of the adaxial (front) and abaxial (back) surfaces of A. roxburghii; (c,d) confusion matrices for the classification of A. roxburghii (adaxial and abaxial) versus counterfeit species (adaxial and abaxial). (e) Learning curve. The blue line represents training accuracy, and the red line represents validation accuracy across different training sample sizes.
Figure 3. (a,b) Original hyperspectral spectra of the adaxial (front) and abaxial (back) surfaces of A. roxburghii; (c,d) confusion matrices for the classification of A. roxburghii (adaxial and abaxial) versus counterfeit species (adaxial and abaxial). (e) Learning curve. The blue line represents training accuracy, and the red line represents validation accuracy across different training sample sizes.
Plants 14 01177 g003
Figure 4. Classification results of A. roxburghii using SVM, KNN, and LDA. (a,c,e) represent the classification results for the adaxial (front) leaf surface; (b,d,f) show the classification results for the abaxial (back) leaf surface.
Figure 4. Classification results of A. roxburghii using SVM, KNN, and LDA. (a,c,e) represent the classification results for the adaxial (front) leaf surface; (b,d,f) show the classification results for the abaxial (back) leaf surface.
Plants 14 01177 g004
Figure 5. Training results of the CNN model. (a) Loss function during training. (b) Accuracy progression during training. (c) Confusion matrix of the model’s classification performance on the test set. (d) Results of the Tukey HSD test comparing model classification accuracy.
Figure 5. Training results of the CNN model. (a) Loss function during training. (b) Accuracy progression during training. (c) Confusion matrix of the model’s classification performance on the test set. (d) Results of the Tukey HSD test comparing model classification accuracy.
Plants 14 01177 g005
Table 1. Comparison of classification performance across different models.
Table 1. Comparison of classification performance across different models.
ModelAccuracyPrecisionRecallF1-ScoreTrain Accuracy95% CI (Train Accuracy)
SVMfront0.56940.58440.56940.57040.5243 ± 0.0474[0.4584, 0.5901]
back0.80560.81130.80560.80550.6665 ± 0.0276[0.6281, 0.7048]
MF-SVMfront0.56940.58440.56940.57040.5383 ± 0.0404[0.4881, 0.5885]
back0.75000.77380.75000.75450.6629 ± 0.0350[0.6195, 0.7063]
AF-SVMfront0.56940.58440.56940.57040.5382 ± 0.0388[0.4843, 0.5920]
back0.79170.83040.79170.79360.6699 ± 0.0380[0.6171, 0.7227]
GF-SVMfront0.62500.67130.62500.61470.5416 ± 0.0263[0.5089, 0.5743]
back0.73610.78820.73610.74280.6700 ± 0.0213[0.6436, 0.6964]
SG-SVMfront0.58330.60010.58330.57820.5415 ± 0.0343[0.5115, 0.5715]
back0.79170.83040.79170.79360.6699 ± 0.0380[0.6366, 0.7033]
PCA-SVMfront0.56940.59530.56940.57310.5070 ± 0.0344[0.4643, 0.5497]
back0.63890.66240.63890.63710.6495 ± 0.0275[0.6153, 0.6837]
KNNfront0.37500.31990.37500.33020.4056 ± 0.0893[0.2946, 0.5165]
back0.51390.55280.51390.51440.4750 ± 0.0753[0.3816, 0.5684]
MF-KNNfront0.37500.32150.37500.33100.4056 ± 0.0972[0.2849, 0.5262]
back0.51390.55280.51390.51440.4806 ± 0.0759[0.3864, 0.5748]
AF-KNNfront0.37500.32350.37500.33190.4586 ± 0.0359[0.4140, 0.5032]
back0.51390.55280.51390.51440.4615 ± 0.0653[0.3805, 0.5426]
GF-KNNfront0.37500.32150.37500.33100.4056 ± 0.0972[0.2849, 0.5262]
back0.51390.55280.51390.51440.4806 ± 0.0759[0.3864, 0.5748]
SG-KNNfront0.37500.32150.37500.33100.4517 ± 0.0380[0.4184, 0.4850]
back0.51390.55280.51390.51440.4615 ± 0.0653[0.4043, 0.5187]
PCA-KNNfront0.45830.51520.45830.45850.4064 ± 0.0655[0.3489, 0.4638]
back0.62500.65150.62500.62240.5694 ± 0.0359[0.5379, 0.6008]
LDAfront0.56940.56650.56940.56090.5139 ± 0.1185[0.3667, 0.6610]
back0.75000.76250.75000.75170.6833 ± 0.0671[0.6000, 0.7667]
MF-LDAfront0.40280.40590.40280.39230.4410 ± 0.0817[0.3694, 0.5125]
back0.69440.72050.69440.68930.7044 ± 0.0734[0.6401, 0.7688]
AF-LDAfront0.34720.38470.34720.34640.4417 ± 0.0397[0.3924, 0.4909]
back0.65280.69430.65280.65000.6194 ± 0.0509[0.5562, 0.6827]
GF-LDAfront0.52780.52840.52780.51250.4621 ± 0.1120[0.3231, 0.6012]
back0.79170.80370.79170.78780.6909 ± 0.0589[0.6178, 0.7640]
SG-LDAfront0.43060.48580.43060.44700.4204 ± 0.0864[0.3005, 0.5404]
back0.54170.59830.54170.54240.6245 ± 0.0620[0.5385, 0.7105]
PCA-LDAfront0.55560.56630.55560.55320.5106 ± 0.0654[0.4532, 0.5680]
back0.63890.66290.63890.63720.6075 ± 0.0435[0.5694, 0.6456]
CNNfront0.90280.93870.90280.90390.9531 ± 0.0064[0.9452, 0.9610]
back0.97220.97730.97220.96900.9939 ± 0.0052[0.9874, 1.0000]
CNNfusion1.00001.00001.00001.00001.0000±0[1.0000, 1.0000]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, Y.; Ding, H.; Zhang, T.; Wang, Z.; Wang, H.; Zhou, L.; Dai, Y.; Liu, Z. Small-Sample Authenticity Identification and Variety Classification of Anoectochilus roxburghii (Wall.) Lindl. Using Hyperspectral Imaging and Machine Learning. Plants 2025, 14, 1177. https://doi.org/10.3390/plants14081177

AMA Style

Xu Y, Ding H, Zhang T, Wang Z, Wang H, Zhou L, Dai Y, Liu Z. Small-Sample Authenticity Identification and Variety Classification of Anoectochilus roxburghii (Wall.) Lindl. Using Hyperspectral Imaging and Machine Learning. Plants. 2025; 14(8):1177. https://doi.org/10.3390/plants14081177

Chicago/Turabian Style

Xu, Yiqing, Haoyuan Ding, Tingsong Zhang, Zhangting Wang, Hongzhen Wang, Lu Zhou, Yujia Dai, and Ziyuan Liu. 2025. "Small-Sample Authenticity Identification and Variety Classification of Anoectochilus roxburghii (Wall.) Lindl. Using Hyperspectral Imaging and Machine Learning" Plants 14, no. 8: 1177. https://doi.org/10.3390/plants14081177

APA Style

Xu, Y., Ding, H., Zhang, T., Wang, Z., Wang, H., Zhou, L., Dai, Y., & Liu, Z. (2025). Small-Sample Authenticity Identification and Variety Classification of Anoectochilus roxburghii (Wall.) Lindl. Using Hyperspectral Imaging and Machine Learning. Plants, 14(8), 1177. https://doi.org/10.3390/plants14081177

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop