You are currently viewing a new version of our website. To view the old version click .
Foods
  • Article
  • Open Access

1 January 2026

Authenticity Identification and Quantitative Analysis of Dendrobium officinale Based on Near-Infrared Spectroscopy Combined with Chemometrics

,
,
,
,
,
and
1
School of Basic Chinese Medicine, Guizhou University of Traditional Chinese Medicine, Guizhou 550000, China
2
Jiangsu Province Engineering Research Center of Classical Prescription, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
3
School of Pharmacy, Guizhou University of Traditional Chinese Medicine, Guizhou 550000, China
*
Author to whom correspondence should be addressed.
Foods2026, 15(1), 121;https://doi.org/10.3390/foods15010121 
(registering DOI)
This article belongs to the Special Issue Advances in Food Analysis: The Role of Chemometrics and Smart Analytical Systems

Abstract

Dendrobium officinale is a valuable medicinal and edible homologous health food. It has immunomodulatory, antioxidant, and metabolism-regulating properties. However, its adulteration is widespread, seriously compromising product quality and safety. Traditional adulteration detection methods are complex, costly, and time-consuming, making it urgent to establish a rapid and non-destructive detection approach. This study developed a rapid identification and quantification method for adulterated D. officinale. The method combined near-infrared (NIR) spectroscopy with data-driven soft independent modeling of class analogy (DD-SIMCA) and partial least squares regression (PLSR) models. PCA, PLS-DA, and OPLS-DA were first used to visualize sample clustering and group differences. DT, SVM, ANN, and NB were used for classification. DD-SIMCA and PLSR were used for one-class modeling and quantitative analysis. Raw spectral data were preprocessed using multiplicative scatter correction (MSC), the standard normal variate (SNV), the first derivative, and Savitzky–Golay smoothing. In the identification analysis, the DD-SIMCA model achieved 100% sensitivity and 100% specificity in the validation set. Its overall accuracy in the independent test set was 99.2%, demonstrating excellent discrimination performance. In addition, SVM combined with NIR also achieved good accuracy. In the quantitative analysis of adulteration, the PLSR model predicted different adulteration levels. Most calibration and validation sets showed R2 values above 0.99 and RMSE values below 0.05, indicating excellent predictive performance. The results indicate that NIR combined with DD-SIMCA and PLSR can achieve rapid identification and accurate quantification of adulterated D. officinale samples. This approach provides strong support for quality control and regulatory supervision of high-value health foods.

1. Introduction

In China and other Southeast Asian countries, plants of the genus Dendrobium hold dual value as both food ingredients and traditional herbal medicines. Among them, Dendrobium officinale Kimura et Migo (DKM) is recognized as one of the most precious Chinese medicinal materials, and it is widely distributed across southern China and Southeast Asia [1,2]. According to the Pharmacopoeia of the People’s Republic of China, there are two independent entries for Dendrobium—“Shi Hu” and “Tie Pi Shi Hu”. The “Tie Pi Shi Hu” entry exclusively lists DKM as its source plant, whereas “Shi Hu” encompasses Dendrobium nobile Lindl. (DL), Dendrobium fimbriatum Hook. (DH), and several other congeneric species. Modern pharmacological studies have demonstrated that DKM possesses several biological activities, including immunomodulation, antioxidant effects, and mucosal protection [3,4]. These properties support its wide application in anti-aging, adjuvant diabetes therapy, and gastrointestinal regulation.
In recent years, DKM has been increasingly developed into various products, including health foods and skincare formulations, across Asia, reflecting its substantial economic value. However, the widespread adulteration of DKM—primarily involving the mixing of other Dendrobium species—has emerged as a significant challenge. Currently, there is no conclusive evidence confirming whether these adulterants provide the same therapeutic efficacy as DKM. Consequently, such adulteration not only disrupts market order but also undermines the rights of consumers and patients. Therefore, the development of a rapid and accurate authentication strategy for DKM is urgently needed. Nevertheless, because these species are closely related to DKM, their morphological characteristics are highly similar, making visual discrimination extremely difficult. Furthermore, DKM used in health products is commonly processed into powder, which further complicates authenticity identification.
In current research on the authenticity and adulteration identification of DKM, multiple-source chromatographic platforms such as Near-Infrared (NIR) Spectroscopy, Laser-Induced Breakdown Spectroscopy (LIBS), and UHPLC-MS/MS are widely applied. These methods, combined with chemometrics and machine learning approaches like PLS-DA and SVM, have significantly improved the accuracy of classification and identification. Relevant studies show that the prediction accuracy for Dendrobium species discrimination is close to 90% when combining NIR-Hyperspectral Imaging, feature selection, and SVM [5]. Furthermore, LIBS can achieve rapid differentiation of Dendrobium and its closely related species by analyzing differences in elemental composition or characteristic peaks [6]. Meanwhile, mass spectrometry-based and deep learning models for origin and quality traceability have further demonstrated their strong applicability for complex pattern recognition tasks, providing more reliable technical support for DKM adulteration monitoring [7]. In addition, NMR metabolomics and qNMR fingerprinting have been widely applied in Dendrobium species differentiation and metabolite profiling due to their clear structural resolution and high reproducibility [8,9,10]. However, their discriminative ability becomes limited when dealing with complex matrices, low-abundance constituents, or multi-source commercial materials. Thus, despite the advantages of multi-spectral platforms, current studies still lack targeted investigation specifically focusing on DKM itself [11,12].
NIR is an analytical technique that involves the interaction between light in the wavenumber range of approximately 13,000–4000 cm−1 (corresponding to wavelengths of 780–2500 nm) and matter. It primarily reflects the overtone and combination vibration absorptions of hydrogen-containing functional groups, such as C–H, O–H, and N–H [13]. Compared with conventional chromatographic or mass spectrometric techniques, NIR analysis offers advantages such as rapid detection, non-destructive testing, simple operation, and environmental friendliness. Consequently, NIR has been widely applied in compositional analysis and quality control of foods, Chinese herbal medicines, and agricultural products. For authenticity identification, NIR combined with multivariate analysis and machine learning algorithms can effectively distinguish adulterated samples from genuine ones, making it particularly suitable for non-destructive detection of powdered substances. Representative applications include identifying adulteration in Arnebia Radix, detecting non-target crop powders in Tartary buckwheat flour, and recognizing starchy adulterants in Panax notoginseng powder [14,15,16].
However, these conventional adulteration detection methods still present clear limitations. They depend heavily on prior knowledge of adulterants and often exhibit low sensitivity to minor adulteration. In contrast, data-driven soft independent modeling of class analogy (DD-SIMCA) offers distinct advantages for such tasks. This class modeling approach explores the intrinsic structure of the data using principal component analysis. It constructs a feature space based solely on authentic samples, without requiring prior knowledge of adulterants. Consequently, it can effectively identify unknown adulteration. By focusing on intrinsic intra-class features, DD-SIMCA minimizes model fluctuations caused by variations in adulteration type or proportion. This method thus provides a novel and efficient technical pathway for authenticity assessment of D. officinale.

2. Materials and Methods

2.1. Sample Collection and Preparation

Ten batches of Dendrobium officinale Kimura et Migo (DKM) were collected, along with one batch each of D. loddigesii (DL), D. hancockii (DH), DKM leaves (DKMLs), bamboo, and corn. Bamboo powder and corn powder were purchased from local markets (Guiyang, Guizhou Province, China), while the remaining samples were sourced from the Bozhou market in Anhui Province, China. All plant materials were authenticated as genuine by Professor Qingwen Sun from Guizhou University of Traditional Chinese Medicine.
Each sample was finely ground and sieved through a No. 5 sieve (80 mesh) to ensure a consistent particle size. Representative adulterants are shown in Supplementary Figure S1. Based on the experimental design, the materials were divided into three categories: pure DKM samples, adulterated DKM samples (with 20%, 40%, 60%, and 80% adulteration levels), and pure adulterant samples (bamboo, corn, DL, DH, and DKML). A total of 275 samples were prepared, and detailed sample information is provided in Supplementary Table S1.

2.2. NIR Spectral Acquisition

Diffuse reflectance near-infrared spectra were acquired using the Fourier-transform NIR spectrometer (Thermo Fisher Scientific, Waltham, MA, USA) equipped with an integrating sphere. Spectra were collected over the range of 10,000–4000 cm−1 with a resolution of 8 cm−1, resulting in 1557 spectral data points per sample. The data were expressed as log(1/R), where R is the relative reflectance.
For each measurement, approximately 3 g of powder was placed in a sample cup, gently shaken, and leveled to ensure uniform packing. Sixty-four scans were averaged to enhance the signal-to-noise ratio and minimize random error. All spectral measurements were conducted under consistent laboratory conditions to reduce environmental variability.

2.3. Exploratory Data Analysis

To explore sample distribution patterns and potential clustering, multivariate visualization techniques were applied. These included Principal Component Analysis (PCA-X), Partial Least Squares Discriminant Analysis (PLS-DA), and Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA) [17]. PCA-X, an unsupervised method, was used to examine the overall variance structure of the spectral data. In contrast, PLS-DA and OPLS-DA, both supervised methods, incorporated group information to enhance class discrimination.
PLS-DA was used to construct an optimal regression model between spectral variables and categorical sample groups. OPLS-DA further removed orthogonal components unrelated to class differences, thereby improving interpretability and predictive performance [18]. Model quality was assessed using R2 (explained variance) and Q2 (predictive ability) obtained through cross-validation.

2.4. Chemometric Classification Modeling

Supervised machine learning methods were used to construct classification models capable of distinguishing genuine DKM from adulterated samples. Four commonly used algorithms—Decision Tree (DT) [19], Support Vector Machine (SVM) [20], Artificial Neural Network (ANN) [21], and Naive Bayes (NB) [22]—were implemented in MATLAB R2022b (MathWorks, Natick, MA, USA) using the Classification Learner Toolbox. The NIR spectral data served as input variables, while class labels (genuine/adulterated) were used as outputs.
A 10-fold cross-validation strategy was adopted to enhance model robustness and reduce overfitting. Classification accuracy and confusion matrices were used as evaluation metrics to compare the discriminative performance of different algorithms.

2.5. DD-SIMCA Single-Class Modeling

The DD-SIMCA is a single-class classification algorithm that focuses on modeling the intrinsic structure of genuine samples to enable sensitive detection of deviations, such as adulteration. In this study, DD-SIMCA was implemented following a standard three-step procedure [23]. First, PCA was performed on the calibration set to decompose the spectra and extract principal components representing the major variance structure. Next, score distance and orthogonal distance for each calibration sample were calculated based on the PCA scores. These distances were combined with data-driven estimates of scaling factors and degrees of freedom to obtain the total distance. Finally, the acceptance region and classification thresholds for the positive class were established, such that a new sample falling within this region would be identified as genuine.
To ensure robust evaluation and model generalization, the Kennard-Stone algorithm was applied to partition the dataset into three subsets—the training, validation, and independent external test sets—using a 6:2:2 ratio [24]. The Validation Set (20%) was strictly dedicated to hyperparameter optimization, thus preventing data leakage. Crucially, the Independent External Test Set (20%) was reserved and completely untouched throughout the entire modeling and parameter tuning process. This set was used only once for the final, unbiased evaluation of the model’s predictive performance, thereby ensuring the reliability and generalizability of the reported results [25]. To strictly control the misclassification risk of genuine samples as adulterated, the Type I error rate and outlier significance level were both set to 0.01. The optimal number of principal components (PCs) was determined by adjusting hyperparameters on the validation set to achieve the best balance between sensitivity and specificity.
The performance of the DD-SIMCA model was evaluated using accuracy, specificity, and sensitivity. Accuracy refers to the proportion of correctly classified samples among all samples; specificity indicates the proportion of correctly identified negative-class samples (adulterated) among actual negative-class samples; and sensitivity represents the proportion of correctly classified positive-class samples (genuine) among actual positive-class samples. These metrics were calculated based on the counts of correctly and incorrectly classified genuine and adulterated samples, providing a comprehensive assessment of the model’s discriminative ability.

2.6. PLSR Quantitative Modeling

Partial Least Squares Regression (PLSR) is a multivariate regression method that reduces the dimensionality of spectral and dependent variables while maximizing their covariance, suitable for high-dimensional and collinear data [26]. In this study, PLSR was applied to quantitatively predict adulteration ratios in DKM samples based on NIR spectra.
First, raw spectral data were preprocessed to reduce noise, correct baseline drift, and minimize scattering effects. Specifically, fourteen preprocessing combinations were tested, including 1st and 2nd derivatives (1st Der & 2nd Der), Savitzky–Golay (SG) smoothing, multiplicative scatter correction (MSC), standard normal variate (SNV), and their combinations, in order to enhance spectral features and improve model performance [27].
Then, the data were divided into calibration and prediction sets using the Kennard–Stone algorithm at a 7:3 ratio. Subsequently, the number of latent variables (LVs) was determined via 10-fold cross-validation, by selecting those corresponding to the minimum root mean square error of cross-validation (RMSECV).
To comprehensively evaluate the model’s performance, several metrics were calculated, including the coefficient of determination of the calibration set (R2c), coefficient of determination of the prediction set (R2p), root mean square error of calibration (RMSEC), root mean square error of prediction (RMSEP), mean absolute error (MAE), and relative percent deviation (RPD), which collectively reflect fitting accuracy, predictive ability, and model generalization.
Finally, for each DKM–adulterant pair, the PLSR model with the optimal preprocessing strategy was selected to quantify adulteration content, ensuring reliable prediction and supporting authenticity assessment and quality control.

2.7. Data Analysis

DD-SIMCA and PLSR modeling procedures were performed in MATLAB R2022b. Spectral preprocessing, PCA, PLS-DA, and OPLS-DA analyses were conducted using SIMCA 14.1 (Umetrics, Umeå, Sweden).

3. Results

3.1. Near-Infrared Spectral Analysis

NIR spectroscopy provides characteristic absorption bands that arise from the first overtones and combination vibrations of molecular functional groups. In this study, the representative absorption peaks were mainly located in the ranges of 7075–6600 cm−1, 5200–5100 cm−1, and 4800–4600 cm−1. Specifically, the band at 7075–6600 cm−1 is generally attributed to the first overtone of O–H stretching vibrations. Its pronounced intensity suggests a high abundance of hydroxyl-containing compounds, such as polyphenols and alcohols [28]. The absorption band observed at 5200–5100 cm−1 may originate from the combination of O–H stretching and C–H bending vibrations, reflecting the synergistic vibrational features of hydroxyl and alkyl groups in sugars or polysaccharides [29]. Meanwhile, the band at 4800–4600 cm−1 is commonly associated with the first overtone of C–H stretching vibrations and O–H combination frequencies in water, and it is often used to characterize trace moisture or bound water in plant samples [30].
A comparison of the average spectra between DKM and adulterated samples revealed subtle but discernible differences in absorption intensity, peak shape, and fine spectral structure at specific wavenumbers, particularly near 5600 and 5100 cm−1. These variations likely reflect compositional differences, including the relative contents of polysaccharides, alcohols, and proteins, as well as structural differences such as hydroxyl positioning, chain length, or molecular conformation. However, the complexity of the spectral signals makes it difficult to achieve accurate classification based solely on direct spectral inspection. Therefore, multivariate statistical analysis and machine learning methods were applied to further explore and extract latent discriminative information from the high-dimensional data.
It should be noted that raw NIR spectra of powdered plant materials are often affected by baseline drift, scattering effects, and random noise. These interferences can obscure subtle chemical information and reduce model robustness [31]. To address these issues, spectral preprocessing techniques were applied, including 1st Der, MSC, and SG filtering. These methods enhance spectral resolution, reduce scattering, and correct baseline fluctuations, respectively [32,33].
After evaluating model performance under different preprocessing strategies, only minor differences were observed in R2 and Q2 values across methods. This indicates comparable feature representation and robust support for subsequent class discrimination. Nevertheless, SG filtering yielded the best overall performance, indicating its superior ability to preserve informative spectral details while reducing noise (Figure 1). This result may be attributed to SG filtering’s ability to effectively smooth random noise while preserving key spectral features, such as peak shapes and local variations. Such properties promote more reliable feature extraction and ultimately improve both classification and quantitative modeling accuracy [34].
Figure 1. (A) Average raw NIR spectra of all powder samples. (B) Average raw NIR spectra of individual pure powders. (C) Optimized NIR spectra of all powder samples after SG pretreatment. (D) Optimized NIR spectra of individual pure powders after SG pretreatment.

3.2. Discriminant Analyses Using PCA, PLS-DA, and OPLS-DA

To further investigate the spectral differences between genuine DKM and adulterated samples, PCA, PLS-DA, and OPLS-DA models were constructed, and their discriminative performances were systematically evaluated.
In the unsupervised analysis, PCA was employed to reveal the overall distribution structure of the samples. Only four PCs were extracted, explaining 99.9% of the total variance, indicating that the dataset was highly compact and the selected PCs sufficiently captured its structure. To assess intra-class consistency, separate PCA models were constructed for six categorical subsets. Each subset exhibited high goodness of fit and predictive ability (R2X > 0.95 and Q2 > 0.994), reflecting the strong spectral homogeneity within each group rather than indicating superior model robustness. However, as an unsupervised method, PCA mainly reveals overall trends and clustering patterns. Its capacity to achieve complete separation among categories is limited, which makes precise classification impossible.
For supervised modeling, PLS-DA was first applied. The model achieved excellent goodness of fit (R2X = 1.000) when eight PCs (A = 8) were extracted, yet its predictive ability remained moderate (Q2 = 0.793).This result indicates that PLS-DA alone cannot achieve complete discrimination between adulterated and genuine samples. By further incorporating orthogonal signal correction, the OPLS-DA model was constructed, which substantially improved predictive performance. With eleven PCs (A = 11), OPLS-DA achieved the same R2X value (1.000) while increasing Q2 to 0.882, reflecting enhanced generalization ability and model stability.
Although OPLS-DA more effectively removes variations irrelevant to class discrimination compared with PLS-DA, traditional spectral analysis and multivariate modeling methods still struggle to achieve full separation between adulterated and genuine samples. These results underscore the necessity of integrating advanced optimization algorithms and efficient machine learning approaches to improve classification accuracy (Supplementary Table S2, Figure 2).
Figure 2. NIR multivariate analysis results: (A,D) PCA-X; (B,E) PLS-DA; (C,F) OPLS-DA.

3.3. Discriminant Analysis Using Machine Learning Algorithms

Given that traditional chemometric approaches (PCA, PLS-DA, and OPLS-DA) were insufficient to achieve complete separation between genuine and adulterated DKM samples, we further constructed four supervised machine learning models—DT, SVM, ANN, and NB. These algorithms offer distinct advantages for handling complex, high-dimensional datasets: SVM maximizes class separation by constructing an optimal hyperplane; ANN mimics neuronal information processing, effectively capturing nonlinear feature patterns; DT offers an interpretable, hierarchical decision structure; and NB applies probabilistic reasoning under the assumption of conditional feature independence, enabling fast and computationally efficient classification [35].
The classification results on the test set demonstrated that SVM achieved 100% accuracy, followed by ANN at 100%, DT at 97.4%, and NB at 81.3% (Supplementary_Table S3, Figure 3A–D). SVM correctly classified all samples, indicating excellent generalization ability and strong adaptability to spectral discrimination tasks. ANN also performed robustly, effectively identifying subtle differences between genuine and adulterated samples. In contrast, DT and NB exhibited lower accuracy, likely due to their sensitivity to noise and limited capacity to model complex relationships within high-dimensional spectral features.
Figure 3. Classification analysis based on NIR spectra: (A) DT test confusion matrix; (B) NB test confusion matrix; (C) ANN test confusion matrix; (D) SVM test confusion matrix; (E,F) DD-SIMCA acceptance plots. In (AD), blue cells represent correctly classified samples, while pink cells indicate misclassified samples. In (E,F), red markers represent samples identified as adulterated, and green markers represent samples identified as genuine by the model.
These findings highlight that machine learning-based approaches, particularly SVM and ANN, outperform traditional multivariate statistical methods in authenticity identification of DKM. Nevertheless, potential limitations remain: misclassification or overfitting may arise when models encounter previously unseen adulterant types, highly imbalanced datasets, or extreme adulteration ratios. Consequently, integrating single-class or one-class modeling frameworks may further strengthen model robustness and enhance reliability in practical authentication scenarios.

3.4. Classification Performance of the DD-SIMCA Model

Although multi-class models such as SVM show high classification accuracy, they depend on comprehensive training data covering all classes. In practical scenarios—where genuine samples far outnumber known adulterants or adulterant types are unknown or variable—such models may show reduced robustness [36]. To address this limitation, a single-class discriminant approach, DD-SIMCA, was employed to construct a robust model based exclusively on genuine samples. Traditional multi-class classification methods have limited adaptability under such conditions, particularly when class imbalance or unknown classes arise [37].
The DD-SIMCA model establishes an exclusive spectral feature space for the target class, enabling sensitive and efficient detection of samples that deviate from the genuine class without requiring prior knowledge of adulterant types [38]. This approach is particularly suitable for rapid authenticity identification in complex matrices. In the present study, when the number of PCs was set to seven, the DD-SIMCA model achieved 100% classification accuracy on the test set, indicating that all genuine samples were correctly identified, with no false negatives. Accuracy on the validation set reached 98.2% (Supplementary_Table S3, Figure 3E,F), showing only a slight decrease compared with the test set and indicating that the model generalizes well without evident overfitting.
The discriminant boundary plot further confirmed that all training set samples fell within the model-defined acceptance region, reflecting stable positive-class modeling and strong internal consistency. Only one genuine sample in the test set was misclassified, likely due to minor variations in its spectral features caused by factors such as origin, storage, or processing differences. Nevertheless, this did not affect the overall discriminative performance of the model [39].
Overall, the DD-SIMCA model effectively captures spectral differences between DKM and adulterated samples across key wavebands. It maps high-dimensional spectral data into a low-dimensional feature space via PCA. Even in scenarios with low adulteration ratios or highly deceptive adulterants, the model demonstrates excellent sensitivity and accuracy, highlighting its potential as a robust tool for food authenticity verification and adulteration detection.

3.5. Quantitative Analysis Using PLSR

Following the successful identification of DKM adulteration via PCA and machine learning algorithms, a PLSR model was developed to predict adulterant content. To further improve performance, multiple models were constructed, and the effects of different spectral preprocessing methods were systematically evaluated [40]. The optimal preprocessing strategy for each model was primarily determined based on the coefficient of determination (R2).
For specific comparisons, the DKM vs. DKML model exhibited optimal performance with SNV preprocessing, achieving an R2 of 0.9974. In the DKM vs. corn group, MSC yielded the R2 of 0.9992. The DKM vs. bamboo model performed best with the combination of SNV, 1st Der, and SG smoothing, resulting in an R2 of 0.9933. For DKM vs. DH, SNV combined with 2nd Der achieved the R2 of 0.9687. Finally, in the DKM vs. DL comparison, SNV + 1st Der demonstrated superior performance with an R2 of 0.9962.
It should be emphasized that the selection of the optimal preprocessing method was not solely based on R2. Model evaluation also considered RMSEC, RMSEP, relative percent deviation (RPD), and mean absolute error (MAE), ensuring that the chosen preprocessing provided high fitting accuracy, minimized prediction errors, and maintained strong generalization. These results highlight that tailoring preprocessing strategies to specific adulterant types significantly enhances the discriminative capability and stability of PLSR models. Overall, SNV and its derivative-based combinations consistently outperformed other preprocessing methods, effectively reducing spectral noise and amplifying differences between genuine and adulterated samples.
The regression plots further illustrate model performance, with data points closely aligned along the diagonal indicate accurate predictions [41] (Figure 4). Specifically, 87.2% of samples exhibited prediction errors ≤0.07%, demonstrating minimal deviation between predicted and actual adulterant concentrations (Table 1). Moreover, the close correspondence between R2 and RMSE values in the calibration and prediction sets confirms that the models were not overfit [42].
Figure 4. PLSR plots for the optimal models predicting adulterant content in DKM. (A) DKM vs. DKML; (B) DKM vs. Corn; (C) DKM vs. Bamboo; (D) DKM vs. DH; (E) DKM vs. DL. Data points along the diagonal line indicate agreement between predicted and actual adulteration levels.
Table 1. Performance metrics of PLSR models for predicting adulteration content in DKM with different preprocessing methods.
Collectively, these findings demonstrate that applying targeted preprocessing strategies in PLSR modeling substantially improves quantitative prediction of adulterants in DKM. This approach provides robust support for reliable adulteration detection and accurate concentration analysis.

4. Conclusions

In this study, a rapid, non-destructive, and efficient approach for the identification and quantitative analysis of DKM adulteration was established based on NIR spectroscopy combined with chemometric and machine learning methods. PCA and PLS-DA models initially revealed the overall distribution and clustering of samples, while OPLS-DA with orthogonal signal correction improved predictive performance and enhanced discrimination between genuine and adulterated samples.
The DD-SIMCA single-class model demonstrated excellent discriminative performance under the current study conditions. It achieved high-specificity identification of genuine DKM, with accuracies of 100% and 98.2% in the prediction and validation sets, respectively. The model also maintained good sensitivity even at the 20% adulteration ratio. For quantitative analysis, multiple PLSR models targeting different adulterants were constructed. Preprocessing strategies, including SNV, MSC, and derivative transformations, were applied to effectively enhance model fitting and predictive performance.
In conclusion, the integrated NIR–DD-SIMCA–PLSR approach provides a robust and rapid proof-of-concept for the authenticity identification and adulteration quantification of DKM. The non-destructive, high-throughput characteristics of this methodology make it highly promising for its eventual application in the quality control and supervision of DKM products. However, the limited number of genuine batches and laboratory-prepared adulterants may not fully reflect real-world variability, so the high model performance should be interpreted with caution. Future work should focus on expanding sample diversity to include varying origins, processing methods, and commercial contaminants to fully validate its generalizability and transition this powerful method from a preliminary study to broad practical adoption.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/foods15010121/s1, Supplementary_Material S1–S3 and Supplementary Figure S1.

Author Contributions

Conceptualization, Z.-L.F. and L.B.; methodology, Z.-L.F., Z.-T.Z. and L.B.; spectral data acquisition, Z.-L.F.; supervision, Y.-H.C. and X.P.; data collation, Q.L.; sample preparation, Q.L. and T.-W.S.; software, Z.-T.Z.; data preprocessing, L.B.; project administration, Y.-H.C. and X.P.; writing—original draft preparation, Z.-L.F., Q.L. and T.-W.S.; writing—review and editing, Y.-H.C. and X.P.; results visualization, T.-W.S.; methodology optimization, X.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guizhou Key Laboratory of Miao Medicine (No. Qiankehe Platform ZSYS [2025] 018) and the Scientific Research Project of Guizhou University of Traditional Chinese Medicine (No. Gui Zhong Yi Ke Yu[2025]48).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

The authors gratefully acknowledge Huaguo Chen and Qingwen Sun for their valuable technical support and guidance throughout this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
NIRNear-Infrared
LIBSLaser-Induced Breakdown Spectroscopy 
DD-SIMCAData-Driven Soft Independent Modeling of Class Analogy
PLSRPartial Least Squares Regression
MSCMultiplicative Scatter Correction
SGSavitzky–Golay
SNVStandard Normal Variate Transformation
DKMDendrobium officinale Kimura et Migo
DLDendrobium nobile Lindl
DHDendrobium fimbriatum Hook
DKMLDKM leaves
PCA-XPrincipal Component Analysis
PLS-DAPartial Least Squares Discriminant Analysis
OPLS-DAOrthogonal Partial Least Squares Discriminant Analysis
DTDecision Tree
SVMSupport Vector Machine
ANNArtificial Neural Network
NBNaive Bayes
LVsLatent Variables
RMSECVRoot Mean Square Error of Cross Validation
RMSECRoot Mean Square Error of Calibration
RMSEPRoot Mean Square Error of Prediction
MAEMean Absolute Error
RPDRelative Percent Deviation
1st DerFirst Derivative
PCsPrincipal Components

References

  1. Wang, M.; Shao, G.; Song, M.; Ye, Y.; Zhu, J.; Yang, X.; Song, X. Dynamic Changes in Functional Components of Dendrobium officinale and Their Applications in Food Science: A Review. Plant Foods Hum. Nutr. 2025, 80, 59. [Google Scholar] [CrossRef] [PubMed]
  2. Li, P.Y.; Li, L.; Wang, Y.Z. Traditional uses, chemical compositions and pharmacological activities of Dendrobium: A review. J. Ethnopharmacol. 2023, 310, 116382. [Google Scholar] [CrossRef]
  3. Qu, J.; Tan, S.; Xie, X.; Wu, W.; Zhu, H.; Li, H.; Liao, X.; Wang, J.; Zhou, Z.-A.; Huang, S.; et al. Dendrobium officinale Polysaccharide Attenuates Insulin Resistance and Abnormal Lipid Metabolism in Obese Mice. Front. Pharmacol. 2021, 12, 659626. [Google Scholar] [CrossRef] [PubMed]
  4. Duan, H.; Yu, Q.; Ni, Y.; Li, J.; Yu, L.; Yan, X.; Fan, L. Synergistic anti-aging effect of Dendrobium officinale polysaccharide and spermidine: A metabolomics analysis focusing on the regulation of lipid, nucleotide and energy metabolism. Int. J. Biol. Macromol. 2024, 278, 135098. [Google Scholar] [CrossRef]
  5. Li, K.; Guo, Y.; Zhong, H.; Jin, Y.; Li, B.; Fang, H.; Yao, L.; Zhao, C. Rapid identification of dendrobium species using near-infrared hyperspectral imaging technology. Sensors 2025, 25, 5625. [Google Scholar] [CrossRef]
  6. Zhang, T.; Liu, Z.; Ma, Q.; Hu, D.; Dai, Y.; Zhang, X.; Zhou, Z. Identification of dendrobium using laser-induced breakdown spectroscopy in combination with a multivariate algorithm model. Foods 2024, 13, 1676. [Google Scholar] [CrossRef]
  7. Lin, T.; Ye, Y.; Zhang, J.; Wang, J.; Hu, Z.; Linn, K.Z.; Chen, X.; Liu, H.; Liu, Z.; Yao, Q. Machine learning and uhplc-ms/ms-based discrimination of the geographical origin of Dendrobium officinale from Yunnan, China. Foods 2025, 14, 3442. [Google Scholar] [CrossRef] [PubMed]
  8. Zhang, X.; Zhang, S.; Gao, B.; Qian, Z.; Liu, J.; Wu, S.; Si, J. Identification and quantitative analysis of phenolic glycosides with antioxidant activity in methanolic extract of Dendrobium catenatum flowers and selection of quality control herb-markers. Food Res. Int. 2019, 123, 732–745. [Google Scholar] [CrossRef]
  9. Deng, Y.; Chen, L.-X.; Han, B.-X.; Wu, D.-T.; Cheong, K.-L.; Chen, N.-F.; Zhao, J.; Li, S.-P. Qualitative and quantitative analysis of specific polysaccharides in Dendrobium huoshanense by using saccharide mapping and chromatographic methods. J. Pharm. Biomed. Anal. 2016, 129, 163–171. [Google Scholar] [CrossRef]
  10. Qin, H.L.; Zhang, J.X.; Wang, Z.T.; Yang, X.S.; Xu, L.S.; Hao, X.J. Analysis of 1H-NMR fingerprint in stem of Dendrobium loddigesii. Zhongguo Zhong Yao Za Zhi 2002, 27, 919–923. [Google Scholar]
  11. Hao, L.; Shi, X.; Qin, S.; Dong, J.; Shi, H.; Wang, Y.; Zhang, Y. Genome-wide identification, characterization and transcriptional profile of the SWEET gene family in Dendrobium officinale. BMC Genom. 2023, 24, 378. [Google Scholar] [CrossRef]
  12. Meng, Y.; Wang, Y.; Zhang, L.; Li, J.; Hu, L.; Wu, Z.; Yang, L.; Wei, G. Identification of bibenzyls and evaluation of imitative wild planting techniques in Dendrobium officinale by HPLC-ESI-MS(n). J. Mass Spectrom. JMS 2023, 58, e4903. [Google Scholar] [CrossRef]
  13. Bec, K.B.; Grabska, J.; Huck, C.W. Near-Infrared Spectroscopy in Bio-Applications. Molecules 2020, 25, 2948. [Google Scholar] [CrossRef] [PubMed]
  14. Yu, Y.; Chai, Y.; Yan, Y.; Li, Z.; Huang, Y.; Chen, L.; Dong, H. Near-infrared spectroscopy combined with support vector machine for the identification of Tartary buckwheat (Fagopyrum tataricum (L.) Gaertn) adulteration using wavelength selection algorithms. Food Chem. 2025, 463, 141548. [Google Scholar] [CrossRef] [PubMed]
  15. Guan, H.; Zhang, Z.T.; Bai, L.; Chen, L.; Yuan, D.; Liu, W.; Chen, P.; Shi, Z.; Hu, C.; Xue, M.; et al. Multi-spectra combined with Bayesian optimized machine learning algorithms for rapid and non-destructive detection of adulterated functional food Panax notoginseng powder. J. Food Compos. Anal. 2024, 133, 106412. [Google Scholar] [CrossRef]
  16. Li, X.; Zhong, Y.; Li, J.; Lin, Z.; Pei, Y.; Dai, S.; Sun, F. Rapid identification and determination of adulteration in medicinal Arnebiae Radix by combining near infrared spectroscopy with chemometrics. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2024, 318, 124437. [Google Scholar] [CrossRef] [PubMed]
  17. Ben Salem, K.; Ben Abdelaziz, A. Principal Component Analysis (PCA). Tunis. Med. 2021, 99, 383–389. [Google Scholar]
  18. Baddini, A.L.Q.; Santos, J.; Tavares, R.R.; Paula, L.S.; Filho, H.; Freitas, R.P. PLS-DA and data fusion of visible Reflectance, XRF and FTIR spectroscopy in the classification of mixed historical pigments. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 265, 120384. [Google Scholar]
  19. Becker, T.; Rousseau, A.J.; Geubbelmans, M.; Burzykowski, T.; Valkenborg, D. Decision trees and random forests. Am. J. Orthod. Dentofac. Orthop. 2023, 164, 894–897. [Google Scholar] [CrossRef]
  20. Rodriguez-Perez, R.; Bajorath, J. Evolution of Support Vector Machine and Regression Modeling in Chemoinformatics and Drug Discovery. J. Comput.-Aided Mol. Des. 2022, 36, 355–362. [Google Scholar] [CrossRef]
  21. Jirik, M.; Moulisova, V.; Hlavac, M.; Zelezny, M.; Liska, V. Artificial neural networks and computer vision in medicine and surgery. Rozhl. Chir. Mesic. Ceskoslovenske Chir. Spol. 2022, 101, 564–570. [Google Scholar]
  22. Fautt, C.; Couradeau, E.; Hockett, K.L. Naive Bayes Classifiers and accompanying dataset for Pseudomonas syringae isolate characterization. Sci. Data 2024, 11, 178. [Google Scholar] [CrossRef] [PubMed]
  23. Pagani, A.P.; Camargo, G.; Ibanez, G.A.; Olivieri, A.C.; Pomerantsev, A.L.; Rodionova, O.Y. Data-Driven Version of Multiway Soft Independent Modeling of Class Analogy (N-Way DD-SIMCA): Theory and Application. Anal. Chem. 2024, 96, 4845–4853. [Google Scholar] [CrossRef] [PubMed]
  24. Jin, C.; Zhou, X.; He, M.; Li, C.; Cai, Z.; Zhou, L.; Qi, H.; Zhang, C. A novel method combining deep learning with the Kennard-Stone algorithm for training dataset selection for image-based rice seed variety identification. J. Sci. Food Agric. 2024, 104, 8332–8342. [Google Scholar] [CrossRef]
  25. Walston, S.L.; Seki, H.; Takita, H.; Mitsuyama, Y.; Sato, S.; Hagiwara, A.; Ito, R.; Hanaoka, S.; Miki, Y.; Ueda, D. Data set terminology of deep learning in medicine: A historical review and recommendation. Jpn. J. Radiol. 2024, 42, 1100–1109. [Google Scholar] [CrossRef]
  26. Sun, W.; Liu, S.; Zhang, X.; Zhu, H. Performance of hyperspectral data in predicting and mapping zinc concentration in soil. Sci. Total Environ. 2022, 824, 153766. [Google Scholar] [CrossRef] [PubMed]
  27. Liu, Z.; Zhang, R.; Yang, C.; Hu, B.; Luo, X.; Li, Y.; Dong, C. Research on moisture content detection method during green tea processing based on machine vision and near-infrared spectroscopy technology. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 271, 120921. [Google Scholar] [CrossRef]
  28. Wu, L.; Su, Y.; Yu, H.; Qian, X.; Zhang, X.; Wang, Q.; Kuang, H.; Cheng, G. Rapid Determination of Saponins in the Honey-Fried Processing of Rhizoma Cimicifugae by Near Infrared Diffuse Reflectance Spectroscopy. Molecules 2018, 23, 1617. [Google Scholar] [CrossRef]
  29. Kang, Y.; Long, T.; Qiao, Y.; Yi, H.; Wang, F.; Chen, C. Rapid quality evaluation of fried Radix Paeoniae Alba (Paeonia lactiflora Pall.) using electronic eye and near-infrared spectroscopy combined with chemometric methods. J. Food Compos. Anal. 2025, 143, 107621. [Google Scholar] [CrossRef]
  30. Fang, H.; Wang, Y.; Deng, J.; Zhang, H.; Wu, Q.; He, L.; Xu, J.; Shao, X.; Ouyang, X.; He, Z.; et al. Sepsis-Induced Gut Dysbiosis Mediates the Susceptibility to Sepsis-Associated Encephalopathy in Mice. mSystems 2022, 7, e0139921. [Google Scholar] [CrossRef]
  31. Fu, S.; Liu, F.; Zhi, X.; Wang, Y.; Liu, Y.; Chen, H.; Wang, Y.; Luo, M. Applications of functional near-infrared spectroscopy in non-drug therapy of traditional Chinese medicine: A review. Front. Neurosci. 2023, 17, 1329738. [Google Scholar] [CrossRef] [PubMed]
  32. Yan, C. A review on spectral data preprocessing techniques for machine learning and quantitative analysis. iScience 2025, 28, 112759. [Google Scholar] [CrossRef]
  33. Zhang, B.; Chen, X.; He, C.; Su, T.; Cao, K.; Li, X.; Duan, J.; Chen, M.; Zhu, Z.; Yu, W. Acute gastrointestinal injury and altered gut microbiota are related to sepsis-induced cholestasis in patients with intra-abdominal infection: A retrospective and prospective observational study. Front. Med. 2023, 10, 1144786. [Google Scholar]
  34. Gao, Z.; Zhang, M.; Liu, N.; Liang, W.; Sun, T. Distinguishing Low-Grade Chondrosarcoma and Osteochondroma Using Visible-Near Infrared Hyperspectral Spectral Characteristics. J. Biophotonics 2025, e202500440. [Google Scholar] [CrossRef]
  35. Alsariera, Y.A.; Baashar, Y.; Alkawsi, G.; Mustafa, A.; Alkahtani, A.A.; Ali, N. Assessment and Evaluation of Different Machine Learning Algorithms for Predicting Student Performance. Comput. Intell. Neurosci. 2022, 2022, 4151487. [Google Scholar] [CrossRef]
  36. Bao, H.; Bao, H.; Wang, Y.; Wang, F.; Jiang, Q.; He, X.; Li, H.; Ding, Y.; Zhu, C. Challenges and Strategies in the Industrial Application of Dendrobium officinale. Plants 2024, 13, 2961. [Google Scholar] [CrossRef] [PubMed]
  37. Rodionova, O.; Pomerantsev, A. Multi-block DD-SIMCA as a high-level data fusion tool. Anal. Chim. Acta 2023, 1265, 341328. [Google Scholar] [CrossRef] [PubMed]
  38. de Sousa, J.F.; Batista Braga, J.W.; Dias, A.C.B. Authenticity assessment of commercial natural sweeteners using near- and mid-infrared spectroscopy with DD-SIMCA modeling. Food Chem. 2025, 481, 143983. [Google Scholar] [CrossRef]
  39. Candeias, D.N.C.; Silva, K.M.; Pereira, H.S.; Bezerra, L.P.; da Silva, J.D.S.; Fernandes, D.D.S.; Diniz, P.H.G.D. Geographical origin authentication of instant coffee from southern Bahia using MIR and NIR spectroscopy coupled with DD-SIMCA. Food Chem. 2025, 479, 143698. [Google Scholar] [CrossRef]
  40. Lan, Z.; Zhang, Y.; Sun, Y.; Ji, D.; Wang, S.; Lu, T.; Cao, H.; Meng, J. Rapid quantitative detection of the discrepant compounds in differently processed Curcumae Rhizoma products by FT-NIR combined with VCPA-GA technology. J. Pharm. Biomed. Anal. 2021, 195, 113837. [Google Scholar] [CrossRef]
  41. Bai, L.; Zhang, Z.-T.; Guan, H.; Liu, W.; Chen, L.; Yuan, D.; Chen, P.; Xue, M.; Yan, G. Rapid and accurate quality evaluation of Angelicae Sinensis Radix based on near-infrared spectroscopy and Bayesian optimized LSTM network. Talanta 2024, 275, 126098. [Google Scholar] [CrossRef] [PubMed]
  42. Daba, S.D.; Honigs, D.; McGee, R.J.; Kiszonas, A.M. Prediction of Protein Concentration in Pea (Pisum sativum L.) Using Near-Infrared Spectroscopy (NIRS) Systems. Foods 2022, 11, 3701. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.