Next Article in Journal
An Improved POD Model for Fast Semi-Quantitative Analysis of Carbendazim in Fruit by Surface Enhanced Raman Spectroscopy
Previous Article in Journal
Green Synthesis of Silver Nanoparticles Using the Plant Extract of Acer oblongifolium and Study of Its Antibacterial and Antiproliferative Activity via Mathematical Approaches
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Rapid Discrimination and Prediction of Ginsengs from Three Origins Based on UHPLC-Q-TOF-MS Combined with SVM

1
CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
2
Jiangxi Provincial Key Laboratory for Pharmacodynamic Material Basis of Traditional Chinese Medicine, Ganjiang Chinese Medicine Innovation Center, Nanchang 330000, China
3
University of Chinese Academy of Sciences, Beijing 100049, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Molecules 2022, 27(13), 4225; https://doi.org/10.3390/molecules27134225
Submission received: 20 May 2022 / Revised: 28 June 2022 / Accepted: 28 June 2022 / Published: 30 June 2022

Abstract

:
Ginseng, which contains abundant ginsenosides, grows mainly in the Jilin, Liaoning, and Heilongjiang in China. It has been reported that the quality and traits of ginsengs from different origins were greatly different. To date, the accurate prediction of the origins of ginseng samples is still a challenge. Here, we integrated ultra-high-performance liquid chromatography quadrupole time-of-flight mass spectrometry (UHPLC-Q-TOF-MS) with a support vector machine (SVM) for rapid discrimination and prediction of ginseng from the three main regions where it is cultivated in China. Firstly, we develop a stable and reliable UHPLC-Q-TOF-MS method to obtain robust information for 31 batches of ginseng samples after reasonable optimization. Subsequently, a rapid pre-processing method was established for the rapid screening and identification of 69 characteristic ginsenosides in 31 batches ginseng samples from three different origins. The SVM model successfully distinguished ginseng origin, and the accuracy of SVM model was improved from 83% to 100% by optimizing the normalization method. Six crucial quality markers for different origins of ginseng were screened using a permutation importance algorithm in the SVM model. In addition, in order to validate the method, eight batches of test samples were used to predict the regions of cultivation of ginseng using the SVM model based on the six selected quality markers. As a result, the proposed strategy was suitable for the discrimination and prediction of the origin of ginseng samples.

1. Introduction

Ginseng is the dried root of Panax ginseng C. A. Mey, first recorded in the Shennong’s Classic of Materia Medica. It has been widely used in many disease for more than two thousand years, because of its wide range of pharmacological effects [1]. Modern pharmacological studies have shown that ginseng has various pharmacological activities such as anti-tumor [2], anti-oxidative [3], improving immunity [4], and enhancing memory [5]. In China, ginseng is mainly planted in the northeast regions, including Jilin (JL), Liaoning (LN), and Heilongjiang (HLJ). According to reports, the quality and traits of ginseng from different origins shows great diversity, due to different cultivation techniques and ecological environments [6,7]. Therefore, it is imperative to establish a method of quality evaluation to differentiate and characterize ginseng samples from different regions.
Phytochemical studies have revealed the major compositions in ginseng, including ginsenosides, polysaccharides, amino acids, polypeptides proteins, and volatile oils [8]. Among them, ginsenosides are considered the main active components [9,10,11,12,13]. In the 2020 edition of the Pharmacopoeia of the People’s Republic of China (ChP), only three ginsenosides and their contents were used as standards for quality evaluation of ginseng, making it impossible to distinguish ginsengs from different origins [14]. In recent years, methods based on liquid chromatography mass spectrometry (LC-MS) fingerprint, LC-MS quantification, and chemical pattern recognition have been widely used to solve this issue [15,16,17]. Xiu et al. quantified fourteen ginsenosides using UHPLC coupled with triple quadrupole mass spectrometer (QQQ-MS). Two commonly used traditional multivariate statistical analysis methods, principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) were further employed to evaluate differences in the contents of these ginsenosides between origins [18]. However, these methods still lacked objectivity and accuracy in their identification results. Additionally, the established QQQ quantitative method required fourteen reference standards for content determination, resulting in a high detection cost and poor practicality of this method. Thus, it is essential to develop a convenient, effective strategy for the accurate differentiation and characterization of ginsengs from different regions of cultivation.
Recently, the combination of UHPLC-MS and support vector machines (SVM) has been considered as a valid method for the authentication of species and the identification of origins for Traditional Chinese Medicines (TCMs), with satisfactory accuracy [19,20]. For instance, Zhao et al. [19] managed to distinguish different varieties of ginsengs using UHPLC-MS integrated with SVM and accurately distinguished the red ginseng from other ginseng samples (white ginseng, Panax quinquefolium, and Panax notoginseng) after sufficient training. However, ginseng from different origins exhibited high similarity in chemical composition, which increased the difficulty of identification. Thus, higher requirements in establishing a model and data processing of SVM is required. In addition, as far as we know, the discovery of quality markers based on SVM model remains challenging.
In this work, a rapid, convenient, and effective differentiation method based on UHPLC-Q-TOF-MS coupled with SVM was developed to evaluate ginseng samples collected from JL, LN, and HLJ. Firstly, stable and reliable data were generated by UHPLC-Q-TOF-MS, and common ginsenosides components of 31 batches of ginsengs were screened. Additionally, the SVM model was established to accurately classify ginseng from different origins using the normalized data. Furthermore, an algorithm of feature contribution values was introduced to the SVM model to obtain quality markers of the ginsengs from three origins. Finally, on the basis of these quality markers, the SVM model was shown to be able to discriminate and predict the geographical origins of ginseng. This strategy was verified by successfully distinguishing test samples from JL, LN, and HLJ, indicating great reliability and affectivity. Our strategy has the potential to provide references for the regional differentiation and traceability of other TCMs.

2. Results and Discussion

2.1. Development of Analysis Method of Ginsengs from Different Origins

2.1.1. Optimization of UHPLC-Q-TOF-MS Analysis Conditions

In order to achieve good separation effects and obtain high-quality UHPLC-Q-TOF-MS data, we optimized the extraction method, extraction solvents, composition of mobile phase, elution gradient, and injection concentration in detail during UHPLC-Q-TOF-MS analysis. The results showed that 40% ethanol with ultrasonic is suitable for the extraction of ginseng and that water (containing 0.01% formic acid, v:v) and acetonitrile (containing 0.01% formic acid, v:v) are preferred by the mobile phase system due to higher peak numbers and better resolution. The injection concentration of 20,000 ppm can obtain an excellent response and will not burden the instrument. These results are shown in Figure S1. Those results show that the optimization of UHPLC-Q-TOF-MS analysis conditions when used for ginseng is necessary to ensure that the samples enter the subsequent analysis in the best state.

2.1.2. Validation of the UHPLC-Q-TOF-MS Analysis Method

After the development of UHPLC-MS analysis conditions, the method was verified by QC samples. The stability and repeatability of the system were evaluated by extraction ion chromatograms (EICs) in QC samples. QC samples were run before and after injection every day, and one QC was inserted every ten samples during the injection. As shown in Table S1, information for a total of seven EICs was extracted from QC. The mass accuracy RSDs of those seven EICs was calculated to be from 1.10 × 10−4% to 1.46 × 10−4%, the RSDs of the retention time were from 0.06% to 0.43%, and the RSDs of the peak area were from 1.94% to 2.43%. The results showed good stability and repeatability of UHPLC-Q-TOF-MS. The analytical environment constructed by UHPLC-Q-TOF-MS can meet the needs of sample analysis and obtain real and robust data.

2.2. Rapid Screening and Identification of Characteristic Ginsenosides in Ginsengs from Different Regions

The developed UHPLC-Q-TOF-MS method was subsequently applied to the analysis of 31 batches of ginseng samples from JL, LN, and HLJ, and MS data were collected. The total ion chromatogram (TIC) of the ginseng sample by UHPLC-Q-TOF-MS is shown in Figure S2. We established a pre-processing method to rapidly filter high-quality information from redundant mass data for data analysis.
According to the processing of the workstation, we screened more than 6000 pieces of information from 31 batches of samples. After peak matching, alignment, and filtering, 122 common peaks were found in the data, and these common peaks were present in all 31 batches of ginsengs. Furthermore, we made comparisons using the in-house database (including MS and MS/MS information of over 400 ginsenosides collected from published references); 69 ginsenosides were quickly screened (Figure 1), and their chemical structures were preliminarily identified (Table 1).
Based on the in-house database, the fragmentation patterns of three typical ginsenosides were summarized. In PPT-type ginsenosides, such as Rg1, the parent ion [M-H]-(m/z 799) in the negative-ion mode showed a loss of two glucose to obtain aglycone protopanaxatriol(m/z 475), as shown in Figure S3A. In PPD-type ginsenosides, such as Rb1, the parent ion [M-H]-(m/z) in negative-ion mode showed a loss of four glucose residues to obtain protopanaxadiol(m/z 459), as shown in Figure S3B. In OA-type ginsenosides, such as Ro, the parent ion [M-H]-(m/z 955) in negative-ion mode showed a loss of two glucose residues and one glucuronic acid group to produce oleanolic acid (m/z 455), as shown in Figure S3C. In the negative-ion mode, the parent ion information was obtained by full scanning of UHPLC-Q-TOF-MS, also known as the MS1 fragment, which mainly exists in the form of [M-H]- and [M+HCOO]-. These were common adduct ion forms of ginsenoside, which is also consistent with the literature [16]. Under MS/MS mode, the sugar on the branched chain gradually cracked, and finally, a relatively stable parent nucleus with m/z of 475, 459, and 455 was detected in three typical ginsenoside styles [21]. In addition, it was found that these parent nucleus were not easy to cleave, and this fragment information is an important basis for our identification and classification of unknown ginsenosides. The full scan mode and MS/MS mode of six compounds, including Compound 13(ginsenoside Rg1), Compound 14(ginsenoside Re), Compound 27(ginsenoside Rf), Compound 46(ginsenoside Rb1), Compound 50(ginsenoside Rc), and Compound 56(ginsenoside Rb2) are shown as examples in Figure S4. These experimental results are consistent with the rules obtained in our summary [22].
Accordingly, 69 ginsenosides were quickly screened (Figure 1), and their chemical structures were preliminarily identified (Table 1). Although the clear structure could not be determined, it does not affect the types of unknown components, nor does it affect the subsequent model analysis.
In brief, the rapid pre-processing method was used for rapid screening and identification of 69 characteristic ginsenosides in 31 batches of ginseng samples from three different origins, and the data pre-processing was performed within an hour, which provided high-quality data for the subsequent multivariate statistical analysis.

2.3. Classification and Prediction of Ginsengs from Different Origins by Multivariate Statistical Analysis

2.3.1. Traditional Multivariate Statistical Analysis

Traditional multivariate statistical analyses, such as PCA and PLS-DA, were conducted using the peak areas of the 69 characteristic ginsenosides to elucidate the similarities and differences between ginsengs from three different geographical origins.
PCA, a commonly used unsupervised data processing model, was used to discover the trends of the ginseng samples from different growing origins. The first two principal components only accounted for 33.0% of the variation. As shown in Figure 2, the 31 batches of ginseng samples failed to establish origins.
Subsequently, a supervised data model PLS-DA was established to further to identify the samples by origins. The R2Y and Q2 of PLS-DA were 0.87 and 0.56, respectively. Furthermore, the PLS-DA model was evaluated using a permutation test shown in Figure S5. In the random permutation test (Figure S2), intercepts of R2 and Q2 were 0.371 and 0.277, respectively. As shown in the PLS-DA score plot (Figure 3), the ginseng samples in the three different geographical areas were divided into only two clusters (from or not from LN), suggesting the failure of identification. This was possibly because the least-squares method cannot effectively handle nonlinear MS data.

2.3.2. SVM Analysis

As a widely used method, SVM has been successfully applied in the quality control of TCM with satisfactory classification and prediction accuracy [20]. In this work, an SVM model was developed to discriminate and predict the ginsengs from cultivation regions, using the peak areas and normalized data of 69 characteristic ginsenosides as input vectors and regions as outputs.
The best values for parameter C and parameter γ of the SVM model were calculated using a grid search method combined with ten-fold cross-validation. The parameter C affected the distance between the support vector and the decision plane. The parameter γ was mainly used to map the height of the low-dimensional samples. Classification accuracy under different combinations for γ and C are shown in Figure 4. There was a large plateau, indicating that the SVM model was well-establishment, and a γ value of 0.03 and a C value of 1 were chosen in the ten-fold cross-validation for all data.
As shown in Table 2, the 31 batches of ginseng samples using peak areas were assigned to individual origins by peak areas with a prediction accuracy of 83%. However, the accuracy of the classification of regions reached 100% when normalized data were used. Therefore, data normalization significantly improved the SVM performance because the Z-Score normalization converted each feature into a standard normal distribution. This prevented the average and variance of the features from affecting the dimensionality reduction results.
Thus, the results strongly indicated that the developed SVM model with normalized data was a powerful tool for the geographical classification and prediction of ginsengs from JL, LN, and HLJ.

2.4. Discovery of Quality Markers of Ginsengs from Three Different Origins

As far as we know, key feature extraction for SVM is still a challenge, which cannot be handled by traditional statistical methods, such as the t-test. To deal with this problem, a permutation importance algorithm was employed in this study. According to the formula (A9), the contribution of all peaks to the SVM was calculated. In the next step, the potential quality markers of ginsengs from JL, LN, and HLJ were selected due to the calculations. Based on the importance value (IV > 0), six quality markers were discovered, including peak 65 (AcO-ginsenoside Rd or isomer), peak 18 (AcO-ginsenoside Re or isomer), peak 26 (Ginsenoside Re2 or isomer), peak 25 (Notoginsenoside M or isomer), peak 3 (Ginsenoside Re2 or its isomer), and peak 33 (Yesanchinoside J or isomer). Their contributions were ranked from highest to lowest as shown in Figure 5. Box plots of the six quality markers are shown in Figure S6, which indicates that there were distributional differences between the same characteristics in JL, LN, and HLJ.
To prove the capability of the six quality markers, SVM model was established again using six quality markers and ten-fold cross-validation. The origin identification accuracy of ginsengs was 100%. The results of the identification of ginseng origin by SVM with six quality markers are shown in Table S2, which indicates that the six quality markers were sufficient to identify the origin of ginseng samples. The selection of six peaks from 69 peaks simplified the process of ginseng sample data acquisition.

2.5. Verification of This Strategy for Ginseng Identification from Different Origins Using Test Samples

To verify the real application capability of this strategy, eight batches (T1-T8) of test ginseng samples, purchased in the market from different growth origins, were used for prediction experiments. According to the sample preparation method, analysis method, and pre-processing method of this strategy described above, normalized data of six differential markers in eight batches of ginseng samples were screened and imported into the SVM model as vectors to distinguish. As shown in Table 3, ginseng samples from three provinces were all correctly identified with an accuracy of 100%, indicating that this approach can effectively and accurately predict the geographical origin of ginseng samples sold in the market.

3. Materials and Methods

3.1. Ginseng Samples

Ginseng samples were collected in three provinces in Northeastern China, including JL, HLJ, and LN. All samples were identified as dry roots of Panax ginseng CA Mey. by Xiaoping Yang from Dalian Institute of Chemical Physics, Chinese Academy of Sciences. Sample information is shown in Table 4. S1~S31 are training samples and T1-T8 are the test samples.

3.2. Chemicals and Reagents

LC-MS-grade acetonitrile was purchased from Fisher Scientific (Pittsburgh, PA, USA), LC-grade formic acid was purchased from Sigma, ultrapure water was obtained from Milli-Q IQ 7000 system (Bedford, MA, USA), and analytical-grade ethanol was purchased from Energy Chemical (Shanghai, China).

3.3. Preparation of Samples

One gram of dry ginseng powder was extracted with 50 mL of 40% ethanol using an ultrasonic method (Kunshan ultrasonic instruments Co., Ltd., Suzhou, China) for 45 min, and the extracted solution was centrifuged at 10,000 rpm for 10 min to obtain the sample stock solutions for UHPLC-Q-TOF-MS. One milliliter of solution was collected from each sample stock solution from 39 batches of ginseng and mixed to obtain Quality Control (QC) samples. All stock solutions were filtered through a 0.22 μm membrane filter prior to UHPLC-Q-TOF-MS analysis.

3.4. UHPLC-Q-TOF-MS Analysis of Ginseng Samples

Chromatographic separation of ginseng samples was performed on an Agilent 1290 Infinity II UHPLC system (Agilent Technologies Inc., Santa Clara, CA, USA) using an Acquity UPLC BEH C18 (2.1 × 100 mm, 1.7 µm) column (Waters Corporation, Milford, MA, USA). Mobile phases were 0.1% formic acid water (v:v, phase A) and acetonitrile (phase B), the flow rate was 0.4 mL/min, the injection volume was 1 µL, the column temperature was 30 ℃, and the detection wavelength was 203 nm. The linear gradient program was as follows: 0~10 min, 19% B; 10~16 min, 19~28% B; 16~30 min, 28~34% B, 30~31 min, 34~90% B; and 31~35 min, 90~90% B.
The MS analysis of ginseng samples was performed on an Agilent 6545 Q-TOF-MS system (Agilent Technologies Inc, Santa Clara, CA, USA) equipped with a Dual AJS ESI ion source. Optimized parameters for the negative-ion mode were as follows: curtain gas temperature: 320 ℃; sheath gas temperature: 320 ℃; dry gas flow rate: 8 L/min, ionization pressure: −3500 V; fragmenter: 75 V; and collision energy: 40 and 60 V. The scan mode was full scan for MS and auto scan for MS/MS. The m/z range for MS was from 400 to 1700 Da, and the m/z range for MS/MS was from 100 to 1700 Da.

3.5. Data Processing and Analysis

The UHPLC-Q-TOF-MS raw data from 31 batch samples and QC were analyzed using the target/suspect compound screening algorithm in the MassHunter workstation (version 10.0, Agilent Technologies Inc., Santa Clara, CA, USA). The target/suspect compound screening algorithm took all ions into account exceeding 1000 counts with a charge state equal to one, and the qualitative score of compounds was greater than 60. Isotope grouping was based on the common organic molecules model. The resulting feature for each sample screened by the workstation was exported for peak matching, aligning, and filtering. Furthermore, peaks that were lacking in more than 80% samples were removed in order to obtain common peaks. In addition, the characterization of common peaks was completed according to the formula, and the exact molecular weight and fragment refer to our existing database. The common peaks identified as ginsenosides are called characteristic ginsenosides. The peak areas of characteristic ginsenosides in all samples were used as the data matrix for subsequent data analysis, including normalization, PCA, PLS-DA, and SVM.

3.5.1. Normalization Methods

The normalization methods of raw data are the mean normalization and Z-Score normalization method, whose formulas are shown below:
Mean Normalization:
P m , s t a n d l i z e = P k , m P ¯ m
where P m , s t a n d l i z e refers to the peaks m in sample k after being normalized and P ¯ m is the average value of peak m in all samples.
Z-Score Normalization:
P m , s t a n d l i z e = P k , m P ¯ k σ k
where P m , s t a n d l i z e is the peaks k in sample m after being normalized, P ¯ k is the average value of peak k in all samples, and σ k is the standard deviation of peak k in all samples.

3.5.2. PCA Algorithm

PCA is a method of calculating principal components by covariance and using them to linearly transform the data, generally using only the first few principal components and ignoring the others [25]. The equation of the PCA model is:
c o v P X = E P X P X * = E P X X * P * = P E X X * P * = P c o v X X * P 1
where X is the matrix of independent variables, P is the transformation matrix, and PX is a diagonal covariance matrix.

3.5.3. PLS-DA Algorithm

PLS-DA is a statistical method with principal component regression. It finds a regression model by projecting the independent variable X and the dependent variable Y into a new space. PLS-DA is a variant used when Y is categorical [26]:
The equation of PLS model is [27]:
X = O P T + E
Y = U Q T + F
where X is the matrix of independent variables and Y is the matrix of dependent variables; T and U are the projection of X and the projection vector of Y, respectively; P and Q are the orthogonal loading matrices; and the matrices E and F are the error terms, which are assumed to be independent and identically distributed random normal variables. The decomposition of X and Y is performed to maximize the covariance between O and U.

3.5.4. SVM Algorithm

The support vector machine (SVM) model uses support vectors to learn on samples and process unknown samples with the following mathematical expression.
w * = i = 1 N α i * y i x i
b * = y j i = 1 N α i * y i x i x j
where α i * is the constraint set for sample i at each iteration, x i is the vector composed of peak area data of sample i, y i is the sample label, w * is the feature matrix calculated at each iteration, x i x j is the vector composed of peak area data of support vector sample j, y i is the support vector sample j label, and b* is the constant vector calculated at each iteration.
The final iterative result makes sample j in the support vector satisfy the formula:
y j w T x j + b = 1

3.5.5. Permutation Importance Algorithm

Traditional statistical learning is poorly interpretable, and calculating the feature contribution is a common method to account for sample variability. The feature contribution degree formula is calculated as follows:
i m p k = s n = 1 N s k , n
All the calculation and pre-processing involving multi-model statistical analysis were performed using the Python® (Version 3.7.3). SVM model and feature selection method were built by the Scikit-learn® (Version 0.21.2). All raw data files were imported into python by Pandas® (Version 0.25.0).

4. Conclusions

In this paper, a rapid and efficient strategy is provided to achieve an intelligent distinction between ginseng from JL, LN, and HLJ. Firstly, a robust UHPLC-QTOF/MS analysis method was developed, and a total of 69 characteristic ginsenosides were successfully extracted in 31 batches of samples for subsequent analysis. PCA and PLS-DA methods could not solve the problem of the differentiation of ginseng origins, but our optimized SVM could achieve accurate differentiation, with an accuracy of 100%. More importantly, the permutation importance algorithm was used to extract quality markers in SVM for the first time, which greatly improves SVM’s interpretation ability. Finally, the test samples were accurately predicted based on the six ginsenosides coupled with SVM. The proposed approach was helpful in elaborating more the specific discrimination and prediction of ginseng and provides a simple and reliable method for the discovery of quality markers for other TCMs.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/molecules27134225/s1: Figure S1: Optimization of UHPLC-MS analysis conditions; Figure S2: TIC of ginseng samples by UHPLC-Q-TOF-MS (taking a QC sample as an example); Figure S3: The MS/MS spectra of ginsenosides; Figure S4: The full scan mode and MS/MS mode of six compounds; Figure S5: A presentation of 200 times the permutation test for PLS-DA analysis; Figure S6: Distribution of six peaks in ginseng from three different origins; Table S1: Stability and repeatability of UHPLC-Q-TOF-MS; Table S2: The classified and predicted results of ginsengs from three geographical origins using the SVM model’s six quality markers.

Author Contributions

Conceptualization, Formal Analysis, and Writing—Original Draft Preparation, C.Z., Z.L. and Q.X.; Methodology and Validation, S.L. and L.X.; Data Curation, J.G.; Project Administration and Funding Acquisition, H.J. and X.L. (Xinmiao Liang); Writing—Review and Editing, H.J., X.L. (Xiaonong Li), Y.L. and X.L. (Xinmiao Liang). All authors have read and agreed to the published version of the manuscript.

Funding

Jiangxi “Double Thousand Plan” (2019 and 2020); Ganjiang New Area: Program for Innovative Research Teams; Key R&D Program of Ganjiang New Area, Jiangxi (2020005).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors confirm that the data supporting the findings of this study are available within the article and from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

Sample Availability

Ginseng samples are available from the authors.

References

  1. Xu, W.; Choi, H.-K.; Huang, L. State of Panax ginseng Research: A Global Analysis. Molecules 2017, 22, 1518. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Tang, M.; Huang, L.; Du, Q.; Yan, C.; Gu, A.; Yang, J.; Li, Y. Ginsenoside 3β-0-Glc-DM(C3DM)Enhances the Antitumor Activity of Taxol on Lewis Lung Cancer by Targeting the lnterleukin-6/Jak2/STAT3 and Interleukin-6/AKT Signaling Pathways. World J. Tradit. Chin. Med. 2021, 6, 432–440. [Google Scholar]
  3. Chen, J.; Fang, W.; Li, S.; Xiao, S.; Li, H.; Situ, Y. Protective effect of ginsenoside Rd on lipopolysaccharide-induced acute lung injury through its anti-inflammatory and anti-oxidative activity. World J. Tradit. Chin. Med. 2021, 7, 383–390. [Google Scholar] [CrossRef]
  4. Zhou, W.; Li, J.; Zhou, Q.; Cai, F.; Chen, X.; Lu, Y.; Zhao, M.; Su, S. Ginsenoside Rb1 pretreatment attenuates myocardial ischemia by reducing calcium/calmodulin-dependent protein kinase II-medicated calcium release. World J. Tradit. Chin. Med. 2020, 6, 284–294. [Google Scholar] [CrossRef]
  5. Geng, J.; Dong, J.; Ni, H.; Lee, M.S.; Wu, T.; Jiang, K.; Wang, G.; Zhou, A.; Malouf, R. Ginseng for cognition. Cochrane Database Syst. Rev. 2010, 12, CD007769. [Google Scholar] [CrossRef]
  6. Dai, Y.; Qiao, M.; Yu, P.; Zheng, F.; Yue, H.; Liu, S. Comparing eight types of ginsenosides in ginseng of different plant ages and regions using RRLC-Q-TOF MS/MS. J. Ginseng Res. 2020, 44, 205–214. [Google Scholar] [CrossRef]
  7. Wang, H.; Sun, H.; Chen, W.; Ma, X.; Chen, D. Different Origin of Gingseng Splenasthenic Syndrome t-cell Subsets and IFN-γ Comparative Study of Impact. Chin. Arch. Tradit. Chin. Med. 2011, 29, 377–379. [Google Scholar] [CrossRef]
  8. Razgonova, M.; Veselov, V.; Zakharenko, A.; Golokhvast, K.; Nosyrev, A.; Cravotto, G.; Tsatsakis, A.; Spandidos, D. Panax ginseng components and the pathogenesis of Alzheimer’s disease (Review). Mol. Med. Rep. 2019, 19, 2975–2998. [Google Scholar] [CrossRef] [Green Version]
  9. Lee, D.-K.; Park, S.; Phuoc Long, N.; Min, J.; Kim, H.M.; Yang, E.; Lee, S.; Lim, J.; Kwon, S. Research Quality-Based Multivariate Modeling for Comparison of the Pharmacological Effects of Black and Red Ginseng. Nutrients 2020, 12, 2590. [Google Scholar] [CrossRef]
  10. Liu, Z. Chemical Insights into Ginseng as a Resource for Natural Antioxidants. Chem. Rev. 2012, 112, 3329–3355. [Google Scholar] [CrossRef]
  11. Chen, Y.; Zhang, Y.; Song, W.; Zhang, Y.; Dong, X.; Tan, M. Ginsenoside Rh2 Improves the Cisplatin Anti-tumor Effect in Lung Adenocarcinoma A549 Cells via Superoxide and PD-L1. Anti-Cancer Agents Med. Chem. 2020, 20, 495–503. [Google Scholar] [CrossRef] [PubMed]
  12. Kang, S.; Min, H. Ginseng, the ‘Immunity Boost’: The Effects of Panax ginseng on Immune System. J. Ginseng Res. 2012, 36, 354–368. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Saba, E.; Irfan, M.; Jeong, D.; Ameer, K.; Yuan, Y.L.; Park, C.K.; Hong, S.B.; Man, H.R. Mediation of antiinflammatory effects of Rg3-enriched red ginseng extract from Korean Red Ginseng via retinoid X receptor α–peroxisome-proliferating receptor γ nuclear receptors. J. Ginseng Res. 2019, 43, 442–451. [Google Scholar] [CrossRef]
  14. Wang, Z.; Ma, W.; Li, H.; Liu, Z.; Gu, Y.; Ma, M. Contents Differentiation of Saponins in Ginsen from Three Different Producing Areas. Inf. Tradit. Chin. Med. 2019, 36, 83–86. [Google Scholar] [CrossRef]
  15. Zhao, H.; Wang, Q.; Sun, X.; Li, X.; Miao, R.; Wu, D.; Liu, S.; Xiu, Y. Discrimination of Ginseng Origins and Identification of Ginsenoside Markers Based on HPLC-MS Combined with Multivariate Statistical nalysis. Chem. J. Chin. Univ. 2019, 40, 246–253. [Google Scholar]
  16. Wang, H.; Zhang, Y.; Yang, X.; Yang, X.; Xu, W.; Xu, F.; Cai, S.; Wang, Y.; Xu, Y.; Zhang, L. High-Performance Liquid Chromatography with Diode Array Detector and Electrospray Ionization Ion Trap Time-of-Flight Tandem Mass Spectrometry to Evaluate Ginseng Roots and Rhizomes from Different Regions. Molecules 2016, 21, 603. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Zhang, B.; Sun, X.; Guo, Y.; Wang, Y.; Liu, S. Chemical Constituents of Ginseng Radix et Rhizoma with Different Growth Years and Different Origins Based on LC-MS. Chin. J. Exp. Tradit. Med. Formulae 2020, 26, 206–212. [Google Scholar] [CrossRef]
  18. Xiu, Y.; Li, X.; Sun, X.; Xiao, D.; Miao, R.; Zhao, H.; Liu, S. Simultaneous determination and difference evaluation of 14 ginsenosides in Panax ginseng roots cultivated in different areas and ages by high-performance liquid chromatography coupled with triple quadrupole mass spectrometer in the multiple reaction–monitoring mode combined with multivariate statistical analysis. J. Ginseng Res. 2019, 43, 508–516. [Google Scholar] [CrossRef]
  19. Zhao, Q.; Zhao, N.; Ye, X.; He, M.; Yang, Y.; Gao, H.; Zhang, X. Rapid discrimination between red and white ginseng based on unique mass-spectrometric features. J. Pharm. Biomed. Anal. 2019, 164, 202–210. [Google Scholar] [CrossRef]
  20. Chang, X.; Zhang, Z.; Yan, H.; Su, S.; Wei, D.; Guo, S.; Shang, E.; Sun, X.; Gui, S.; Duan, J. Discovery of Quality Markers of Nucleobases, Nucleosides, Nucleotides and Amino Acids for Chrysanthemi Flos From Different Geographical Origins Using UPLC–MS/MS Combined With Multivariate Statistical Analysis. Front. Chem. 2021, 9, 689254. [Google Scholar] [CrossRef]
  21. Wang, H.; Zhang, Y.; Yang, X.; Zhao, D.; Wang, Y. Rapid characterization of ginsenosides in the roots and rhizomes of Panax ginseng by UPLC-DAD-QTOF-MS/MS and simultaneous determination of 19 ginsenosides by HPLC-ESI-MS. J. Ginseng Res. 2016, 40, 382–394. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Wang, H.; Zhang, C.; Zuo, T.; Li, W.; Jia, L.; Wang, X.; Qian, Y.; Guo, D.; Yang, W. In-depth profiling, characterization, and comparison of the ginsenosides among three different parts (the root, stem leaf, and flower bud) of Panax quinquefolius L. by ultra-high performance liquid chromatography/quadrupole-Orbitrap mass spectrometry. Anal. Bioanal. Chem. 2019, 411, 7817–7829. [Google Scholar] [CrossRef] [PubMed]
  23. Yang, X.; Yang, X.; Liu, J. Study on Ginsenosides in the Roots and Rhizomes of Panax ginseng. Mod. Chin. Med. 2013, 15, 349–358. [Google Scholar] [CrossRef]
  24. Ma, L.; Zhang, Y.; Zhou, Q.; Yang, Y.; Yang, X. Simultaneous Determination of Eight Ginsenosides in Rat Plasma by Liquid Chromatography-Electrospray Ionization Tandem Mass Spectrometry: Application to Their Pharmacokinetics. Molecules 2015, 20, 21597–21608. [Google Scholar] [CrossRef] [Green Version]
  25. Jolliffe, I.T. Mathematical and Statistical Properties of Sample Principal Components. In Principal Component Analysis; Jolliffe, I.T., Ed.; Springer: New York, NY, USA, 1986; pp. 23–49. [Google Scholar]
  26. Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
  27. Zheng, Z.; Hu, H.; Zeng, L.; Yang, H.; Yang, T.; Wang, D.; Zhang, C.; Deng, Y.; Zhang, M.; Guo, D.; et al. Analysis of the characteristic compounds of Citri Sarcodactylis Fructus from different geographical origins. Phytochem. Anal. 2021, 33, 72–82. [Google Scholar] [CrossRef]
Figure 1. EICs of 69 characteristic ginsenosides of ginseng.
Figure 1. EICs of 69 characteristic ginsenosides of ginseng.
Molecules 27 04225 g001
Figure 2. PCA results of ginseng from three geographical origins.
Figure 2. PCA results of ginseng from three geographical origins.
Molecules 27 04225 g002
Figure 3. The PLS-DA results of ginsengs from three geographical origins.
Figure 3. The PLS-DA results of ginsengs from three geographical origins.
Molecules 27 04225 g003
Figure 4. The accuracy of different parameters C and γ of the SVM model.
Figure 4. The accuracy of different parameters C and γ of the SVM model.
Molecules 27 04225 g004
Figure 5. Six quality markers selected by the permutation importance algorithm.
Figure 5. Six quality markers selected by the permutation importance algorithm.
Molecules 27 04225 g005
Table 1. Identification of characteristic ginsenosides in ginseng by UHPLC-Q-TOF-MS.
Table 1. Identification of characteristic ginsenosides in ginseng by UHPLC-Q-TOF-MS.
No.RT (min)CompoundFormulaMass Ion
(m/z)
Type of IonError
(ppm)
Fragment IonsReferences
11.07Quinquenoside L9 or its isomerC42H74O15863.4940[M+HCOO]-−6.80/[22]
21.07Ginsenoside Re2 or its isomerC48H82O191007.5422[M+HCOO]-0.05/[16]
34.23Ginsenoside Re2 or its isomerC48H82O191007.5422[M+HCOO]-0.05961.5401; 799.4824; 781.4713[16]
44.74(B4-b)-glc-xylC41H70O14831.4737[M+HCOO]-−0.01785.4677; 653.4273; 491.3746[22]
55.18Notoginsenoside R8 or its isomerC36H62O10699.4315[M+HCOO]-0.13/[22]
65.73Ginsenoside Re4 or its isomerC47H80O18977.5353[M-H]-3.79977.5353; 931.5271; 637.4358; 457.3784[23]
76.13Ginsenoside Re2 or its isomerC48H82O191007.5438[M+HCOO]-1.71961.5419; 799.4876; 637.4357; 475.3806[16]
86.97Notoginsenoside R1 C47H80O18931.5260[M-H]-−1.28931.5220; 799.4836; 638.4292; 475.3696[23]
97.37Ginsenoside Re4C47H80O18977.5330[M+HCOO]-1.43931.5302; 637.4335; 475.3784[23]
108.11Ginsenoside Rc or its isomerC53H90O221077.5829[M-H]-−2.06945.5438; 719.3460; 433.5658[21]
118.40Ginsenoside Re3C48H82O19961.5365[M-H]-−1.33799.4859; 637.4300[23]
129.37Ginsenoside Re4 or its isomerC47H80O18977.5308[M+HCOO]-−0.81931.5203; 637.4289; 475.3736[23]
1310.83Ginsenoside Rg1C42H72O14845.4912[M+HCOO]-2.27799.4852; 637.4337; 619.4215; 475.3802[23]
1411.70Ginsenoside ReC48H82O18945.5426[M-H]-−0.21799.4880; 783.4926; 637.4346; 475.3818[23]
1511.70Ginsenoside Re2 or its isomerC48H82O19961.5377[M-H]-−0.11/[16]
1612.95Vinaginsenoside R13 or its isomerC48H84O20979.5454[M-H]-−2.94/[22]
1714.51Vinaginsenoside R13 or its isomerC48H84O21979.5470[M-H]-−1.38/[22]
1814.59AcO-ginsenoside Re or its isomer C50H84O19987.5527[M-H]-−0.74945.5519; 927.5335; 783.4923; 765.5022; 637.4373[23]
1914.71AcO-ginsenoside Rf or its isomerC44H74O15841.4946[M-H]-−1.12637.4308; 619.4205; 475.3759[22]
2015.43Notoginsenoside G or its isomerC48H80O191005.5209[M+HCOO]-−5.57/[22]
2115.71Notoginsenoside R2C41H70O13815.4791[M+HCOO]-0.31/[23]
2216.29Ginsenoside F5C41H70O13815.4789[M+HCOO]-0.16/[22]
2316.19Notoginsenoside C or its isomerC54H92O251139.5831[M-H]-−2.08961.5606; 785.8238; 584.0663[22]
2417.06Notoginsenoside M or its isomerC42H70O14843.4734[M+HCOO]-−0.40/[22]
2516.92Ginsenoside Re2 or its isomerC48H82O191007.5416[M+HCOO]-−0.51961.5419; 799.4876; 637.4357; 475.3806[16]
2617.39Ginsenoside Re2 or its isomerC48H82O191007.5419[M+HCOO]-−0.20961.5314; 799.4734[16]
2718.65Ginsenoside RfC42H72O14799.4858[M-H]-1.14637.4327; 475.3796[21]
2818.90Ginsenoside Re6 or its isomerC46H76O15913.5158[M+HCOO]-0.27830.6457; 765.8931; 620.4240; 475.3751[22]
2919.24Notoginsenoside D or its isomerC64H108O311371.6754[M-H]-−3.491273.1482; 1031.7337; 875.6615; 597.4910; 415.6329[22]
3019.62Notoginsenoside D or its isomerC64H108O311371.6777[M-H]-−1.84/[22]
3119.88AcO-ginsenoside Rg1 C44H74O15841.4953[M-H]-−0.26799.4865; 679.4467; 637.4326; 619.4224; 571.3972; 475.3799[23]
3220.00Notoginsenoside R4 or its isomerC59H100O271239.6365[M-H]-−1.151107.5904; 1077.5822; 946.5432; 945.5391; 783.4854; 621.4298; 459.3820[23]
3320.53Yesanchinoside J or its isomerC61H102O281281.6480[M-H]-−0.37/[22]
3420.9820(R)-Ginsenoside Rh1C36H62O9683.4372[M+HCOO]-1.08475.3815[24]
3520.90Quinquenoside VC60H102O281269.6463[M-H]-−1.761107.6007[22]
3621.2020(R)-Ginsenoside Rg2C42H72O13829.4967[M+HCOO]-2.74783.4923; 637.4372; 619.4248; 475.3808[24]
3721.33Ginsenoside Rg5 or its isomerC42H70O12811.4842[M+HCOO]-0.54/[22]
3821.36Notoginsenoside D or its isomerC64H108O311371.6762[M-H]-−2.951145.2550; 838.4987; 652.4940; 438.2765[22]
3921.41Quinquenoside L1 or its isomerC48H80O18989.5313[M+HCOO]-−0.29/[22]
4021.80Ginsenoside Ra1/Ra2 or its isomerC58H98O261209.6252[M-H]-−1.831077.5829; 945.5368; 783.4866; 621.7380[23]
4121.96Notoginsenoside R4 or its isomerC59H100O271239.6356[M-H]-−1.851077.5843; 916.9001; 621.4288[23]
4222.08Quinquenoside I or its isomerC52H86O191059.5727[M+HCOO]-−0.63/[22]
4322.11Ginsenoside Ro or its isomerC48H76O19955.4896[M-H]-−1.29/[23]
4422.35Ginsenoside Ra1/Ra2 or its isomerC58H98O261209.6274[M-H]-0.031077.5874; 945.5440; 783.4925; 621.4390;[23]
4522.55Ginsenoside F1 or its isomerC36H62O9683.4380[M+HCOO]-2.25/[22]
4622.70Ginsenoside Rb1C54H92O231153.6013[M+HCOO]-1.001107.5980; 945.5438; 783.4904; 621.4370; 323.0986[21]
4722.78Notoginsenoside R4 or its isomerC59H100O271239.6366[M-H]-−1.091209.6298; 1077.5874; 945.5440; 783.4928; 621.4390[23]
4822.98Ginsenoside Ra1/Ra2 or its isomerC58H98O261209.6188[M-H]-−7.101077.5826; 945.5420[23]
4923.58Ginsenoside RoC48H76O19955.4924[M-H]-1.63793.4392; 569.3860; 455.3534;[23]
5024.01Ginsenoside RcC53H90O221123.5915[M+HCOO]-1.771077.5879; 945.5412; 915.5334 [21]
5124.46Ginsenoside Ra1/Ra2 or its isomerC58H98O261209.6282[M-H]-0.681077.5846; 945.5437; 915.5327; 783.4863[23]
5225.04Ginsenoside F1 or its isomerC36H62O9683.4371[M+HCOO]-0.94/[22]
5325.04AcO-ginsenoside RoC50H78O20997.5001[M-H]-−1.30/[22]
5425.19Ginsenoside Ra1/Ra2 or its isomerC58H98O261209.6254[M-H]-−1.691077.5842; 783.4910; 621.4377[23]
5525.62Ginsenoside Ra1/Ra2 or its isomerC58H98O261209.6226[M-H]-−3.941077.5856; 621.3146[23]
5625.67Ginsenoside Rb2C53H90O221123.5918[M+HCOO]-2.041077.5824; 945.5402; 915.5279; 783.4881; 765.4772; 621.4359;[21]
5726.24Ginsenoside Rb3C53H90O221123.5902[M+HCOO]-0.661077.5892; 945.5474; 915.5364; 783.4912; 621.4374; 459.3830[23]
5826.77Quinquenoside L1 or its isomerC48H80O18943.5262[M-H]-−1.04/[22]
5926.90m-Ginsenoside Rc/Rb2 or m-Ginsenoside Rb3C56H92O251163.5858[M-H]-0.231119.6012; 1077.5910; 1059.5793; 915.5332; 765.4795[23]
6027.00Ginsenoside Ra1/Ra2 or its isomerC58H98O261209.6257[M-H]-−1.38/[23]
6127.37Notoginsenoside O or its isomerC52H88O211093.5787[M+HCOO]-−0.15/[22]
6227.57Yesanchinoside J or its isomerC61H102O281281.6451[M-H]-−2.62/[22]
6327.87Vinaginsenoside R3 or its isomerC48H82O17975.5511[M+HCOO]-−1.22739.7635; 576.8463; 481.3275; 324.4059[22]
6428.76Ginsenoside RdC48H82O18991.5497[M+HCOO]-2.55945.5477; 783.4920; 765.480; 621.4385; 459.3882[21]
6530.28AcO-ginsenoside Rd or its isomer C50H84O19987.5526[M-H]-−0.84987.5518; 945.5420; 927.5342; 783.4925; 765.4773; 621.4397; 459.3808[23]
6631.41Quinquenoside L14 or its isomerC47H80O17961.5370[M+HCOO]-0.27915.5347; 783.4907; 709.1200; 621.4368; 434.0248[22]
6731.43Ginsenoside Re2 or its isomerC48H82O19961.5403[M-H]-2.60/[16]
6831.55Quinquenoside I or its isomerC52H86O191059.5731[M+HCOO]-−0.27915.5271; 783.4907; 621.4369; 459.3846[22]
6931.68Quinquenoside I or its isomerC52H86O191059.5852[M+HCOO]-11.10/[22]
Table 2. The classified and predicted results of ginsengs from three geographical origins using SVM model with raw data and normalized data.
Table 2. The classified and predicted results of ginsengs from three geographical origins using SVM model with raw data and normalized data.
Raw Data (Accuracy = 83%)Normalized Data (Accuracy = 100%)
SampleActualRecognizedSampleActualRecognizedSampleActualRecognizedSampleActualRecognized
S1LNJLS17JLJLS1LNLNS17JLJL
S2LNLNS18JLJLS2LNLNS18JLJL
S3LNJLS19JLJLS3LNLNS19JLJL
S4LNJLS20JLJLS4LNLNS20JLJL
S5HLJHLJS21JLJLS5HLJHLJS21JLJL
S6HLJHLJS22JLJLS6HLJHLJS22JLJL
S7HLJHLJS23JLJLS7HLJHLJS23JLJL
S8HLJJLS24JLJLS8HLJHLJS24JLJL
S9JLJLS25JLJLS9JLJLS25JLJL
S10JLJLS26JLJLS10JLJLS26JLJL
S11JLJLS27JLJLS11JLJLS27JLJL
S12JLJLS28JLJLS12JLJLS28JLJL
S13HLJHLJS29JLJLS13HLJHLJS29JLJL
S14HLJHLJS30JLJLS14HLJHLJS30JLJL
S15HLJJLS31JLJLS15HLJHLJS31JLJL
S16HLJHLJ S16HLJHLJ
Table 3. The classified and predicted results of ginseng test samples from three different geographical origins using SVM model with raw data and normalized data.
Table 3. The classified and predicted results of ginseng test samples from three different geographical origins using SVM model with raw data and normalized data.
SampleActualRecognized
S32LNLN
S33HLJHLJ
S34JLJL
S35JLJL
S36JLJL
S37JLJL
S38JLJL
S39JLJL
Table 4. Sample information of 39 batches of ginseng *.
Table 4. Sample information of 39 batches of ginseng *.
No.OriginAgeBatch CodeNo.OriginAgeBatch Code
S1Dandong City, Liaoning Province420200901S21Changbai County, Jilin Province520190901
S2Dandong City, Liaoning Province420200902S22Changbai County, Jilin Province520190902
S3Dandong City, Liaoning Province420200903S23Changbai County, Jilin Province520190903
S4Dandong City, Liaoning Province420200904S24Changbai County, Jilin Province520190904
S5Mudanjiang City, Heilongjiang Province5RS180321-2S25Ji’an City, Jilin Province520180421-1
S6Mudanjiang City, Heilongjiang Province5RS180322-2S26Ji’an City, Jilin Province520180421-2
S7Mudanjiang City, Heilongjiang Province5RS180323-2S27Ji’an City, Jilin Province520180421-3
S8Mudanjiang City, Heilongjiang Province5RS180324-2S28Ji’an City, Jilin Province520180421-4
S9Tonghua City, Jilin Province5RS180311S29Fusong County, Jilin Province520180911-1
S10Tonghua City, Jilin Province5RS180312S30Fusong County, Jilin Province520180911-3
S11Tonghua City, Jilin Province5RS180313S31Fusong County, Jilin Province520180911-4
S12Tonghua City, Jilin Province5RS180314T1Liaoning Province//
S13Heilongjiang Province5RS180321T2Heilongjiang Province//
S145Heilongjiang Province5RS180322T3Jilin Province//
S15Heilongjiang Province5RS180323T4Heilongjiang Province//
S16Heilongjiang Province5RS180324T5Jilin Province//
S17Jingyu County, Jilin Province520190901T6Jilin Province//
S18Jingyu County, Jilin Province520190902T7Jilin Province//
S19Jingyu County, Jilin Province520190903T8Jilin Province//
S20Jingyu County, Jilin Province520190904
* All samples were cultivated, and roots were used in the experiment.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, C.; Liu, Z.; Lu, S.; Xiao, L.; Xue, Q.; Jin, H.; Gan, J.; Li, X.; Liu, Y.; Liang, X. Rapid Discrimination and Prediction of Ginsengs from Three Origins Based on UHPLC-Q-TOF-MS Combined with SVM. Molecules 2022, 27, 4225. https://doi.org/10.3390/molecules27134225

AMA Style

Zhang C, Liu Z, Lu S, Xiao L, Xue Q, Jin H, Gan J, Li X, Liu Y, Liang X. Rapid Discrimination and Prediction of Ginsengs from Three Origins Based on UHPLC-Q-TOF-MS Combined with SVM. Molecules. 2022; 27(13):4225. https://doi.org/10.3390/molecules27134225

Chicago/Turabian Style

Zhang, Chi, Zhe Liu, Shaoming Lu, Liujun Xiao, Qianqian Xue, Hongli Jin, Jiapan Gan, Xiaonong Li, Yanfang Liu, and Xinmiao Liang. 2022. "Rapid Discrimination and Prediction of Ginsengs from Three Origins Based on UHPLC-Q-TOF-MS Combined with SVM" Molecules 27, no. 13: 4225. https://doi.org/10.3390/molecules27134225

Article Metrics

Back to TopTop