Untargeted Metabolomic Analysis and Chemometrics to Identify Potential Marker Compounds for the Chemical Differentiation of Panax ginseng, P. quinquefolius, P. notoginseng, P. japonicus, and P. japonicus var. major

The Panax L. genus is well-known for many positive physiological effects on humans, with major species including P. ginseng, P. quinquefolius, P. notoginseng, P. japonicus, and P. japonicus var. major, the first three of which are globally popular. The combination of UPLC-QTOF-MS and chemometrics were developed to profile “identification markers” enabling their differentiation. The establishment of reliable biomarkers that embody the intrinsic metabolites differentiating species within the same genus is a key in the modernization of traditional Chinese medicine. In this work, the metabolomic differences among these five species were shown, which is critical to ensure their appropriate use. Consequently, 49 compounds were characterized, including 38 identified robust biomarkers, which were mainly composed of saponins and contained small amounts of amino acids and fatty acids. VIP (projection variable importance) was used to identify these five kinds of ginseng. In conclusion, by illustrating the similarities and differences between the five species of ginseng with the use of an integrated strategy of combining UPLC-QTOF-MS and multivariate analysis, we provided a more efficient and more intelligent manner for explaining how the species differ and how their secondary metabolites affect this difference. The most important biomarkers that distinguished the five species included Notoginsenoside-R1, Majonoside R1, Vinaginsenoside R14, Ginsenoside-Rf, and Ginsenoside-Rd.


Introduction
Many plants from the ginseng genus (Panax) are regarded as valuable traditional medicine resources, as well as valuable herbs in the herb and supplement industry globally. They are well known for many positive physiological effects on humans and widely used in traditional medicine systems, as well as in the health food and supplement industry [1]. In China, the genus consists of six native and one introduced species (P. quinquefolius). Of these seven species, five are recognized as medicinal and officially recorded in the Pharmacopoeia of the People's Republic of China (2020 Edition): P. ginseng C.A. Meyer (ginseng, Chinese ginseng, renshen, RS), P. quinquefolius L. (American ginseng, xiyangshen, XYS), P. notoginseng F. H. Chen (notoginseng, sanqi, SQ), P. japonicus C.A. Meyer (Japanese ginseng, zhujieshen, ZJS), P. japonicus var. major (Burkill) C.Y. Wu and K.M. Feng (zhuzishen, ZZS). In the global ginseng market, RS has the largest market share, followed by XYS, SQ, ZJS, and ZZS in the descending order of market share [2].
As medicinal materials from the homologous species, they basically contain ginsenosides, polysaccharides, volatile oils, proteins, amino acids, organic acids, flavonoids, vitamins, and trace elements and other active ingredients. At the same time, their traditional efficacy and pharmacological activity are relatively similar. However, there are also some differences. For example, both RS and XYS have the effect of replenishing "qi" (a concept of traditional Chinese medicine, which is the energy of movement that reaches the skin externally and the organs internally, maintaining vitality) and producing fluid. SQ, ZJS, and ZZS all have the effect of removing blood stasis, stopping bleeding, and relieving pain. RS has antioxidant effects, can improve immunity, is anti-fatigue and anti-tumor, and can regulate the nervous system and immune system. XYS can improve blood circulation, improve the nervous system, and regulate immune function. SQ has the function of protecting the cardio-cerebrovascular and nervous systems and being anti-tumor, anti-bacterial, and anti-inflammatory. ZJS can protect the liver and heart, reduce blood fat, resist fatigue, enhance immunity, and is anti-tumor. ZZS can be anti-inflammatory, is an analgesic, is antitumor, and can regulate the immune function of the cardiovascular and cerebrovascular systems. According to traditional Chinese medicine (TCM) theory, RS and XYS are herbs for replenishing qi, and RS is considered slightly warm, while XYS is considered cool [3]. SQ is sweet and slightly bitter, warm in nature, and has a special tropism to the liver and stomach. ZJS and ZZS have a sweet and bitter taste similar to that of SQ, and they show tropism activity in the liver and stomach similar to SQ; however, they have additional tropism activity in the lung and thus can be used to treat phlegm and stop coughing. In addition, ZZS is slightly cold in nature and thus has additional efficacy in nourishing the lung-yin [4]. By integrating the information about the properties (nature, taste, and organ-specific channel tropism), as well as indications of RS, XYS, SQ, ZJS, and ZZS into a Venn diagram (Figure 1), we created a visual means of identifying the commonalities and differences in these herbs according to their traditional properties [5]. A key segment of the modernization research for TCM is the clarification of the chemical compositions for each species used [19]. The development of powerful and feasible analytical methods capable of the comprehensive deconvolution of plant metabolites that are from similar species is currently an important topic in the field of analytical chemistry [20,21]. Ultra-high performance liquid chromatography (UPLC) combined with quadrupole time-of-flight tandem mass spectrometry (QTOF-MS) is the most popular approach for the systematic multi-component characterization of an herb or a bio-sample, and UPLC-QTOF-MS profiling combined with chemometrics can render the untargeted metabolomics analysis suitable for discovering potential chemical markers for the authentication and quality evaluation of easily confused herb [11,22,23]. Yoon et al. (2022) reported the characterization of a total of 62 saponins from the geographical origin discrimination of P. ginseng by UPLC-QTOF-MS with multivariate analysis [24]. The establishment of a more powerful untargeted metabolomics platform can offer useful insights into the evaluation of the properties of medicinal plants, including easily confused species with similar pharmacological activity.
Untargeted metabolomics can detect the dynamic changes of all small-molecule metabolites before and after stimulation or disturbance in cells, tissues, organs, or organisms The phytochemistry and biological activities of Panax has been widely investigated. Although different species of Panax have diverse properties and indications in the TCM system, most literature presume that saponins (also known as ginsenosides) are the major active ingredients [6]. The current literature available regarding the metabolic difference studies of the same parts among different Panax species [7][8][9], the different parts [10] or ages [11,12] of the same species, and their combination demonstrates that ginsenosides have great potential when differentiating various species within the ginseng genus. The content and composition of ginsenosides that belong to the protopanaxadiol (PPD)-, protopanaxatriol (PPT)-, ocotillol (OCT)-, and oleanane (OA)-types; the C-17 side-chain-varied; the malonylated; and the others saponins vary widely in different ginseng species and in different parts of the plants [13]. Discrimination of the differences in the chemical composition among the five most commonly used species of Panax could inspire new research into potentially novel clinical applications of these species [14,15]. In contrast to the three Molecules 2023, 28, 2745 3 of 14 most important Panax species (P. ginseng, P. quinquefolius, and P. notoginseng) covering very extensive studies, P. japonicus and P. japonicus var. major have far less literature support, particularly for the differences in their metabolome [16,17]. Characterizing the chemical components can ensure the correct use [18].
A key segment of the modernization research for TCM is the clarification of the chemical compositions for each species used [19]. The development of powerful and feasible analytical methods capable of the comprehensive deconvolution of plant metabolites that are from similar species is currently an important topic in the field of analytical chemistry [20,21]. Ultra-high performance liquid chromatography (UPLC) combined with quadrupole time-of-flight tandem mass spectrometry (QTOF-MS) is the most popular approach for the systematic multi-component characterization of an herb or a bio-sample, and UPLC-QTOF-MS profiling combined with chemometrics can render the untargeted metabolomics analysis suitable for discovering potential chemical markers for the authentication and quality evaluation of easily confused herb [11,22,23]. Yoon et al. (2022) reported the characterization of a total of 62 saponins from the geographical origin discrimination of P. ginseng by UPLC-QTOF-MS with multivariate analysis [24]. The establishment of a more powerful untargeted metabolomics platform can offer useful insights into the evaluation of the properties of medicinal plants, including easily confused species with similar pharmacological activity.
Untargeted metabolomics can detect the dynamic changes of all small-molecule metabolites before and after stimulation or disturbance in cells, tissues, organs, or organisms without bias, screen differential metabolites through bioinformatics analysis, and conduct pathway analysis of differential metabolites to reveal the physiological mechanism of their changes. Targeted metabolomics is the study of a specific class of metabolites, used for the discovery and quantization of differential metabolites, and the in-depth research and analysis of subsequent metabolic molecular markers, which play an important role in disease research, animal model validation, biomarker discovery, disease diagnosis, drug development, plant metabolism, and other studies. Targeted metabolomics usually have several advantages over untargeted metabolomics, both in terms of sensitivity and specificity, as well as quantitative data processing.
In our case study, we presented an integral strategy of combining untargeted metabolomics with multivariate analysis and applied it to simultaneously differentiate five Panax species. PCA, PLS-DA, and OPLS-DA are all dimensionality reduction methods. They are used to look for a small number of principal components and to explain major changes in the data. We found that magnitudes were measured by the VIP of the chosen principal components. Our data showed that the newly established method was precise and rapid for distinguishing the five chosen species of ginseng.

Optimization of a UPLC-QTOF-MS Approach for the Enhanced Profiling and Characterization of Metabolites Simultaneously from RS, XYS, SQ, ZJS, and ZZS
The effects of the mobile phase system on the chromatographic peak and gradient elution on sample peak separation were investigated. The aqueous solution of formic acid containing 0 mol/L, 0.1 mol/L, 0.1 mol/L (containing 10 mmol/L ammonium formate), and 0.2 mol/L (containing 10 mmol/L ammonium formate) were investigated. The results showed that when gradient elution was applied with 0.2 mol/L formic acid solution (containing 10 mmol/L ammonium formate)-acetonitrile solution, both the peak shape and the separation effect were acceptable. Figure 2 shows the base peak chromatograms (BPC) at the optimized analysis conditions of the representative samples of five Panax species, which illustrate the metabolite differences among the five species. The similarity in the spectrograms of RS and XYS are obvious, while ZJS and ZZS have a similar composition. The essential characteristics are consistent with other literature [25,26]. The positive and negative ion data were interpreted so as to comprehensively and simultaneously characterize the five Panax species. After processing the chromatographic data, those listed as "Identified Compounds" were putatively identified using databases such as Pubchem and Massbank.
the separation effect were acceptable. Figure 2 shows the base peak chromatograms (BPC) at the optimized analysis conditions of the representative samples of five Panax species, which illustrate the metabolite differences among the five species. The similarity in the spectrograms of RS and XYS are obvious, while ZJS and ZZS have a similar composition. The essential characteristics are consistent with other literature [25,26]. The positive and negative ion data were interpreted so as to comprehensively and simultaneously characterize the five Panax species. After processing the chromatographic data, those listed as "Identified Compounds" were putatively identified using databases such as Pubchem and Massbank.

Multivariate Analysis for UPLC-QTOF-MS Results and Selection of Target Ion
To obtain more information on the components of the five species, UPLC-QTOF-MS data was used for untargeted component analysis. Multivariate statistical analysis methods, such as unsupervised principal component analysis (PCA), supervised principal component analysis (PLS-DA), and orthogonal partial least squares discrimination analysis (OPLS-DA), were performed to identify the differences in metabolic profiles among the species. The quality of the model is dependent on the values of R 2 Y and Q 2 ; when the values are higher (>0.9 and close to 1), the model is more reliable [27]. R 2 Y and Q 2 represent the explanatory rate model and the forecast rate, respectively. The MS data of samples were statistically analyzed by PCA, PLS-DA, and OPLS-DA. The score plots of each revealed the factors that accounted for the largest variations and grouping tendencies. We found that the clustering degree of the samples in the PCA model indicated that the instrument was stable during this experiment in the positive and negative ion modes ( Figure 3A,B). We also found signs of partial separation among the five groups. Using the PLS-DA method to analyze the metabolite profile of the Panax samples, the samples from RS, XYS, SQ, ZJS, and ZZS were clearly clustered and segregated into different groups scattering in different quadrants up to the 95% Hotelling T 2 ellipse. There were significant differences among the five groups (ZZS is a variant of ZJS that considers partial overlap of results acceptable): [R 2 X (cum) =0.592, R 2 Y (cum) = 0.992, Q 2 (cum) = 0.985] in positive ion mode, and [R 2 X (cum) =0.671, R 2 Y (cum) = 0.992, Q 2 (cum) = 0.986] in negative ion mode ( Figure 3C,D). It can be seen that R 2 Y and Q 2 are both above 0.5 and close to 1, indicating the good reliability, good predictability, and no over-fitting for the PLS-DA model. These findings indicate that the PLS-DA model can be used to distinguish the five species. To obtain the greatest separation of metabolites, OPLS-DA was performed ( Figure 3E   Variable importance in projection (VIP) is a weighted sum of the squares of the PLS loadings. To distinguish the most important metabolites among the species, p-values and VIP scores were used to screen for differential components (Figure 4). Given VIP > 4.0 and p < 0.05, a total of 49 potential biomarkers were preliminarily screened. The heatmaps of the component of significant difference among the species, as detected in the positive and negative modes, are illustrated in Figure 5.   The red color represents the peak value that is relatively large; the blue color represents the peak value that is relatively small. The more similar the color, the more similar the peak value. The units in the abscissa axis represent the sample names and their groups; the panel on the right represents the different metabolites. The upper dendritic structure is clustered according to the degree of metabolite similarity across samples. The red color represents the peak value that is relatively large; the blue color represents the peak value that is relatively small. The more similar the color, the more similar the peak value. The units in the abscissa axis represent the sample names and their groups; the panel on the right represents the different metabolites. The upper dendritic structure is clustered according to the degree of metabolite similarity across samples.
According to the screening results of potential markers, there were some significant differences in chemical composition between the five Panax species, including saponins, amino acids, and fatty acids. The primary differences were saponins (Table 1), such as t R 6.80 and m/z 817.4781, identified as ginsenoside Re 5 or ginsenjilinol, and t R 8.45 and m/z 945.5336, identified as ginsenoside Rd or ginsenoside Re [28,29]. The levels of 12 ginsenosides showed significant differences in the five species. The spectral intensities of these 12 ginsenosides are presented as bar plots ( Figure 6). These bar charts show the peak intensities for the target ion compounds, which vary significantly between species.    +++: the content of the compound is very high, peak area ≥ 260; ++: the content of the compound is high, peak area ≥ 130 but <260; +: the content of the compound is low, peak area ≥

Reagents and Material
Acetonitrile

19
The Panax ginseng, P. quinquefolius, P. japonicus, P. japonicus var. major, and P. noto-20 ginseng roots were powdered to a homogeneous size and sieved through a No. 60 mesh. 21 The study involved 10 batches of RS, 8 XYS, 10 SQ, 8 ZJS, 10 ZZS. Ultrasonic extraction 22 with 60% methanol at a solid-to-liquid ratio of 1:5 g·mL −1 for 1 h was used to extract the 23 medicinal materials. Dried crude powder (0.3 g) was accurately weighed in a centrifuge 24 tube, and then 1.5 mL of 70% methanol/water (v/v) solution was added. Then, the sample 25 was processed using ultrasonic extraction for 45 min and cooled to room temperature. The 26 sample solution was centrifuged at 12,000 rpm for 5 min at 20℃, and the supernatant was 27 stored at 4°C and filtered through a 0.22 μm filter membrane before injection for UPLC 28 analysis. 29 30 The ultrahigh-performance liquid chromatography by an Acquity UPLC system 31 (Waters, Milford, MA, USA) was coupled with high-resolution MS analysis by a Micro-32 mass QTOF mass spectrometer (Waters, Manchester, U.K.). An Acquity UPLC BEH C18 33 column (2.1 mm × 50 mm，1.7 μm) was used to perform the metabolite profiling, and 10 34 μL of each sample was injected into a gradient system at a flow rate of 0.3 mL/min. The 35 mobile phase consisted of 0.2 mol/L formic acid and 10 mmol/L ammonium formate in 36 water (A) and acetonitrile (B). The starting eluent was 2% B, and its proportion was held 37 constant for 1 min, increased linearly to 10% from 1.0 to 2.0 min, to 25% from 2.0 to 5.1 38 min, to 32% from 5.1 to 6.1 min, to 38% from 6.1 to 8.1 min, held constant at 38% for 0.5 39 min, increased to 50% from 8.6 to 9.1 min, to 55% from 9.1 to 9.6 min, held constant at 55% 40 until 12.5 min, increased to 70% from 12.5 to 14.0 min, to 100% from 14.0 to 16.1 min, held 41

Sample Preparation
The Panax ginseng, P. quinquefolius, P. japonicus, P. japonicus var. major, and P. notoginseng roots were powdered to a homogeneous size and sieved through a No. 60 mesh. The study involved 10 batches of RS, 8 XYS, 10 SQ, 8 ZJS, 10 ZZS. Ultrasonic extraction with 60% methanol at a solid-to-liquid ratio of 1:5 g·mL −1 for 1 h was used to extract the medicinal materials. Dried crude powder (0.3 g) was accurately weighed in a centrifuge tube, and then 1.5 mL of 70% methanol/water (v/v) solution was added. Then, the sample was processed using ultrasonic extraction for 45 min and cooled to room temperature. The sample solution was centrifuged at 12,000 rpm for 5 min at 20 • C, and the supernatant was stored at 4 • C and filtered through a 0.22 µm filter membrane before injection for UPLC analysis.

UPLC/QTOF-MS Analysis
The ultrahigh-performance liquid chromatography by an Acquity UPLC system (Waters, Milford, MA, USA) was coupled with high-resolution MS analysis by a Micromass QTOF mass spectrometer (Waters, Manchester, U.K.). An Acquity UPLC BEH C 18 column (2.1 mm × 50 mm, 1.7 µm) was used to perform the metabolite profiling, and 10 µL of each sample was injected into a gradient system at a flow rate of 0.3 mL/min. The mobile phase consisted of 0.2 mol/L formic acid and 10 mmol/L ammonium formate in water (A) and acetonitrile (B). The starting eluent was 2% B, and its proportion was held constant for 1 min, increased linearly to 10% from 1.0 to 2.0 min, to 25% from 2.0 to 5.1 min, to 32% from 5.1 to 6.1 min, to 38% from 6.1 to 8.1 min, held constant at 38% for 0.5 min, increased to 50% from 8.6 to 9.1 min, to 55% from 9.1 to 9.6 min, held constant at 55% until 12.5 min, increased to 70% from 12.5 to 14.0 min, to 100% from 14.0 to 16.1 min, held constant at 100% until 18.5 min, returned to 2% B at 18.5 min, and held constant until 20 min to equilibrate the column. This UPLC elution condition was optimized to detect the maximal number of metabolites in P. ginseng, especially to separate ginsenosides for identifying markers. The column was maintained at 40 • C. The mass spectrometer was equipped with an ESI source and operated in positive and negative ion modes. The MS conditions were as follows: capillary and cone voltages were adjusted to 2800 V and 20 V in positive ion mode but 3000 V and 20 V in negative ion mode, separately. The source and desolvation temperatures were maintained at 100 • C and 350 • C, respectively. The desolvation gas used was N 2 . The flow rate of desolvation gas and cone gas were at 500L/h and 50L/h, respectively. The scanning m/z range was 50 to 1500. Mass resolution > 10,000 FWHM (standard mode). To ensure that mass was measured accurately, leucine-enkephalin (10 µg/mL) was infused as the reference lockmass compound for real-time correction, and the [M+H] + ion at 556.2771 Da and the [M-H] − ion at 554.2615 Da were detected in the analysis.

Multivariate Analysis
To evaluate the potential characteristic components of the five species of Panax in our study, the raw data of all samples were analyzed with the MassLynx application manager version 4.1 (Waters MS Technologies) [30]. The method parameters were as follows: retention time range, 0.3-17 min; mass range, 100-1500 Da; mass tolerance, 50 mDa; and noise elimination level, 6.00. For further analysis, a combination of retention time (t R ) and mass data (m/z) from the detected peaks was assigned as temporary ID (t R -m/z), and the identifier of each peak for data adjustment was based on their chromatographic elution order of UPLC [31]. The identities of the variables chosen as biomarkers were based on m/z. The list was then applied for PCA, PLS-DA, and OPLS-DA using the Ezinfo software (Waters). Markers that differentiated the five groups were selected according to the variable importance in the projection values (VIP). The data matrix involving t R , m/z, and normalized peak area was exported into the SIMCA-P 14.0 software (Umetrics, Umea, Sweden) for chemometric analysis [32]. The compound with a VIP>4 was evaluated as the potential characteristic component from the positive and negative chromatogram [33]. The ANOVA value was lower than 0.05. Internal standard standardization was used.

Conclusions
Our study used a metabolomic fingerprinting approach based on discrimination by combining UPLC-QTOF-MS and multivariate analysis with the aim of discovering robust biomarkers for the authentication of the five most commonly used Panax species. The target compounds were selected from UPLC-MS screening results using multivariate analysis. Samples of the five Panax species were analyzed to create PCA, PLS-DA, and OPLS-DA models, which were compared with the UPLC-MS multivariate models. The resulting score plots showed good separation between the five species. In addition, the intensities of the target markers were compared with one another using bar plots to illustrate differences between the species. This is the first report to systematically compare the metabolome differences among these five important Panax species. The method elucidates highly efficient discrimination, especially considering the similarly of the samples, as required for authenticity testing. Moreover, with further optimization, this method could be used to analyze other plant materials to facilitate "chemical fingerprinting" and could be used as a new and important tool for the quality control of natural products derived from the Panax genus.