Chemotaxonomic Classification Applied to the Identification of Two Closely-Related Citrus TCMs Using UPLC-Q-TOF-MS-Based Metabolomics

This manuscript elaborates on the establishment of a chemotaxonomic classification strategy for closely-related Citrus fruits in Traditional Chinese Medicines (TCMs). UPLC-Q-TOF-MS-based metabolomics was applied to depict the variable chemotaxonomic markers and elucidate the metabolic mechanism of Citrus TCMs from different species and at different ripening stages. Metabolomics can capture a comprehensive analysis of small molecule metabolites and can provide a powerful approach to establish metabolic profiling, creating a bridge between genotype and phenotype. To further investigate the different metabolites in four closely-related Citrus TCMs, non-targeted metabolite profiling analysis was employed as an efficient technique to profile the primary and secondary metabolites. The results presented in this manuscript indicate that primary metabolites enable the discrimination of species, whereas secondary metabolites are associated with species and the ripening process. In addition, analysis of the biosynthetic pathway highlighted that the syntheses of flavone and flavone glycosides are deeply affected in Citrus ripening stages. Ultimately, this work might provide a feasible strategy for the authentication of Citrus fruits from different species and ripening stages and facilitate a better understanding of their different medicinal uses.


Introduction
Herbal medicines are widely used around the world and are gaining more attention in many fields due to their low toxicity and therapeutic performance [1,2]. Traditional Chinese Medicines (TCMs) have a long and storied history of use in China and were first documented in approximately 200 AD in the text 'Shen nong ben cao jing'. Most herbal medicines are homologous in medicines and foods such as Citrus, Dioscorea, etc. It is well acknowledged that herbal medicines exert their curative effect via a holistic mode through primary metabolites and especially secondary metabolites [3]. The quality and content of active metabolites in herbs are highly variable, depending on factors such as the species, growth stage when harvested and geographical origins, climate and cultivation [4]. The resources and cultures of TCMs are profound and magnificent and have a long history. TCMs derived from different species, different development stages or different processing procedures exert different efficacies, which has been verified by modern pharmacology to correspond to variations in active metabolites. Therefore, the efficacy, based on variable metabolites, is an important measure to ensure the safety and effectiveness of TCMs in clinical use [5]. In the present paper, we focused on two main factors, species and development stage, to elucidate how their variation is related to the presence of primary and secondary metabolites.
DNA sequence analysis and traditional taxonomic procedures based on phenotypic traits were applied for species authentication [3]. However, only using one taxonomic method is not sufficient for quality control of TCMs. For instance, TCMs from the same species, but different development stages can hardly be discriminated by DNA sequencing [6]. Alternatively, chemotaxonomic classification provides an efficient methodology based on the diversity of plant metabolites to achieve species identification, and the method could contribute to the identification of relationships among development stages. It has been estimated that the total number of metabolites in plants is over 200,000, and each plant may contain its own chemotypic expression pattern [7]. Metabolites are the end products of the response to genetic and enzyme regulated biosynthesis. The pattern of metabolites should be more closely related to the genotype of an organism than to classic morphological traits, which are mainly based on what we can observe about the characteristics of an organism [3]. The chemotaxonomic strategy has been successfully used to reveal the functions of genes involved both in primary plant metabolism and secondary metabolism [8][9][10][11]. In addition, studies of the interactions between species and their development stage have revealed comprehensive and coordinated reprogramming of metabolic pathways via different enzyme catalysis.
Metabolomics reveals global metabolic networks, which reflect the pathophysiological states through the fluctuation of primary and secondary metabolites and can serve as a valuable tool in understanding the biosynthesis mechanisms and variable markers of TCMs [12]. Metabolomics platforms using gas chromatography-mass spectrometry (GC-MS), liquid chromatography-mass spectrometry (LC-MS) or nuclear magnetic resonance (NMR) methods have been reported to be effective tools for quality control of medicinal plants and their products [12]. Ultra-performance liquid chromatography-quadrupole-time-of-flight-MS (UPLC-Q-TOF-MS) is one of the most powerful analytical tools for the identification and discrimination of metabolic profiles in plant extracts, allowing metabolite screening to be performed over a wide range of concentrations [1,13]. The classification of global metabolites based on metabolomics is emerging as a powerful chemotaxonomic tool that provides a detailed view of the differences and similarities between herbal medicines, and it has been successfully applied for quality control of herbs, foods and their products [9]. In this paper, metabolomics using an UPLC-Q-TOF-MS method coupled with multivariate statistical analysis was applied for chemotaxonomic classification of Citrus fruits from different species and different development stages.
Citrus fruits, which are the most popular TCMs of homology of medicine and food, are of great interest because they contain large amounts of dietary phenoloids possessing important therapeutic properties for human health, and Citrus fruits and their juices are consumed in large quantities around the world [14]. Four different Citrus TCMs recorded in Chinese Pharmacopeia, 2015 edition, Zhishi (ZS, young fruit of C. aurantium L., harvested in May and June), Zhiqiao (ZQ, immature fruit of C. aurantium L., harvested in July), Qingpi (QP, pericarps of the young or immature fruit of C. reticulate Blanco, harvested in June) and Chenpi (CP, pericarps of the ripe fruit of C. reticulate Blanco, harvested in August-December), were used as research subjects [15]. Moreover, as a famous medicinal herb, Citrus aurantium L. is a special genus, which is composed of thin pulps and thick peels, especially in the young and immature stage. The peels contained a huge proportion of active phenolic compounds versus the full fruit. The comparison between the peels, pulps and fruits of Citrus aurantium L. have been put in the Supplementary Material. In the following research, the composition of peels and full fruits from Citrus aurantium L. was used in the same. Although they belong to the same genus, their efficacy and clinical indications are different. As we know, Citrus fruits contain a large number of bioactive phenolic metabolites, which have been associated with lower risks of different types of cancers and cardiovascular diseases and have been shown to exhibit antioxidant, anti-inflammatory and anti-ageing activities [14,[16][17][18]. According to our previous research, the total phenolic products could be used as characteristic markers for species discrimination of different Zhishi samples, while phenolic metabolites varied over the ripening process [19]. Therefore, Citrus TCMs from different species (C. aurantium L. and C. reticulate Blanco) and different development stages (ZS/ZQ and QP/CP) were divided into three groups to explore their potential chemotaxonomic bioactive markers and query their corresponding biosynthesis pathways.
As a result, a metabolomics study aimed at primary and secondary metabolites from three groups of Citrus TCMs was performed using UPLC-Q-TOF-MS analysis. The two basic species of C. aurantium L. and C. reticulate Blanco were clearly differentiated based on their primary and secondary metabolite profiles using hierarchical cluster analysis (HCA) and orthogonal signal correction partial least squares-discriminate analysis (OPLS-DA). In addition, the variation of secondary metabolites was further used to determine the development stage, as depicted in the HCA-dendrogram and OPLS-DA score plot. Using two-stage receiver operating characteristic (ROC) analysis and the Plackett-Burman (PB) test, 79 secondary (among them, 35 metabolites were unambiguously identified compared with the reference standards) metabolites were recognized as potential chemotaxonomic markers related to biosynthetic pathways. Multivariate statistical analysis coupled with MetaboAnalyst (www.metaboanalyst.ca) permitted the comprehensive metabolomic data to be visualized and interpreted and supported integrative pathway analysis for both primary and secondary metabolites. Therefore, we propose that the chemotaxonomic classification approach focus on metabolomics-guided metabolic pathway interrogation to reveal different pharmacodynamical mechanisms of metabolites, which will be an effective strategy for quality control of closely-related TCMs and facilitate a better understanding of their different medicinal uses.

Optimization of Chromatographic Conditions
Different mobile phase compositions were screened to obtain LC chromatograms that had good peak shapes and separation. It was found that methanol and 0.1% acetic acid were the most suitable eluting solvent system. In order to acquire better sensitivity for base ions of most compounds in the TOF-MS spectra, the ionization parameters, including the desolvation gas flow rate, desolvation gas temperature, fragmentor voltage, nebulizer pressure and capillary voltage, were optimized. The optimum conditions of TOF/MS were decided as follows: capillary voltage positive ion mode 3.0 kV/negative ion mode 3.5 kV, desolvation gas flow 800.0 L/h, desolvation gas temperature 400 • C and nebulizer pressure 40 psi.

Validation of the Analytical Method
To ensure the stability of the sequence analysis, a quality control (QC) sample was prepared by pooling a same volume (100 µL) from each Citrus sample and then preparing the pooled QC sample in the same way as the samples. To provide assurance that the system was suitable for use, five pooled QC samples were run prior to analysis. These pooled QC samples were also interspersed between every five samples during the analytical run.
The precision of the method was validated by the determination of the intra-and inter-day variances. For the intra-day test, the samples were analyzed six times within the same day, and for the inter-day test, the samples were examined three times per day for three consecutive days. The relative standard deviations (RSD) from UPLC-Q-TOF-MS are listed in Tables S1 and S2. To confirm the repeatability, six replicates of the QC sample were analyzed. Reproducibility was analyzed by repeating the whole method for the same sample within one day using UPLC-Q-TOF/MS (n = 6). For most of the metabolites investigated, the peak area RSD values of reproducibility were 0.69-2.79%.
The present method was found to have acceptable precision for the intra-day RSD values (0.36-2.92%) and inter-day RSD values (1.45-3.36%). The samples and analytical method exhibited good stability, with an RSD range of 0.04-6.31%. The method validation terms are summarized in Tables S1 and S2.

Identification of Primary Metabolites
As the central metabolism in plants, primary metabolism has been referenced in the assimilation, respiration, transport and differentiation processes that take place in cells [20]. It is well known that primary metabolites determine relevant crop quality traits related to nutritional content and composition, for they include a wide range of intermediate compounds and end products of metabolic pathways that accumulate in sink organs (e.g., seeds, fruits and tubers), such as lipids, amino acids and organic acids [21]. Most recently, Jing et al. [22] found that primary metabolites are fundamental to genetic improvement and metabolic engineering of either metabolic composition or plant primary production.
Plant cells have numerous membranes that are generally the same as those in animal cells, including the plasma, mitochondrial, nuclear and peroxisomal membranes. In addition, plant cells contain unique membrane-bound compartments, the chloroplast, vacuole and symbiosome, which have other cellular structures composed of lipids, e.g., the cuticle of epidermal cells [24]. As the key photosynthetic organelle in plants, the focus of plant lipid research has been on the chloroplast being predominant; its known membrane was unique in its lipid composition, with two galactolipids, monogalactosyldiacylglycerol (MGDG) and digalactosyldiacylglycerol (DGDG) [24,25]. They also contain a significate unique sulfolipid, sulfoquinovosyl diacylglycerol (SQDG), whose head group was a modified galactose. Nevertheless, the phospholipid components of plastids are less abundant. On the other hand, the two most abundant classes of phospholipids in plant mitochondria are similar to yeast and mammals, which are phosphatidylcholine (PC) and phosphatidylethanolamine (PE), the "star products" in lipid research [26]. PC species are not only an essential component of biomembranes, but also protect cells and their organelles from oxidative stress, lipotoxicity and endoplasmic reticulum (ER) stress [26]. PE species have been reported as the second most abundant membrane phospholipid in plant cells, and they have been identified as modulators in cold temperature stress. The other productions of biologically-active lipids, glycerophosphatidic acid (GPA) and glycerophosphoethanolamine (GPEtn), are derivatives of glycerophospholipids. GPA has been reported to be the simplest phospholipid and is also one of the building units for phospholipid biosynthesis [27]. Table 1. Differential primary metabolites between the Citrus reticulate Blanco and Citrus aurantium L. group profile in positive and negative ion modes. GPA (glycerophosphatidic acid); MG (monogalactosyl); PE (phosphatidylethanolamine); DG (diacylglycerol); GPEtn (glycerophosphoethanolamine); PC (phosphatidylcholine); GPIns (Glycerophosphatidylinositol), GPSer (Glycerophosphatidylserine acid), GPGro (Glycerophosphatidylglycerol); UMP (Uridine monophosphate). Some other primary metabolites that are detected, including amino acids, phenolic acids, nucleosides and organic acids, have been reported to be essential elements in plant cells. Amino acids are unarguably the most important metabolites from a biological point of view. Not only are they the basic structural units of proteins, but also a source of energy. Furthermore, some amino acids play important roles as neurotransmitters [28]. Now, much more attention has been paid to nucleosides, nucleotides and nucleobases, as they are the group of highly active metabolites, exhibit anti-platelet aggregation, antiarrhythmic, anti-oxidant, anti-seizure and anti-tumor effects [29]. Furthermore, phenolic acids have been found to be stronger antioxidants against free radicals and other reactive oxygen species, which are essential in preventing chronic human diseases such as cancer and cardiovascular diseases [18].

PCA Analysis of Primary Metabolites
PCA was carried out to provide additional insight into the chemical differences between the Citrus samples. PCA is an unsupervised pattern recognition method that is used for analyzing, classifying and reducing the dimensionality of numerical datasets in multivariate problems. This approach has been widely used for quality control of herbal medicines. The contents of the 88 analytes were set as the variables, and 37 sample batches were set as the observations. The score scatter plot is shown in Figure 1, and the first two principal components accounted for 75.2% of the variance. According to the PCA score plot, all of the QP and CP samples were clustered in a small region and can be separated from the ZS and ZQ samples; the QP and CP, as well as ZS and ZQ samples from the same species might be more difficult to divide, as the collection stage may be closer. Although some inevitable differences arise from cultivation regions, harvesting time and the storage process may substantially influence the contents, and the species source was one of the most important factors. From this result, primary metabolites can be easily clustered within the different species of Citrus. Some other primary metabolites that are detected, including amino acids, phenolic acids, nucleosides and organic acids, have been reported to be essential elements in plant cells. Amino acids are unarguably the most important metabolites from a biological point of view. Not only are they the basic structural units of proteins, but also a source of energy. Furthermore, some amino acids play important roles as neurotransmitters [28]. Now, much more attention has been paid to nucleosides, nucleotides and nucleobases, as they are the group of highly active metabolites, exhibit anti-platelet aggregation, antiarrhythmic, anti-oxidant, anti-seizure and anti-tumor effects [29]. Furthermore, phenolic acids have been found to be stronger antioxidants against free radicals and other reactive oxygen species, which are essential in preventing chronic human diseases such as cancer and cardiovascular diseases [18].

PCA Analysis of Primary Metabolites
PCA was carried out to provide additional insight into the chemical differences between the Citrus samples. PCA is an unsupervised pattern recognition method that is used for analyzing, classifying and reducing the dimensionality of numerical datasets in multivariate problems. This approach has been widely used for quality control of herbal medicines. The contents of the 88 analytes were set as the variables, and 37 sample batches were set as the observations. The score scatter plot is shown in Figure 1, and the first two principal components accounted for 75.2% of the variance. According to the PCA score plot, all of the QP and CP samples were clustered in a small region and can be separated from the ZS and ZQ samples; the QP and CP, as well as ZS and ZQ samples from the same species might be more difficult to divide, as the collection stage may be closer. Although some inevitable differences arise from cultivation regions, harvesting time and the storage process may substantially influence the contents, and the species source was one of the most important factors. From this result, primary metabolites can be easily clustered within the different species of Citrus.

Identification of Secondary Metabolites
Compared with the advent of modern DNA sequencing techniques, chemotype-based classification methods, which are mainly based on Citrus secondary metabolites, are undeniably the most useful characteristic accompanying morphology in the discrimination of Citrus species. In current studies, it seems that a certain group of Citrus TCMs, which are defined by their morphology, tends to have a stable pool of secondary metabolites, which is a good indication of their classification. For instance, some articles have indicated that chemotype-based classification methods are reliable [3].
Phenolic metabolites are one of the most important groups of plant secondary metabolites; they are widespread in plants and encompass more than 8000 molecules [30]. Numerous studies have focused on polyphenols as they are a group that has multiple biological effects. Polyphenols are involved in many plant processes, such as development (participating in plant hormone signaling or pollen germination), reproduction (pigments attracting pollinators) and plant defense (protecting from UV, pathogens and predators) [31,32]. Polyphenols also play a significant part of food quality and taste, as well as potentially participating in the prevention of chronic diseases in relation to their antioxidant properties [33].

Non-Targeted Analysis of Secondary Metabolite Features
The plot of PCA scores clearly indicated that the first two PCs were able to separate the different developmental and ripening stages in Citrus ( Figure 2A); 72.12% of the variance was explained using the first two PCs (scores) alone. The clear separation of the four samples of QP, CP, ZQ and ZS in the score plot indicated a more significant effect of the harvest season than variety on the Citrus metabolome. According to the 2015 edition of the Chinese Pharmacopoeia, ZS is harvested in May and June ZQ in July, CP in August-December and QP in June. Due to the passage of time, the quality and quantity of phenolic metabolites in ZS samples tend to be similar to those in ZQ samples. Therefore, the PCA score plot indicated that the first two components were able to separate the different species of the Citrus samples, as well as the different developmental and ripening stages of those Citrus samples, which indicated that multiple sources were important factors that can affect the use of TCMs. Of course, differences in the developmental and ripening stages should not be neglected.
A supervised PLS-DA approach was used to investigate the metabolites that showed the greatest differences. As shown in Figure 2B, the Citrus samples were separated into the QP, CP and ZS, ZQ samples by PC 1, and the QP samples were easily separated from the CP samples by PC 2, which indicated different secondary metabolite phenotypes for Citrus picked at different ripening stages. Cross-validation analysis showed that the R 2 and Q 2 intercepts were 0.487 and 0.894, respectively, thus demonstrating that the PLS-DA model was reliable ( Figure 2D). An S-plot was used to discover key metabolites that contributed to the differentiation of the QP, CP, ZQ and ZS samples ( Figure 2C). Seventy-nine metabolites, including 14 flavanones, 24 flavone and flavone glycosides, 4 flavonol and flavonol glycosides, 29 coumarins, 1 anthocyanin, 4 limonoids and glycosides and 1 abscisic acid, were tentatively identified ( Table 2). The secondary metabolites were putatively identified according to the retention time, accurate mass, MS 2 and searches of the PubChem and ChemSpider databases. ChemSpider (http://www.chemspider.com/) is a free chemical structure database providing fast access to over 34 million structures, properties and associated information. Among them, 35 metabolites were confirmed by authentic standards. Others were putatively identified according to retention time, accurate mass, MS 2 and metabolomics databases.   A heat-map was applied to visualize the developmental variations of differential metabolites in the Citrus samples ( Figure 3). The data were Pareto scaled. A red box indicates that a metabolite occurred at greater than the mean level in a sample, and a blue box means that the metabolite was at a lower level. Flavonol and flavonol glycosides, some coumarins (6′-7′-dihydroxybergamottin and psoralen) and anthocyanin clearly occurred at higher levels in the QP and CP than in the ZS and ZQ samples; while others, flavone and flavone glycosides, flavanones, majority coumarins and limonoids and glycosides, were found to be significantly higher in ZS and, especially, in ZQ samples. Otherwise, some coumarins (bergapten, xanthotoxin, isopimpinellin, aurapten, auraptene and bergaptol) and some flavone and flavone glycosides (amentoflavone and acacetin) were at their lowest levels in ZQ samples.

Chemotaxonomy of Citrus by Primary and Secondary Metabolites
The databases we obtained were introduced to the HCA technique, taking into account the concentration of each sample. HCA examines the distances between the samples in a dataset, and the information is represented in a two-dimensional plot (dendrogram). The most similar points were grouped and then formed clusters [34]. The process is repeated until all of the points are inserted into a unique group. The style of the data is auto-scaled, and the Euclidean distance with the complete linkage method is used to calculate the sample similarities. A hierarchical agglomerative procedure is employed to establish clusters. A heat-map was applied to visualize the developmental variations of differential metabolites in the Citrus samples ( Figure 3). The data were Pareto scaled. A red box indicates that a metabolite occurred at greater than the mean level in a sample, and a blue box means that the metabolite was at a lower level. Flavonol and flavonol glycosides, some coumarins (6 -7 -dihydroxybergamottin and psoralen) and anthocyanin clearly occurred at higher levels in the QP and CP than in the ZS and ZQ samples; while others, flavone and flavone glycosides, flavanones, majority coumarins and limonoids and glycosides, were found to be significantly higher in ZS and, especially, in ZQ samples. Otherwise, some coumarins (bergapten, xanthotoxin, isopimpinellin, aurapten, auraptene and bergaptol) and some flavone and flavone glycosides (amentoflavone and acacetin) were at their lowest levels in ZQ samples.

Chemotaxonomy of Citrus by Primary and Secondary Metabolites
The databases we obtained were introduced to the HCA technique, taking into account the concentration of each sample. HCA examines the distances between the samples in a dataset, and the information is represented in a two-dimensional plot (dendrogram). The most similar points were grouped and then formed clusters [34]. The process is repeated until all of the points are inserted into a unique group. The style of the data is auto-scaled, and the Euclidean distance with the complete linkage method is used to calculate the sample similarities. A hierarchical agglomerative procedure is employed to establish clusters. The results showed groupings of primary metabolites in tight clusters according to the Citrus species. In addition, the relationship between clusters was in agreement with the expected phylogenetic relationships among varieties, showing a perfect separation of the represented groups, C. aurantium L. and C. reticulate Blanco ( Figure 4A). Nevertheless, by representing other combinations of components, the model is chaotic in differentiating harvesting period-related varieties within a group. In general, varieties were grouped according to genotype, but not harvesting period.
It seemed clear that the development stage had an influence on the fruit secondary metabolite composition as shown in previous research. Accordingly, the 37 Citrus samples were analyzed by chemotaxonomic classification using the 79 secondary metabolites as reference markers. HCA showed that those Citrus samples were divided into four branches (QP, CP, ZS and ZQ) according to their secondary metabolite chemotaxonomy ( Figure 4B). Compared with primary metabolites, the HCA results of secondary metabolites showed different results for the chemotype-based classification methods, which reflected how the secondary metabolites play an important role in the chemotaxonomic classification of the Citrus harvesting period. The results showed groupings of primary metabolites in tight clusters according to the Citrus species. In addition, the relationship between clusters was in agreement with the expected phylogenetic relationships among varieties, showing a perfect separation of the represented groups, C. aurantium L. and C. reticulate Blanco ( Figure 4A). Nevertheless, by representing other combinations of components, the model is chaotic in differentiating harvesting period-related varieties within a group. In general, varieties were grouped according to genotype, but not harvesting period.
It seemed clear that the development stage had an influence on the fruit secondary metabolite composition as shown in previous research. Accordingly, the 37 Citrus samples were analyzed by chemotaxonomic classification using the 79 secondary metabolites as reference markers. HCA showed that those Citrus samples were divided into four branches (QP, CP, ZS and ZQ) according to their secondary metabolite chemotaxonomy ( Figure 4B). Compared with primary metabolites, the HCA results of secondary metabolites showed different results for the chemotype-based classification methods, which reflected how the secondary metabolites play an important role in the chemotaxonomic classification of the Citrus harvesting period.

Potential Chemotaxonomic Markers Associated with Primary Metabolites
A two-stage ROC curve analysis was carried out to identify potential markers associated with species. ROC analysis is a useful tool for evaluating the accuracy of a statistical model (e.g., logistic regression, linear discriminate analysis). The area under the ROC curve is a summary measure that essentially averages diagnostic accuracy [35].
As a result, 48 altered primary metabolites, with areas under the ROC curves ranging from 0.85-1 ( Figure S1, Table 1 and Table S3), were considered to show the greatest diagnostic accuracy. Subsequently, an ROC curve-based model was established to assess the integrated predictive power of the combined 48 altered metabolites to distinguish C. aurantium L. from C. reticulate Blanco. The AUC value of the established model was 1.000, which showed a good ability for discriminating the two different Citrus species. Thus, those 48 altered metabolites (Table 1) can be defined as potential primary chemotaxonomic markers that are associated with different species.

Potential Chemotaxonomic Markers Associated with Secondary Metabolites
In this study, a detailed analysis of potential markers was conducted to determine the chemotype-features of these closely-related TCMs and the associations between the species and ripening stages of Citrus species [36]. To identify the metabolites that are significantly affected by ripening stages, OPLS-DA modeling was performed on the profiling datasets. The model separated young samples from ripe samples of Citrus by discriminating, according to their difference in light intensity. The OPLS-DA model of group QP/CP and ZS/ZQ explained (R 2 = 0.812 and 0.886, respectively) and predicted (Q 2 = 0.991 and 0.935, respectively) the total variance [37]. S-plots ( Figure 5A,B) were constructed by presenting covariance (p) against correlation (pcorr), and the potential chemotaxonomic markers for the separation of shading effects were obtained by filtering with the variables that had an influence on the projection (VIP) > 1 and p < 0.05 in the statistical analysis. VIP was a weighted sum of squares of the PLS weight, and a value >1 was generally used as a criterion to identify the variables that were important to the model. Although many of the significantly differing components remained unknown, a total of 50 potential biomarkers was identified from the PLS-DA and S-plot as chemotaxonomic markers for ripening stages. The p-values of potential markers between groups (expressed as QP/CP and ZS/ZQ in Table 2) were calculated

Potential Chemotaxonomic Markers Associated with Primary Metabolites
A two-stage ROC curve analysis was carried out to identify potential markers associated with species. ROC analysis is a useful tool for evaluating the accuracy of a statistical model (e.g., logistic regression, linear discriminate analysis). The area under the ROC curve is a summary measure that essentially averages diagnostic accuracy [35].
As a result, 48 altered primary metabolites, with areas under the ROC curves ranging from 0.85-1 ( Figure S1, Table 1 and Table S3), were considered to show the greatest diagnostic accuracy. Subsequently, an ROC curve-based model was established to assess the integrated predictive power of the combined 48 altered metabolites to distinguish C. aurantium L. from C. reticulate Blanco. The AUC value of the established model was 1.000, which showed a good ability for discriminating the two different Citrus species. Thus, those 48 altered metabolites (Table 1) can be defined as potential primary chemotaxonomic markers that are associated with different species.

Potential Chemotaxonomic Markers Associated with Secondary Metabolites
In this study, a detailed analysis of potential markers was conducted to determine the chemotype-features of these closely-related TCMs and the associations between the species and ripening stages of Citrus species [36]. To identify the metabolites that are significantly affected by ripening stages, OPLS-DA modeling was performed on the profiling datasets. The model separated young samples from ripe samples of Citrus by discriminating, according to their difference in light intensity. The OPLS-DA model of group QP/CP and ZS/ZQ explained (R 2 = 0.812 and 0.886, respectively) and predicted (Q 2 = 0.991 and 0.935, respectively) the total variance [37]. S-plots ( Figure 5A,B) were constructed by presenting covariance (p) against correlation (pcorr), and the potential chemotaxonomic markers for the separation of shading effects were obtained by filtering with the variables that had an influence on the projection (VIP) > 1 and p < 0.05 in the statistical analysis. VIP was a weighted sum of squares of the PLS weight, and a value >1 was generally used as a criterion to identify the variables that were important to the model. Although many of the significantly differing components remained unknown, a total of 50 potential biomarkers was identified from the PLS-DA and S-plot as chemotaxonomic markers for ripening stages. The p-values of potential markers between groups (expressed as QP/CP and ZS/ZQ in Table 2) were calculated from their peak intensity to show the effects of different developmental stages of Citrus. As a result, flavones played a significantly important role in discriminating the ripening stage of Citrus; for example, sinensetin decreased and rutin increased in ripe fruits.  To explore the ripening effect by development stages, another multivariate statistical analysis was performed on the datasets from the two different species. Again, the two groups of samples were well separated by discriminating according to their differences. The OPLS-DA model explained more than 80% (R 2 ) and predicted more than 95% (Q 2 ) of the total variances. Analysis of the S-plot ( Figure 5C) showed that most coumarins increased in the C. aurantium L. species, while flavonol, To explore the ripening effect by development stages, another multivariate statistical analysis was performed on the datasets from the two different species. Again, the two groups of samples were well separated by discriminating according to their differences. The OPLS-DA model explained more than 80% (R 2 ) and predicted more than 95% (Q 2 ) of the total variances. Analysis of the S-plot ( Figure 5C) showed that most coumarins increased in the C. aurantium L. species, while flavonol, flavonol glycosides and anthocyanin increased in C. reticulate Blanco. A total of 57 potential biomarkers were identified as species chemotaxonomic markers, with a p-value between the group (C. aurantium L. and C. reticulate Blanco.), which is shown in Table 2.
Next, the PLS-DA-based receiver operating characteristic (ROC) curve was performed on each group to further find specific chemotaxonomic markers. This contribution map allowed comparisons between contrasting species groups and ripening stage groups ( Figure S2), and it highlighted the potential regulatory nodes involved in different group variability by ROC curve analysis. In general, metabolites with areas under the ROC curves ranging from 0.85-1 are marked in the figure. As is shown, 25, 34 and 47 metabolites have been marked for QP/CP, ZS/ZQ and species group again, which is shown in Table 2. The AUC value of the established model is 0.934, 1.000 and 0.984 ( Figure S2), which showed a good ability for discriminating.

The Pathway Analysis Associated with Chemotaxonomic Markers
Pathway analysis has been used to determine vegetative metabolic regulation that chiefly affected the following pathways: flavone and flavonol biosynthesis; flavonoid biosynthesis; anthocyanin biosynthesis; glyoxylate and dicarboxylate metabolism; alanine, aspartate and glutamate metabolism; starch and sucrose metabolism; amino sugar and nucleotide sugar metabolism; citrate cycle (TCA cycle); pyrimidine metabolism; pentose and glucuronate interconversions; lysine biosynthesis; cysteine and methionine metabolism; butanoate metabolism; beta-alanine metabolism; phenylalanine, tyrosine and tryptophan biosynthesis; nicotinate and nicotinamide metabolism; glycerophospholipid metabolism; cyanoamino acid metabolism; pyrimidine metabolism; carbon fixation pathways in photosynthetic organisms metabolism; and phenylpropanoid biosynthesis.
Based on the primary metabolites and secondary chemotaxonomic markers, a comprehensive metabolic network for the discrimination of the species and ripening stages of Citrus fruits was mapped by KEGG (Kyoto Encyclopedia of Genes and Genomes) and MetaboAnalyst 3.0 [38] (shown in Figure 6A-C). Twenty-four metabolic pathways were disturbed in the QP/CP group, ZS/ZQ group and CA/CS group. The top three metabolic pathways with an impact value >0.2 influencing the species were glycerophospholipid metabolism, cyanoamino acid metabolism and flavone and flavonol biosynthesis; at the same time, three of the metabolic pathways were considered to be the most pertinent to the ripening stages in both the QP/CP and ZS/ZQ groups, including flavone and flavonol biosynthesis, flavonoid biosynthesis and phenylpropanoid biosynthesis. flavonol glycosides and anthocyanin increased in C. reticulate Blanco. A total of 57 potential biomarkers were identified as species chemotaxonomic markers, with a p-value between the group (C. aurantium L. and C. reticulate Blanco.), which is shown in Table 2.
Next, the PLS-DA-based receiver operating characteristic (ROC) curve was performed on each group to further find specific chemotaxonomic markers. This contribution map allowed comparisons between contrasting species groups and ripening stage groups ( Figure S2), and it highlighted the potential regulatory nodes involved in different group variability by ROC curve analysis. In general, metabolites with areas under the ROC curves ranging from 0.85-1 are marked in the figure. As is shown, 25, 34 and 47 metabolites have been marked for QP/CP, ZS/ZQ and species group again, which is shown in Table 2. The AUC value of the established model is 0.934, 1.000 and 0.984 ( Figure S2), which showed a good ability for discriminating.

The Pathway Analysis Associated with Chemotaxonomic Markers
Pathway analysis has been used to determine vegetative metabolic regulation that chiefly affected the following pathways: flavone and flavonol biosynthesis; flavonoid biosynthesis; anthocyanin biosynthesis; glyoxylate and dicarboxylate metabolism; alanine, aspartate and glutamate metabolism; starch and sucrose metabolism; amino sugar and nucleotide sugar metabolism; citrate cycle (TCA cycle); pyrimidine metabolism; pentose and glucuronate interconversions; lysine biosynthesis; cysteine and methionine metabolism; butanoate metabolism; beta-alanine metabolism; phenylalanine, tyrosine and tryptophan biosynthesis; nicotinate and nicotinamide metabolism; glycerophospholipid metabolism; cyanoamino acid metabolism; pyrimidine metabolism; carbon fixation pathways in photosynthetic organisms metabolism; and phenylpropanoid biosynthesis.
Based on the primary metabolites and secondary chemotaxonomic markers, a comprehensive metabolic network for the discrimination of the species and ripening stages of Citrus fruits was mapped by KEGG (Kyoto Encyclopedia of Genes and Genomes) and MetaboAnalyst 3.0 [38] (shown in Figure 6A-C). Twenty-four metabolic pathways were disturbed in the QP/CP group, ZS/ZQ group and CA/CS group. The top three metabolic pathways with an impact value >0.2 influencing the species were glycerophospholipid metabolism, cyanoamino acid metabolism and flavone and flavonol biosynthesis; at the same time, three of the metabolic pathways were considered to be the most pertinent to the ripening stages in both the QP/CP and ZS/ZQ groups, including flavone and flavonol biosynthesis, flavonoid biosynthesis and phenylpropanoid biosynthesis.

The PB Test of the Analysis Factor
A PB design is utilized to find experimental designs for investigating the dependence of some measured quantity on a number of independent variables to minimize the variance of the estimates of these dependencies using a limited number of experiments [39]. It was performed to investigate which factors significantly affected the different markers, as well. The two factors that we discussed below were assessed, including species (A) and ripening stages (B) (shown in Table S4). The contents of analyses were determined to be the influencing parameters of the markers. Main effects plots were used to examine differences between mean levels for those factors. Different levels of a factor affected the response differently. In main effect plots, if the line were parallel to the x-axis, there was no effect in reality. Conversely, this means a main effect. The steeper the slope of the line, the greater the magnitude of the main effect was observed. As shown in Figure S3, two factors seem to affect the content of the analyses because the lines are not horizontal. The slopes of the species and ripening stages curves in these graphs are all proportional to the absolute value of the estimated effects and the factors that have a significant effect on the response. Different chemotaxonomic markers have different effects on species and ripening stage.

Analyzing a Wide Range of Secondary Metabolites to Understand the Metabolic Network
Flavonoids: This class of metabolites has great bioactivities, such as antioxidant activity, antifungal activity, antiparasitic activity and the beneficial health properties of Citrus. Indeed, the high radical scavenging activity in Citrus has been almost rarely associated with flavonoids and other phenolic constituents. In this study, from the same flavanone core, several following derivatives were identified by substitution with methyl groups or hexose moieties: naringin, hesperidin, narirutin, neohesperidin and eriodictyol (Figure 7 and Table 2). Among this group, the most widespread compounds were hesperidin, narirutin, sakuranetin and eriodictyol 7-O-neohesperidoside. These flavanone metabolites appeared to be found at higher concentrations in C. aurantium L. species. Regarding the ripening stage, hesperetin, hesperidin, neohesperidin and neoeriocitrin were marked as markers. The synthesis of flavonoid has been reported to start from the flavanone naringenin by continuous transfer of glycosyl groups (a first step by which glucose is transferred to oxygen in position 7, generating a 7-O-glucoside). In turn, a 1,6 rhamnosyl transferase renders the hesperidosides (or rutinosides) hesperidin and narirutin. Conversely, the action of 1,2 rhamnosyl transferase on flavanone 7-O-hexosides generates the neohesperidosides neohesperidin and naringin [40].
Flavone and flavonols: Flavones and flavonols synthesized from the same flavanone naringenin by hydroxylation include isorhamnetin, kaempferol and quercetin. One of the most important flavonols, rutin, has shown the highest accumulation in immature samples, which was the most different from other flavonols. This may be representative of a better flavonoid 3-monooxygenase activity in these imam ture samples. Moreover, isorhamnetin 3-O-rutinoside also showed a higher accumulation in immature samples and was derived from the addition of a hexose moiety to oxygen at position 3 catalyzed [40]. In our investigation, the variation trends of flavonol/flavone glycosides were found to depend on both aglycone species and ripening stages (Figure 7). The flavonoid content, as observed for apigenin, luteolin, chrysoeriol, vitexin-2-Orhamnoside, diosmetin, tetramethyl-isoscutellarein, sinensetin, tangeretin, nobiletin, diosmin, neodiosmin, luteolin-3 -7-di-O-glucoside, diosmetin-7-O-Glucoside, 5-demethylnobiletin and vitexin, was significantly elevated in QP samples and especially in ZS samples. By contrast, orientin and myricetin had higher contents in mature samples, both in CP and ZQ samples. Almost all of the flavonoids were markers in both the ripened stage and in spices; however, polymethoxylated flavones, such as sinensetin, tangeretin and 5-demethylnobiletin, did not perform well in the identified species. Regarding flavonoids, temperature has an important effect on the biosynthesis and accumulation of phenolics in many plant species. Sub-groups of flavonoids were shown to be differently affected, which might be plant species dependent. For instance, flavonol in tomato increased when the temperature was low (18-12 °C), whereas grape plants grew under high temperatures (30-35 °C) [41]. Zhang et al. reported that lower temperature decreased flavonols and their glycosides in tea species [42].
Coumarins: The pathway of coumarin biosynthesis was largely discovered during the 1960s and 1970s, with the help of tracer feeding experiments. Some research indicated that umbelliferon is derived from cis-coumaric acid, whereas coumarin originates from cis-cinnamic acid [43]. As the beginning of coumarin biosynthesis, umbelliferon plays an important role in both the discrimination of species and ripening stage. For other coumarins, aurapten, epoxyaurapten and limettin have been Regarding flavonoids, temperature has an important effect on the biosynthesis and accumulation of phenolics in many plant species. Sub-groups of flavonoids were shown to be differently affected, which might be plant species dependent. For instance, flavonol in tomato increased when the temperature was low (18-12 • C), whereas grape plants grew under high temperatures (30-35 • C) [41]. Zhang et al. reported that lower temperature decreased flavonols and their glycosides in tea species [42].
Coumarins: The pathway of coumarin biosynthesis was largely discovered during the 1960s and 1970s, with the help of tracer feeding experiments. Some research indicated that umbelliferon is derived from cis-coumaric acid, whereas coumarin originates from cis-cinnamic acid [43]. As the beginning of coumarin biosynthesis, umbelliferon plays an important role in both the discrimination of species and ripening stage. For other coumarins, aurapten, epoxyaurapten and limettin have been identified as reflecting the ripening stage and species, whereas herniarin, 5,7-dihydroxycoumarin and 5-methoxy-7-hydroxycoumarin were only ripening stage factors [44].
While coumarin biosynthesis has remained a black box, several enzymes of the furanocoumarin pathway were isolated and characterized. Umbelliferone, rather than coumarin, is also the parent compound of furanocoumarins, as was reported a long time ago [44]. It is first prenylated at the 8-position for angular furanocoumarins to yield psoralen as the furanocoumarin beginning. Furanocoumarin performed well in identifying Citrus. Xanthotoxol, bergaptol, psoralen, xanthotoxin and bergapten 6 -7 -dihydroxybergamottin were both markers of species and ripening stages, and almost all furanocoumarin had a higher content in the C. aurantium L. species.
Abscisic acid (ABA) and derivatives: The pathway starting from ABA has two major branches: the catabolic and conjugating branches. The catabolic branch starts with the transfer of ABA into 8 -hydroxy ABA, which is catalyzed by ABA 8 -hydroxylase and then isomerizes to phaseic acid (PA). This metabolite will further catabolize to dehydrophaseic acid (DPA). The conjugating branch involves the temporary storage of ABA into a glycosylated form, catalyzed by an UDP-ABA glycosyl transferase ( Figure 7). Actually, in this research, ABA had a higher content in mature samples [45].
Limonoids and glycosides: Limonoids are highly oxygenated triterpenes that are present in Rutaceae and Meliaceae. These metabolites are derived from squalene, although the first true limonoid precursor is nomilin, which can be directly glucosylated by a limonoid UDP-glucosyl transferase or also deacetylated (Figure 7), rendering obacunone [45]. All of the limonoids had higher contents in the C. aurantium L. species, especially in ZQ samples. Limonin could be used as a marker of the ripening stages in our study, and three other limonoids were both markers of ripening stage and species.
The phenylpropanoid biosynthesis, flavone and flavonol biosynthesis and flavonoid biosynthesis pathways are the most critical pathways for the synthesis of characteristic phenolic metabolites, including flavones, flavonols, flavanones and their glycosides, abscisic acid and procyanidins. Flavonoids have been reported to be lower in Citrus that is cultured in low temperature and low sun exposure conditions. Herein, we mapped the changes of metabolites involved in these pathways in Figure 7. The level of shikimate, which is a substrate of the phenylpropanoid biosynthesis pathway, was lower in CP and ZQ than in QP and ZS. At the same time, downstream metabolite products, such as cis-coumaric acid, abscisic acid, flavone glycosides, flavan-3-ols, procyanidins, coumarin and furanocoumarin, were significantly higher, which indicated more vigorous biosynthesis of phenolic secondary metabolites in QP and ZS than in CP and ZS. These results agreed with the study showing that the expression of structural genes encoding biosynthesis of flavonoids and the activity of some important enzymes increased under high light intensity and therefore led to a subsequent increase in the contents of flavonoids. This work paves the way for further analyses aiming to describe the control of the metabolism of phenolic metabolites in Citrus in response to genetic and ripening factors.  . (Beijing, China). The purity of the standards was relatively high (i.e., higher than 98%).

Reference Standard Solution
Reference standards were dissolved in 50% methanol (1.0 mg/mL) and then stored at 4 • C until analysis.

Extraction of Plant Material
A total of 37 batches of four different Citrus medicinal plants were collected from the Jiangxi, Sichuan, Hunan, Hubei and Zhejiang provinces and Chongqing municipality in China. (Table 3) Dried voucher specimens were deposited at the Institute of Basic Theory, China Academy of Chinese Medical Sciences, Beijing, China. The samples used for ACQUITY UPLC HSS T3 column extract detection were pulverized and sifted through a 60-mesh sieve. The method of sample preparation established and validated in our previous study was used. Accurately weighed powder (1.0 g) was placed in a conical flask. The material was sonicated three times with 10 mL of 50% MeOH for 30 min. The extraction solutions were filtered. The supernatant solution was combined and filtered through a 0.22-µm filter membrane. Each sample was extracted in parallel three times and then analyzed by UPLC-Q-TOF-MS.   As the samples for the ACQUITY UPLC BEH HILIC column method, the dried powder (1.0 g) was weighed accurately into a 10-mL conical flask with a stopper, and 10 mL water were added accurately. After weighing the filled flask accurately, ultrasonication (40 kHz) was carried out at room temperature for 30 min, then weighing again, and the same solvent (water) was added to compensate for the lost weight during the extraction as needed. After centrifugation (15,000× g, 10 min), the supernatant was stored at 4 • C and filtered through a 0.22-µm polytetrafluoroethylene filter before injection into the HILIC-UPLC-Q-TOF/MS system for analysis.
The extraction protocols and the repeatability data of samples have been reported in our pervious article [19,46].

Chromatographic Conditions
Chromatographic separation was performed on the Waters ACQUITY UPLC System (Waters Corp. Milford, MA, USA), equipped with a binary solvent delivery system and an autosampler. The extracts were performed on an ACQUITY UPLC HSS T3 column (100 mm × 2.1 mm, 1.8 µm). Water with 0.1% (v/v) formic acid and acetonitrile were used as Mobile Phases A and B, respectively, for chromatographic elution: from 0-2 min, Phase B was linearly increased from 0-20%, then linearly increased to 50% at 7.0 min and after that, linearly increased to 100% at 11 min and maintained for 3.0 min; Phase B was adjusted to 0% at 14.1 min for re-equilibration and maintained for 0.9 min. The total elapsed time required for a given chromatographic analysis was thus 15.0 min. The columns were maintained at 45 • C and eluted at a flow rate of 0.30 mL/min. The injection volume was 2 µL. The chromatograms of ACQUITY UPLC HSS T3 column have been put in the Supplemental Materials ( Figure S8).
The extracts of samples were performed on the ACQUITY UPLC BEH HILIC column (100 mm × 2.1 mm, 1.7 µm). With the same Mobile Phases A and B with the ACQUITY UPLC HSS T3 column method, the gradient program for these samples was: from 0-1 min, 99% Phase B, then decreased at 70% B from 1-10 min, and remained at 99% Phase B from 10-15 min. The chromatograms of the ACQUITY UPLC BEH HILIC column have been put in the Supplemental Materials ( Figure S9).
On the other side, we have demonstrated the chromatographic conditions in our pervious research [19].

Mass Spectrometry Condition
Mass spectrometry data were collected by using a Q-TOF analyzer in a Xevo G2 QTOF Mass system (Waters Corporation, Milford, MA, USA) in both positive and negative ion modes. The source temperature was set at 100 • C with a cone gas flow of 25.0 L/h, a desolvation gas flow of 800.0 L/h and a desolvation gas temperature of 400 • C. For the positive and negative ion modes, the capillary voltage was set to 3.0 kV, and the cone voltage was set to 40 V. The mass spectrometric data were collected in centroid mode from m/z 50-1200 with a scan time of 0.1 s and an interscan delay of 0.014 s over a 15.0-min analysis time. Leucine-enkephalin was used as the lock mass (m/z 556.2771 in positive mode and m/z 554.2615 in negative mode) at a concentration of 0.5 µg/mL with a flow rate of 30 µL/min The lock spray frequency was set at 20 s, and the lock mass data were averaged over 10 scans for correction. Compared with the negative ion mode, positive ion mode performed well in this study.

Data Processing and Statistical Analysis
Raw data files acquired from the UPLC-Q-TOF/MS analysis were imported into the Progenesis QI (Waters, Manchester, U.K.), which is used for UPLC-Q-TOF/MS data processing and database researching to pathway interrogation, to generate a peak table that included information on retention time, mass-to-charge ratio (m/z) and MS intensity of the metabolites in the sample. Additionally, the retention time tolerance and mass tolerance for the peak alignment were set to 0.20 and 0.05, respectively.
In order to perform the multivariate statistical analysis, the data we obtained from the UPLC-Q-TOF/MS including sample names (observations), the arbitrary compound index (m/z) and peak areas (variables) were introduced into the SIMCA-P 13.5 software (Umetric, Umeå, Sweden). The raw data were subjected to principal component analysis (PCA) to visualize general clustering, trends or outliers among the observations. Partial least-squares discriminant analysis (PLS-DA) was utilized to validate the PCA model and identify the differential metabolites. R 2 represented the explanation capacity of the model (R 2 X and R 2 Y represent the fraction of the variance of X matrix and Y matrix), whereas Q 2 suggested the predictive accuracy of the model. The cumulative values of R 2 X, R 2 Y and Q 2 close to 1 meant an excellent model. The differential metabolites were selected by the statistically-significant threshold of the variable influence on projection (VIP) values. Heat-map analysis, pathway analysis and ROC curves were performed using MetaboAnalyst 3.0, which is an open bioinformatics website for metabolite data interpretation. The significance level of the metabolite differences between groups was calculated by Student's t-test using the SPSS software (Version 22.0, International Business Machines Corp., Armonk, NY, USA). Results were considered significant when the p-value was less than 0.05.
Putative metabolites were first derived by searching the exact molecular mass data from redundant m/z peaks against the online HMDB (http://www.hmdb.ca/), PubMed (http://www. ncbi.nlm.nih.gov/pubmed/), LMSD (http://www.lipidmaps.org/), KEGG (http://www.kegg.jp/) and ChemSpider (http://www.chemspider.com) databases. A specific metabolite would be sieved out when a match with a difference between observed and theoretical mass was less than 5 ppm. Then, the metabolite molecular formula of matched metabolites was further identified by the isotopic distribution measurement.

Conclusions
In this study, a metabolomics-guided chemotaxonomic classification strategy was applied to investigate four closely-related Citrus TCMs. A UPLC-Q-TOF-MS method was employed for the analysis of primary and secondary metabolites from Citrus TCMs with different species and ripening stages. It indicated that different Citrus TCMs have different metabolic profiles depending on genotype-regulated primary/secondary metabolites and that the development stage influenced secondary metabolites, which could have an impact on Citrus nutritional properties for quality control and efficient clinical usage. Thus, the genotype is expected to be the major contributing factor in determining fruit compositional properties related to species; however, it is worth noting that different growth stages may give rise to new varieties that can also affect the fruit chemical composition. Moreover, multivariate statistical analysis coupled with MetaboAnalyst permits a comprehensive depiction of the variable chemotaxonomic markers and the elucidation of the biosynthesis mechanism of Citrus TCMs from different species and different ripening stages. The top three metabolic pathways with an impact value >0.2 influencing the species were glycerophospholipid metabolism, cyanoamino acid metabolism and flavone and flavonol biosynthesis. Meanwhile, three of the metabolic pathways were considered to be the most pertinent to the ripening stages in both the QP/CP and ZS/ZQ groups, including flavone and flavonol biosynthesis, flavonoid biosynthesis and phenylpropanoid biosynthesis.