Discriminant Analysis of Pu-Erh Tea of Different Raw Materials Based on Phytochemicals Using Chemometrics

Pu-erh tea processed from the sun-dried green tea leaves can be divided into ancient tea (AT) and terrace tea (TT) according to the source of raw material. However, their similar appearance makes AT present low market identification, resulting in a disruption in the tea market rules of fair trade. Therefore, this study analyzed the classification by principal component analysis/hierarchical clustering analysis and conducted the discriminant model through stepwise Fisher discriminant analysis and decision tree analysis based on the contents of water extract, phenolic components, alkaloid, and amino acids, aiming to investigate whether phytochemicals coupled with chemometric analyses distinguish AT and TT. Results showed that there were good separations between AT and TT, which was caused by 16 components with significant (p < 0.05) differences. The discriminant model of AT and TT was established based on six discriminant variables including water extract, (+)-catechin, (−)-epicatechin, (−)-epigallocatechin, theacrine, and theanine. Among them, water extract comprised multiple soluble solids, representing the thickness of tea infusion. The model had good generalization capability with 100% of performance indexes according to scores of the training set and model set. In conclusion, phytochemicals coupled with chemometrics analyses are a good approach for the identification of different raw materials.


Introduction
Pu-erh tea is defined as a geographical indication product by the General Administration of Quality Supervision, Inspection, and Quarantine of the People's Republic of China (GB/T 22111-2008), and is one of the most popular tea beverages in Asian countries; in particular, southwestern China and South Asian countries attribute this to its unique flavors and beneficial effects on human health [1]. It is processed from the sun-dried green tea leaves and can be classified into ancient tea (Gu-shu cha, AT) and terrace tea (Tai-di cha, TT) based upon the source of raw material [2,3]. AT is collected from ancient tea gardens that have good economic and ecological efficiencies such as climate regulation, water and soil conservation whereas TT is gathered from terrace tea plantations that rely on the good management of tea fields including fertilization, pruning, and pesticide spraying. The differences in growth environments and management methods of both result in different flavors: compared with TT, AT has a richer taste with durability and a more distinctive aroma, which is widely considered to have more preservation value [4]. However, their similar appearance has led to low market identification of AT, which seriously damages the interests of consumers and the reputation of tea producers. Meanwhile, lack of yield and

Tea Samples
As shown in Table 1, a total of 80 samples including 30 ATs and 50 TTs were collected from five ancient tea gardens (including Banpen, Laobanzhang, Hekai, Xinbanzhang, Laoman'e) and nine terrace tea plantations (including Nannuo, Bulang, Mensong, Lincang, Lancang, Xiding, Gelanghe, Menla, Dali) in Yunnan Province. The raw Pu-erh samples were processed through five steps including picking, withering, green removing, rolling and twisting, and sun-drying, and marked with the corresponding numbers according to the sample names before grinding. All samples were stored at −4 • C until further analysis. WE in the AT and TT was detected by the constant temperature drying method [24] with some modifications. Briefly, 1 g of the ground sample (m 0 ) was soaked in 150 mL boiled distilled water for 45 min. After washing and filtration using 75 mL boiled distilled water, tea grounds were put in an oven (120 ± 2 • C) to bake for 4 h before weighing (m 1 ). The WE content was expressed as (m 0 − m 1 ) × 1000/m 0 mg/g.

Total Phenolics (TPC)
The determination of TPC content was performed using Folin-Ciocalteu reagent [25]. Briefly, 0.2 g of the ground sample and 5 mL of 70% methanol (70 • C preheat) were placed in a 70 • C water bath pot for 10 min and centrifuged at 3500 rpm for 10 min after being cooled. The above operation was repeated, and all supernatants were merged to a constant volume of 10 mL. Then, 1 mL of the sample or water, appropriately diluted, was taken, followed by 5 mL of 10% Folin-Ciocalteu reagent. After 5-8 min at room temperature, 4 mL of 7.5% sodium carbonate solution was added and the mixture was placed at 25 ± 2 • C in the dark for 1 h. The absorbance was measured at 765 nm using a Synergy H1MG microplate reader (Synergy H1MG; BioTek Instruments Inc., Winooski, VT, USA). Gallic acid (0-0.0055 mg/g) was used as the reference standard, and the results were expressed as gallic acid equivalents per gram sample (mg/g).

Total Free Amino Acids (TFAAs)
The TFAAs contents of samples were identified using the ninhydrin colorimetric assay [26]. Briefly, 1 g of the ground sample and 150 mL water were placed in a boiling water bath for 45 min, and filtrated through decom-pressure filtration was added to water, yielding a 150 mL volume. Then, a 1 mL of sample was taken, followed by 0.5 mL phosphate buffer (pH 8) and 0.5 mL 2% ninhydrin. After shaking, the solution was placed in a boiling water bath for 15 min. Then samples were cooled to 25 ± 2 • C and water was added to a 25-mL volume before measurement at 570 nm using a Synergy H1MG microplate reader (Synergy H1MG; BioTek Instruments Inc., Winooski, VT, USA). Glu (0-0.6 mg/g) was used as the reference standard, and the results were expressed as Glu equivalents per gram sample (mg/g).

Free Amino Acids
The free amino acids were detected according to Lu et al. [29] using an Amino Acid Analyzer (L-8900, Hitachi, Tokyo, Japan). Briefly, a 4 mL sample and 4 mL 10% sulfosalicylic acid were added to a 10-mL tube. After one night, the sample was filtered through a 0.45-µm organic membrane (Jinteng Experimental Equipment Co. Ltd., Tianjin, China) before analysis. The Amino Acid Analyzer system used a mobile phase involving lithium citrate and UV-Vis detection at 440 nm and 570 nm. The flow rates were 0.35 mL/min for the mobile phase and 0.3 mL/min for the derivatization reagent. The column temperature was set to 50 • C, and the post-column reaction equipment was maintained at 135 • C. The temperature of the autosampler was kept at 4 • C, and the injection volume was 20 µL for the standard and samples. The free amino acids levels were expressed as mg/g.

Chemometric Analyses
All data were subjected separately to PCA and HCA to detect whether phytochemicals could be used to classify AT and TT. HCA and PCA were respectively performed using IBM SPSS Statistics software (Version 22, SPSS Inc., Chicago, IL, USA). An analysis of variance (ANOVA) was also used by IBM SPSS Statistics software (Version 22, SPSS Inc., Chicago, IL, USA) to determine significant differences (p < 0.05). Regarding the establishment of discriminant models, stepwise Fisher discriminant analysis (SFDA) was performed based on the model set. The model set consisted of 20 ATs and 40 TTs chosen randomly from 80 samples, and the other samples (10 ATs and 10 TTs) were the training set to check the accuracy of the model by the leave-one-out method (LOO). Based on discriminant variables from SFDA, DTA was further used to detect the classification of AT and TT, thereby establishing the model. The optimal model of AT and TT was selected by performance indexes and distance measurement. Both SFDA and DTA were applied by IBM SPSS Statistics software (Version 22, SPSS Inc., Chicago, IL, USA), and the evaluation performance indexes including accuracy, precision, recall, and F-score were calculated through the confusion matrix, as shown in Table 2. The formulas were as follows [30]:

Feasibility of Phytochemicals to Classification of AT and TT
Based on WE, TPC, TFAAs, six catechin components, two purine alkaloids, and fifteen free amino acids, PCA was used to explore whether the AT and TT could be distinguished through phytochemicals in this study. As shown in Table 3, the seven principal components (PCs) had an eigenvalue greater than 1 and explained 80.352% of the cumulative percentage of variance, among which PC1, PC2, and PC3 extracted according to the Kaiser criterion represented 24.542%, 19.348%, and 11.261%, respectively, of the variability of raw materials. Obviously, the first three PCs were the main components [31]. Figure 1a shows the score plots of PC1 versus PC3 and it could be observed that all samples were distributed in three areas, among which 11 TTs (T6, T7, T14, T15, T16, T17, T22, T23, T32,  T46, and T48) were distributed in an area exhibiting positive scores of PC1, and other samples with cross distribution in the remaining two areas showed negative scores of PC1. Further combined with 3D score plots of PC1 versus PC2 versus PC3 (Figure 1b), it was found that 30 ATs were distributed around with TTs as the center, among which the 12 ATs (A29, A30, A13, A24, A27, A22, A28, A26, A21, A23, A20, and A25) in the left-upper corner showed negative scores of PC2 and positive scores of PC3; five ATs (A17, A18, A15, A16, and A19) in the right-upper corner showed positive scores of PC2 and PC3; and 13 ATs (A14, A11, A9, A3, A10, A12, A1, A6, A4, A7, A2, A8, and A5) in the right-lower corner showed positive scores of PC2 and negative scores of PC3.  Meanwhile, HCA also showed that when the square Euclidean distance was set at 14, all samples could be divided into eight clusters (clusters-A, B, C, D, E, F, G, and H) ( Figure 2). 30 ATs were distributed in clusters-A, D, and H, corresponding to the sample distribution in the blue, orange, and red circles in Figure 1b, respectively. However, A19 was distributed in the red circle (cluster-H) in Figure 1b in the PCA while it was classified into cluster-D in the HCA. The reason might be that the new variables generated in the dimensionality reduction process of PCA produced different results. The clusters-B, C, E, Meanwhile, HCA also showed that when the square Euclidean distance was set at 14, all samples could be divided into eight clusters (clusters-A, B, C, D, E, F, G, and H) (Figure 2). 30 ATs were distributed in clusters-A, D, and H, corresponding to the sample distribution in the blue, orange, and red circles in Figure 1b, respectively. However, A19 was distributed in the red circle (cluster-H) in Figure 1b in the PCA while it was classified into cluster-D in the HCA. The reason might be that the new variables generated in the dimensionality reduction process of PCA produced different results. The clusters-B, C, E, F, and G included seven, twenty-one, two, nine, and eleven TTs, respectively, among which clusters-F and G corresponded to the sample distribution in the gray and green circles in Figure 1a,b, respectively; while other samples in Figure 1b corresponded to clusters-B, C, and E. In short, PCA and HCA can be used to explain whether teas could be distinguished according to season, year, production place, processing technologies, and category variations [32][33][34][35]. In this study, the results revealed that phytochemicals were the dominating influence factors in the classification of AT and TT, which could be used as identification indicators of both.
respectively; while other samples in Figure 1b corresponded to clusters-B, C, and E. In short, PCA and HCA can be used to explain whether teas could be distinguished according to season, year, production place, processing technologies, and category variations [32][33][34][35]. In this study, the results revealed that phytochemicals were the dominating influence factors in the classification of AT and TT, which could be used as identification indicators of both.

Comparison of Phytochemicals Differences Related with a Classification of AT and TT
The phytochemical contents related to the classification of AT and TT are presented in Table S1 (Supplementary Materials), and the differences were analyzed based on the

Comparison of Phytochemicals Differences Related with a Classification of AT and TT
The phytochemical contents related to the classification of AT and TT are presented in Table S1 (Supplementary Materials), and the differences were analyzed based on the average levels of both. As shown in Figure 3a, the average levels of WE and TFAAs in AT were significantly (p > 0.05) higher than those in TT, whereas there was no significant difference in the average level of TPC between AT and TT. In terms of catechin components (Figure 3b), the average level of EC in AT was up to 34.3 mg/g, which was significantly higher than that in TT, while the contents of EGC, C, and GCG in AT were significantly lower than that in TT. Theacrine, as a kind of purine alkaloid, had a similar chemical structure to caffeine. It is the key component of Yunnan Kucha (also called bitter Pu-erh tea or Pu-erh Kucha tea), which not only has a beneficial effect on the human body but also a more bitter taste than caffeine [36,37]. As presented in Figure 3b, the level of theacrine in AT was significantly higher than that of TT, which indicated that the bitterness of AT was more remarkable than that of TT. Compared with the content of theacrine, the level of caffeine of AT and TT reached 22.6 and 61.6 folds, respectively, among which the content of caffeine in AT was significantly higher than that in TT. Additionally, free amino acids of AT and TT were dominant with theanine and Glu, according to Figure 3c. Theanine in AT and TT was up to 12.0 mg/g and 7.96 mg/g, respectively, and there was a significant difference between both; Glu in AT and TT had no significant difference. Asp and Ser were detected only in TT. Those findings were similar to those of Zhang [38] but opposite to the results on the levels of WE and TFAAs in AT and TT as reported by Liang et al. [6]. The disagreement might be due to the influence of processing technology.
Pu-erh tea or Pu-erh Kucha tea), which not only has a beneficial effect on the human body but also a more bitter taste than caffeine [36,37]. As presented in Figure 3b, the level of theacrine in AT was significantly higher than that of TT, which indicated that the bitterness of AT was more remarkable than that of TT. Compared with the content of theacrine, the level of caffeine of AT and TT reached 22.6 and 61.6 folds, respectively, among which the content of caffeine in AT was significantly higher than that in TT. Additionally, free amino acids of AT and TT were dominant with theanine and Glu, according to Figure 3c. Theanine in AT and TT was up to 12.0 mg/g and 7.96 mg/g, respectively, and there was a significant difference between both; Glu in AT and TT had no significant difference. Asp and Ser were detected only in TT. Those findings were similar to those of Zhang [38] but opposite to the results on the levels of WE and TFAAs in AT and TT as reported by Liang et al. [6]. The disagreement might be due to the influence of processing technology. Overall, growth environments and management methods had definite impacts on phytochemicals, resulting in significant differences in WE, TFAAs, four catechin components, two purine alkaloids, and eight free amino acids between AT and TT. Previous studies have also indicated that chemical fertilizers and environment could affect the contents of polyphenols, WE, caffeine, and free amino acids [7,[39][40][41]. This indicates that the above differential components of AT and TT were representative, which could be considered as major components of classifying AT and TT in this study.

Establishing a Discriminant Model through SFDA
SFDA is a popular recognition method by dimensionality reduction based on variance analysis and is one of the most effective methods for feature extraction [42]. To scientifically classify AT and TT, this study extracted key identification indicators by establishing a discriminant model based on 16 phytochemicals. Twenty ATs and forty TTs selected from all samples formed a model set and then the feature components of the model set were extracted, thereby establishing a discriminant model. The results indicated that there were eight phytochemicals that significantly (p < 0.05) affected the discriminant effect, and the order of influence degree was as follows, according to F value: EC > C > theanine > WE > EGC > theacrine > Ala > Arg (Table 4). However, due to the lack of detection of Ala and Arg in some samples, which could affect the accuracy of identification between AT and TT, six phytochemicals were finally used to establish a discriminant model in this study. The model is shown in Equation (5): the unknown sample can be determined as AT if Y > 0; otherwise, it was judged to be TT.

Optimizing Classification Model through DTA
To reflect the classification of samples more intuitively, DTA was used to detect the classification of 80 samples based upon six variables of the discriminant model. As shown in Figure 4, the tree involved a three-level structure with a total of eight decision nodes and four classification rules created by only using three elements. Among these, the decision rules were based on two concentration ranges of EC, four concentration ranges of C, and two concentration ranges of WE, which correctly identified 30 ATs and 50 TTs. According to the DTA results, the discriminant model of AT and TT was optimized. As presented in Equation (6), the unknown sample could be determined as AT if Y > −1; on the contrary, it was judged to be TT.

Evaluating Classification Models
Upon the above two models, this study tried to evaluate the model externally and internally through a training set (10 ATs and 10 TT, n = 20) and model set (20 ATs and 40 TTs, n = 60) to explore the generalization capability of two discriminant models and choose the optimal model. As shown in Table 5, there were no misjudgments in the training set and model set by model calculation and the leave-one-out method (LOO), regardless of Equation (1) or Equation (2), that is to say, the accuracy, precision, recall, and F-score reached 100%. This suggested that the two discriminant models had a good generalization capability [43]. However, the above performance indexes were unable to help us choose the most suitable discriminant model of AT and TT. The distance

Evaluating Classification Models
Upon the above two models, this study tried to evaluate the model externally and internally through a training set (10 ATs and 10 TT, n = 20) and model set (20 ATs and 40 TTs, n = 60) to explore the generalization capability of two discriminant models and choose the optimal model. As shown in Table 5, there were no misjudgments in the training set and model set by model calculation and the leave-one-out method (LOO), regardless of Equation (1) or Equation (2), that is to say, the accuracy, precision, recall, and F-score reached 100%. This suggested that the two discriminant models had a good generalization capability [43]. However, the above performance indexes were unable to help us choose the most suitable discriminant model of AT and TT. The distance measurement could be used to evaluate the separation effect of a model by a minimum distance of two categories: The larger the distance, the easier the classification and a lower error rate [44]. According to Figure 5a,b, the separation effect of AT and TT based on Equation (1) was better than that based on Equation (2) (1.60 > 1.11). This indicates that Equation (1), established by six components, was optimal, which could be used as the discriminant model of AT and TT in this study. measurement could be used to evaluate the separation effect of a model by a minimum distance of two categories: The larger the distance, the easier the classification and a lower error rate [44]. According to Figure 5a,b, the separation effect of AT and TT based on Equation (1) was better than that based on Equation (2) (1.60 > 1.11). This indicates that Equation (1), established by six components, was optimal, which could be used as the discriminant model of AT and TT in this study.  Regarding the six discriminant variables, previous studies have shown that WE is made up of soluble substances such as phenolics and alkaloids, reflecting the thickness of tea infusion [45]. Theanine and nonester catechins (EC, C, and EGC) were related to umami/sweetness and bitterness/astringency, respectively, which were significantly affected by some factors such as geographical environment, light, cultivar, and fertilizer [46,47]. For instance, the climate had an impact on the chlorophyll contents, thereby regulating the contents of nonester catechins; the contents of catechins were higher in northern areas while the contents of free amino acids were higher in the southeast. Theacrine significantly affected the bitter taste of tea infusion and had many excellent pharmacological effects such as sedative and hypnotic [48][49][50]. At present, our laboratory has already published relevant studies on the recognition threshold and leaching rule of theacrine [51]. The results showed that the taste of AT was more bitter than that of TT due Regarding the six discriminant variables, previous studies have shown that WE is made up of soluble substances such as phenolics and alkaloids, reflecting the thickness of tea infusion [45]. Theanine and nonester catechins (EC, C, and EGC) were related to umami/sweetness and bitterness/astringency, respectively, which were significantly affected by some factors such as geographical environment, light, cultivar, and fertilizer [46,47]. For instance, the climate had an impact on the chlorophyll contents, thereby regulating the contents of nonester catechins; the contents of catechins were higher in northern areas while the contents of free amino acids were higher in the southeast. Theacrine significantly affected the bitter taste of tea infusion and had many excellent pharmacological effects such as sedative and hypnotic [48][49][50]. At present, our laboratory has already published relevant studies on the recognition threshold and leaching rule of theacrine [51]. The results showed that the taste of AT was more bitter than that of TT due to its high level of theacrine at the same recognition threshold. Meanwhile, the slow leaching rate of theacrine contributed to the endurance property in brewing AT. Overall, the six components could well reflect the differences between AT and TT, and an in-depth study on flavor contribution will be conducted based on these in the future, thereby providing a reference for regulating the sale of AT.

Conclusions
All in all, according to the 26 phytochemicals determined by spectrophotometry methods and HPLC, the PCA and HCA could divide 80 samples into AT and TT, and ANOVA showed the growth environment and management method caused the significant (p < 0.05) differences of the 16 phytochemicals as the principal factor including WE, TFAAs, four catechin components, two purine alkaloids, and eight free amino acids. Based on the ANOVA results, the discriminant model of AT and TT was eventually established based on six components including WE, EC, C, EGC, theacrine, and theanine by comparing the separation effect of SFDA and DTA. The accuracy, precision, recall, and F-score of the model were up to 100%, which illustrates the good generalization capability of the discriminant model. This study offers data support for Pu-erh tea from different raw materials from the perspective of phytochemical components and an analytical thinking of classification, which achieved a great effect. In future production practice, the classification method could be applied to classify and distinguish unknown samples. In addition, the chemometric analyses will also be a powerful tool in food fraud such as tea origin, storage time, and organic tea.
Supplementary Materials: The following supporting information are available online: https://www. mdpi.com/article/10.3390/foods11050680/s1. Table S1: The main phytochemicals of AT and TT (Unit: mg/g); Table S2: The model scores of AT and TT based on Equations (1)

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.