An Improved Weighted Partial Least Squares Method Coupled with Near Infrared Spectroscopy for Rapid Determination of Multiple Components and Anti-Oxidant Activity of Pu-Erh Tea

Background: Pu-erh tea is a unique microbially fermented tea, which distinctive chemical constituents and activities are worthy of systematic study. Near infrared spectroscopy (NIR) coupled with suitable chemometrics approaches can rapidly and accurately quantitatively analyze multiple compounds in samples. Methods: In this study, an improved weighted partial least squares (PLS) algorithm combined with near infrared spectroscopy (NIR) was used to construct a fast calibration model for determining four main components, i.e., tea polyphenols, tea polysaccharide, total flavonoids, theanine content, and further determine the total antioxidant capacity of pu-erh tea. Results: The final correlation coefficients R square for tea polyphenols, tea polysaccharide, total flavonoids content, theanine content, and total antioxidant capacity were 0.8288, 0.8403, 0.8415, 0.8537 and 0.8682, respectively. Conclusions: The current study provided a comprehensive study of four main ingredients and activity of pu-erh tea, and demonstrated that NIR spectroscopy technology coupled with multivariate calibration analysis could be successfully applied to pu-erh tea quality assessment.


Introduction
China has a long history of using various kinds of tea, such as green, oolong, and black tea, which are produced with different processes [1]. Pu-erh tea is a unique microbially fermented tea, in which during the process of pile-fermentation, ripened pu-erh tea undergoes the enzymatic action of microorganisms in a humid environment [2]. Ripened pu-erh tea is mainly procuded from Camellia sinensis var. assamica. growing in Yunnan Province, located in the southwestern region of China [3]. As a health promoting beverage used for centuries, ripened pu-erh tea is of great value in the prevention of tumors, hyperlipidemia, hyperglycaemia, cardiovascular diseases, constipation, and other functions [4][5][6][7][8].
Bioactive compounds in ripened pu-erh tea, such as tea polyphenols, total flavonoids, tea polysaccharide, had been proven to have various pharmacological effects [8,9]. Some studies have demonstrated that polyphenols (TP) in ripened pu-erh tea had strong potential chemopreventive effects, especially in the prevention of cancer and cardiovascular diseases [10][11][12][13]. Total flavonoid (TF) in ripened pu-erh tea plays a significant role in lowering hyperlipidemia [14]. Tea polysaccharides spectrophotometer. Based on these experimental results, a quantitative model was constructed to rapidly and accurately determine these main components, and their antioxidant activities. Forty Pu-erh tea samples were firstly classified into a calibration dataset (32 samples) and a prediction dataset (eight samples) by using the Kennard-Stone (K-S) algorithm with ratio 0.8. Detailed information about the samples in both groups is listed in Table 1. In this section, we aimed to establish a reliable and accurate calibration model for these main components and the quantitative estimation of their anti-oxidant activities by using the PLS method and the proposed weighted PLS, respectively. During calibration model establishing, some spectra pre-processing methods were also optimized. All results were listed in Tables 2 and 3, respectively. As can be seen from these results, each component has its own suitable pre-processing method, with highest R 2 and lowest RMSEP. In the raw PLS model, the suitable pre-processing methods for tea polyphenol, tea polysaccharide, and total flavonoid content were "smoothing combined with SNV", SNV, and MCS method, respectively. The R 2 cal values for tea polyphenol, tea polysaccharide, and total flavonoid content were 0.9362, 0.9051 and 0.8955, respectively, while the RMSEC values of the calibration models for tea polyphenol, tea polysaccharide, and total flavonoid content were 0.2203, 0.0154, and 0.0987, respectively. The best pre-processing method for total antioxidant capacity is the SNV method, where the R 2 cal value and RMSEC value were 09179 and 0.0346, respectively (Table 2).  Forty Pu-erh tea samples were firstly classified into a calibration dataset (32 samples) and a prediction dataset (eight samples) by using the Kennard-Stone (K-S) algorithm with ratio 0.8. Detailed information about the samples in both groups is listed in Table 1. In this section, we aimed to establish a reliable and accurate calibration model for these main components and the quantitative estimation of their anti-oxidant activities by using the PLS method and the proposed weighted PLS, respectively. During calibration model establishing, some spectra pre-processing methods were also optimized. All results were listed in Tables 2 and 3, respectively.
As can be seen from these results, each component has its own suitable pre-processing method, with highest R 2 and lowest RMSEP. In the raw PLS model, the suitable pre-processing methods for tea polyphenol, tea polysaccharide, and total flavonoid content were "smoothing combined with SNV", SNV, and MCS method, respectively. The R 2 cal values for tea polyphenol, tea polysaccharide, and total flavonoid content were 0.9362, 0.9051 and 0.8955, respectively, while the RMSEC values of the calibration models for tea polyphenol, tea polysaccharide, and total flavonoid content were 0.2203, 0.0154, and 0.0987, respectively. The best pre-processing method for total antioxidant capacity is the SNV method, where the R 2 cal value and RMSEC value were 09179 and 0.0346, respectively ( Table 2). In Table 3, results were obtained based on the Weighted PLS model. The most suitable pre-processing methods for tea polyphenol and total flavonoid content were "smoothing combined with SNV" method. The suitable pre-processing methods for tea polysaccharide and total antioxidant capacity were "smoothing combined with SNV", and MCS method, respectively. Furthermore, comparing with the raw PLS model, the weighted PLS could give a better prediction results for all the components (Table 3). Based on these optimized experimental conditions, the calibration models were constructed, and then the prediction dataset was used to validate these models. These validation results are also listed in Tables 2 and 3. The RMSEP values for tea polyphenols, tea polysaccharide, total flavonoids, and total antioxidant capacity calibration models were 0.3251, 0.0192, 0.1225 and 0.0587, respectively, while the R 2 values for the tea polyphenols, tea polysaccharide, total flavonoids, and total antioxidant capacity calibration models were 0.8288, 0.8403, 0.8415 and 0.8682, respectively. The scatter plots of the calibration models for three main components and total antioxidant capacity, showing corrections between NIR prediction values and reference measurement, are plotted in Figure 2. These results suggest that the established models were accurate, and can schieve the aim of fast quantitative analysis of the main components and activities.

Discussion
In the current study, a weighted PLS method was proposed based on the similarities between variables and responses, and the distance method was adopted to estimate these similarities. The functional groups of substances always have relatively wide absorptions, thus adjacent wavelengths' effects should be taken into accounts. In the proposed method, if a variable is similar to the response, it would be given a larger weight; this principle can be looked as like a variable selection process. That is why the proposed PLS has better performance than the raw PLS method. The authors think this idea can be further developed into a variable selection approach.
Furthermore, this weighted PLS method was used to determine tea polyphenols, tea polysaccharide, total flavonoids, theanine contents, and total antioxidant capacity of pu-erh tea samples by combining it with NIR spectrum data. The established calibration models showed good performances in rapidly and accurately predicting the main components and activity. The proposed method can be used to comprehensively evaluate the quality of pu-erh tea by combining the multiple active constituents and pharmacological activities simultaneously.

Sample Preparations and Reagents
Forty ripened pu-erh tea samples were collected from Pu'er city (Yunnan Province, China). These samples were powdered in a grinder, and passed through a 60-mesh sieve after drying. Powdered samples were labeled, and stored in plastic bags until chemical analysis, in a dry and shady place (25 °C).

Discussion
In the current study, a weighted PLS method was proposed based on the similarities between variables and responses, and the distance method was adopted to estimate these similarities. The functional groups of substances always have relatively wide absorptions, thus adjacent wavelengths' effects should be taken into accounts. In the proposed method, if a variable is similar to the response, it would be given a larger weight; this principle can be looked as like a variable selection process. That is why the proposed PLS has better performance than the raw PLS method. The authors think this idea can be further developed into a variable selection approach.
Furthermore, this weighted PLS method was used to determine tea polyphenols, tea polysaccharide, total flavonoids, theanine contents, and total antioxidant capacity of pu-erh tea samples by combining it with NIR spectrum data. The established calibration models showed good performances in rapidly and accurately predicting the main components and activity. The proposed method can be used to comprehensively evaluate the quality of pu-erh tea by combining the multiple active constituents and pharmacological activities simultaneously.

Sample Preparations and Reagents
Forty ripened pu-erh tea samples were collected from Pu'er city (Yunnan Province, China). These samples were powdered in a grinder, and passed through a 60-mesh sieve after drying. Powdered samples were labeled, and stored in plastic bags until chemical analysis, in a dry and shady place (25 • C). Sodium carbonate (Na 2 CO 3 ), sodium nitrite (NaNO 2 ), aluminium chloride (AlCl 3 ), sodium hydroxide (NaOH), phenol and concentrated sulfuric were purchased from Sinopharm (Guoyao Co. Ltd., Shanghai, China). 1,1-Diphenyl-2-picrylhydrazyl (DPPH), glucose, and Trolox were purchased from Sigma Company (St. Louis, MO, USA). Rutin and gallic acid used as standard materials were purchased from the China Food and Drug Inspection Institute (Beijing, China) and Hunan Institute for Food and Drug Control (Changsha, China), respectively. L-Theanine (99% purity) was purchased from Beijing Century Aroke Biotechnology Co. Ltd. (Beijing, China). HPLC-grade methanol (Hanbang Chemicals Co. Ltd., Suzhou, China), and phosphoric acid (Kermel, Tianjin, China) were also used. Deionized water was purified with a MillI-Q system (Millipore, Bedford, MA, USA).

Analysis Main Components and Their Anti-Oxidant Activity
The main compounds i.e., tea polyphenols, tea polysaccharide, total flavonoids, theanine content, are considered the most important anti-oxidants in tea. The DPPH assay is one of the most widely used methods in natural antioxidant studies. The absorbance of DPPH solutions was determined before and after reaction with plant antioxidants, and the absorbance decrease was used to estimate the reduction capability of DPPH radical [33]. Thus, the main active components in Pu-erh tea, and total antioxidant activity were determined (Supporting Information, Table S1).

Extraction and Determination of Total Flavonoids
For the determination of total flavonoids, ultrasonic extraction was adopted using the method described in previous studies [34,35]. The solid-liquid ratio was controlled from 1 to 25 after adding 70% of ethanol, then sonication was implemented at 70 • C for 40 min. Consequently, the supernatant was diluted into a 50 mL volumetric flask after vacuum filtration. A 1 mg/mL standard solution of rutin was prepared as reference material. This rutin solution was then diluted to different concentrations (100, 200, 300, 400, 500 µg/mL) to calculate its calibration curve and verify its linearity. The content of total flavonoids was detected using the aluminum chloride method [36]. One mL pu-erh tea extract solution and 1 mL blank solution (70% ethanol in water) were added into two 10 mL test tubes, respectively. 5% sodium nitrite solution was added into the tubes equally, and after 6 min 0.5 mL of 1% aluminum chloride was added, and then the samples were shaken and stored for 6 min. The absorbency at a wavelength of 510 nm was measured after adding 4 mL of sodium hydroxide solution (1 mol/L). Finally, results were expressed as grams of rutin equivalent/100 g dry weight (g rutin/100 g tea extract). All samples were analyzed in triplicate.

Extraction and Determination of Tea Polysaccharide
The determination of tea polysaccharide was performed by ultrasonic extraction referring to previous research [17]. After controlling the solid-liquid ratio at 1 to 15 by adding distilled water, sonication was implemented at 50 • C for 40 min. Next, the supernatant was diluted into a 25 mL volumetric flask after vacuum filtration. Ten mL of absolute ethanol were added to 2 mL of mixture, and then shaken. After leaving the solution to stand overnight, the solution was centrifuged (4000 rpm/min) for 30 min. The supernatant was discarded and the filtrate was diluted with water, and the dissolved filtrate was transferred into a 10 mL volumetric flask. A 1 mg/mL standard solution of glucose was prepared as reference solution. It was diluted to different concentrations within the range of 10, 20, 30, 40, 60, 80, 100 µm/mL to prepare a linear calibration curve. The content of tea polysaccharide was detected by the phenol-vitriol colorimetric method [37]. One mL of pu-erh tea extract solution and 1 mL of blank solution were added into two 10 mL test tubes, respectively. Then, 1.0 mL of 5% phenol solution was added into the tubes which then were shaken for 30 s. The mixture was shaken again for 30 s to ensure full mixing after adding 5.0 mL of concentrated sulfuric acid. the Absorbency was measured at the wavelength of 485 nm after successively heating the samples by putting them in a boiling bath for 20 min, and then cooling by putting them in a cold bath for 5 min. The content was expressed as grams of glucose equivalent/100g dry weight (g glucose/100g tea extract). All samples were run in triplicate.

Extraction and Determination of Tea Polyphenols
According to national standard method [12], the optimal approach after optimizing the extraction conditions of tea polyphenols is as follows [38]. An accurately weighed, powdered sample (1.00 g) was transferred to a 100 mL conical flask; 70% ethanol (60 mL) was added; the sample was extracted at 70 • C for 35 min. Subsequently, the sample was cooled to room temperature and vacuum filtered. Filtrates were diluted to volume in a 100 mL volumetric flask using ethanol-water solution; then stored in a refrigerator at 4 • C for tea polyphenols determination. For determining the tea polyphenols, the Folin-Ciocalteu reagent was applied, which has been widely acknowledged as a national standard method for the determination of polyphenol substances. Gallic acid was applied in this determination and the detailed procedure is as follows. The diluted (10 times) Folin-Ciocalteu reagent (5 mL) was added and then samples were shaken. The tubes were allowed to equilibrate at ambient temperature (25 ± 0.2 • C) for 4 min, and then 4 mL of 7.5% (w/v) sodium carbonate was added with a 20 s vortexing step. Finally we measured the absorbency under the maximum absorption wavelength of 747 nm after storing in a cool and dark place for 60 min. The TPC was expressed as grams of gallic acid equivalent (GAE)/100 g material using a gallic acid calibration plot, where the standard curve of gallic acid ranged from 10-80 µg/mL (R 2 = 0.9998). All samples were tested in triplicate.

Preparation and Determination of Theanine Content
Powdered sample (1.0 g) was extracted with water (50 mL) in a 100 mL conical flask. The mixture was sonicated for 30 min at the extraction temperature of 60 • C, and the supernatant was filtered under vacuum after cooling to room temperature. Subsequently, the solution was filtered through a 0.45 µm membrane before HPLC analysis. In order to obtain the linear range of quantification, a stock standard solution of L-theanine (6 mg/mL) was prepared with water and stored at 4 • C, and finally the standard curve of L-theanine in the range from 0.05-2 mg/mL was determined.
To determine the content of L-theanine, a RP-HPLC method was performed by using an Agilent 1260 series instrument (Palo Alto, CA, USA). A reverse-phase Agilent TP-C18 column (4.6 mm × 250 mm, 5 µm) was used. The column temperature was maintained at 30 • C. The injected volume was 10 µL and the flow rate was 1.0 mL/min. The mobile phases which were used for gradient HPLC were (A) 0.1% phosphoric aqueous acid and (B) methanol. The detailed gradient was given as follows: 2% B at 0-8 min, 2-95% B at 8-10 min, 95% B at 10-20 min, and 95-2% at 20-25 min. The DAD detection was performed at 210.4 nm. The determination of L-theanine in tea samples is calculated according to the chromatographic peak area of the L-theanine standard substance.

TAA-DPPH Assay
Total antioxidant activity in pu-erh tea was assessed using a 1,

NIR Spectroscopy Measurements
The NIR spectra were obtained in diffuse reflectance mode using an Antaris II FT-NIR spectrophotometer (Thermo EIectron Co., Waltham, MA, USA) equipped with an integrating sphere. The spectra (32 scans, 8 cm −1 resolution, 4000 to 10,000 cm −1 , and 1557 points/spectrum) were collected in the log (1/R) mode (R = the reflectance value). Dry Pu-erh sample (0.5 g) was added into the sample cup, each sample was measured three times, and the mean of the three spectra was used for further statistical analysis. The temperature was kept at 25 • C.

Weighted Partial Least Squares Model
The partial least squares regression (PLSR) method is widely used to establish calibration models. PLS models can decompose both independent variable (X) and dependent variable (Y), simultaneously, and aim to find the latent variables in X to predict the latent variables in Y.
Let X be the m × p data matrix, whose rows and columns correspond to samples and variables (spectral intensity in correspond wavelength), respectively, and Y matrix is a m × k response vector. PLS aims to find a weight vector w ∈ R p , which can maximize the covariance between the independent variables X, and the corresponding dependent variable Y. X and Y can be further decomposed by the PLS method as follows: where the latent variables T, U are extracted from X, Y, and, P, Q are the score matrices. E and F are the residual matrices. The desired weight vector can be represented as: From Equation (2), we can see that the weight vectorŵ reflects the relationships between the response y and each X variable X 1 , . . . , X p . Therefore, the computation of weight directly affects the final PLS model. Furthermore, if a variable X i is very similar to the response Y, it should be assigned a large weight value W i . There are many approaches to estimate the similarities between variables and response; among them, distance is a simple and effective mode.
Based on this idea, we proposed a new weight partial least squares to establish the quantitative model, in which the weights of variables were adjusted according to the distance between the variable and response. Instead of using the single variable to calculate the distance, the adjacent variables' effects were also taken into account (as the functional groups always have absorptions within relatively short wavelength bands). By using the adjusted distance, the adjacent wavelengths' effects can be balanced, these can be defined as: D1 = dist(y, X i ) = ( y − x i ) T D2 = dist(y, X (i−l,i+l) ) = ( y − x 1 , . . . , y − x p ) T D = D1 + D2 Using the following regularization framework: w = argmin w y − Xw 2 +δ Dw 2 (4) where δ is the regularization parameter, and D is a diagonal regularization matrix. The first term in Equation (4) ensures that the solved weight vector keeps the relations between Y and X, while the second term penalizes the variables with a large distance to the response; moreover, if these adjacent wavelength brands are also far from the response, this will aggravate the penalties i.e., lower weights will be given to these variables. In the new algorithm, the regularization parameter δ was pre-set to 1, the width of wavelength l for each side was also pre-set to 5 empirically. For the first and last l variables, i + 10 and i − 10 wavelengths were used to calculate the distances, respectively.
Supplementary Materials: The following are available online. Figure S1: Chromatography of Theanine standard and one tea sample, Table S1: The main components of 40 pu-erh tea samples, and their anti-oxidant activity. Author