NIR Spectrometric Approach for Geographical Origin Identification and Taste Related Compounds Content Prediction of Lushan Yunwu Tea

Lushan Yunwu Tea is one of a unique Chinese tea series, and total polyphenols (TP), free amino acids (FAA), and polyphenols-to-amino acids ratio models (TP/FAA) represent its most important taste-related indicators. In this work, a feasibility study was proposed to simultaneously predict the authenticity identification and taste-related indicators of Lushan Yunwu tea, using near-infrared spectroscopy combined with multivariate analysis. Different waveband selections and spectral pre-processing methods were compared during the discriminant analysis (DA) and partial least squares (PLS) model-building process. The DA model achieved optimal performance in distinguishing Lushan Yunwu tea from other non-Lushan Yunwu teas, with a correct classification rate of up to 100%. The synergy interval partial least squares (siPLS) and backward interval partial least squares (biPLS) algorithms showed considerable advantages in improving the prediction performance of TP, FAA, and TP/FAA. The siPLS algorithms achieved the best prediction results for TP (RP = 0.9407, RPD = 3.00), FAA (RP = 0.9110, RPD = 2.21) and TP/FAA (RP = 0.9377, RPD = 2.90). These results indicated that NIR spectroscopy was a useful and low-cost tool by which to offer definitive quantitative and qualitative analysis for Lushan Yunwu tea.


Introduction
Tea is one of the three greatest non-alcoholic beverages in the world, one that is preferred by many consumers and is particularly popular in China and Japan [1]. Tea contains many secondary metabolites, such as polyphenols, amino acids, polysaccharides, alkaloids, and terpenes, which are closely related to its quality and contribute to its rich taste, flavor, and health benefits [2]. It is generally known that the quality of tea is influenced by diverse factors, such as variety, original environment, plucking time, processing technology, and storage method [3,4]. Among them, the origin of tea is considered to be one of the key factors that directly determine its quality. There was an old Chinese saying, "Where there's cloud and mist, there's bound to be good tea". It was beneath the natural wonder of tremendous clouds over Lushan Mountain that Lushan Yunwu tea was created. The mild and rainy climate is conducive to the accumulation of organic matter, such as amino acids and caffeine, so the tea has a delicate aroma and a mellow, fresh, and sweet flavor [5]. Lushan Yunwu tea was first produced in the Han Dynasty (202 BCE-220 CE), with a history of over 1800 years; since the Song Dynasty (960-1279), Lushan Yunwu was listed as the emperor's and the imperial family's tribute tea. In recent years, Lushan Yunwu tea was increasingly sought by tea tasters all over the world because of its unique geographical location and taste. Currently, Lushan Yunwu tea is included in the register of Products of Geographical Indication in China (GB/T 21003-2007).
Tea polyphenols (TP), total free amino acids (FAA), and their ratio of TP/FAA may be reliable taste quality indicators of tea flavor [5,6]. Generally, a high-quality green tea requires: (1) a proper concentration of TP to give the essential astringency; (2) a high concentration of FAA, which mainly contributes to the umami taste; and (3) an optimal balance between TP and FAA for a particular taste or pleasant flavor [1,6]. Research showed that the total phenolic contents of Lushan Yunwu tea were higher than those of other teas [7]. However, in recent years, many products of poor quality but similar in appearance to Lushan Yunwu tea have appeared in the market, pretending to be Lushan Yunwu tea, which has had a substantial impact on the sales price and market reputation of Lushan Yunwu tea. Therefore, the quality evaluation of Lushan Yunwu tea is an important topic, especially in preventing frauds that may affect the health of customers, or that may cause economic losses to producers and customers.
Traditionally, the authenticity identification and quality evaluation of tea mainly rely on the sensory evaluation of professional tea tasters. Although this method is relatively classic, the results are easily affected by factors such as the reviewer's sensory organs and are highly subjective. The possible biases and changes in perception of the method make the results inaccurate and non-repeatable [8,9]. In addition, sensory evaluation cannot help us to know the contents of the main ingredients in tea. Ultraviolet-visible spectroscopy (UV-vis) [10,11], liquid chromatography mass spectrometry (LC-MS) [3,12], gas chromatograph-mass spectrometry (GC-MS) [13,14], and other technologies [8] can be used for the authenticity detection and quality determination of tea. These methods can accurately detect the chemical components that affect the quality of tea [9]. Nevertheless, most of these methods are prohibitively complicated, time-consuming, costly, and use a wide range of toxic solvents, so they are not suitable for the rapid testing of tea [9]. As an alternative, a variety of non-invasive detection technologies have been applied to rapidly evaluate the quality of tea, such as near-infrared spectroscopy technology (NIR) [15,16], hyperspectral imaging technology (HIS) [17], the E-nose [18], the E-tongue [19] and computer vision [20]. NIR technology is a mature testing technique with numerous advantages, such as being rapid, low-cost, nondestructive, and environmentally friendly [15,21,22]. Rapid identification methods for Anji white tea were developed using NIR spectroscopy and chemometric-class modeling techniques [21]. The same method has also been demonstrated to be an effective method for the identification of Darjeeling PGI tea [22]. Furthermore, NIR, when combined with multivariate calibration methods, has been shown to exhibit excellent analytical performance in predicting the main contents of tea leaves, including polyphenols, catechins, caffeine, and theanine [23,24]. In a recent study, Wang et al. used a micro NIR spectrometer to distinguish tea types (black, green, yellow, and oolong teas) and to predict catechins, caffeine, and theanine in four types of teas, and obtained satisfactory results [15]. However, there are few research reports on the simultaneous authenticity identification and taste quality assessments of tea, especially of Lushan Yunwu tea, a traditional and famous green tea. Therefore, the main purpose of this work was to tackle the adulteration identification and quality prediction of Lushan Yunwu tea samples by the application of diffuse reflectance NIR spectra. The specific objectives of this study were: (1) to identify the authenticity of the origin of Lushan Yunwu tea; (2) to rapidly predict the taste quality indicators of Yunwu green tea (including TP, FAA, and TP/FAA) simultaneously by NIR; (3) to compare the performance of different spectral preprocessing methods and characteristic wavelength (or wavenumber) selection methods, in improving the accuracy of the prediction models.

Sample Preparation
Fifty-six Lushan Yunwu tea samples (LY) were obtained from various origins in Jiujiang, Jiangxi Province, China. Another 30 non-Lushan Yunwu tea samples (NLY) were collected in Fujian, Guangxi, and Sichuan provinces; these were processed using the same stir fixation as Lushan Yunwu tea. The details of the process and the representative images of tea samples are shown in Table S1 and Figure S1a of the Supplementary Materials. After being ground down for 30 s using a laboratory pulverizer (JYL-CO12, Joyoung Company Limited, Jinan, China) and passed through a 40-mesh sieve, all samples were individually packaged in light-proof sealed bags and stored at 4 • C before analysis.

Spectra Acquisition
Referring to the previous study [25] with slight modifications, the NIR spectra of tea samples were collected using a Nicolet 5700 FTIR spectrometer (Thermo Electron Corp., Madison, WI, USA). Briefly, about 1.0 g of tea powder was transferred into a standard quartz bottle (inner diameter, ∼10 mm) for testing. The NIR spectra were measured in the wavenumber range of 12,000-4000 cm −1 by using an average spectrum of 64 scans with a resolution of 8 cm −1 .

Chemical Analysis
The TP content was determined using the Chinese standards method (GB/T8313-2018) with slight modifications. A 0.2 g sample of powdered tea was extracted with 5 mL of 70% (v/v) methanol solution at 70 • C for 10 min (stirred at 5 min intervals). The extract was left to cool to room temperature, then it was centrifuged at 3500 rpm for 10 min. The extraction process was repeated, and the supernatants were combined. The volume of the extract was adjusted to 10 mL with 70% (v/v) methanol and was filtered through a 0.45-µm microporous filter. The TP concentration was determined using the Folin-Ciocalteu reagent reaction, with a detection wavelength of 765 nm. The calibration standard was gallic acid and the results were expressed as gallic acid equivalents (GAE), in mg GAE/g dry matter.
The FAA contents were reference-measured using ninhydrin detection, as described in the Chinese standards method (GB/T8314-2013) with slight modifications. A mixture of tea powder (0.6 g) and 85 mL boiling water was placed in a boiling water bath for 45 min (stirred every 10 min). After that, the extract was immediately filtered under reduced pressure and washed with a little boiled water. Then, the filtered supernatant was adjusted to 100mL. First, 1 mL tea extract was mixed with 0.5 mL 1/15 mol/L phosphate buffer solution (pH 8.0) and 0.5 ml 2% ninhydrin (containing 0.8 mg/mL SnCl 2 ·2H 2 O). The solution mixture was heated at 90 • C for 15 min, then the volume of the flask was adjusted to 25 mL after cooling down. The absorbance of the reaction solution was detected at 570 nm. The calibration standard was theanine and the results were expressed as theanine equivalents (TE), in mg TE/g of dry matter. TP/FAA was estimated as the TP content, divided by the FAA content.

Spectral Preprocessing
The tea samples were affected by external conditions during the NIR spectrum collection process, including sample state, particle size, baseline change, sample compactness difference, measurement environment difference, etc., which could cause various noises and information errors [26]. Therefore, before the multivariate statistical analysis took place, the spectral data needed to be preprocessed to reduce system noise and enhance the spectral features [27]. Multiplicative signal correction (MSC) is a type of widely applied preprocessing method for light-scattering correction, which is used to regulate the addition and multiplication effects in the spectrum on the foundation of different particle sizes [28]. Similar to MSC, the standard normal variate (SNV) is another mathematical transformation method for light-scattering correction; it is mainly used to eliminate the noise caused by optical path differences and uneven solid particles [29]. The first derivative (1st) and second derivative (2nd) can effectively eliminate the baseline shift and rotation error of the spectra [30]. However, the derivative transform promoted an emphasis on the level of noise while emphasizing the features of the spectral data, so it is often used in conjunction with a smoothing filter method, such as a Savitzky-Golay (SG) smoothing filter [25].

Multivariate Analysis
Tea samples were divided into two groups: three-quarters of the samples were used as the calibration set to develop DA and PLS models, and the remaining quarter of the samples was used as the prediction set to evaluate the performance of the models [31]. To avoid bias in sample selection, the sample data set was split following two principles: (1) regardless of whether the calibration set or prediction set was used, samples from every origin were included; (2) the samples containing extreme values for TP, FAA, and TP/FAA had to be included in the calibration set.

Discrimination Analysis (DA)
The discrimination between Lushan Yunwu tea samples (LY) and non-Lushan Yunwu tea samples (NLY) was carried out using a discrimination analysis (DA) model. DA is a supervised classification technique with predefined groups. An unknown sample was assigned by calculating its Mahalanobis distance from the center of gravity of each group [25,32]. The greater the Mahalanobis distance between two given groups, the greater the spectral difference between them.

Partial Least Squares (PLS)
The contents of TP, FAA, and TP/FAA in tea samples were predicted using the partial least squares (PLS) model. PLS projects the predictor variables and the observed variables into a new space and then captures several latent variables (LVs) that can represent the majority of raw information, using them to establish a linear regression model. The optimal number of factors was selected according to the residual sum of squares (PRESS) during the Leave One Out Cross-Validation (LOOCV) of the calibration models [32]. The factors where the first minimum value of PRESS appears is the optimal number of factors. The computed correlation coefficients in the calibration sets (R C ) and prediction sets (R P ), and their corresponding root mean square error of calibration (RMSEC) and prediction (RMSEP), along with the residual predictive deviation (RPD), were used to estimate and verify the accuracy of the developed models [33,34]. The RPD was calculated using the ratio between the standard deviation of the reference values and the RMSEP in the prediction set. In addition, the root mean square error of cross-validation (RMSECV) was used as a diagnostic value for model robustness. Good models have higher R (close to 1) and RPD (>2) but lower RMSE, and the RMSEP is lower than the RMSECV [35][36][37].

Synergy Interval Partial Least Squares (siPLS)
The siPLS model is an effective characteristic variable selection algorithm proposed on the basis of interval partial least squares (iPLS) [38]. The entire spectra are equally divided into several equal intervals, then the different intervals are combined with each other. Then, the interval combination with the highest correlation coefficient and the lowest root mean square error of cross-validation (RMSECV) is selected. The combined intervals represent those spectral variables of the full spectra that are particularly relevant to the target parameter in the analysis.

Backward Interval Partial Least Squares (biPLS)
The biPLS model is another characteristic variable selection method developed on the basis of iPLS [39]. The data set is divided into a given number of interval intervals of equal length and calculates RMSECV with each interval left out, leaving out one interval at a time [40]. After this process, the most relevant intervals are left behind.

Software
The spectral preprocessing and DA and PLS models were developed using the TQ Analyst (Thermo Nicolet Corporation, Madison, WI, USA). The siPLS and biPLS algorithms were carried out using the MATLAB R2020a software (MathWorks Inc., Natick, MA, USA).

Spectrum Description
The raw NIR spectra of tea samples in the 12,000-4000 cm −1 frequency regions are shown in Figure 1a. The hydrogen-containing groups of organic matter in tea (such as C-H, O-H, S-H, and N-H, etc.) can produce multiple-frequency and combined-frequency absorptions in the near-infrared region, which are mainly related to the water and polysaccharides, polyphenols, amino acids, caffeine, and proteins in the tea [41]. The two weak peaks at 4258 cm −1 and 4327 cm −1 may be related to the second overtone of C-H bending and -CH 2 bending, respectively [42]. The first overtone of C-H stretching was likely to appear near to 5780 cm −1 . The absorption peak around 4650 cm −1 was related to the combination of N-H bending and C=O stretching [42]. Another absorption peak appeared at around 6700 cm −1 , corresponding to the presence of the N-H group, which was attributed to free amino acids [41]. A wide and weak peak that appeared at 8600-8000 cm −1 may be associated with the C-H stretching of CH 2 and CH 3 groups, which was related to the presence of tea polyphenols [41]. In addition, since dry tea leaves generally contained 4-7% (w/w) of moisture, the absorption bands near 5160 cm −1 and 7000 cm −1 were also apparently displayed, which arose from the combination of O-H stretching and H-O-H deformation [25]. For some other wavenumber ranges (e.g., 12,000-10,000 cm −1 ), the spectrogram resembled noise and contained little spectral information on the active components. Figure 1a showed similar trends in all tea samples over the entire wavenumber range. Therefore, data preprocessing methods were generally used to eliminate noise and make the spectral features apparent after pretreatments, for example, in the case of the MSC (Figure 1b).

Authentication of Lushan Yunwu Tea
The representative images of tea samples are presented in Figure S1a of the Supplementary Materials, and it can be seen that it is impossible to clearly distinguish between the LY and NLY samples. The differences in geographic location cause differences in the chemical composition of the tea leaves. Figure S1b showed the descriptive statistics for the taste quality indicator of tea samples. It showed that LY samples contained more polyphenols and lower numbers of free amino acids than the NLY samples. However, no statistically significant differences were found in TP and FAA contents between LY and NLY samples (p > 0.05). Although TP/FAA showed significant differences between LY and NLY (p < 0.05), it was still difficult to distinguish LY and NLY samples solely by chemical composition analysis because of some overlapping ranges.
In recent years, NIR spectroscopy has been widely used in the field of food authenticity identification because of its simplicity and convenience, so we used NIR technology to differentiate the samples of LY and NLY. To visualize the NIR data, a PCA model was used to analyze the natural distribution of the samples (Figure 2a). It was found that most of the LY and NLY samples could be distinguished in this unsupervised model. To further distinguish the two groups and to achieve authenticity prediction, the DA model was used. Models with high classification accuracy were the most suitable for discrimination. It could be seen that models discriminating green tea origins with a degree of accuracy ranged from 62.79% to 100% when the DA models were developed using full-spectra data ( Table 1). The first 9 PCs were utilized in the DA model, which covered the most variations (>62.49% of the total variance) contained in the spectral data. Interestingly, the application of spectral preprocessing did not improve the prediction accuracy and stability of the DA model. Generally, this result could be due to too many latent variables being included and may lead to overfitting of the calibration [25]. Thus, the wavenumber ranges needed to be selected in this study. A spectral wavenumber range from 8000 to 4000 cm −1 was manually selected for the models (range 1), which has been reported to include most of the spectral information that reflects the metabolite components in tea [25]. Another manual selection of regions was conducted, based on the derivative transformed spectra, including 9700-8600 cm −1 , 7400-6800 cm −1 , and 5600-4000 cm −1 (range 2). The performance of the DA models, established with different selected wavenumber ranges and preprocessing methods, is shown in Table 1.
Compared to the DA models built using full spectral wavenumbers, DA models with manual wavenumber selections were more efficient in terms of accuracy and robustness, especially when based on the wavenumber range 2. The DA models with MSC and SNV pretreatment performed best using the wavenumber ranges of 9700-8600, 7400-6800, and 5600-4000 cm −1 . The Mahalanobis distance plot of every sample to the center of gravity of two classes, established by a DA model with MSC pretreatment, is shown in Figure 2b. The LY samples were completely separated from the NLY tea group, which indicated that NIR, combined with the DA model, may be a potential method for the authenticity discrimination of Lushan Yunwu tea with sufficient sensitivity. As far as we know, this was the first time that near-infrared technology was used to identify and predict the authenticity of Lushan Yunwu tea and obtained satisfactory results. It was worth noting that the derivative transform pretreatment did not play a helpful role in optimizing this model, which may be because the derivative transform promoted an emphasis on the level of noise while emphasizing the features of the spectral data.

PLS Models for TP, FAA, TP/FAA Prediction
TP contributes to the essential astringency in green tea, while FAA mainly contributes to the mellowness and umami of green tea; these two components are often used as important indicators to evaluate the quality of tea [23,43]. TP/AA reflects the relationship between the umami taste and astringency and is used to reflect the quality of tea in previous studies [33]. In this study, the three indicators of TP, FAA, and TP/FAA were predicted using NIR technology. PLS models were developed with raw and preprocessed spectral data to predict TP, FAA, and TP/FAA in different tea samples (Table 2). For TP prediction, the raw-PLS model with the full-spectra data achieved an acceptable performance, with R C = 0.9303 and RMSEC = 8.05 in the calibration set, and R P = 0.8546, RMSEP = 14.2 in the prediction set. Compared with the performance of the original data, the performance of the models after MSC and SNV spectral pretreatment had been improved, with higher R P and lower RMSEP. However, other methods of spectral preprocessing did not play an effective role. In the case of FAA, the performance of all PLS models was relatively poor (R C , R P < 0.9), indicating that all PLS models based on full-spectra data were not ideal for predicting free amino acid content. For TP/FAA, the MSC-PLS model performed best, with the highest R P = 0.8430, RPD = 1.88 and the lowest RMSEP = 0.593. It showed that the prediction of TP/FAA by the PLS model, based on the full-spectra data, was also unsatisfactory. As in the DA models, the derivative transformation preprocessing did not play an effective role in the optimization of PLS models for prediction in tea samples. Compared with polyphenols, the values of amino acid contents were much lower; this may explain why the prediction accuracy of FAA and TP/FAA was low [15]. In addition, the full spectrum contained redundant information that had nothing to do with amino acids, which could reduce the prediction accuracy [15]. Therefore, a variable filter was needed to optimize the PLS models.

Prediction Models Based on the Manual Selected Wavenumber Range
In order to remove redundant information that had nothing to do with the tea flavor components, the PLS models were optimized by manually selected wavenumbers. As noted in Section 3.2, two manually selected wavenumber ranges were applied ( Table 2). The models established with manually selected wavenumber ranges achieved better results than those that were performed based on the full-spectra region. However, regardless of the wavenumber range and the spectral preprocessing method that was used, the PLS models concerning FAA did not achieve a good performance, with R P values lower than 0.9 and RPD values lower than 2.0. These results indicated that the wavenumber selection range needs to be optimized further.

Prediction Models Based on the siPLS and biPLS Algorithms
The siPLS and biPLS algorithms were used to select wavenumber variables that were closely related to the tea taste quality indicators. The optimal model results for the TP, FAA, and TP/FAA of tea can be seen in Table 3. Firstly, the full-spectra data were equally split into 10 or 20 sub-intervals, as in the siPLS modeling. The siPLS algorithm selected optimal sub-interval combinations for TP, FAA, and TP/FAA with the lowest RMSECV values (Figure 3a-c). The wavenumber ranges of 7204.87-6807.60 cm −1 and 4798.10-3999.70 cm −1 that were selected for TP corresponded to the C-H stretching and C-H deformation combinations [44]. The optimal sub-intervals were 6402.61-5604. 21 (Table 3 and Figure S2 in the Supplementary Materials). After using the siPLS algorithm to filter wavenumbers, the R P for TP, FAA, and TP/FAA prediction increased from 0.8546, 0.8490, and 0.8089 to 0.9407, 0.9110, and 0.9377, and the RPD values improved from 1.91, 1,62, and 1.73 to 3.00, 2.21, and 2.90, respectively. These proved the effectiveness of the siPLS model in selecting a few spectral variables from the optimized combination of sub-intervals, to determine the target of interest [45]. Table 2. The performance of partial least squares (PLS) models with different spectral preprocessing approaches for the prediction of total polyphenols content (TP), free amino acids content (FAA), and polyphenols-to-amino acids ratio (TP/FAA), based on different wavenumber ranges. Abbreviations: R C , correlation coefficients of calibration; (R P ) correlation coefficients of prediction; RMSEC, root mean square error of calibration; RMSECV, root mean square error of cross validation; RMSEP, root mean square error of prediction; RPD, residual predictive deviation. Table 3. The performance of PLS models, with different characteristic wavenumber selection procedures for the prediction of polyphenols, free amino acids content, and the polyphenols-to-amino acids ratio.  Compared with siPLS, biPLS is also a method for the joint modeling of several subintervals for screening correlations, but the biPLS algorithm has the advantage of filtering only the characteristic wavenumber sub-intervals in the backward direction; it rejects the sub-intervals with the worst correlation each time [40]. Removed permanently, the last sub-intervals that are left are the spectral variables that have the highest correlation with the tea flavor components. The performance of biPLS models for TP, FAA, and TP/FAA are shown in Table 3 and Figure S3 (Figure 3e); the model prediction accuracy was significantly higher than the original PLS model prediction accuracy. The R P (from 0.7619 to 0.9492) and the RPD (from 1.62 to 2.07) increased, and the RMSEP decreased from 6.79 to 5.31. The biPLS algorithm selected 1013 relevant variables to predict the TP/FAA (Figure 3f). Compared with the original PLS model, based on the full-spectra data, the correlation coefficient R P of the model in the prediction set was significantly improved (from 0.8089 to 0.9303). However, the RMSECV values of the biPLS models were all greater than those in the corresponding siPLS models, indicating that the siPLS models were more robust for predicting TP, FAA, and TP/FAA. The spectral regions selected by the siPLS and biPLS methods had an abundant overlap. The region that was referred to by the wavenumbers of 4798.10-3999.70 cm −1 was selected for TP in both algorithms, which included a combination of C-H stretching and C-H deformation, and the second overtone of C-H deformation. This spectral region was associated with the reported presence of tea polyphenols and caffeine and contained an important spectral region for identifying special-grade green tea from other grades of green tea [46]. Two regions, 6402.61-5604.21 and 5199.23-4801.96 cm −1 , were important for building the FAA prediction model. The 5199.23-4801.96 cm −1 region reflected the -NH group co-frequency information and 6402.61-5604.21 cm −1 contained overtones of both the -CH 2 (5750 cm −1 ) and -CH 3 (5800 cm −1 ) [47], which were related to the presence of free amino acids. One finding of note was that the model performance of FAA was worse than that of TP and TP/FAA in all PLS models. Wang et al. also found that the prediction performance of the PLS model for the theanine content in tea was not satisfactory [15]. This may be because the relationship between free amino acids and near-infrared spectroscopy may be non-linear; the non-linear algorithms may be more appropriate to improve the model prediction accuracy of NIR spectroscopy for free amino acids in future studies.
Compared with the PLS model established by the full-spectra data or manually selected wavenumber range, the prediction performance of the PLS model, established after using the siPLS and biPLS algorithms to select the characteristic wavenumber range, had been greatly improved, especially for FAA and TP/FAA prediction. As observed in this study, the combination of siPLS and biPLS with NIR could greatly improve the prediction of flavor components in Yunwu green tea.

Conclusions
In this study, the general prediction models for TP, FAA, and TP/FAA in Lushan Yunwu tea and non-Lushan Yunwu tea were developed using NIR, combined with multivariate analysis. The NIR spectra of LY and NLY could be distinguished according to geographical origin, while the correct discrimination rate of DA models could reach 100%. The effects of different spectral preprocessing and wavenumber selection on the performance of PLS prediction models for TP, FAA, and TP/FAA were compared. The siPLS model achieved satisfactory performance for TP and TP/FAA predictions, with RP values higher than 0.9 and RPD values higher than 2.0. The results showed that the combination of NIR and intelligent variable selection algorithms (such as biPLS and siPLS) could achieve the rapid prediction of taste quality indicators in tea. Therefore, NIR spectroscopy may be a green analysis tool that can predict the taste quality indicators in tea while identifying the authenticity of the tea.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/foods11192976/s1, Figure S1. Boxplots of TP, FAA, and TP/FAA contents in the LY and NLY samples; Figure S2. Correlation between the measured and predicted values of siPLS models; Figure S3. Correlation between the measured and predicted values of biPLS models; Table S1. A summary of the tested samples.