Digital Prediction of the Purchase Price of Fresh Tea Leaves of Enshi Yulu Based on Near-Infrared Spectroscopy Combined with Multivariate Analysis

In this study, near-infrared spectroscopy (NIRS) combined with a variety of chemometrics methods was used to establish a fast and non-destructive prediction model for the purchase price of fresh tea leaves. Firstly, a paired t-test was conducted on the quality index (QI) of seven quality grade fresh tea samples, all of which showed statistical significance (p < 0.05). Further, there was a good linear relationship between the QI, quality grades, and purchase price of fresh tea samples, with the determination coefficient being greater than 0.99. Then, the original near-infrared spectra of fresh tea samples were obtained and preprocessed, with the combination (standard normal variable (SNV) + second derivative (SD)) as the optimal preprocessing method. Four spectral intervals closely related to fresh tea prices were screened using the synergy interval partial least squares (si-PLS), namely 4377.62 cm−1–4751.74 cm−1, 4755.63 cm−1–5129.75 cm−1, 6262.70 cm−1–6633.93 cm−1, and 7386 cm−1–7756.32 cm−1, respectively. The genetic algorithm (GA) was applied to accurately extract 70 and 33 feature spectral data points from the whole denoised spectral data (DSD) and the four characteristic spectral intervals data (FSD), respectively. Principal component analysis (PCA) was applied, respectively, on the data points selected, and the cumulative contribution rates of the first three PCs were 99.856% and 99.852%. Finally, the back propagation artificial neural (BP-ANN) model with a 3-5-1 structure was calibrated with the first three PCs. When the transfer function was logistic, the best results were obtained (Rp2 = 0.985, RMSEP = 6.732 RMB/kg) by 33 feature spectral data points. The detection effect of the best BP-ANN model by 14 external samples were R2 = 0.987 and RMSEP = 6.670 RMB/kg. The results of this study have achieved real-time, non-destructive, and accurate evaluation and digital display of purchase prices of fresh tea samples by using NIRS technology.


Introduction
Enshi Yulu is a famous historical tea produced in Enshi City, Hubei Province.It is the only steamed green tea and is a national geographical indication protection product in China [1].Its brand reputation is well-known at home and abroad, and is often used to entertain foreign guests in state events.Enshi Yulu Nature Reserve is located in the range of 450-850 m above sea level.The climate is warm and humid all year round, and the clouds and mists around the day and night produce more diffuse light and short-wave ultraviolet light, making the fresh tea buds and leaves tenderer, and the contents of protein, amino acid and alkaloid are very rich, which lays a solid material foundation for forming the excellent quality of Enshi Yulu.As is known to all, the quality of fresh tea leaves is the basis of the tea quality [2].For example, before Enshi Yulu is processed, there are clear regulations on the quality of fresh tea leaves [3]; for example, the special grade one Enshi Yulu requires fresh tea leaves of a single bud to exceed 95% so that the raw materials are fresh and uniform, without red bud leaves, purple bud leaves, disease and insect bud leaves, and rain leaves.Due to the high requirements on the quality of fresh tea leaves, the purchase price of Enshi Yulu fresh tea leaves is higher than that of ordinary fresh tea leaves.
Generally speaking, tea gardens are usually planted and managed by tea farmers, and tea processing factories are responsible for purchasing and processing fresh tea leaves.When purchasing, the purchasing personnel usually determine the quality grade of fresh tea leaves based on their own sense organs, such as smell, vision, and touch, and personal experience, and then give the corresponding purchase price.For example, Zhang Jun [4] has established a relationship between the quality and price of jasmine tea.However, the sensitivity of human sense organs is easily affected by their own work experience, physiological conditions at that time, and external conditions (such as the surrounding environment, weather, temperature, and humidity), and has greater subjectivity, so most of the time the purchaser and the tea farmer could not reach an agreement on the determination of the purchase price of fresh tea leaves, resulting in many conflicts.To create an effective method to weaken the limitations of sensory quality analysis and compensate for the shortcomings (e.g., subjectivity, unpredictability, and inconsistency) of sensory evaluation, researchers have been devoted to developing various instrumental analytical techniques to evaluate the quality of fresh tea leaves based on their physical and chemical profiles, such as liquid chromatography-mass spectrometry [5], gas chromatography-mass spectrometry [6], and high-performance liquid chromatography [7].Then, the quality index (QI) of fresh tea leaves was computed by selecting proprietary ingredients [8] in order to evaluate the purchase price based on quality.Although the chemical method has high detection accuracy, the process is extremely cumbersome and requires many chemical reagents.In addition, the detection process is time-consuming and laborious, and cannot have timeliness.It is still unable to achieve the rapid evaluation of the purchase price of fresh tea leaves.Therefore, in order to effectively alleviate the distrust between tea factories and tea farmers and increase their mutual trust between each other, it is urgent to establish an objective, fair, and rapid method for digitally evaluating the purchase price of Enshi Yulu fresh tea leaves.
Near-infrared spectroscopy (NIRS), an electromagnetic wave with a wavelength in the range of 780-2526 nm, mainly reflecting the X-H chemical bond, has the advantages of rapid and non-destructive analysis, and now has been widely used in agriculture [9][10][11], the petrochemical industry, the textile industry, and the pharmaceutical industry [12,13].NIRS combined with CARS-PLS and si-PLS methods, has been broadly used to predict the amounts of polyphenols, caffeine [14], and other components in tea [15], assess the quality of fresh tea leaves using QI values [16], and discriminate the tea varieties [17].However, at present, there are few reports on the application of near-infrared spectroscopy technology to evaluate the purchase price of fresh tea leaves, and further research is needed.
In this paper, fresh tea leaves of the Entaizao tea varieties in the Enshi Yulu Nature Reserve were used as the research objects, the quality index (QI) of different quality grades (QG) were calculated, and the corresponding relationship between QI, QG, and the purchase price was clarified.Then, NIRS were obtained, and the spectral noise information was removed by various pre-processing methods.The characteristic spectral data was extracted by using the synergy interval partial least squares (si-PLS) method and genetic algorithm (GA), then the principal component analysis (PCA) method was applied to compress and extract the above characteristic spectral information, and finally, the backpropagation artificial neural network (BP-ANN) method combined with three transfer functions was used to establish a NIRS digital model of the purchase price.The actual application effect of the model was tested using external samples.This study can provide a convenient new method for the rapid, non-destructive, objective, and digital evaluation of the purchase price of Enshi Yulu fresh tea leaves, striving to overcome subjective factors and laying a solid scientific foundation for the next step of developing a portable near-infrared spectrometer for the purchase price of fresh tea leaves.

Samples and Classification of Fresh Tea Leaves
Samples of Entaizao tea variety fresh tea leaves were picked from March to May 2022.The sample standard was one bud, one bud and the first leaf, one bud and the first two leaves, and one bud and the first three leaves.Then, seven quality grades of fresh tea processing samples were mixed with the above fresh tea leaves [3] (Table 1), (grade 1 samples have the best quality, while grade 7 samples have the worst quality).Each grade had 8 samples and each sample were approximately 100 g.In order to ensure the fairness of fresh leaf prices, in this study the price was set by a team of three people, including one fresh leaf purchaser, one tea expert, and one tea farmer.The prices of fresh leaf tea are set based on the quality of fresh leaves.The quality of fresh leaves is closely related to their tenderness, integrity, uniformity, and purity.The better the tenderness of fresh leaves, the fresher they are, which is beneficial for kneading and shaping during processing.The integrity of fresh leaves is good, indicating that the buds and leaves of fresh leaves are not separated, and the finished tea has less broken tea.The high uniformity of fresh leaves indicates that the picking standards for fresh leaves are consistent, making it easy to use the same processing parameters and improving the utilization rate of fresh leaves.The purity of fresh leaves is good, indicating that there are fewer impurities in the fresh leaves, such as grass leaves and other leaves.Therefore, the better the quality of fresh leaves, the higher the price of the fresh leaves.In addition, considering the market demand factor and local price levels when purchasing fresh leaves, the average purchase price of fresh leaves from grade 1 to grade 7 ranges from 30 RMB/kg to 220 RMB/kg in this research.According to the different quality grades, of the 56 samples, 42 were selected for the calibration set model and 14 were used for validation, with a ratio of 3:1.Additionally, 14 samples purchased from the local market were used to test the effectiveness of the price calibration model.

Spectral Collection
Near-infrared spectroscopy (NIRS) data were acquired using a Thermo Antaris II Fourier transform (FT) NIR spectrometer (Thermofisher Scientific, Waltham, MA, USA) in reflectance mode, equipped with an InGaAs detector and an integrating sphere accessory.To obtain the spectral data, 10 g of the fresh tea samples were placed in a sample cup that rotated 360 • during scanning.The spectral range was between 10,000 cm −1 and 4000 cm −1 , with 3.857 cm −1 intervals.Each sample was scanned three times, and the average spectrum of the three scans was used for subsequent analysis (Figure 1).

Spectral Data Analysis
Due to the distance between adjacent spectral data points being 3.857 cm −1 , the nearinfrared spectrum of each sample contains 1557 pairs (x, y) of data points (x as spectral data points, y as absorbance) analyzed by TQ Analyst 9.4.45 software (Thermofisher Scientific, Waltham, MA, USA).The data point pairs were saved in Excel sheets.Then, the PLS model was built to select the best pretreatment method by using OPUS 7.0 software (Bruker Optik GmbH., Saarbrucken, Germany).The feature spectral intervals were selected to build price NIRS models by using the synergy interval partial least squares (si-PLS) method on the Matlab 2012a software platform (MathWorks, Natick, MA, USA).
Before modeling, in order to effectively eliminate extraneous background and noise information and enhance model performance, various spectral preprocessing techniques were employed, including spectral free preprocessing (none), standard normal variable (SNV), first derivative (FD), second derivative (SD), multiple scatter correction (MSC), and their combined methods, to remove noise from the original spectra [18].After comparing the results, the optimal preprocessing method was determined.

Synergy Interval Partial Least Squares (si-PLS) Method
The si-PLS method can divide the whole spectral data set into a number of intervals (10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25), and all possible PLS models combinations for two, three, or four intervals are calculated [19].The spectral regions most relevant to the price are selected according to the root mean square error of cross-validation (RMSECV) in the calibration set.When the si-PLS model has the lowest RMSECV, the modelled spectral intervals are the selected feature spectral intervals, which contain the NIR information specific to the price of fresh tea samples.
The RMSECV was calculated as follows:

Spectral Data Analysis
Due to the distance between adjacent spectral data points being 3.857 cm −1 , the nearinfrared spectrum of each sample contains 1557 pairs (x, y) of data points (x as spectral data points, y as absorbance) analyzed by TQ Analyst 9.4.45 software (Thermofisher Scientific, Waltham, MA, USA).The data point pairs were saved in Excel sheets.Then, the PLS model was built to select the best pretreatment method by using OPUS 7.0 software (Bruker Optik GmbH., Saarbrucken, Germany).The feature spectral intervals were selected to build price NIRS models by using the synergy interval partial least squares (si-PLS) method on the Matlab 2012a software platform (MathWorks, Natick, MA, USA).
Before modeling, in order to effectively eliminate extraneous background and noise information and enhance model performance, various spectral preprocessing techniques were employed, including spectral free preprocessing (none), standard normal variable (SNV), first derivative (FD), second derivative (SD), multiple scatter correction (MSC), and their combined methods, to remove noise from the original spectra [18].After comparing the results, the optimal preprocessing method was determined.

Synergy Interval Partial Least Squares (si-PLS) Method
The si-PLS method can divide the whole spectral data set into a number of intervals (10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25), and all possible PLS models combinations for two, three, or four intervals are calculated [19].The spectral regions most relevant to the price are selected according to the root mean square error of cross-validation (RMSECV) in the calibration set.When the si-PLS model has the lowest RMSECV, the modelled spectral intervals are the selected feature spectral intervals, which contain the NIR information specific to the price of fresh tea samples.
The RMSECV was calculated as follows: where n is the number of samples in the calibration set, y i is the true value for sample i, and y i is the theoretical value for sample i predicted from the calibration set.The GA refers to natural selection and genetic mechanism in the biological world [20].It uses operators such as selection, exchange, and mutation.With continuous genetic iteration, the variables with better objective function values are retained, and the poor variables are eliminated, and lastly, the optimal results are achieved.In this paper, the GA is applied to obtain the optimal NIRS data points, when RMSECV is at its minimum.2.4.3.Principal Component Analysis (PCA) and Backpropagation Artificial Neural Network Method (BP-ANN) PCA [21] was performed on the best spectral intervals and data points obtained by the si-PLS method and GA method, respectively.It is the compression and extraction of spectral information to obtain the contribution rates of each principal component, which is used to establish a BP-ANN model.
The BP-ANN [22] has emerged as a research hotspot in the field of artificial intelligence in recent years.It abstracts the neural network of the human brain from the perspective of information processing, forms different networks according to different connection modes, and is composed of a large number of neurons connected with each other.Each node represents a specific transfer function.By establishing the connection between input data and output data, a prediction model can be established.The results were evaluated based on the coefficient of determination of cross-validation (Rc 2 ), coefficient of determination of prediction (Rp 2 ), root mean square error of cross-validation (RMSECV), and root mean square error of prediction (RMSEP).A higher R 2 and a lower RMSEP indicate better prediction performance.The equations used to calculate RMSEP and R 2 are provided.
The RMSEP was calculated as follows: where n is the number of samples in the prediction set, y i is the true value of sample i and y i is the predicted value of sample i in the prediction set.The R 2 was calculated as follows: where y i and y i are the true value and predicted value of sample i, respectively, and y is the average true value of all samples.

Results and Discussion
3.1.The Relationships between Quality Grade, Quality Index, and Purchase Price of Fresh Tea Samples According to the formula for calculating the quality index (QI) (QI = (humidity content × total nitrogen content) ÷ crude fiber content) [23], the QI values of seven quality grades of fresh tea samples were obtained and a paired t-test was conducted.The results are shown in Table 2.
Table 2 shows that the value of t (1v2) was the smallest at 3.25, but it was still greater than the critical value of 2.365 (t 0.05 (7) = 2.365).Therefore, there was significant statistical significance (p < 0.05) between the first and second QG of fresh tea samples.The t-values among the other QG were all greater than the critical value of 3.499 (t 0.01 (7) = 3.499), so there was extremely significant statistical significance (p < 0.01) among the other QG of fresh tea samples.Therefore, Table 2 confirms the correctness of the classification of fresh tea samples, laying a foundation for the next research on the purchase prices of fresh tea samples of different quality grades.
Figure 2 shows that as the quality of fresh tea samples gradually decreased from grade 1 to grade 7, their QI values also showed a gradual downward trend.The QI of grade 1 Foods 2023, 12, 3592 6 of 15 fresh tea leaves was the highest at 0.552, while the QI of grade 7 fresh tea leaves was the lowest at 0.383.From grade 1 to grade 7, the QI values decrease by 30.6%, indicating that the quality of the mixed fresh tea samples was very reasonable, and there was a good linear relationship between the quality grades and the quality index, with an excellent correlation, an R 2 of 0.9974.Table 2 shows that the value of t (1v2) was the smallest at 3.25, but it was still greater than the critical value of 2.365 (t0.05 (7) = 2.365).Therefore, there was significant statistical significance (p < 0.05) between the first and second QG of fresh tea samples.The t-values among the other QG were all greater than the critical value of 3.499 (t0.01 (7) = 3.499), so there was extremely significant statistical significance (p < 0.01) among the other QG of fresh tea samples.Therefore, Table 2 confirms the correctness of the classification of fresh tea samples, laying a foundation for the next research on the purchase prices of fresh tea samples of different quality grades.
Figure 2 shows that as the quality of fresh tea samples gradually decreased from grade 1 to grade 7, their QI values also showed a gradual downward trend.The QI of grade 1 fresh tea leaves was the highest at 0.552, while the QI of grade 7 fresh tea leaves was the lowest at 0.383.From grade 1 to grade 7, the QI values decrease by 30.6%, indicating that the quality of the mixed fresh tea samples was very reasonable, and there was a good linear relationship between the quality grades and the quality index, with an excellent correlation, an R 2 of 0.9974.Figure 2 also shows that as the quality of fresh tea samples gradually decreased, their average purchase prices decreased rapidly from the highest 220 RMB/kg for grade 1 to 30 RMB/kg for grade 7. By fitting the relationship between the quality grade and the purchase price of fresh tea samples, the correlation was excellent, and the R 2 was as high as 0.9954.Therefore, there was a good linear relationship between the quality grade and the purchase price of fresh tea samples.
Figure 3 shows that by fitting the linear relationship between the average QI values and their corresponding average purchase prices of seven grades of fresh tea samples, it was found that there was also an excellent linear relationship between the two factors, with an R 2 as high as 0.9973.Therefore, based on the quality of fresh tea samples, it will be a completely feasible and convenient method to quickly and non-destructively predict the purchase price of fresh tea leaves by using NIRS technology.
Figure 3 shows that by fitting the linear relationship between the average QI values and their corresponding average purchase prices of seven grades of fresh tea samples, it was found that there was also an excellent linear relationship between the two factors, with an R 2 as high as 0.9973.Therefore, based on the quality of fresh tea samples, it will be a completely feasible and convenient method to quickly and non-destructively predict the purchase price of fresh tea leaves by using NIRS technology.

Comparison of Pre-Processing Methods for Spectral Data
Figure 1 shows that the spectra exhibit multiple absorption peaks in the long wave band (4000-7000 cm −1 ), primarily due to the presence of water −OH [25] and various components of varying quality in the NIRS absorption information of fresh tea samples.Prior to model building, nine spectral preprocessing methods were employed to pretreat the NIR spectra of fresh tea samples with varying quality prices.Subsequently, PLS was utilized to construct NIRS models.The performance of the models was evaluated using RMSECV and Rc 2 , with a higher Rc 2 and lower RMSECV indicating better pretreatment methods.The results of all the pre-treatment models are presented in Figure 4.

Comparison of Pre-Processing Methods for Spectral Data
Figure 1 shows that the spectra exhibit multiple absorption peaks in the long wave band (4000-7000 cm −1 ), primarily due to the presence of water −OH [25] and various components of varying quality in the NIRS absorption information of fresh tea samples.Prior to model building, nine spectral preprocessing methods were employed to pretreat the NIR spectra of fresh tea samples with varying quality prices.Subsequently, PLS was utilized to construct NIRS models.The performance of the models was evaluated using RMSECV and Rc 2 , with a higher Rc 2 and lower RMSECV indicating better pretreatment methods.The results of all the pre-treatment models are presented in Figure 4.In Figure 4, among the nine models, the NIRS models built with the original spectra yielded the worst results (Rc 2 = 0.526, RMSECV = 32.501RMB/kg).The models with a single preprocessing method, specifically the MSC pretreatment method, showed better results (Rc 2 = 0.685, RMSECV = 25.742RMB/kg), but the prediction results were still inferior to those obtained with combined pre-treatment methods.The NIRS model built using the (SNV+SD) combined method produced the best results (Rc 2 = 0.732, RMSECV = In Figure 4, among the nine models, the NIRS models built with the original spectra yielded the worst results (Rc 2 = 0.526, RMSECV = 32.501RMB/kg).The models with a single preprocessing method, specifically the MSC pretreatment method, showed better results (Rc 2 = 0.685, RMSECV = 25.742RMB/kg), but the prediction results were still inferior to those obtained with combined pre-treatment methods.The NIRS model built using the (SNV+SD) combined method produced the best results (Rc 2 = 0.732, RMSECV = 24.817RMB/kg), representing a 39.16% increase in Rc 2 and a 23.64% decrease in RMSECV compared to the original spectra NIR model.Therefore, it is crucial to pretreat the original spectra before building NIRS models, which is consistent with previous findings [26].In this study, the best spectral pretreatment method was the combination of (SNV + SD), and the Rc 2 and RMSECV of the best calibration model built using the PLS method were 0.732 and 24.817 RMB/kg, respectively.This is because the SNV preprocessing method is used to correct spectral errors caused by scattering between samples; moreover, each spectrum is individually corrected separately.Its correction ability is superior to the MSC method.A derivative pre-treatment method can eliminate spectral baseline drift, enhance spectral band characteristics, and overcome spectral band overlap.Because there is less information directly reflecting the purchase price in NIRS, the application of a second derivative spectral pre-treatment method can highlight the NIRS information reflecting the purchase price.Therefore, the best spectral pre-treatment method obtained in this experiment was the combined method of (SNV + SD) [27].However, the performance was still unsatisfactory, and there is still ample room for improvement in the results.

Results of si-PLS Model
As seen from Table 3, all spectra were divided into 10-25 spectral sub-intervals.Along with the numbers of the spectral intervals gradually increasing, the RMSECV of the si-PLS models showed a trend of gradually decreasing and then slowly increasing, but the PLS factors changed were not significant, with a range from 7 to 10.When the number of subregions was 16, and the factor number was 8, the results of the calibration model had the best performances, meaning the RMSECV value was the least, 15.340 RMB/kg.The selected spectral regions were the four regions of [2 3 7 10], and the corresponding spectral wavelengths were 4377.62 cm −1 -4751.74cm −1 , 4755.63 cm −1 -5129.75cm −1 , 6262.70 cm −1 -6633.93cm −1 , and 7386 cm −1 -7756.32cm −1 , respectively.The R c 2 of the calibration model was 0.783.When the prediction samples were used to test the NIRS calibration model, the R p 2 and RMSEP were 0.746 and 17.252 RMB/kg, respectively.Although the data information of the characteristic spectral range accounted for 25% of the all-spectral data information, the si-PLS prediction results were better than that PLS model.But, in practical applications, the accuracy of the si-PLS model still needs to improve further.Therefore, nonlinear methods were applied to establish a prediction model for the purchase prices of fresh tea leaves.In order to further improve the NIRS model prediction accuracy of purchase prices, the GA was applied to accurately extract spectral data points that reflect the purchase prices from whole denoised spectral data (DSD) and the four characteristic spectral intervals data (FSD) selected by si-PLS method, respectively.The RMSECV and the corresponding filtered feature spectral data points are shown in Figures 5 and 6.Figures 5 and 6 show that, in the process of applying GA to extract feature spectral data points further accurately, whether for the DSD and FSD, as the feature spectral data points gradually increased, the RMSECV value showed a trend of rapidly decreasing to the minimum values and then gradually increasing.Among them, when the minimum RMSECV obtained 24.602 RMB/kg, 70 optimal spectral data points were extracted for DSD.Also, when the minimum RMSECV obtained 15.106 RMB/kg, 33 optimal data points were extracted for FSD.Comparing the results of Figure 4, the proportion of spectral data points (70 data points) extracted using the GA method to all spectral data points (1557 data points) was only 4.62%, but the RMSECV value was lower than the full wavelength PLS model (24.817RMB/kg), indicating a slight improvement in prediction accuracy.Figures 5 and 6 show that, in the process of applying GA to extract feature spectral data points further accurately, whether for the DSD and FSD, as the feature spectral data points gradually increased, the RMSECV value showed a trend of rapidly decreasing to the minimum values and then gradually increasing.Among them, when the minimum RMSECV obtained 24.602 RMB/kg, 70 optimal spectral data points were extracted for DSD.Also, when the minimum RMSECV obtained 15.106 RMB/kg, 33 optimal data points were extracted for FSD.Comparing the results of Figure 4, the proportion of spectral data points (70 data points) extracted using the GA method to all spectral data points (1557 data points) was only 4.62%, but the RMSECV value was lower than the full wavelength PLS model (24.817RMB/kg), indicating a slight improvement in prediction accuracy.Figures 5 and 6 show that, in the process of applying GA to extract feature spectral data points further accurately, whether for the DSD and FSD, as the feature spectral data points gradually increased, the RMSECV value showed a trend of rapidly decreasing to the minimum values and then gradually increasing.Among them, when the minimum RMSECV obtained 24.602 RMB/kg, 70 optimal spectral data points were extracted for DSD.Also, when the minimum RMSECV obtained 15.106 RMB/kg, 33 optimal data points were extracted for FSD.Comparing the results of Figure 4, the proportion of spectral data points (70 data points) extracted using the GA method to all spectral data points (1557 data points) was only 4.62%, but the RMSECV value was lower than the full wavelength PLS model (24.817RMB/kg), indicating a slight improvement in prediction accuracy.Similarly, comparing the results of Figure 6 and Table 3, a total of 390 spectral data points were found in the four selected feature spectral intervals in Table 3.After further extraction, a total of 33 feature spectral data points were obtained, accounting for only 8.21% of the FSD.However, the RMSECV value (15.106RMB/kg) was lower than the si-PLS model (RMSECV = 15.340RMB/kg), indicating a slight improvement in the model's prediction accuracy.From the above results, it can be further concluded that the GA had better spectral information extraction ability, which not only has eliminated noise information unrelated to purchase prices, but has also reduced the amount of data input to the model, which was very conducive to improving the prediction ability [28].The extracted feature spectral data points are shown in Table 4. From Table 4, it can be concluded that the characteristic spectral data points of DSD had a distribution within the full wavelength range, indicating that the data points can better represent the DSD information.Among the feature spectral data points extracted from FSD, a total of 23 feature spectral data points were extracted in the range of 4377.62 cm −1 -5129.75cm −1 , 9 feature spectral data points were extracted in the range of 6262.70 cm −1 -6633.93cm −1 , while only 1 feature spectral data point was extracted in the range of 7386 cm −1 -7756.32cm −1 .This was because the near-infrared spectral information of fresh tea leaves was mainly reflected in the long wavelength range, which contained more spectral information reflecting the price of fresh tea leaves, and the established model results will also be better [29].

Principal Component Analysis (PCA)
PCA was applied to extract feature spectral data points and compress the spectral information.The results of the PCA were as follows.
From Table 5, it can be concluded that the contribution rates of PC1 for the DSD 70 data points and FSD 33 data points were both higher, reaching 94.828% and 93.101%, respectively.As the increasing of PC2 in the FSD data points has exceeded the increasing of PC2 in the DSD data points (net increase of 2.506%), PC (1-2) was higher than the cumulative contribution rate of the DSD data points.However, the cumulative contribution rates of the first three PCs were very close, at 99.856% and 99.852%, respectively, and so extremely close to 100%.According to the PCA principle [30], the first three PCs can fully represent the extracted spectral data point information, further proving the powerful information extraction ability of the genetic algorithm.From Figure 7, it can be concluded that within the same spatial range, the vast majority of the samples in Figure 7a were distributed within the ranges of −1.0 < score1 < 1.5 and −0.3 < score2 < 0.2, while the vast majority of the samples in Figure 7b were distributed within the ranges of −1.0 < score1 < 1.0 and −0.25 < score2 < 0.2.The distribution space of the samples in Figure 7b was smaller, and the clustering effect was more prominent.This will lay a good foundation for the establishment of the BP-ANN model in the next step.Figure 7b was smaller, and the clustering effect was more prominent.This will lay a good foundation for the establishment of the BP-ANN model in the next step.

BP-ANN Model
According to the PCA results, the first three PCs were as input values and the prices were as output values, while the BP-ANN method was used to establish the NIRS models using DSD data points and FSD data points, respectively.During the process of establishing the model, the number of hidden layers was continuously adjusted to obtain the best prediction model.After repeated adjustments, a three-layer BP-ANN prediction

BP-ANN Model
According to the PCA results, the first three PCs were as input values and the prices were as output values, while the BP-ANN method was used to establish the NIRS models using DSD data points and FSD data points, respectively.During the process of establishing the model, the number of hidden layers was continuously adjusted to obtain the best prediction model.After repeated adjustments, a three-layer BP-ANN prediction model with a 3-5-1 structure was finally obtained using three different transfer functions.This study compared the price NIRS models of three transfer functions, namely the linear [−1, 1] function, logistic function, and tanh function, and the results of the BP-ANN models were as follows.
Table 6 shows that the R c 2 and RMSECV of the calibration price NIRS model established by the linear [−1, 1] transfer function were 0.845 and 11.164 RMB/kg, respectively.When 14 prediction samples were used to verify the robustness, R p 2 and RMSEP were 0.812 and 14.014 RMB/kg, respectively.The Rc 2 and RMSECV of the calibration price NIRS model established by the tanh transfer function were 0.883 and 10.135 RMB/kg, respectively.When the robustness was verified by 14 prediction samples, the R p 2 and RMSEP were 0.857 and 10.875 RMB/kg, respectively.The Rc 2 and RMSECV of the calibration price NIRS

Figure 1 .
Figure 1.Near-infrared spectroscopy of seven quality grades fresh tea samples.

Figure 1 .
Figure 1.Near-infrared spectroscopy of seven quality grades fresh tea samples.

Figure 2 .
Figure 2. Relationships between average QI values, average purchase prices, and seven QGs of fresh tea samples.

Figure 2 .
Figure 2. Relationships between average QI values, average purchase prices, and seven QGs of fresh tea samples.

Figure 3 .
Figure 3.The relationship between the average QI values and the average prices of seven grade samples.

Figure 3 .
Figure 3.The relationship between the average QI values and the average prices of seven grade samples.

Foods 2023 , 16 Figure 4 .
Figure 4.The results of purchase price PLS models by using different pretreatment methods.

Figure 4 .
Figure 4.The results of purchase price PLS models by using different pretreatment methods.

Figure 5 .
Figure 5. Correspondence between RMSECV and the best spectral data points of DSD.

Figure 6 .
Figure 6.Correspondence between RMSECV and the best spectral data points of FSD.

Figure 5 .
Figure 5. Correspondence between RMSECV and the best spectral data points of DSD.

Foods 2023 , 16 Figure 5 .
Figure 5. Correspondence between RMSECV and the best spectral data points of DSD.

Figure 6 .
Figure 6.Correspondence between RMSECV and the best spectral data points of FSD.

Figure 6 .
Figure 6.Correspondence between RMSECV and the best spectral data points of FSD.

Figure 7 .
Figure 7. PC1 vs PC2 distribution of different price fresh tea samples.Note: (a) was PC1 vs PC2 distribution of DSD 70 data points; (b) was PC1 vs PC2 distribution of FSD 33 data points.

Figure 7 .
Figure 7. PC1 vs PC2 distribution of different price fresh tea samples.Note: (a) was PC1 vs PC2 distribution of DSD 70 data points; (b) was PC1 vs PC2 distribution of FSD 33 data points.

Table 1 .
Composition of seven grades fresh tea leaves samples (fresh weight ratio/%).

Table 2 .
The t values between QI of seven quality grades fresh tea samples.

Table 3 .
Results of si-PLS calibration model selected different spectral regions.

Table 4 .
Feature spectral data points extracted by GA.

Table 5 .
Cumulative contribution rate of the first six principal components.

Table 5 .
Cumulative contribution rate of the first six principal components.