An E ﬀ ective Prediction Approach for Moisture Content of Tea Leaves Based on Discrete Wavelet Transforms and Bootstrap Soft Shrinkage Algorithm

: The traditional method used to determine the moisture content of tea leaves is time consuming and destructive. To address this problem, an e ﬀ ective and non-destructive prediction method based on near-infrared spectroscopy (NIRS) is proposed in this paper. This new method combines discrete wavelet transforms (DWT) with the bootstrap soft shrinkage algorithm (BOSS). To eliminate uninformative or interfering variables, DWT is applied to remove the noise in the spectral data by decomposing the origin spectrum into six layers. BOSS is used to select informative variables by reducing the dimensions of the sub-layers’ reconstruction spectrum. After selecting the e ﬀ ective variables using DWT and BOSS, a prediction model based on partial least squares (PLS) is built. To validate e ﬀ ectiveness and stability of the prediction model, full-spectrum PLS, genetic algorithm PLS (GA-PLS), and interval PLS (iPLS) were compared with the proposed method. The experiment results illustrate that the proposed prediction model outperforms the other classical models considered in this study and shows promise for the prediction of the moisture content in Yinghong No. 9 tea leaves. error of cross-validation (RMSECV), and root mean square error of (RMSEP). , predictive ability of the and RMSE represents the stability of the model [37]. The the of three the higher the reliability of the model. The correlation coefficient ( R is used to measure the correlation between two closer R is to 1, the higher the correlation. In this paper, when R > 0.8, the corresponding variables are defined as correlated variables.


Introduction
Processed leaves and leaf buds of tea tree are used to produce tea, which are popular in many parts of the world [1]. Traditional tea making is complicated; the drying of fresh leaves is the primary and indispensable stage of this process [2] and moisture content is a key index in the drying process [3]. However, improper handling may lead to inaccurate measurements when determining moisture content. Therefore, an accurate and rapid detection approach would be indispensable for determining the moisture content of tea leaves during tea making [4].
Many attempts have been made to determine moisture based on near-infrared spectroscopy (NIRS). Moisture measurements are commonly recorded by detecting mass loss after heating to evaporate moisture. However, this procedure damages the samples and is time consuming. In contrast, direct determination of moisture by NIRS is fast, only requiring the acquisition of the sample's reflection spectrum [5]. However, the disadvantages of NIRS include broad overlapping, difficultly interpreting the attribute absorption bands, and noise [6]. The effective selection of wave bands is used to address these problems. It is difficult to select effective variables that peaks are unresolved and important features cannot be recognized [7]. Therefore, it is crucial to eliminate noise and avoid losing spectral details in the spectral prediction model of determining the moisture content.
The current common spectral denoising methods include moving average, Savitzky-Golay filtering, and median computing [8]. Xie et al. proposed a tailoring noise frequency spectrum technique based on the Savitzky-Golay filter and obtained a satisfying result [9]. Morgan et al. used the moving weighting algorithm to estimate soil organic carbon content fixing the spectrum bias [10]. Although these methods can remove the noise in spectral data, useful signals may be lost during the process of denoising. To avoid losing effective variables, a discrete wavelet transform (DWT) of spectral signals was developed. The moisture content (MC), soluble solids content (SSC), pH, and hardness of Gala apple samples were tested non-destructively within 350-2500 nm using the wavelet transform pretreatment of raw spectral data [11]. The use of DWT successfully further simplified the genetic algorithm-the partial least squares (GA-PLS) model by reducing variables by 40-44% without reducing the prediction accuracy [12]. Other experimental results [13] showed that the DWT-support vector regression (DWT-SVR) multivariate regression model, having good robustness, can measure protein, starch, and fat contents in corn simultaneously, demonstrating that DWT can effectively remove noise from corn NIRS spectral data. However, an unsolved problem is that DWT cannot reduce the dimensions of huge data, which leads to a redundancy in the data volume during model building. In summary, wavelet decomposition is an effective method for removing noise without reducing data dimension.
As NIRS produces a large amount of data, considerable residual redundant noise and irrelevant data remain after spectra denoising. Therefore, variables must be selected before building a prediction model. The benefits of variable selection can be summarized into three aspects: (1) eliminating uninformative or interfering variables, (2) selecting informative variables, and (3) reducing the dimensions of the data [14]. The common selection methods can be divided into three types: (1) single variable selection, where some use different variable ranking criteria such as regression coefficients and variance analysis [9]; (2) random variable selection such as uninformative variable elimination (UVE) [15], genetic algorithm (GA) [16], random forest (RF) [17], etc.; and (3) interval variable selection such as interval partial least squares (iPLS) [18] and synergy iPLS (SiPLS) [19]. A new variable selection method called bootstrap soft shrinkage algorithm (BOSS) was proposed, which was derived from the idea of weighted bootstrap sampling (WBS) and model population analysis (MPA) [20]. In BOSS, WBS is used to generate sub-models based on the weights, and MPA is used to analyze the sub-models and update the weights of the variables [21,22]. Yan et al. used the BOSS method with mid-infrared (MIR) spectroscopy to determine chlorantraniliprole in abamectin, and obtained the highest coefficient of determination of cross-validation (R 2 cv = 0.9998) and coefficient of determination of the test set (R 2 p = 0.9989) [23]. Zhang et al. showed that BOSS can improve prediction performance and markedly reduce features, and had the best accuracy in calibration and prediction with the correction determination coefficient (R 2 c ) of 0.9907, the root-mean-square error of calibration (RMSEC) of 0.4257 mg/kg, R 2 p of 0.9821, and the root-mean-square error of prediction (RMSEP) of 0.6461 mg/kg [24]. From the above research, the BOSS algorithm not only improves the prediction accuracy of the model but also effectively reduces the number of variables to speed up the calculation of the model. However, BOSS directly processes the original spectral data, which includes processing irrelevant noise information. Therefore, noise elimination steps must be added.
In this study, we constructed a novel variable selecting method based on DWT and BOSS. GA [25] and iPLS [26,27] were compared with the new proposed method, which are categorized as a random variable selection method and interval variable selection method, respectively [28,29]. As classical methods, many studies selected variables to improve prediction ability. Jiang, H monitored yeast concentrations of Saccharomyces cerevisiae cultivations with NIRS and compared the results with different variable selection methods. The GA model was built on fewer data points than that based on full spectra, which ranges from 1557 to 71 points, with R 2 p ranging from 0.9777 to 0.9806 [30]. Sousa Sampaio Appl. Sci. 2020, 10, 4839 3 of 13 optimized rice amylose determination using NIRS with the iPLS method. The full spectrum was split into 10, 20, 25, and 50 intervals, and the optimal model was obtained for the Savitzky-Golay filter (R 2 p = 0.92 and RMSEP = 2.133), which was better than the full-spectrum PLS model [7]. Yang et al. used different regression methods such as PLS, iPLS, and SiPLS with multiple pretreatment methods. The Al 2 O 3 models obtained using the iPLS algorithm had R 2 p values of 0.8273 to 0.9196 [31]. PLS, which can improve the prediction ability by selecting informative variables or eliminating uninformative variables, was used to build a prediction model in this paper [32][33][34]. DWT and BOSS were combined as a new variable selection method. After previous variable selection, DWT-BOSS-PLS, GA-PLS, and iPLS models were established, which are based on the NIRS data and moisture content. By comparison, three variable selection methods are discussed to choose the best one.

Trial Introduction
For the trial, we used Yinghong No. 9 variety tea leaves, which was carried out on 4 December, 2019 at the Yingde Yinghong No. 9 base of the Tea Research Institute of Guangdong Academy of Agricultural Sciences (Yingde, Qingyuan, Guangdong, China). The tea leaves were picked randomly within the tea garden. At 12:00 p.m., 100 kg of tea leaves were picked and placed in a withering trough, and the leaves were about 4 cm thick. Samples were taken every hour from withering trough. At the normal time of withering, there was a total of 15 h. The fresh tea leaves were taken in 5 samples and, in the other 14 h, the tea leaves were taken 10 samples. In total, 145 samples were obtained in this test.
The tea leaves reflectance spectra were measured using a Thermo Antaris II Fourier transform near-infrared (FT-NIR) spectrometer (Thermo Scientific Co., Waltham, MA, US) with a diffuse reflection of the integrating sphere at a spectral range of 12,000-3800 cm −1 (833-2630 nm). The resolution was 4 cm −1 and the diameter of the sample cup rotator was 20 cm. The number of sample scans were 64 (can rotate a circle). Each sample was covered with 25 g in an integration sphere. Three spectra were taken from each sample, and then the average spectra were taken as the spectra of the corresponding samples.

Moisture Content Acquisition
Tea leave samples were tested for moisture content immediately after the spectral experiments, and measured for moisture content according to GB/T 8304-2013 in Chinese. From 12:00 p.m., the moisture content was recorded every hour. Each spectrum corresponds to a moisture content, so the number of moisture content was 145. The average moisture content per hour over a range of 15 h is shown in Figure 1. As the withering time increases, the moisture content gradually decreases.
where w is the moisture content, m 1 is the leaves of fresh weight, m 2 is the leaves of dry weight.

Sample Set Partitioning Based on Joint X-Y Distance (SPXY)
The SPXY algorithm is a method of dividing the sample set considering both X and Y variables. It evolved from the Kennard-Stone (KS) algorithm and divides the samples into training and test sets by calculating the distance between samples [35]. In this paper, X indicates tea leaves' spectral data and Y indicates moisture content.

Sample Set Partitioning Based on Joint X-Y Distance (SPXY)
The SPXY algorithm is a method of dividing the sample set considering both X and Y variables. It evolved from the Kennard-Stone (KS) algorithm and divides the samples into training and test sets by calculating the distance between samples [35]. In this paper, X indicates tea leaves' spectral data and Y indicates moisture content.

Analysis of PLS Model
The partial least squares organically combines the model and cognitive methods. Under regression modeling (multiple linear regression), data structure simplification (principal component analysis) and correlation analysis between two sets of variables can be performed simultaneously [36]. In this study, the prediction model was built on the PLS algorithm.
In this study, the following parameters were selected to evaluate the accuracy of the model: , the root mean square error of calibration (RMSEC), the root mean square error of cross-validation (RMSECV), and the root mean square error of prediction (RMSEP). The larger the R 2 , the more accurate the predictive ability of the mode, and the RMSE represents the stability of the model [37]. The lower the value of these three values, the higher the reliability of the model. The correlation coefficient (R) is used to measure the correlation between two variables; the closer R is to 1, the higher the correlation. In this paper, when R > 0.8, the corresponding variables are defined as strongly correlated variables.
whereŷ i is the value predicted by the calibration model, y i is the reference value,ŷ i * is the value predicted by the cross-validation model, y is s the mean of the reference values,ŷ is the mean of the predicted values, n is the number of samples in the calibration or validation steps, and m is the number of predicted samples.

DWT and BOSS Coupling Algorithm
DWT produces a multi-scale representation of digital signals using a series of high-and low-pass cutoff filters to classify signals according to their frequencies in the wavelength space of the spectrum [38][39][40]. In this study, the wavelet decomposition coefficient was extracted first, then the wavelet high-frequency coefficient (HC) and the wavelet low-frequency coefficient (LC) were extracted. Finally, the LC and the zeroing HC were combined to establish the wavelet reconstruction matrix.
BOSS is a method using collinearity to select effective features and using the information of the regression coefficient to flexibly shrink the information of interest. The BOSS algorithm is constructed using bootstrap sampling (BBS) and weighted bootstrap sampling (WBS) to generate random combination of variables and sub-model, and by combining model population analysis (MPA) and the PLS algorithm to extract effective information from the sub-model [12].
In this paper, the DWT-BOSS algorithm is proposed by coupling DWT with BOSS to obtain the optimal band to establish the PLS prediction model. The process is as follows (shown in Figure 2).
(1) Obtain the maximum decomposition layer (L(Max)) of the wavelet transform. First observe the trend in the spectra image after decomposition, then according to the order of correlation coefficient to select the maximum number of layers; (2) Use the BOSS algorithm to optimize the effective variables of each spectral data from L1 to L(Max) (L1-L(Max)), and the optimal variables set is obtained by superimposing the preferable variables of L1-L(Max). L1 is defined as the first decomposition layer.

Best model
Worst model

Correlation Coefficient (R)Method
The correlation coefficient method involves obtaining the correlation coefficient from the unknown sample and the reference sample to judge whether the unknown sample and the reference sample are consistent for a certain property. The higher the similarity, the closer the R value is to 1. The formula is as follows: where the 1 is the absorbance corresponding to each wavelength point, 2 is the water content. delta_ 1 means the 1 of the standard deviation, delta_ 2 means the 2 of the standard deviation. is the covariance.

Genetic Algorithm
The GA is a global optimization method that can solve problems efficiently for which there are many possible solutions, such as variable selection. The core steps of GA are analogous to the process of Darwinian evolution, in which individuals are selected for the next generation through crossover, mutation, and survival of the fittest until a specific stopping criterion is reached [16]. The main GA parameters were set as follows: population size of 32, window width of 10, maximum generation of 100, and mutation rate of 0.005 in MATLAB R2016a (MathWorks, Natick, MA, US).

iPLS
The iPLS method is a wavelength interval selection method. The method functions by dividing the whole spectrum into several intervals, and then expanding or decreasing the wavelength variables by the center of the interval [18]. The modeling setting of the iPLS method was as follows: the number of intervals was set to 20 in MATLAB R2016a (MathWorks, Natick, MA, US).

Correlation Coefficient (R)Method
The correlation coefficient method involves obtaining the correlation coefficient from the unknown sample and the reference sample to judge whether the unknown sample and the reference sample are consistent for a certain property. The higher the similarity, the closer the R value is to 1. The formula is as follows: where the y 1 is the absorbance corresponding to each wavelength point, y 2 is the water content. delta_y 1 means the y 1 of the standard deviation, delta_y 2 means the y 2 of the standard deviation.
Cov is the covariance.

Genetic Algorithm
The GA is a global optimization method that can solve problems efficiently for which there are many possible solutions, such as variable selection. The core steps of GA are analogous to the process of Darwinian evolution, in which individuals are selected for the next generation through crossover, mutation, and survival of the fittest until a specific stopping criterion is reached [16]. The main GA parameters were set as follows: population size of 32, window width of 10, maximum generation of 100, and mutation rate of 0.005 in MATLAB R2016a (MathWorks, Natick, MA, USA).

iPLS
The iPLS method is a wavelength interval selection method. The method functions by dividing the whole spectrum into several intervals, and then expanding or decreasing the wavelength variables by the center of the interval [18]. The modeling setting of the iPLS method was as follows: the number of intervals was set to 20 in MATLAB R2016a (MathWorks, Natick, MA, USA).

Wavelet Transform and Maximum Decomposition Layer
In this study, the db4 wavelet-generating function was used in MATLAB R2016a (MathWorks, Natick, MA, USA) to decompose the eight layers wavelet of the original spectrum. The reconstructed signals of layers 1 to 8 are defined as L1 to L8(L1-L8), respectively. L0 indicates the origin spectrum.
The significant moisture absorption peaks around 1800 and 2400 nm, and weak peaks around 1200 and 2600 nm. There are three distinct areas of noise in L0, which were more obvious around 1600 (defined as noise1), 2200 (defined as noise2), and 2400 nm (defined as noise3) (Figure 3). The small burr phenomenon occurred in noise1, noise2, and noise3. Figure 4 depicts the noise spectral image around 1600 nm, the scope of which is disordered in L0. When decomposition was applied, the high-frequency signal was further removed, and the noise weakened. As shown in Figure 4, when the original spectrum was decomposed into the fifth sub-layer, the spectral curves became smoother. The spectral details were gradually removed and the spectral curve gradually tended to be smooth, so some absorption peaks representing the moisture characteristics of tea leaves disappeared. When the spectrum was decomposed into seven layers, the spectral curve was almost a straight line, but in this case, the spectral data considerably deviated from the original data and large amounts of effective information were lost. The loss of effective information was more serious at L8. Therefore, L7 and L8 completely deviated from L0 to L6.

Wavelet Transform and Maximum Decomposition Layer
In this study, the db4 wavelet-generating function was used in MATLAB R2016a (MathWorks, Natick, MA, US) to decompose the eight layers wavelet of the original spectrum. The reconstructed signals of layers 1 to 8 are defined as L1 to L8(L1-L8), respectively. L0 indicates the origin spectrum.
The significant moisture absorption peaks around 1800 and 2400 nm, and weak peaks around 1200 and 2600 nm. There are three distinct areas of noise in L0, which were more obvious around 1600 (defined as noise1), 2200 (defined as noise2), and 2400 nm (defined as noise3) (Figure 3). The small burr phenomenon occurred in noise1, noise2, and noise3. Figure 4 depicts the noise spectral image around 1600 nm, the scope of which is disordered in L0. When decomposition was applied, the high-frequency signal was further removed, and the noise weakened. As shown in Figure 4, when the original spectrum was decomposed into the fifth sub-layer, the spectral curves became smoother. The spectral details were gradually removed and the spectral curve gradually tended to be smooth, so some absorption peaks representing the moisture characteristics of tea leaves disappeared. When the spectrum was decomposed into seven layers, the spectral curve was almost a straight line, but in this case, the spectral data considerably deviated from the original data and large amounts of effective information were lost. The loss of effective information was more serious at L8. Therefore, L7 and L8 completely deviated from L0 to L6.   To further determine the appropriate maximum decomposition layer, the correlation coefficient method was used to measure the correlation between the spectral absorption and moisture characteristics of each wavelength point in the spectral matrix of L1-L8. By comparing the measured value (defined as R) with the threshold value, the preferable number of wavelength points was determined. In this study, the threshold value was set to 0.8. According to Figure 5, 259 points in L1 and L2 exceeded the threshold, 257 points in L3 and L4, 260 points in L5, 262 in L6, 175 in, and 199 in L8. Figure 5 shows that the numbers of points in L1-L6 passing threshold were similar, stable at 260 ± 3. About 30% less of the points in L7 and L8 passed the threshold than in L1-L6, gradually weakening the moisture characteristics of the spectrum. To ensure that enough moisture characteristics are preserved after the wavelet transform, the sixth decomposition layer was taken as the largest decomposition layer and the reconstruction spectrum of L1-L6 lost as few spectral details as possible and noise was relatively thoroughly removed. In the following, we used L1-L6 to replace the original spectrum.  To further determine the appropriate maximum decomposition layer, the correlation coefficient method was used to measure the correlation between the spectral absorption and moisture characteristics of each wavelength point in the spectral matrix of L1-L8. By comparing the measured value (defined as R) with the threshold value, the preferable number of wavelength points was determined. In this study, the threshold value was set to 0.8. According to Figure 5, 259 points in L1 and L2 exceeded the threshold, 257 points in L3 and L4, 260 points in L5, 262 in L6, 175 in, and 199 in L8. Figure 5 shows that the numbers of points in L1-L6 passing threshold were similar, stable at 260 ± 3. About 30% less of the points in L7 and L8 passed the threshold than in L1-L6, gradually weakening the moisture characteristics of the spectrum. To ensure that enough moisture characteristics are preserved after the wavelet transform, the sixth decomposition layer was taken as the largest decomposition layer and the reconstruction spectrum of L1-L6 lost as few spectral details as possible and noise was relatively thoroughly removed. In the following, we used L1-L6 to replace the original spectrum. To further determine the appropriate maximum decomposition layer, the correlation coefficient method was used to measure the correlation between the spectral absorption and moisture characteristics of each wavelength point in the spectral matrix of L1-L8. By comparing the measured value (defined as R) with the threshold value, the preferable number of wavelength points was determined. In this study, the threshold value was set to 0.8. According to Figure 5, 259 points in L1 and L2 exceeded the threshold, 257 points in L3 and L4, 260 points in L5, 262 in L6, 175 in, and 199 in L8. Figure 5 shows that the numbers of points in L1-L6 passing threshold were similar, stable at 260 ± 3. About 30% less of the points in L7 and L8 passed the threshold than in L1-L6, gradually weakening the moisture characteristics of the spectrum. To ensure that enough moisture characteristics are preserved after the wavelet transform, the sixth decomposition layer was taken as the largest decomposition layer and the reconstruction spectrum of L1-L6 lost as few spectral details as possible and noise was relatively thoroughly removed. In the following, we used L1-L6 to replace the original spectrum.

An Optimal Variable Set Applicable to Moisture Characteristics of Tea Leaves
The BOSS method was used to optimize the variables of different layers of the wavelet reconstruction matrix. As shown in Figure 6, the optimal set of each layer was roughly distributed around the moisture absorption peak (1200, 1400, 2400, and 2600 nm). Due to the randomness of the variable selection, BOSS was repeated 30 times to reduce the statistical errors. Therefore, the top 10 variables with the highest occurrence frequency were taken as the preferable variables in each layer after 30 cycles in the test. After the combination of the preferable variables of L1-L6, the optimal variable set V was obtained. As the number of decomposition layers increased, some moisture features were optimized and some irrelevant information was eliminated. The optimal variables decomposed by L1-L6 were superimposed to obtain 55 optimal variables in the regions of 800-1000, 1100-1400, 1500-1700, 1900-2000, and 2300-2600 nm. The considerable number of variables are in the range of 800-1100, 1200-1400, and 1700-2000 nm. The wavelength ranges were mainly represented by the fundamental frequency vibration of the free -OH group, as well as the combination and octave vibration absorption.

An Optimal Variable Set Applicable to Moisture Characteristics of Tea Leaves
The BOSS method was used to optimize the variables of different layers of the wavelet reconstruction matrix. As shown in Figure 6, the optimal set of each layer was roughly distributed around the moisture absorption peak (1200, 1400, 2400, and 2600 nm). Due to the randomness of the variable selection, BOSS was repeated 30 times to reduce the statistical errors. Therefore, the top 10 variables with the highest occurrence frequency were taken as the preferable variables in each layer after 30 cycles in the test. After the combination of the preferable variables of L1-L6, the optimal variable set V was obtained. As the number of decomposition layers increased, some moisture features were optimized and some irrelevant information was eliminated. The optimal variables decomposed by L1-L6 were superimposed to obtain 55 optimal variables in the regions of 800-1000, 1100-1400, 1500-1700, 1900-2000, and 2300-2600 nm. The considerable number of variables are in the range of 800-1100, 1200-1400, and 1700-2000 nm. The wavelength ranges were mainly represented by the fundamental frequency vibration of the free -OH group, as well as the combination and octave vibration absorption. Figure 6. The variables selected by the DWT-BOSS coupling algorithm. Note: L1 refers to the reconstructed spectral average after one-layer wavelet transform, L2 to L6 are analogous to L1. V is the optimal variable set. Different graphs represent different levels of decomposition.

Establishment and Verification of the PLS Model Based on an Optimal Variable Set
The optimal variable set selected by DWT-BOSS was the independent variable of the tea leaves' moisture content prediction model, and the corresponding tea leaves moisture content was the dependent variable. The tea leaves moisture content prediction model (defined as L(i)-BOSS-PLS model, i = 1-6) was constructed. Due to the generation of random numbers, the model was run 30 times to verify the reliability of the model. In other words, 30 models were obtained in each layer. The optimal variable sets of L1-L6 were modeled, respectively, and the model of full-spectrum L0 was introduced for comparison. The V-PLS model was constructed to explore the model's accuracy and stability. By comparing the L(i)-BOSS-PLS algorithm with the V-PLS algorithm, we concluded that the accuracy and stability of the PLS models were improved. By analyzing the information in Table 1, we found the V-PLS model has the highest accuracy, with an 2

Establishment and Verification of the PLS Model Based on an Optimal Variable Set
The optimal variable set selected by DWT-BOSS was the independent variable of the tea leaves' moisture content prediction model, and the corresponding tea leaves moisture content was the dependent variable. The tea leaves moisture content prediction model (defined as L(i)-BOSS-PLS model, i = 1-6) was constructed. Due to the generation of random numbers, the model was run 30 times to verify the reliability of the model. In other words, 30 models were obtained in each layer. The optimal variable sets of L1-L6 were modeled, respectively, and the model of full-spectrum L0 was introduced for comparison. The V-PLS model was constructed to explore the model's accuracy and stability. By comparing the L(i)-BOSS-PLS algorithm with the V-PLS algorithm, we concluded that the accuracy and stability of the PLS models were improved. By analyzing the information in Table 1, we found the V-PLS model has the highest accuracy, with an R 2 c of 0.9410, RMSEC of 0.2404, R 2 cv of 0.9171, RMSECV of 0.2851, R 2 p of 0.9513, and RMSEP pf 0.2236. In general, the L(i)-BOSS-PLS model produced a better effect than the L0-PLS model, with higher prediction accuracy and better model stability. The results obtained by running the program 30 times were within a reasonable range. The DWT-BOSS considerably reduces the amount of modeling computation and effectively improves the prediction ability of the model. In the V-PLS model established by the optimal variable set, 55 variables were selected from 3112 variables for modeling, which greatly reduced the modeling time and improved the model accuracy. The method provides a reference for the selection of key bands for the near-infrared spectrum of Yinghong No. 9 tea leaves, providing an inversion of moisture content for other tea leaves.

Two Classical Methods Introduced to Establish PLS Models
To validate the prediction accuracy and stability of the prediction model, two classical algorithms for selecting variables based on the near-infrared spectrum were introduced for comparison with the performance of the proposed DWT-BOSS selection algorithm. Two classical variable selection methods are the GA and interval iPLS.

GA-PLS Prediction Model Built for Comparison with the Proposed Model
The main GA parameters were set as follows: population size of 32, window width of 10, maximum generation of 100, and mutation rate of 0.005. Due to the randomness of the GA, 30 modeling repetitions were used in this experiment for selecting the best results. As shown in Figure 7, the corresponding bands above the red dotted line were selected, for a total of 870 bands. As shown in Table 2, R 2 c was 0.9318, RMSEC was 0.2617, R 2 cv was 0.8908, RMSECV was 0.3287, R 2 p was 0.9420, and RMSEP was 0.2421. Due to the complexity of the full-spectrum data, which contained redundant information and noise, the GA left the band closer to the moisture characteristics using the survival of the fittest rule, and the result was optimized and improved compared with the original spectrum. However, compared with the DWT-BOSS algorithm, the result still had redundant wavebands; the proportion of the number of the variables was about 1:16 (V:GA).    The modeling setting of the iPLS method was as follows: By moving windows, the interval size was set to 20. The result included 280 bands in total that were selected from 14 intervals, which were located near 850, 1200, 1350, 1600, 1800, 2200, 2400, 2500, and 2600 nm. As shown in Figure 8, about 280 variables were selected as the modeling objects to establish the iPLS model, whose R 2 c was 0.9294, RMSEC was 0.2713, R 2 cv was 0.9021, RMSECV was 0.3088, R 2 p was 0.9232, and RMSEP was 0.2838. By selecting the interval, the bands with a stronger correlation with moisture characteristics were obtained, which increased the accuracy and stability of the model. However, the performance of the iPLS model was slightly worse than that of the DWT-BOSS-PLS model because the selected variables were still redundant bands.  The modeling setting of the iPLS method was as follows: By moving windows, the interval size was set to 20. The result included 280 bands in total that were selected from 14 intervals, which were located near 850, 1200, 1350, 1600, 1800, 2200, 2400, 2500, and 2600 nm. As shown in Figure 8, about 280 variables were selected as the modeling objects to establish the iPLS model, whose 2 was 0.9294, RMSEC was 0.2713, 2 was 0.9021, RMSECV was 0.3088, 2 was 0.9232, and RMSEP was 0.2838. By selecting the interval, the bands with a stronger correlation with moisture characteristics were obtained, which increased the accuracy and stability of the model. However, the performance of the iPLS model was slightly worse than that of the DWT-BOSS-PLS model because the selected variables were still redundant bands. Figure 8. Several interval models (bars) and full spectrum models (solid lines) for cross-validation prediction errors (RMSECV). Note: The preferred interval is given in black. Figure 8. Several interval models (bars) and full spectrum models (solid lines) for cross-validation prediction errors (RMSECV). Note: The preferred interval is given in black. Table 2, among the three different wavelength selection methods, the DWT-BOSS algorithm performed the best. The PLS model established using the DWT-BOSS algorithm not only had the best stability and prediction ability but also used the least number of wavelength points. In summary, the ranking of the number of selected variables was as follows: DWT-BOSS < iPLS < GA, whereas the ranking of the prediction accuracy was: DWT-BOSS > GA > iPLS.

Conclusions
In this study, a novel variable selecting algorithm based on DWT and BOSS was employed to select the optimal variable set of the moisture content of tea leaves for the Yinghong No. 9 variety. After selecting the optimal variables, a PLS prediction model was built. The prediction effect of this algorithm on the moisture content of tea leaves was explored. Some conclusions and contributions of this research are summarized as follows: (1) In the DWT process, the noise was considerably removed. The band was calculated by the correlation coefficient method to select the maximum levels and the maximum levels of decomposition was found to be six. In general, the moisture-related spectrum of L6 was denoised but retained effective information.
(2) BOSS selected the effective information from the spectrum with the noise removed by DWT. By superimposing layers L1-L6 optimized by the BOSS algorithm, 55 optimal variables were obtained as the optimal variable set V. A considerable number of variables fell within the range of 800-1100, 1200-1400, and 1700-2000 nm. Those wavelength ranges were mainly represented by -OH groups and the vibration absorption of combinations and octaves. The number of variables dropped from 3112 to 55 using BOSS.
(3) Compared with full spectral modeling, DWT-BOSS-PLS had higher accuracy and prediction accuracy, with R 2 c of 0.9410, RMSEC of 0.2404, R 2 cv of 0.9171, RMSECV of 0.2851, R 2 p of 0.9513, and RMSEP of 0.2236. GA and iPLS algorithms were used for comparison with the proposed DWT-BOSS method; the DWT-BOSS results had higher stability and accuracy, with fewer bands used.
(4) We proposed a novel prediction model that is robust and effective for forecasting the moisture content of Yinghong No. 9 tea leaves.
However, tea making still has difficulties of extensive application of NIRS technology, such as expensive machinery and equipment, learning to use NIRS technology, and the production line design and so on. Thus, NIRS technology needs to be popularized in tea processing factories. Furthermore, spectral equipment needs some suitable designs for production, while the entrepreneur is willing to pay for technological transformation.