Quantification of Water, Protein and Soluble Sugar in Mulberry Leaves Using a Handheld Near-Infrared Spectrometer and Multivariate Analysis

Mulberry (Morus alba L.) leaves are not only used as the main feed for silkworms (Bombyx mori) but also as an added feed for livestock and poultry. In order to rapidly select high-quality mulberry leaves, a hand-held near-infrared (NIR) spectrometer combined with partial least squares (PLS) regression and wavelength optimization methods were used to establish a predictive model for the quantitative determination of water content in fresh mulberry leaves, as well as crude protein and soluble sugar in dried mulberry leaves. For the water content in fresh mulberry leaves, the R-square of the calibration set (RC2), R-square of the cross-validation set (RCV2) and R-square of the prediction set (RP2) are 0.93, 0.90 and 0.91, respectively, the corresponding root mean square error of calibration set (RMSEC), root mean square error of cross-validation set (RMSECV) and root mean square error of prediction set (RMSEP) are 0.96%, 1.13%, and 1.18%, respectively. The RC2, RCV2 and RP2 of the crude protein prediction model are 0.91, 0.83 and 0.92, respectively, and the corresponding RMSEC, RMSECV and RMSEP are 0.71%, 0.97% and 0.61%, respectively. The soluble sugar prediction model has RC2, RCV2, and RP2 of 0.64, 0.51, and 0.71, respectively, and the corresponding RMSEC, RMSECV, and RMSEP are 2.33%, 2.73%, and 2.36%, respectively. Therefore, the use of handheld NIR spectrometers combined with wavelength optimization can fastly detect the water content in fresh mulberry leaves and crude protein in dried mulberry leaves. However, it is a slightly lower predictive performance for soluble sugar in mulberry leaves.


Introduction
Mulberry is a perennial root plant, and the leaves contain a variety of nutrients, such as proteins, soluble sugars, and fat, which are essential nutrients for the growth and development of silkworm [1]. The amino acids (made up of proteins) in mulberry leaves are abundant and suitable in proportion, and the essential and semi-essential amino acids account for more than half of the total amino acids, with the contents of methionine and lysine higher than that of conventional feed. Besides, mulberry leaves are palatable, highly digestible, and barrier-free feeding. They contain a variety of biologically active substances, which have effects on the improvement of immunity, anti-inflammatory, and anti-oxidant [2,3].
In the traditional sericulture industry, mulberry leaves are mainly used as feed for silkworms [4]. In recent years, mulberry leaves have been extended from silkworm feed to livestock feed. Islam [5] confirmed that the addition of mulberry leaf meal to broiler feed could lower their cholesterol and reduce production costs. Zhu [6] added 15% mulberry leaf powder to the finished pigs' diet and found that it can change the muscle fiber properties, resulting in enhanced antioxidant capacity and increased intramuscular fat to improve meat quality. At present, the demand of mulberry leaves is increasing in the livestock industry [7]. However, differences in a mulberry growth environment and field management can result in quality differences of mulberry leaves [8]. In order to obtain high-quality mulberry leaves, a fast, simple, and effective method for the determination of mulberry leaves is in urgent need, instead of the conventional wet biochemical methods that have disadvantages of a long time, high cost and inconvenient. NIR spectroscopy is widely used in food for advantages rapid detection, low analysis cost, excellent reproducibility [9][10][11]. Toledo-Martín et al. [12] used NIR spectroscopy and modified partial least squares (MPLS) to establish a regression model for the fast determination of total phenolic content (TPC) and total carotenoid content (TCC) in blackberry. The ratio of the standard deviation to standard error of prediction (performance) (RPD) and ratio of the range to standard error of prediction (RER) of the TPC model were 1.52 and 5.92, respectively, and the RPD and RER of the TCC model were 1.82 and 8.63, respectively. The results show that NIR spectroscopy can be used for the detection of substances in blackberries. In recent years, NIR spectroscopy has also been used in the feed industry [13][14][15]. Swart et al. [16] successfully used NIR spectroscopy to detect dry matter (DM), ash, crude protein (CP), ether extract (EE), crude fiber (CF), acid detergent fibre (ADF), neutral detergent fibre (NDF), gross energy (GE), calcium (Ca), phosphorus (P), etc., to achieve a rapid, non-destructive quantitative analysis of nutrients in the ostrich mixed rations. Tahir et al. [17] used NIR reflectance spectroscopy to achieve accurate estimates of total and phytate phosphorus in poultry feed.
The recent progress in miniaturization that has taken advantage of new micro-technologies such as micro-electro-mechanical systems (MEMS), micro-opto-electro-mechanical systems (MOEMS) and micro-mirror arrays or linear variable filters (LVFs) has led to a drastic reduction of spectrometer size while allowing excellent performance due to the high-precision implementation of essential elements in the final device, which is dramatically facilitates the on-site and real-time detection [18]. Neve et al. [19] applied hand-held NIR instruments to record the NIR spectra of a variety of different pasta sauce blends and established six models for different nutritional parameters such as energy, protein, fat, carbohydrate, sugar and fiber. The predictive model, the experimental results show the feasibility of handheld NIR spectroscopy to predict dietary nutrition parameters.
Thus, based on the handheld NIR spectrometer, this study established rapid analysis methods for the determination of water content in fresh mulberry leaves, and crude protein and soluble sugar in dry mulberry leaves. Three wavelength optimization methods, including uninformative variable elimination (UVE) [14], competitive adaptive reweighted sampling (CARS) [20], and random frog (RF) [21] were used to select high informative wavelength variables to improve the determination accuracy.

Spectral Characteristics
The raw NIR spectra of fresh mulberry leaves and dry mulberry leaves are shown in Figure 1, which shows that the primary trend of the spectral curves is similar. The spectra have an overtone absorption peak of the weak C-H bond at 1190 nm and a distinct -OH absorption peak at 1440 nm. The peak strength of -OH bond absorption peaks in the spectra of fresh mulberry leaves is higher than that of dry mulberry leaves, which is mainly due to the higher water content in fresh mulberry leaves.

Reference Values
The statistics of water content, crude protein, and soluble sugar in mulberry leaves are shown in Table 1. The range of water, crude protein, and soluble sugar in mulberry leaves were 60.44~78.46%, 11.10~23.50 and 8.47~31.01%, the average and standard deviation were 68.24 ± 3.75%, 17.41 ± 2.27%, 19.97 ± 3.92. The range, average, and standard deviation of the calibration set and the unknown sample set are close, indicating that these values are highly representative, so the model constructed will be better applied in practice.

Spectral Pretreatment
Different methods were used to pretreat spectral data. The results are shown in Table 2, in which the pretreatment has significantly affected the prediction accuracy of the models. The combination of the first-order derivative (1 st Der) + standard normal variate (SNV) + autoscaling pretreated spectra show the best results of modeling ( Figure 2).

Reference Values
The statistics of water content, crude protein, and soluble sugar in mulberry leaves are shown in Table 1. The range of water, crude protein, and soluble sugar in mulberry leaves were 60.44~78.46%, 11.10~23.50 and 8.47~31.01%, the average and standard deviation were 68.24 ± 3.75%, 17.41 ± 2.27%, 19.97 ± 3.92. The range, average, and standard deviation of the calibration set and the unknown sample set are close, indicating that these values are highly representative, so the model constructed will be better applied in practice.

Spectral Pretreatment
Different methods were used to pretreat spectral data. The results are shown in Table 2, in which the pretreatment has significantly affected the prediction accuracy of the models. The combination of the first-order derivative (1st Der) + standard normal variate (SNV) + autoscaling pretreated spectra show the best results of modeling ( Figure 2).    As shown in Table 2, for the water content, when the optimal number of factors is seven, the R 2 C and R 2 CV are 0.92 and 0.90, respectively, and the corresponding RMSEC and RMSECV are 1.00% and 1.17%, respectively. When eight factors are applied for the protein content, the R 2 C and R 2 CV are 0.90 and 0.83, respectively, and the corresponding RMSEC and RMSECV are 0.74% and 0.97%, respectively. For soluble sugars, when the optimal number of factors is seven, the R 2 C and R 2 CV are 0.60 and 0.45, respectively, and the RMSEC and RMSECV are 2.45% and 2.90%, respectively. The pretreatment of the raw spectra improves the prediction accuracy of the model because SNV can correct the scattering caused by sample roughness and particle unevenness, the first derivative can deduct the baseline drift and background noise interference to improve resolution, and autoscaling enhances the difference between spectral data [15].  Figure 3 shows diagrams of the wavelength variable screening for the water content of mulberry leaves. For the CARS (Figure 3a), the first graph is the trend graph of the number of selected wavelength variables with the number of sampling runs. As the number of sampling runs increases, the number of selected wavelength variables decreases from fast to slow.

Wavelength Optimization
respectively. For soluble sugars, when the optimal number of factors is seven, the R 2 C and R 2 CV are 0.60 and 0.45, respectively, and the RMSEC and RMSECV are 2.45% and 2.90%, respectively. The pretreatment of the raw spectra improves the prediction accuracy of the model because SNV can correct the scattering caused by sample roughness and particle unevenness, the first derivative can deduct the baseline drift and background noise interference to improve resolution, and autoscaling enhances the difference between spectral data [15]. Figure 3 shows diagrams of the wavelength variable screening for the water content of mulberry leaves. For the CARS (Figure 3a), the first graph is the trend graph of the number of selected wavelength variables with the number of sampling runs. As the number of sampling runs increases, the number of selected wavelength variables decreases from fast to slow. The second graph is a graph of RMSECV changes. Before the sample was iterated seven times, RMSECV gradually decreased, indicating that the wavelength variables not related to the moisture content of mulberry leaves were eliminated. After seven times, RMSECV gradually increased, indicating that essential wavelength variables related to the moisture content of mulberry leaves were eliminated. The third graph is the changing trend of the regression coefficient of each wavelength  The second graph is a graph of RMSECV changes. Before the sample was iterated seven times, RMSECV gradually decreased, indicating that the wavelength variables not related to the moisture content of mulberry leaves were eliminated. After seven times, RMSECV gradually increased, indicating that essential wavelength variables related to the moisture content of mulberry leaves were eliminated. The third graph is the changing trend of the regression coefficient of each wavelength variable during the screening process. The position of "*" in the figure corresponds to the minimum value of RMSECV. The colored line indicates the trend of the regression coefficient of each wavelength variable, which increases as the number of samples increases.

Wavelength Optimization
For the wavelength selection of UVE, as shown in Figure 3b, data on the left side of the abscissa is the actual spectral wavelength variable, and the right part is the system's noise variable generated by the random noise simulation. The numerical values in the ordinate direction indicate the stability of each wavelength variable, and the two horizontal dashed lines represent the stability threshold of the selected actual spectral wavelength variable. The wavelength variable corresponding to the stability value within the threshold range didn't participate in PLS modeling. The stability variables outside the threshold range were useful for the water content of mulberry leaves and were selected for the PLS modeling.
For the RF, as shown in the wavelength variable screening graph (Figure 3c), the ordinate is the probability of the wavelength variable selected. According to the importance of the wavelength variable, 50 wavelength variables with larger possibilities were selected to participate in the PLS modeling.
The selected wavelength variables are shown in Figure 4. For the determination of water content in fresh mulberry leaves, it is interesting that the selected wavelength variables are not shown in the absorption peak of water, and the variables on the shoulder were selected. For the protein, a few variables are selected. For the soluble sugar, the selected variables are similar to that for water, which may be that a lot OH in soluble sugar.
For the wavelength selection of UVE, as shown in Figure 3b, data on the left side of the abscissa is the actual spectral wavelength variable, and the right part is the system's noise variable generated by the random noise simulation. The numerical values in the ordinate direction indicate the stability of each wavelength variable, and the two horizontal dashed lines represent the stability threshold of the selected actual spectral wavelength variable. The wavelength variable corresponding to the stability value within the threshold range didn't participate in PLS modeling. The stability variables outside the threshold range were useful for the water content of mulberry leaves and were selected for the PLS modeling.
For the RF, as shown in the wavelength variable screening graph (Figure 3c), the ordinate is the probability of the wavelength variable selected. According to the importance of the wavelength variable, 50 wavelength variables with larger possibilities were selected to participate in the PLS modeling.
The selected wavelength variables are shown in Figure 4. For the determination of water content in fresh mulberry leaves, it is interesting that the selected wavelength variables are not shown in the absorption peak of water, and the variables on the shoulder were selected. For the protein, a few variables are selected. For the soluble sugar, the selected variables are similar to that for water, which may be that a lot OH in soluble sugar.  The number of factors has a significant impact on the prediction ability of models. When the number of factors is less, it does not reflect the characteristics of the substance, which leads to a low prediction accuracy of the model. Many factors lead to over-fitting, which gives a high prediction accuracy; however, when applied to unknown sample detection, the prediction effect is weak. In this work, the cross-validation of leave-one-out was applied to obtain an optimal number of factors. The results are shown in Table 3.
For the water content in fresh leaves, the RF method has the best performance in the selection of wavelength variables, and the final wavelength variables were reduced from 125 to 50 (Figure 4a). The RMSEC and RMSECV are 0.96% and 1.13%, respectively, and decrease by 4.00% and 3.42%, respectively, Molecules 2019, 24, 4439 7 of 13 compared to the model with the whole wavelength variables. The corresponding R 2 C and R 2 CV are 0.93 and 0.90, respectively, Cross-validation relative analysis error (RPDCV) was 3.25. The low RMSEC and RMSECV values, and the high R 2 C and R 2 CV values indicate the high prediction accuracy of the model. Furthermore, their values are similar, which demonstrates that the model is robust. Therefore, the model can be accurately and reliably used to predict the water content of unknown mulberry leaves.
For crude protein, the CARS, UVE, and RF methods effectively improved the prediction accuracy of the model, among which the CARS method was the best (Figure 4b). When the number of optimal factors is 9, the R 2 C and R 2 CV are 0.91 and 0.83, respectively, and the corresponding RMSEC and RMSECV are 0.71% and 0.97%, respectively, and the RPDCV is 2.43, it indicates that the model can predict crude protein of mulberry leaves. However, the difference between RMSEC and RMSECV values indicates that the model is less robust.
For soluble sugar, 60 spectral wavelength variables were selected to establish the PLS model (Figure 4c). When the optimal factor is 8, the RMSEC and RMSECV are 2.33% and 2.73%, respectively, and they are reduced by 4.90% and 5.86%, respectively, compared with the model with whole-wavelength variables. The R 2 C and R 2 CV are 0.64 and 0.51, respectively, and are increased by 6.25% and 13.33%, respectively, and the RPDCV was 1.43. RPDCV < 2.5, which indicates that the predicted value of this model is not high and can only be used for rough evaluation. Moreover, RMEC and RMSECV, R 2 C , and R 2 CV differ greatly from each other, reflecting the instability of the model.

Validation for Unknown Samples
The results of the verification of the unknown sample to the model are shown in Table 3. 27 unknown samples were collected to verify the predictive power of the moisture model for mulberry leaves. The results showed that R 2 P and RMSEP were 0.91 and 1.18%, respectively, and the RPDP and RER were 3.43 and 15.21, respectively, R 2 P was high, closer to R 2 C and R 2 CV , RMSEP was low, similar to RMSEC and RMSECV, RPDP > 3, and RER > 10, indicating that the model is accurate and robust. The absolute error range is −2.58~2.16%, and the relative error range is −3.28~3.16% (Table 4), indicating that the model has accurate prediction ability. Figure 5a is a scatter plot of measured and predicted values for mulberry water content. This value is close to the regression line, which indicates that the model has higher prediction accuracy. Ni et al. [22] applied NIR spectroscopy and the stacked autoencoder combined with support vector regression to establish a prediction model for the moisture content of Masson pine seedling leaves. The R 2 C and R 2 P are 0.9946 and 0.9621, respectively, and the RMSEC and RMSEP are 0.1636 0.4249, respectively. The performance of calibration is higher than that of this study, mainly because the NIR spectrometer used is a Fourier-Transform NIR spectrometer, with a wide spectral range. However, this kind of instrument is expensive and is hard to be widely used.
Twenty four samples were collected to assess the predictive performance of the crude protein model. The R 2 P and RMSEP are 0.92% and 0.61%, respectively, and the RPDP and RER are 3.34 and 16.11, respectively. The absolute error range is −1.31 to 1.36%, and the relative error range is −6.55~7.30% (Table 4). In Figure 5b, the measured and predicted values are close to the regression line. These results show that although the established model can predict crude protein of mulberry leaves, the robustness and prediction accuracy of the model are poor.
A set of 22 samples was collected to form a test set to validate the performance of the determination of soluble sugar in mulberry leaves. The R 2 P is 0.71, and the RMSEP is 2.36%, the RPDP and RER are 1.29 and 4.93, respectively, RPDP < 2.5, RER < 10, and the absolute error range and relative error range between predicted and measured values are −3.04~3.33% and −14.81 to 21.09%, respectively (Table 4). In the scatter plot of the measured and predicted sugar content (Figure 5c), the measured and predicted value is not close to the regression line, which indicates that the predictive ability of the model is relatively weak, so it is difficult to predict the soluble sugar content in mulberry leaves accurately in practice. Quentin et al. [23] established a PLS prediction model for soluble sugar in spherical eucalyptus leaves by NIR spectroscopy. The R 2 is 0.70 and RMSEP > 2.3%, which are basically consistent with the achieved results in this work.
In this work, the accuracy of models for the determination of water content in fresh mulberry leaves is high, and the prediction accuracy for the crude protein in mulberry leaves is not as high as that of the water content, it is because the NIR spectra are sensitive to water [18]. The soluble sugar content of the mulberry leaves prediction model is not very effective, and it may because that the soluble sugar is similar to carbohydrates, polysaccharides, and cellulose, which interfere with the NIR spectra.  A set of 22 samples was collected to form a test set to validate the performance of the determination of soluble sugar in mulberry leaves. The R 2 P is 0.71, and the RMSEP is 2.36%, the RPDP and RER are 1.29 and 4.93, respectively, RPDP < 2.5, RER < 10, and the absolute error range and relative error range between predicted and measured values are −3.04~3.33% and −14.81 to 21.09%, respectively (Table 4). In the scatter plot of the measured and predicted sugar content (Figure 5c), the measured and predicted value is not close to the regression line, which indicates that the predictive ability of the model is relatively weak, so it is difficult to predict the soluble sugar content in mulberry leaves accurately in practice. Quentin et al. [23] established a PLS prediction model for soluble sugar in spherical eucalyptus leaves by NIR spectroscopy. The R 2 is 0.70 and RMSEP > 2.3%, which are basically consistent with the achieved results in this work.
In this work, the accuracy of models for the determination of water content in fresh mulberry leaves is high, and the prediction accuracy for the crude protein in mulberry leaves is not as high as that of the water content, it is because the NIR spectra are sensitive to water [18]. The soluble sugar content of the mulberry leaves prediction model is not very effective, and it may because that the soluble sugar is similar to carbohydrates, polysaccharides, and cellulose, which interfere with the NIR spectra.

Conclusions
A handheld NIR spectrometer combined with chemometric methods can quickly detect the moisture in fresh mulberry leaves, as well as the crude protein and soluble sugar content in dried mulberry leaves. The detection accuracy of water and protein content was high; the RMSEPs are 0.91% and 0.92%, the RPDs are 3.43 and 3.34, respectively, and the RERs are 15.21 and 16.11, respectively. However, soluble sugar content is slightly low, and the RMSEP, RPD, and RER are 0.71%, 1.29, and 4.93, respectively. With the developed method, it will be of great importance to improve the quality of mulberry leaves for animal feeds.

Mulberry Leaves
Fresh mulberry leaves were collected from the mulberry resource center, the sericultural research institute, Chinese academy of agricultural sciences (Zhenjiang, Jiangsu, China). The whole leaves of the seventh or eighth position of the mulberry branch were plucked as calibration sets. The numbers of samples were 83, 77, and 80 for the calibration sets of water content, crude protein, and soluble sugar, respectively.

NIR Spectra Collection
The NIR transflective spectra of fresh mulberry leaves were collected by a handheld NIR spectrometer (MicroNIR1700, JDSU, Santa Rosa, CA, USA). Spectra were collected from four points on each of the two superimposed samples (Figure 6a). Each point was collected three times, and the spectrometer was rotated 120 • each time to collect the spectrum. A total of 12 spectra were averaged as the final spectrum of a sample. spectrometer (MicroNIR1700, JDSU, Santa Rosa, CA, USA). Spectra were collected from four points on each of the two superimposed samples (Figure 6a). Each point was collected three times, and the spectrometer was rotated 120 ° each time to collect the spectrum. A total of 12 spectra were averaged as the final spectrum of a sample.
Fresh mulberry leaves were placed in an oven and dried to constant weight at 60 °C, and then pulverized and passed through a 60-mesh sieve to obtain mulberry leaf powders. As shown in Figure  6b, 1.5~2 cm high of mulberry leaf powder was poured in the drum (the bottom of which is the window of the NIR spectrometer) to collect the NIR diffuse reflectance spectrum of the mulberry leaf powder. Each sample was collected three times, and the sample was rotated 120 ° each time to collect the spectrum, an average of the three spectra was used as the final spectrum of a sample.
For the spectral acquisition parameters, the spectral range was 950~1650 nm, and the spectral resolution was 12.5 nm (at 1000 nm), the number of scans was 50, and the integration time was 15ms. As a reference, a 99% Spectralon reflection standard (Labsphere, Inc., North Sutton, NA, USA) was used, all measurements were performed at room temperature and relative humidity of 35-40%. Fresh mulberry leaves were placed in an oven and dried to constant weight at 60 • C, and then pulverized and passed through a 60-mesh sieve to obtain mulberry leaf powders. As shown in Figure 6b, 1.5~2 cm high of mulberry leaf powder was poured in the drum (the bottom of which is the window of the NIR spectrometer) to collect the NIR diffuse reflectance spectrum of the mulberry leaf powder. Each sample was collected three times, and the sample was rotated 120 • each time to collect the spectrum, an average of the three spectra was used as the final spectrum of a sample.
For the spectral acquisition parameters, the spectral range was 950~1650 nm, and the spectral resolution was 12.5 nm (at 1000 nm), the number of scans was 50, and the integration time was 15 ms. As a reference, a 99% Spectralon reflection standard (Labsphere, Inc., North Sutton, NA, USA) was used, all measurements were performed at room temperature and relative humidity of 35-40%.

Reference Determination
The water content of fresh mulberry leaves was determined by the drying method at 105 • C. The crude protein and soluble sugar in dry mulberry leaves were determined by the Kjeldahl method [24] and the anthrone-sulfuric acid colorimetric method [25], respectively. Each component was subjected to three parallel determinations, and the average value was used as the final result.

Spectra Pretreatment
Collected NIR spectra contain not only the component information of the sample but also interference information such as stray light, baseline drift, background noise, etc., which can reduce the reliability and stability of the spectral model. This may be due to the rough surface of leaves, an abundance of veins on fresh mulberry leaves, and uneven mulberry leaf particles. In this work, spectral data were pretreated using different combinations of the 1st Der, SNV, mean center, and autoscaling to eliminate interfering information and to highlight spectral information.

Wavelength Selection
Generally, there is redundant information in the raw NIR spectra. Therefore, when the prediction model is built with the whole wavelength variables, the accuracy of the model will be reduced. The wavelength optimization can extract the characteristic wavelength variables of the component in samples to establish a more reliable prediction model [26]. At present, the commonly used characteristic wavelength screening algorithms mainly include genetic algorithms [27], CARS, UVE, moving window, and RF. In this work, the UVE, CARS, and RF methods were used to improve the reliability and accuracy of the prediction model.
The CARS is based on the simple and effective "survival of the fittest" principle to select wavelength variables, and it selects the optimal combination of wavelength variables with larger absolute regression coefficients in PLS regression [28]. The UVE is a wavelength optimization method based on the PLS regression coefficient b to eliminate the useless information of spectral data [14]. RF is a novel feature wavelength optimization method that can be iteratively modeled with a small number of wavelength variables [29,30]. This algorithm can calculate the probability that each variable is selected, and the wavelength is preferred according to the magnitude of the probability.

PLS Calibration
PLS is a linear regression modeling method for multiple independent variables versus multiple dependent variables [31][32][33]. It was used in calibration in this work.

Evaluation Method
The evaluation indicators for the model mainly include RMSEC, RMSECV, R 2 C , and R 2 CV . The smaller the values of RMSEC and RMSECV and the closer they are to each other, the more the prediction accuracy and the higher the stability of the model. The R 2 is used to describe the correlation between the two group variables. In the prediction model, the R 2 between the predicted and the measured values has a value range of 0~1, the closer the R 2 is to 1, the closer the predicted value is to the actual value. RPD is the ratio of SD to RMSE for the prediction set [34]. The higher the RPD value, the better the prediction ability of the established model. When RPD ≥ 3, it indicates that the prediction model has a good effect and can be used for rapid analysis and detection of unknown samples. When 2.5 < RPD <3, it indicates that the prediction model has general analysis ability, and the prediction accuracy needs to be improved. When RPD < 2.5, the prediction model is difficult. Rapid detection and analysis of unknown samples. RER is the ratio of the reference range of the prediction set to the RMSEP, which is similar in nature to RPD, but at least higher than 10 indicates that the prediction model is reliable.

Validation With Unknown Samples
Unknown mulberry leaves were collected to validate the prediction capability of built models for the water content, crude protein, and soluble sugar.

Software
The UVE was run on the toolbox of Chemoactbx, and the CARS and RF algorithms were performed with the libPLS toolbox (http://www.libpls.net/) [28], and they were all run on the MATLAB R2009 (MathWorks, Natick, MA, USA).