Improving the Selection of Vegetation Index Characteristic Wavelengths by Using the PROSPECT Model for Leaf Water Content Estimation

: Equivalent water thickness (EWT) is a major indicator for indirect monitoring of leaf water content in remote sensing. Many vegetation indices (VIs) have been proposed to estimate EWT based on passive or active reﬂectance spectra. However, the selection of the characteristics wavelengths of VIs is mainly based on statistical analysis for speciﬁc vegetation species. In this study, a characteristic wavelength selection algorithm based on the PROSPECT-5 model was proposed to obtain characteristic wavelengths of leaf biochemical parameters (leaf structure parameter (N), chlorophyll a + b content (Cab), carotenoid content (Car), EWT, and dry matter content (LMA)). The effect of combined characteristic wavelengths of EWT and different biochemical parameters on the accuracy of EWT estimation is discussed. Results demonstrate that the characteristic wavelengths of leaf structure parameter N exhibited the greatest inﬂuence on EWT estimation. Then, two optimal characteristics wavelengths (1089 and 1398 nm) are selected to build a new ratio VI (nRVI = R1089/R1398) for EWT estimation. Subsequently, the performance of the built nRVI and four optimal published VIs for EWT estimation are discussed by using two simulation datasets and three in situ datasets. Results demonstrated that the built nRVI exhibited better performance ( R 2 = 0.9284, 0.8938, 0.7766, and RMSE = 0.0013 cm, 0.0022 cm, 0.0030 cm for ANGERS, Leaf Optical Properties Experiment (LOPEX), and JR datasets, respectively.) than that the published VIs for EWT estimation. It is demonstrated that the built nRVI based on the characteristic wavelengths selected using the physical model exhibits desirable universality and stability in EWT estimation.


Introduction
Leaf water content (LWC) is a significant variable involved in physiological processes and drought stress of plants and is an influencing factor on short-term risk of fire [1][2][3][4][5]. Hence, LWC variation is a significant factor in estimating plant growth status and in providing guidance for agricultural water management [6][7][8].
Traditional methods for monitoring LWC are time-consuming, laborious, and destructive [9]. Remote sensing technology can be an effective method for quantitative assessment of a crop's water content from the leaf-scale to canopy-scale [10]. In addition, accurately obtaining LWC is the foundation of evaluating canopy water content [11]. Due to the characteristic wavelengths of water absorption in short-wave infrared (SWIR) (1300-2500 nm) and near-infrared (NIR) (750-1300 nm), changes in LWC will result in differentiation of reflectance spectra. Then, the reflectance spectrum can be used to estimate LWC [12][13][14]. two optimal characteristics wavelengths to calculate the new ratio VI for EWT estimation; (3) to analyze the robustness of the new ratio VI and the four published VIs; (4) to validate the performance of the new ratio VI for EWT estimation by using two simulation datasets (without and with 2% random Gaussian noise) and three in situ datasets (ANGERS, Leaf Optical Properties Experiment (LOPEX), and JR) by comparing with the published VIs based on the GPR model.

Datasets
In this study, two simulation datasets generated by the PROSPECT model and three publicly available measured datasets (ANGERS, LOPEX, and JR) were used.

Simulation Datasets
The PROSPECT model is a radiative transfer model developed based on the plate model to simulate hemispheric reflectance and transmittance in the 400-2500 nm spectral range of broadleaf [21]. It assumes that the leaf is superimposed by N homogeneous layers, with the upper layer denoted as leaf surface, and that the light exhibits non-diffuse properties on the leaf surface and diffuse properties inside the leaf [23]. The optical properties of vegetation leaves are mainly dependent on the biochemical components of leaves. The PROSPECT model is widely used for the simulation of leaf spectral information and corresponding biochemical components [7,38]. Different plant species and growth states under different environmental conditions can be simulated based on the range of variation in the input parameters of the PROSPECT model. The PROSPECT model includes five main input parameters, which are four biochemical parameters: Cab, Car, EWT, and LMA, and one leaf structure parameter N. N represents the number of leaf layers and is generally entered as continuous values for the model, taking into account the subtle variations of the leaf structure.
The variation ranges of all input parameters in the PROSPECT model were set according to Féret et al. [32] and Sun et al. [39] (Table 1). To avoid unrealistic combinatorial generations of leaf biochemical parameters, the covariance between leaf traits should be considered when generating the simulated datasets. The combination of leaf biochemical parameters was simulated using a Gaussian distribution while considering the correlation matrix between the parameters [32]. Then, based on the PROSPECT forward model, two simulated datasets containing 2000 leaf spectra in the 400-2500 nm spectral range with 1 nm bandwidth were generated using the generated combinations of leaf biochemical parameters, with one synthetic dataset simulating random errors with 2% random Gaussian noise (n = 1000) [23] and another synthetic dataset simulating ideal state without noise (n = 1000). To avoid using synthetic datasets without the factors that emerged in field measurements, we applied experimental datasets. As we wished to evaluate VIs for different types of datasets, three experimental datasets were utilized. The first dataset was the ANGERS Leaf Optical Properties dataset, which was built in Angers, France, in June 2003 by S. Jacquemound et al. [23]. It measured a total of 276 leaf samples from 43 different plant species. The second dataset was the Leaf Optical Properties Experiment (LOPEX) dataset, the first publicly available experiment dataset established by the Joint Research Center of the European Commission in 1993 [40]. The dataset measured a total of 320 leaf samples from 45 different plant species, which were sampled multiple times throughout their growth, and each sample consisted of five individual leaves. The last dataset is JR, which was measured in Jasper Ridge, California [41]. This dataset contains only 30 leaf samples after removing the spectral errors during the measurement. For JR dataset, reflectance measurements were averaged over approximately ten leaf samples from a single plant. Besides, different leaf samples were used for reflectance measurements and biochemical analysis [42]. All these factors may affect the correlation between EWT and the reflectance spectra with JR dataset and reduce its estimation accuracy. For the three datasets, the spectral ranges of ANGERS, LOPEX, and JR are 400-2450 nm, 400-2500 nm, and 400-2498 nm, respectively. The leaf spectral sampling interval for the ANGERS and LOPEX datasets is 1 nm, and for the JR dataset is 2 nm (ANGERS and LOPEX datasets can be found at http://opticleaf.ipgp.fr/index.php?page=database (accessed on 9 December 2020)).

Characteristic Wavelength Selection Algorithm
Characteristic wavelength selection is based on sensitivity analysis (SA) and correlation analysis between different bands. In this study, SA was applied to calculate the contributions of different leaf biochemical parameters in the PROSPECT5 model towards the leaf reflectances. Sobol's global SA was completed using a software tool (GSAT) of MATLAB R2015b (MathWorks, Inc.) [43,44]. A ranking list of wavelengths in the range of 400-2500 nm was derived based on the sensitivity of each model parameter to different wavelengths. In addition, the information redundancy between selected wavelengths was reduced by correlation analysis to improve the information validity of the selected wavelengths. The characteristic wavelength selection used the simulation dataset without noise [39].
The workflow chart of the characteristic wavelengths selection algorithm is shown in Figure 1 and the details are as follows. Step (1) A ranking list of contribution rates/sensitivities (S) of PROSPECT5 input parameters (N, Cab, Car, EWT, and LMA) to leaf reflectance over the 400-2500 nm spectral range was obtained by SA, and a correlation coefficient matrix (C) between bands over the 400-2500 nm spectral range was obtained by correlation analysis ( Figure S1).
Step (2) Select the wavelength with the highest sensitivity as the initial wavelength for each model parameter (W1). The first n wavelengths in the ranked list of the sensitivity of each parameter were taken as the candidate wavelength set (Wc). The n value was set according to the sensitivity of each parameter to leaf reflectance (N: n = 400; Cab: n = 100; Car: n = 50; EWT: n = 300; LMA: n = 50).
Step (3) The wavelength with the smallest correlation coefficient between the candidate wavelength set and the initial wavelength was then selected (W2).

VIs Correlation Analysis and Robustness Analysis
Many VIs have been proposed and utilized to monitor leaf biochemical parameters in previous literature [45][46][47]. In this study, the new ratio VI (nRVI) was constructed by selecting the two characteristic wavelengths that are suitable for EWT estimation based on the characteristic wavelength selection algorithm. To compare the performance of nRVI for EWT estimation, four optimal published VIs were selected based on different datasets by using the statistical model with the highest coefficient of determination (R 2 ) ( Table 2). To evaluate the correlation between the selected published VIs, nRVI, and EWT, correlation analysis was performed using the five datasets. To construct a suitable VI for EWT estimation requires not only achievable correlation with the target parameter EWT but also robustness to other disturbances [48,49]. To assess the robustness of VIs to interfering factors, the effects of wide range variations of N, LMA, and spectral noise on the estimated EWT of VIs were simulated separately using the PROSPECT model.

Gaussian Process Regression
Different machine learning regression algorithms (MLRAs) are applied for the RTM estimation, for instance, artificial neural network (ANN), support vector regression (SVR), random forest (RF), and Gaussian process regression (GPR) [50][51][52][53]. Several studies have shown that GPR is more accurate among these MLRAs [50,54,55]. GPR is a non-parametric model that prioritizes Gaussian processes and assumes that the training sample is a sample of Gaussian processes for data regression analysis. GPR is essentially a model that uses Bayesian inference for solving, and the kernel function closely influences the model estimates. Indeed, the kernel function in the GPR model describes the covariance function of the correlation between training samples. In this study, the squared exponential kernel function was used. For further details about GPR, see Verrelst et al. and Camps-Valls [50,56,57].

Statistical Analysis
In this paper, the synthetic datasets were randomly divided into two parts: 70% for training and 30% for validation. Due to the small sample number, in situ datasets were utilized through three-fold cross-validation. Considering the prediction values of GPR mostly based on the training samples, the process of GPR was repeated 100 times for all datasets to ensure stable results. To evaluate the predictive capability of nRVI and selected published VIs for EWT estimation, the R 2 , root mean square error (RMSE), and normalized RMSE (NRMSE) were used.

Selection of Characteristic Wavelengths and nRVI Construction
The characteristic wavelengths of different leaf components were obtained based on the synthetic dataset without noise (n = 1000). Table 3 lists the selected characteristics wavelengths. This study focused on the estimation of EWT by using constructed nRVI based on selected characteristic wavelengths. Thus, how to effectively determine the wavelength of VI according to the selected characteristic wavelengths of different biochemical parameters is the first problem that needs to be solved. Then, the effect of combined characteristic wavelengths of different biochemical parameters and EWT on the accuracy of EWT estimation were analyzed. Table 4 presents the effect of combined characteristic wavelengths of different biochemical parameters and EWT on the accuracy of EWT estimation by using the five datasets. The results showed that the combination with the characteristic wavelengths of EWT and N is optimal for EWT estimation with higher R 2 values and lower RMSE and NRMSE values than other combinations. However, two characteristic wavelengths of EWT, which does not contain enough information, were selected to estimate EWT. The accuracy of EWT estimation by adding the characteristic wavelengths of N was improved further than by adding the characteristic wavelengths of other biochemical parameters (Cab, Car, and LMA). Thus, the characteristic wavelengths of EWT + N (746, 1089, 1398, and 1906 nm) were extracted and used for further analysis.  33.1941 Two feature bands are selected for each biochemical parameter and whether it will influence the precision of EWT estimation. Then, the effect of the number of wavelengths on EWT reversion is discussed, and the results are listed in Table 5.  Table 5 shows that the accuracy of EWT estimation based on GPR is not directly proportional to the number of bands. The estimation results of full-wavelength (400-2500 nm) are obviously inferior to that of full-characteristics wavelengths (10), and characteristic wavelength EWT + N (4). For the full-wavelength (400-2500 nm), the reflectance information between adjacent bands has a great correlation and collinearity. Thus, it can influence the convergence and efficiency of regression algorithms using full-band variables that serve as input parameters to train the statistical model [58]. However, the difference between the performance of full-characteristic wavelengths (10) and characteristic wavelength EWT + N (4) for EWT estimation does not clearly exclude the data synthetic spectrum with 2% random Gaussian noise. For the synthetic data with 2% random Gaussian noise, the possible interpretation is that the Gaussian noise differs from the actual measurement error, and such a difference will influence the analysis results. In addition, the superior capacity of the four characteristic wavelengths of EWT + N compared with that of the two characteristic wavelengths of EWT for EWT estimation based on the five datasets. Thus, the spectral information is adequate for EWT estimation based on the selected characteristic wavelengths of EWT + N (4).
Tables 4 and 5 demonstrate that the characteristic wavelengths of N exhibited the greatest influence on EWT estimation among these biochemical parameters. Therefore, two characteristic wavelengths from N and EWT were selected, then constructed the new ratio VI (nRVI) for EWT estimation. Two optimal wavelengths were selected based on the SA of the spectrum by using the PROSPECT model and the correlation between the reflectance spectra of characteristic wavelengths of N + EWT and EWT parameter. The position of four characteristic wavelengths of N and EWT was listed in the SA of biochemical parameters ( Figure 2). The correlation between the reflectance spectra of characteristic wavelengths of N + EWT and EWT parameter is shown in Figure 3.   Figure 2 demonstrates that the region of EWT characteristic wavelengths was influenced by other biochemical parameters. Thus, the influence of other parameters needs to be eliminated to improve the accuracy of EWT monitoring. The characteristic wavelength at 746 nm is located at the range of the red-edge band, which belongs to a transition band. The reflectance spectrum at 746 nm was affected by Cab and LMA. However, the spectrum at 1089 nm was less affected by other pigments than the characteristic wavelength at 746 nm. Thus, the characteristic wavelength of N at 1089 nm was selected as a band for constructing nRVI.
For the characteristic wavelength of EWT, the major influencing factors include N, EWT, and their interaction. The contribution ratio of interaction at 1096 nm is higher than that at 1398 nm based on SA. In addition, the correlation between the spectrum information at 1398 nm and the EWT parameter is higher than that at 1906 nm ( Figure 3). The characteristic wavelengths of N at 746 and 1089 nm exhibited low correlations with EWT parameter. Hence, the characteristic wavelength at 1398 nm was selected as another band to construct nRVI. Therefore, two characteristics wavelengths (1089 and 1398 nm) were selected to build the nRVI (nRVI = R1089/R1398) in this study.

Correlation Analysis Between the nRVI, Selected Published VIs, and EWT
The correlations between the four selected published VIs and EWT for five corresponding datasets are provided in Figure 4. The correlations between the nRVI and EWT for all five datasets are shown in Figure 5.   Figure 5 shows that nRVI formed a significant positive relationship with EWT for all five datasets. The correlation between nRVI and EWT was stronger than the selected VIs for synthetic spectrum with 2% random Gaussian noise, ANGERS, LOPEX and JR datasets. For synthetic spectrum with 2% random Gaussian noise, LOPEX and JR datasets, the correlations (R 2 = 0.782, 0.907, and 0.813; RMSE = 0.311%, 0.209%, and 0.27%) between nRVI and EWT (Figure 5b,d,e) were better than the correlations (R 2 = 0.681, 0.828, and 0.701; RMSE = 0.376%, 0.284%, and 0.34%) among WBI, NDII, SR, and EWT (Figure 4b,d,e). For the synthetic dataset without noise, the unremarkable differences were observed in the correlations (R 2 = 0.952, RMSE = 0.147%) between WBI and EWT ( Figure 4a) and between nRVI and EWT (R 2 = 0.948, RMSE = 0.152%, Figure 5a). These results suggested that the built nRVI based on the selected characteristic wavelengths using a physical model can be applied in estimating EWT in this study.

Robustness Analysis
The results of Section 3.2 show that the nRVI was more likely to achieve high precision EWT estimation than other VIs. However, the robustness of nRVI to interference factors, for instance, N, LMA, and spectral noise, should be explored in depth for EWT estimation. The results of the robust analysis are as follows.

Robustness to the Change of N
The influences of N on nRVI and the other VIs are shown in Figure 6. For the five VIs, the value of N affected the relationships between the VIs and EWT, especially at high EWT contents. Among them, the influences of N on SR were relatively large in the case of the lowest value of N (N = 1) and the high EWT values (0.045-0.07 cm). Compared to SR, the other VIs revealed less variability and more stability in the change of N.  Figure 7 indicates the performance of the five VIs for EWT estimation across various N values based on the synthetic dataset with 2% Gaussian noise. WBI performed worst for the whole range of N in Figure 7. While Figure 6 showed that the effect of N on WBI was small, the worst performance of WBI may be caused by the artificial noise of synthetic dataset. SR had the worst R 2 for EWT estimation only at a low N value (N = 1) that was consistent with Figure 6. The performance of MSI and NDII for EWT estimation was almost the same. Hence, the robustness of VIs to N changes was only related to the associated bands and independent of the form of indices. Compared to the other VIs, nRVI had the highest accuracy and the best stability for EWT estimation across various N values.

Robustness to the Change of LMA
The influence of LMA on nRVI and the published VIs are shown in Figure 8. For nRVI, WBI, and SR, the effect of LMA content on the relationship between EWT and VIs was less than that of MSI and NDII, especially at low EWT values (0.005-0.035 cm). nRVI, SR, and WBI all showed lower variability and higher resistance to the changing LMA content at low EWT values (0.005-0.035 cm). For MSI and NDII, LMA (0.005-0.03 g/cm 2 ) effects were greater throughout the range of EWT variation (0.005-0.07 cm). Figure 9 indicates the performance of the five VIs for EWT estimation across various LMA contents based on the synthetic dataset with 2% Gaussian noise. WBI still performed worst for the whole range of LMA values that are similar to N changes, according to Figures  7 and 9, and the possible reason was the adding artificial noise in the synthetic dataset. The precision of MSI and NDII was also almost the same. When LMA = 0.005 g/cm 2 , the R 2 of MSI, NDII, and SR were 0.6816, 0.6816, and 0.6932, while when LMA = 0.03 g/cm 2 , the R 2 of MSI, NDII, and SR decreased to 0.3375, 0.3350, and 0.5439. This indicated that the accuracy of MSI, NDII, and SR for EWT estimation decreased obviously with the increase of LMA content. The nRVI still performed best no matter how the LMA content changed.   Figure 10 indicates the performance of nRVI, WBI, MSI, SR, and NDII for EWT estimation based on synthetic datasets with various random Gaussian noise. According to the results, all VIs performed the best accuracy without added noise. For the effects of adding noise, the higher the level of added noise, the greater the impact on VIs, but the five VIs were affected to varying degrees. WBI had no resistance to spectral noise regardless of the level of added noise, which was consistent with its performance in Figures 7 and 9. The worst performance of WBI in Figures 7 and 9 was due to the use of synthetic datasets with added artificial noise. SR had poor accuracy and great instability when the level of added noise was high (>2%). Therefore, the WBI and SR are maybe not appropriate indices for EWT estimation due to the sensitivity to the spectral noise. The accuracy of MSI and NDII for EWT estimation decreased sharply with the increase of noise level. However, nRVI had the best performance for EWT estimation in spectral noise resistance compared with the other VIs. Therefore, nRVI had the highest robustness to interference factors for EWT estimation.

Validation of the Performance of nRVI and Selected VIs for EWT Estimation
The performances of nRVI and four selected VIs (WBI, MSI, SR, and NDII) for EWT estimation are analyzed based on the GPR model using two synthetic and three public experimental datasets. The reversion results are shown in Table 6. The capabilities of nRVI and selected VIs for EWT monitoring based on the GPR model by using different datasets are validated ( Table 6). All indices exhibited the best performance for the synthetic dataset without noise, followed by ANGERS, and with the poorest performance observed in JR. Possibly, the synthesized dataset without noise was ideal, whereas the ANGERS, LOPEX, and JR datasets contained different species and measurement errors, especially for the JR dataset, which included too few samples. For the synthetic data with noise, the addition of noise significantly reduced the accuracy of EWT estimation for the five VIs compared to the synthesized dataset without noise, but the noise had the least effect on the nRVI.
Among these VIs, nRVI exhibited better performance in EWT estimation compared with the other VIs with the five different datasets. Due to the sensitivity to spectral noise, WBI performed worst on EWT estimation in the synthetic data with noise. MSI and NDII still showed similar results on EWT estimation in all datasets as in robustness analysis. They were just superior to the WBI and SR and almost consistent with nRVI for EWT estimation using the LOPEX dataset. But for the other datasets, MSI and NDII cannot perform as well as nRVI all the time. nRVI possesses a better capability to estimate EWT than other VIs and provides satisfactory accuracy in EWT estimation for different datasets.
The results presented in Table 6 suggest that the nRVI performed well in EWT estimation for synthetic and in situ datasets. Thus, nRVI, which was built based on the selected characteristic wavelengths by using the physical model, exhibited better universality and stability in the evaluation of EWT than the four selected optimal VIs based on the statistical model.

Discussion
In this study, nRVI was built for EWT estimation based on two selected optimal characteristic wavelengths (1089 and 1398 nm) by using a physical model. Then, the correlation between the nRVI and EWT and that between the published four VIs and EWT were analyzed. Results showed that the nRVI features a remarkably positive correlation with EWT ( Figure 5). Additionally, the nRVI and four other optimal published VIs were evaluated in the sensitivity of EWT and the insensitivity of the interference (e.g., N, LMA, and spectral noise). Last, the performance of nRVI and published VIs for EWT estimation based on the GPR model was analyzed and compared using five datasets (one synthetic dataset without noise, one synthetic dataset with 2% random Gaussian noise, and ANGERS, LOPEX, and JR datasets).
In previous studies, numerous researchers have investigated the selection of sensitive wavelength for leaf biochemical parameter estimation based on the PROSPECT model. Zarco-Tejada et al. [59] indicated that in the inversion of coupling of the canopy model to PROSPECT, utilizing red-edge spectral index (e.g., 750 and 710 nm) in function minimization outperformed that utilizing all single spectral reflectance bands from hyperspectral images. Song et al. [60] used multivariate analysis and correlation between adjacent bands methods to obtain major wavelengths; then, four selected narrow bands (552, 675, 705, and 776 nm) were subsequently used to monitor nitrogen stress in paddy rice, and wavelengths of 1158, 1378, and 1965 nm were selected and applied to estimate irrigation stress. He et al. [61] proposed an angular insensitivity vegetation index (AIVI) based on green, blue, and red-edge bands; AIVI exhibited the highest association with leaf nitrogen concentration compared with traditional VIs. Sun et al. [39] proposed a method of wavelength selection by the PROSPECT model and established band combinations to evaluate leaf Cab and EWT. These studies showed the feasibility of wavelength selection for leaf trait estimation. According to Table 5, the precision of EWT estimation using characteristic wavelengths was superior to the performance of full-wavelengths for EWT estimation based on GPR. Compared with full-wavelength, the characteristic wavelengths can be efficient and timesaving in the estimation of leaf biochemical parameters. However, previous studies rarely focused on the application of characteristic wavelengths for the biochemical parameter estimation based on a physical model. Furthermore, most of the published VIs were established based on water absorption wavelengths selected by the statistical model, for instance, 970 nm for WBI [9], 1450 nm for SR [46], and 1600 nm for NDII and MSI [11,62]. The published VIs for EWT estimation is limited by certain datasets or plant species. Moreover, the selection of wavelengths for the construction of VIs is based on the statistical model for specific vegetation species and lacks a certain physical mechanism and universality. Considering the restriction of wavelength selection and published VIs for leaf biochemical parameter estimation based on a statistical model, characteristic wavelengths were selected by using the physical model PROSPECT based on the SA. Then, two optimal characteristic wavelengths (1089 and 1398 nm) were selected to build the nRVI (nRVI = R1089/R1398) for EWT estimation. Additionally, the robustness of nRVI and the four published VIs in N, LMA, and spectral noise was analyzed, and the anti-interference ability of the published four VIs was weak. But nRVI exhibited better anti-interference ability and sensitivity to target parameters. Meanwhile, based on the results shown in Table 6, the nRVI method exhibited better accuracy and robustness in EWT estimation than the four selected optimal VIs based on the statistical model. Furthermore, nRVI can partly decouple the influence of N and LMA on EWT estimation. Hence, the built nRVI based on the selected characteristic wavelengths by using the physical model exhibited better universality and stability in the EWT estimation than the published VIs, which can guide agricultural water management.
The results in Sections 3.3 and 3.4 indicated that the MSI and NDII performed significant similarity in robustness analysis and EWT estimation based on GPR. MSI and NDII involved the same wavelengths but only in a different form of the indices. This maybe illustrates that the selected wavelengths of VIs are more critical than the selection form of indices in EWT estimation or other biochemical parameters.
Different methods were used for the estimation of leaf biochemical parameters. Colombo et al. [18] used inverse ordinary least squares (OLS) and reduced major axis (RMA) regression methods to estimate EWT by different spectral indices, and RMA obtained the best regression results by spectral indices related to the continuous removal area at 1200 nm with 61% explained variance and 6.6% prediction error. Feret et al. [32] used two regression methods, spectral indices, and partial least squares (PLS) for the retrieval of leaf biochemical parameters, and the RMSEs of the two methods in EWT estimation of experimental data were 0.0037 cm and 0.0025 cm, respectively. Sun et al. [39] performed the inversion of Cab and EWT by the PROSPECT model using selected characteristic wavelengths; the RMSE of EWT estimation for ANGERS, LOPEX, and JR datasets was 0.0023 cm, 0.0029 cm, and 0.0059 cm. Verrelst et al. [37] estimated Cab form hyperspectral reflectance data using GPR, and the GPR regression model was well validated for measured data. This study used the constructed new ratio VI (nRVI) to perform well in the EWT estimation of simulated and in situ datasets through the GRP model. Compared with the traditional methods for estimating leaf biochemical parameters, machine learning methods are more robust and stable, and their application in leaf and canopy scale parameter retrieval is becoming more widespread.
Although significant discoveries were revealed in this study, the limitations require further discussion. First, to determine the optimal wavelength of VIs, two characteristic wavelengths were selected for each leaf biochemical parameter through the PROSPECT-5 model. The effect of the number of characteristic wavelengths on the performance of built nRVI for EWT estimation should be discussed in the following researches. In addition, the optimal characteristic wavelengths of nRVI were determined through a physical model, and universality was verified by using large quantities of synthetic and experimental datasets. However, the discussion about the effect of plant species and different growth stages on EWT estimation remains lacking. Hence, more vegetation species and measured field datasets should be included in future work. Although nRVI has performed well in different datasets at the leaf scale, there has been no further study at canopy or aerial, or even spaceborne scale. Canopy water concentration is an important variable for the evaluation of plant water status. The effect of leaf area index (LAI) on nRVI performance needs to be considered at the canopy scale, as changes in LAI can lead to confusion of spectral features with water, which can reduce the accuracy of canopy water concentration estimation [63]. In addition to LAI, canopy water content estimation using nRVI also needs to consider the effects of soil background, leaf inclination, and canopy structure. These factors affect not only the nRVI to canopy scale, but also all leaf scale biochemical parameter retrieval methods up to canopy scale [61]. For the scale issue, a model retrieval method for estimating leaf reflectance spectra using canopy remote sensing canopy spectra data was proposed [64,65]. Using the top-down inversion model, leaf reflectance spectra can be obtained from remote sensing data, thus allowing nRVI to be used for canopy water content estimation.

Conclusions
This study extracted the characteristic wavelengths of leaf biochemical parameters (N, Cab, Car, EWT, and LMA) by using the proposed characteristic wavelengths selection algorithm based on the PROSPECT model. The effect of combined characteristic wavelengths of EWT and different biochemical parameters on the accuracy of EWT estimation was analyzed. Results demonstrated that the characteristic wavelengths of N exhibited the greatest effect on EWT estimation, and two characteristics wavelengths (1089 and 1398 nm) were selected to build nRVI (nRVI = R1089/R1398) for EWT estimation. The robustness of nRVI and the four published VIs in N, LMA, and spectral noise was analyzed, and nRVI exhibited better anti-interference ability and sensitivity to target parameter than other parameters. Furthermore, the performance of nRVI for EWT estimation was validated by using two simulation datasets (without and with 2% random Gaussian noise) and three in situ datasets (ANGERS, LOPEX, and JR) based on the GPR model. The nRVI exhibited higher accuracy and robustness in EWT estimation for different types of datasets than published VIs. The characteristic wavelengths of nRVI were selected based on a physical model and could be applied for monitoring crop water deficiency stress accurately, which can provide the guidance for agricultural water management. Besides, the proposed method features the promising potential for application in different environments for various kinds of plants water deficiency stress.