Model-Based Optimization of Spectral Sampling for the Retrieval of Crop Variables with the PROSAIL Model

Satellite hyperspectral Earth observation missions have strong potential to support sustainable agriculture by providing accurate spatial and temporal information of important vegetation biophysical and biochemical variables. To meet this goal, possible error sources in the modelling approaches should be minimized. Thus, first of all, the capability of a model to reproduce the measured spectral signals has to be tested before applying any retrieval algorithm. For an exemplary demonstration, the coupled PROSPECT-D and SAIL radiative transfer models (PROSAIL) were employed to emulate the setup of future hyperspectral sensors in the visible and near-infrared (VNIR) spectral regions with a 6.5 nm spectral sampling distance. Model uncertainties were determined to subsequently exclude those wavelengths with the highest mean absolute error (MAE) between model simulation and spectral measurement. The largest mismatch could be found in the green visible and red edge regions, which can be explained by complex interactions of several biochemical and structural variables in these spectral domains. For leaf area index (LAI, m2·m−2) retrieval, results indicated only a small improvement when using optimized spectral samplings. However, a significant increase in accuracy for leaf chlorophyll content (LCC, μg·cm−2) estimations could be obtained, with the relative root mean square error (RMSE) decreasing from 26% (full VNIR range) to 15% (optimized VNIR) for maize and from 77% to 29% for soybean, respectively. We therefore recommend applying a specific model-error threshold (MAE of ~0.01) to stabilize the retrieval of crop biochemical variables.


Introduction
Worldwide, there is an increasing interest in and need to optimize agricultural management systems to enhance yields while minimizing environmental impacts [1].Earth observation (EO) supports precision agriculture techniques by providing accurate spatial and temporal information of vegetation biophysical and biochemical variables, such as leaf area index (LAI, in m 2 •m −2 ) and leaf chlorophyll content (LCC, in µg•cm −2 ) [2].As several satellite hyperspectral EO missions are due to launch, these vegetation quantities will soon be monitored over large areas with potentially higher accuracies than is possible with today's multispectral systems [3].Planned missions for the near future include, for instance, the Italian Prototype Research Instruments and Space Mission Technology Advancement (PRISMA) [4] and the US Hyperspectral Infrared Imager (HyspIRI) [5], as well as the two German missions, the DLR Earth Sensing Imaging Spectrometer (DESIS) [6] and the Environmental Mapping and Analysis Program (EnMAP) [7].
The expected global availability of hyperspectral data requires the development of generic and transferable retrieval techniques [8,9].Considering the high spectral information content that will be provided by the new sensors, physically based retrieval techniques combined with flexible and computationally efficient machine-learning regression algorithms (MLRAs) can strongly support the effective exploitation of the full spectral signals.This can be realized, for instance, by employing radiative transfer models (RTM) such as the well-known and widely used canopy bidirectional reflectance model 4SAIL [10] and the PROSPECT-D leaf optical properties model [11].The combined model version, commonly called PROSAIL [12], constitutes an ideal compromise between model complexity and computational efficiency, in particular for the efficient processing of large datasets and especially for agricultural crops [13].To retrieve the sought biophysical or biochemical variable via RTM inversion, most commonly, look-up table (LUT) approaches [14][15][16][17] are applied.Such LUTs store a defined number (e.g., 100,000) of canopy parameter realizations and their corresponding simulated spectral reflectance.LUTs also serve as training databases for MLRAs within hybrid retrieval schemes [8].
Compared to the estimation of structural variables such as LAI, the estimation of biochemical leaf variables from the canopy signals has often been found to be difficult [18,19].Besides the use of hyperspectral data, the implementation of model constraints can enhance the estimation accuracy of those quantities [20,21].Useful constraints include, for instance: • spectral constraints: limit the analysis to predefined spectral samplings or spectral regions where the variable of interest dominates the spectral signal (e.g., [22]); • sampling constraints: predefine the distribution of the variables in the LUT, such as uniform or Gaussian based on a priori information, either from field sites or derived from growth models (e.g., [20,22]); • spatial constraints: take neighborhood information into account, resulting in object-based retrieval (e.g., [23]); • temporal constraints: use temporal autocorrelation to modify the cost function (e.g., [17,20]).
Such constraints, however, can only support accurate product retrieval if the prerequisite of having error-free measurements and model simulations is fulfilled.Indeed, a drawback of the physically based approaches-especially in the context of hyperspectral data-is associated to their sensitivity to corrupted or less well-modelled (and measured) wavelengths, biasing the retrieval [24,25].This implies the need to test the models' capability of reproducing the reflectance signal of the vegetation of interest before applying RTMs for the retrieval of biophysical and biochemical variables.Even small discrepancies between modelled and measured reflectance may otherwise affect the final retrieval accuracy.In a study by Atzberger et al. [24], this was exemplified for grasslands.Using a fully automated feature-selection (FS) algorithm, poorly modelled wavebands were sequentially identified and discarded, leading to more accurate variable estimations.A study analyzing different crops and a multispectral dataset [23] had to discard three from six crops (including maize) since the forward modelling, parameterized by field-measured LAI, did not match the observed signatures.The mismatch led to a failure of the LUT-based inversion.The future availability of hyperspectral data will allow for the more specific identification of modelled and measured spectral uncertainties.Feature-selection algorithms, as proposed by Atzberger et al. [24], can be implemented to improve retrieval.In view of the complex and varying canopy structures, such tests should be repeated on various crop types at different sites and with actual sensors throughout the growing season.
In this context, the objective of the present study was to test the capability of the PROSAIL model to reproduce hyperspectral measurements in the visible to near infrared (VNIR) region and to analyze the impact of erroneous bands-and their subsequent removal-on the retrieval of LAI and LCC for agricultural crops.We thus aim to improve the usability of PROSAIL simulations for two important crop types (maize and soybean) as a prerequisite for the application of the model within a retrieval scheme for biophysical and biochemical variables from future hyperspectral sensor missions' data.

Data Collection and Preparation
Spectroscopic data were collected concurrently with biophysical and biochemical variables at the University of Nebraska-Lincoln Agricultural Research and Development Center (UNL-ARDC), which takes part in the Carbon Sequestration Program.Spectroscopic measurements in the 400-1000 nm domain were carried out during multiple years at three agricultural fields (irrigated maize and irrigated maize-soybean rotation at 41 • 09 54.2 N, 96 • 28 35.9W, 361 m above sea level (a.s.l.); and rain-fed maize-soybean rotation at 41 • 10 46.8 N, 96 • 26 22.7 W, 362 m a.s.l.) using a dual-fiber optics system (i.e., two Ocean Optics USB2000 radiometers) mounted on an all-terrain sensor platform.Spectral reflectance sampling areas were associated to intensive measurement zones (IMZ).At each IMZ, green leaf area index (LAI, m 2 •m −2 ) and leaf chlorophyll content (LCC, µg•cm −2 ) measurements were carried out destructively from June 2001 to August 2008 [26,27].Since a spectral sampling distance of 6.5 nm represents a good compromise between several upcoming hyperspectral sensors, ranging from 2.55 nm (DESIS) over 6.5 nm (EnMAP) to 12 nm (PRISMA), EnMAP band-specific spectral response functions with 6.5 nm spectral sampling distance were applied to the field spectrometer data in the VNIR domain (N samples = 73 bands in the spectral range 423-863 nm).Due to the presence of noise in the spectrometer data above 870 nm, we decided to exclude the central wavebands from 871 nm to 975 nm.In this way, the available spectral information is reduced while still providing a valid hyperspectral database for the exploitation of optimal LAI and LCC retrieval.Currently, there are no spaceborne sensors and only very few airborne sensors that are suitable to 'emulate' the future satellites with the appropriate spectral resolution, signal-to-noise ratio (SNR) and sampling (especially the temporal sampling to cover different phenological phases), with field sampling activities running in parallel.Although there may be some spectral information missing, a field spectroscopy database is still ideal to simulate VNIR bands from future hyperspectral spaceborne sensors.
Note that the converted measurements were assumed to be error-free.However, the inclusion of measurement uncertainties in the inversion process may lead to improved retrieval, for instance, when using Bayesian techniques [28].Moreover, spatial resolution and atmospheric correction issues go beyond the scope of the current study.
For the present study, 169 measurements of LAI (m 2 •m −2 ) and LCC (µg•cm −2 ) were available for maize and 68 samples for soybean.A summary of the variable distribution is presented in Table 1.
Table 1.Summarized statistics of leaf area index and leaf chlorophyll content measured at the intensive measurement zones (IMZ) of maize and soybean at the Nebraska-Lincoln study site: total number (N), range (Min-Max), mean, standard deviation (SD), and coefficient of variation (CV).

Spectral Feature Selection
From each crop type, 20 sample spectra (corresponding to 20 IMZs) were selected to assess the agreement between the model-simulated and measured spectra and to identify/eliminate poorly simulated spectral bands.The 20-sample spectra were extracted from different dates to represent the range of phenological stages that occur during the course of a growth cycle.Following the procedure outlined in Atzberger et al. [24], for each IMZ, an individual look-up table (LUT ind ) was established with the PROSAIL model.LAI was fixed at its measured value, while all other model parameters were allowed to vary freely within a uniform distribution.This decision is based on the dominant influence of canopy structure (i.e., LAI) on the overall spectral signal.Soil reflectance and measurement geometry were specified according to actual conditions.Table 2 gives an overview of the specific individual LUT parameterization (LUT ind ).For each LUT ind , spectral signals were simulated for 10,000 randomly selected parameter combinations.Simulated PROSAIL reflectance values were converted to VNIR bands (N bands = 73) by applying the actual band-specific response functions of the future EnMAP hyperspectral imager.Due to the narrow spectral range of the field spectrometer used, the simulation was limited to the VNIR part of the spectrum.By means of the root mean square error (RMSE), calculated across the 73 selected bands, the best-fitting LUT spectrum was selected.The mean absolute error (MAE) was calculated for each of the 20 individual IMZs by subtracting the best-fitting LUT spectrum from the corresponding measured spectrum.After calculating the mean value of the resulting 20 MAEs, the least accurately modeled band was deleted and the procedure was repeated, running the LUT ind from N samples = 73 bands to N samples = 1 band.The resulting band combinations were applied to retrieve LAI (in m 2 •m −2 ) and LCC (in µg•cm −2 ) from the validation dataset (i.e., N = 169 for maize and N = 68 for soybean).
To demonstrate the influence of the different parameters on the output reflectance in the analyzed wavelength range (400-865 nm), a global sensitivity analysis (GSA) was carried out.A GSA informs about the contribution of each input parameter to the total variability of the output signal.It further provides information about parameter interactions [32,33].The GSA was performed for PROSAIL with input parameter ranges as presented in Table 2.A Matlab software tool (GSAT) [34] was applied for this purpose, including both Fourier amplitude sensitivity testing (FAST) analysis and Sobol's method to calculate first-order sensitivity coefficients.For more details, see also the work of Wang et al. [33].

Constrained LUT Inversion
For the LUT-based inversion of the PROSAIL model, we followed standard approaches, e.g., [19,30].The input parameters of the coupled leaf (PROSPECT-D) and canopy reflectance model (SAIL) are displayed in Table 2. Sun zenith angle (SZA) was set to 30 • .For the background soil reflectance (ρ soil ), the means of different bare soil spectra were calculated and implemented in the model.For leaf chlorophyll content estimations, different constraints were tested to improve the estimates.This included the Gaussian sampling of LCC in the LUT according to current growth stages (maize: mean value (µ) of 50 µg•cm −2 , standard deviation (σ) of 20 µg•cm −2 ; soybean: µ = 30 µg•cm −2 and σ = 20 µg•cm −2 ).Such information is not commonly available, but in our case, weekly acquisitions could be used to delimit the variables in the LUT according to realistic actual values.Other parameter ranges were set according to Table 2, following uniform distributions.The total size of the LUT was set to 20,000 combinations of input parameters (members) and the corresponding bidirectional reflectance was calculated with the PROSAIL model applying the EnMAP spectral response functions.To select the solution for the inverse problem of a defined measured spectrum, the RMSE between the measured and modeled spectra was calculated, and the best-fitting 50 spectra were selected as the final solution.We chose the best-fitting 50 spectra following the results of Danner et al. [35], who found that the median of the 50 (and 100) best fits-using a RMSE cost function-led to the lowest relative RMSE (rRMSE) for LAI retrieval.According to a study of Darvishzadeh et al. [19], no significant difference was found between the uses of the two statistical parameters of the median and mean.Therefore, we decided to use the mean value, as applied in our previous studies, e.g., [30].For the LUT-based inversion of the PROSAIL model, we followed standard approaches, e.g., [19,30].The input parameters of the coupled leaf (PROSPECT-D) and canopy reflectance model (SAIL) are displayed in Table 2. Sun zenith angle (SZA) was set to 30°.For the background soil reflectance (ρsoil), the means of different bare soil spectra were calculated and implemented in the model.For leaf chlorophyll content estimations, different constraints were tested to improve the estimates.This included the Gaussian sampling of LCC in the LUT according to current growth stages (maize: mean value (µ) of 50 µg•cm -2 , standard deviation (σ) of 20 µg•cm -2 ; soybean: µ = 30 µg•cm -2 and σ = 20 µg•cm -2 ).Such information is not commonly available, but in our case, weekly acquisitions could be used to delimit the variables in the LUT according to realistic actual values.Other parameter ranges were set according to Table 2, following uniform distributions.The total size of the LUT was set to 20,000 combinations of input parameters (members) and the corresponding bidirectional reflectance was calculated with the PROSAIL model applying the EnMAP spectral response functions.To select the solution for the inverse problem of a defined measured spectrum, the RMSE between the measured and modeled spectra was calculated, and the best-fitting 50 spectra were selected as the final solution.We chose the best-fitting 50 spectra following the results of Danner et al. [35], who found that the median of the 50 (and 100) best fits-using a RMSE cost function-led to the lowest relative RMSE (rRMSE) for LAI retrieval.According to a study of Darvishzadeh et al. [19], no significant difference was found between the uses of the two statistical parameters of the median and mean.Therefore, we decided to use the mean value, as applied in our previous studies, e.g., [30].For both crops, less-accurately modelled bands (i.e., with systematic bias) are situated in the spectral regions between 500 nm and 600 nm and between 700 nm and 800 nm.For maize, we For both crops, less-accurately modelled bands (i.e., with systematic bias) are situated in the spectral regions between 500 nm and 600 nm and between 700 nm and 800 nm.For maize, we generally observe larger and more frequent discrepancies than for soybean.For instance, the spectral domain from 600 nm to 650 nm was much better simulated for soybean than for maize crops.The similar pattern in Figure 1a,b indicates that PROSAIL has general difficulties in simulating certain spectral regions independent from the crop type.This has also been confirmed by other studies, for instance, analyzing grassland with the PROSAIL model [19] where a similar pattern for the 500-600 nm region was identified.Thus, a more in-depth analysis of the phenomenon is required here: a global sensitivity analysis from the PROSAIL model is used to demonstrate the effects of the different input parameters for each wavelength region (Figure 2).spectral regions independent from the crop type.This has also been confirmed by other studies, for instance, analyzing grassland with the PROSAIL model [19] where a similar pattern for the 500-600 nm region was identified.Thus, a more in-depth analysis of the phenomenon is required here: a global sensitivity analysis from the PROSAIL model is used to demonstrate the effects of the different input parameters for each wavelength region (Figure 2).Results of Fourier amplitude sensitivity testing (FAST) first-order sensitivity coefficients and interactions of canopy reflectance: global sensitivity analysis (GSA) of the PROSAIL model.N is the leaf structural parameter, LCC is the leaf chlorophyll content, Ccx is the leaf carotenoid content, CAnth is the leaf anthocyanin content, Cw is the equivalent water thickness (not present), Cm is the leaf dry matter content, LAI is the leaf area index, ALIA is the average leaf inclination, and αsoil is soil brightness.The brownish-red area corresponds to parameter interactions.Applied units and parameter ranges for the GSA can be found in Table 2.

Model Suitability Test and Feature Selection
It becomes clearly evident that exactly in the less well-modelled wavelength range between 500 and 550 nm (green visible (VIS) region), the two pigments of carotenoids (Ccx) and anthocyanins (CAnth) gain influence besides LCC.The total contribution of carotenoids is even stronger than that of chlorophyll content.Obviously, this is the critical wavelength region in which to decouple the effects of these pigments.Moreover, leaf dry matter content (Cm) and the structural SAIL parameters LAI and average leaf inclination angle (ALIA) as well as the soil background contribute to the reflectance signal between 500 nm and 550 nm (Figure 2).Whereas LAI has a quasi-continuous (and largest) influence along the spectral domains, the contribution of ALIA becomes more enhanced from 700 nm onwards.In the so-called red edge region, usually situated between 680-740 nm of the vegetation spectrum, there are also two pronounced peaks of LCC with abrupt leaf reflectance changes.This is caused by the combined effects of strong LCC absorption and leaf internal scattering [36].The high volatility of parameter contributions in these two spectral regions (green VIS and red edge) may lead to biased reflectance simulations if no proper parameterization has been applied.The question here is whether the leaf optical or the structural canopy model is responsible for the errors: The PROSPECT-D model was largely improved in comparison to its previous versions.This Results of Fourier amplitude sensitivity testing (FAST) first-order sensitivity coefficients and interactions of canopy reflectance: global sensitivity analysis (GSA) of the PROSAIL model.N is the leaf structural parameter, LCC is the leaf chlorophyll content, C cx is the leaf carotenoid content, C Anth is the leaf anthocyanin content, C w is the equivalent water thickness (not present), C m is the leaf dry matter content, LAI is the leaf area index, ALIA is the average leaf inclination, and α soil is soil brightness.The brownish-red area corresponds to parameter interactions.Applied units and parameter ranges for the GSA can be found in Table 2.
It becomes clearly evident that exactly in the less well-modelled wavelength range between 500 and 550 nm (green visible (VIS) region), the two pigments of carotenoids (C cx ) and anthocyanins (C Anth ) gain influence besides LCC.The total contribution of carotenoids is even stronger than that of chlorophyll content.Obviously, this is the critical wavelength region in which to decouple the effects of these pigments.Moreover, leaf dry matter content (C m ) and the structural SAIL parameters LAI and average leaf inclination angle (ALIA) as well as the soil background contribute to the reflectance signal between 500 nm and 550 nm (Figure 2).Whereas LAI has a quasi-continuous (and largest) influence along the spectral domains, the contribution of ALIA becomes more enhanced from 700 nm onwards.In the so-called red edge region, usually situated between 680-740 nm of the vegetation spectrum, there are also two pronounced peaks of LCC with abrupt leaf reflectance changes.This is caused by the combined effects of strong LCC absorption and leaf internal scattering [36].The high volatility of parameter contributions in these two spectral regions (green VIS and red edge) may lead to biased reflectance simulations if no proper parameterization has been applied.The question here is whether the leaf optical or the structural canopy model is responsible for the errors: The PROSPECT-D model was largely improved in comparison to its previous versions.This was emphasized by the results shown in the corresponding paper [11], indicating a high accordance of simulated versus measured leaf spectral reflectance (although this analysis was mainly carried out for tree species).Since the signal of the structural canopy parameters is dominant, the SAIL model or its parameterization may play a larger role in the error identification.The interested reader is referred to the paper of Danner et al. [37], in preparation, which discusses the special role of ALIA within the PROSAIL model environment.In summary, the combined effects of all these parameters may explain the discrepancies between simulated and measured reflectance in the two waveband regions: to obtain an optimal model result, input parameters should be defined according to thoroughly measured field data of all required parameters, which is a difficult task rarely accomplished.
On the other hand, we also observe subtle differences between the crop types.This phenomenon can be attributed to crop-specific reflectance behavior, also changing throughout the growing season.The rather simple parameterization of PROSAIL may not be sufficient to describe these canopy structure changes correctly [38].The study of Atzberger and Richter [23] discarded three from six crop types, amongst them maize, from their analyses as the forward modelling did not match the observed signatures.The authors explained this as a result of the strong row effects not taken into account by the 1D turbid medium model SAIL.Similar findings and explanations were revealed by other studies analyzing maize [30], but in their work, the authors also assumed that the early growth stage with a high (heterogeneous) soil fraction observable by the sensor caused the lower LAI retrieval accuracies compared to sugar beet crops.
The results of the feature-selection experiment are summarized in Figure 3. Initially, the mean absolute error of the simulated versus observed spectral profiles decreases strongly parallel to the decreasing number of bands; see Figure 3a.This effect is observed for maize at a higher level (max MAE = 0.0108) than for soybean reflectance (max MAE = 0.0103).Interestingly, the curves of both crops converge at a certain point (MAE = 0.009) and then decrease linearly until MAE = 0. Applying the corresponding band combinations for variable retrieval, the patterns shown for LAI in Figure 3b and LCC in Figure 3c are obtained.In the case of LAI retrieval, the RMSE of maize increases strongly until band combination 68 is reached-similar to the course of the MAE in the band removal procedure, as illustrated in Figure 3a.From this point onwards, the removal of more bands increases the estimation accuracy until a minimum RMSE of 0.54 m 2 •m −2 (rRMSE = 0.14) is reached with 42 from 73 spectral bands; 31 bands (mostly in the spectral regions of 500-600 nm and 700-800 nm) were deleted to obtain the optimal accuracy.In the case of soybean LAI, the minimum RMSE of 0.55 m 2 •m −2 (rRMSE = 0.21) is already reached with 67 from 73 spectral bands.The algorithm deleted only the six least well-simulated bands in the 550 nm and 700 nm regions.For both crops, the further removal of bands (i.e., N bands < 42/67) again led to an increase in RMSE, indicating that more and more informative bands are removed.The trends for LCC retrieval, demonstrated in Figure 3c, are different: the deletion of the least well-performing bands leads to a very strong increase of retrieval accuracy.After reaching the optimum feature set, accuracy remains stable (soybean) or again slightly decreases (maize).Note that the minimum RMSE occurs at different spectral samplings for each crop and variable type.Hence, an independent generalization can only be achieved through a compromise threshold, where the spectral match is appropriate and at the same time, the retrieval performance of the estimated variables is satisfactory.Looking at Figure 3a-c, this would be the case at around spectral band sampling N bands ~62, with a mean absolute error in the VNIR domain smaller than 0.01.In the study by Atzberger et al. [24], a threshold of MAE ≤ 0.02 was proposed.However, this finding was based on the full spectrum (HyMap sensor) with the short-wave infrared (SWIR) domain exhibiting the highest inaccuracies between the simulated and measured reflectance.Note that our study provides results from the VNIR region only, which includes the absorption spectra of leaf pigments (e.g., LCC) and the NIR shoulder that is very sensitive to canopy structure effects (i.e., LAI).not match the observed signatures.The authors explained this as a result of the strong row effects not taken into account by the 1D turbid medium model SAIL.Similar findings and explanations were revealed by other studies analyzing maize [30], but in their work, the authors also assumed that the early growth stage with a high (heterogeneous) soil fraction observable by the sensor caused the lower LAI retrieval accuracies compared to sugar beet crops.The results of the feature-selection experiment are summarized in Figure 3. Initially, the mean absolute error of the simulated versus observed spectral profiles decreases strongly parallel to the decreasing number of bands; see Figure 3a.This effect is observed for maize at a higher level (max MAE = 0.0108) than for soybean reflectance (max MAE = 0.0103).Interestingly, the curves of both crops converge at a certain point (MAE = 0.009) and then decrease linearly until MAE = 0. Applying the corresponding band combinations for variable retrieval, the patterns shown for LAI in Figure 3b and LCC in Figure 3c are obtained.In the case of LAI retrieval, the RMSE of maize increases strongly until band combination 68 is reached-similar to the course of the MAE in the band removal procedure, as illustrated in Figure 3a.From this point onwards, the removal of more bands increases the estimation accuracy until a minimum RMSE of 0.54 m 2 •m −2 (rRMSE = 0.14) is reached with 42 from 73 spectral bands; 31 bands (mostly in the spectral regions of 500-600 nm and 700-800 nm) were deleted to obtain the optimal accuracy.In the case of soybean LAI, the minimum RMSE of 0.55

Biophysical and Biochemical Variable Estimations
For LAI, an estimation accuracy of RMSE = 0.55 m 2 •m −2 was obtained for maize using all available VNIR bands (N bands = 73).Applying the model-based optimized spectral sampling (N bands = 42) resulted in no significant retrieval improvement with RMSE = 0.54 m 2 •m −2 .For soybean LAI, similar results were obtained with RMSE = 0.57 m 2 •m −2 using N bands = 73 and RMSE = 0.56 m 2 •m −2 using optimized sampling (N bands = 67).Figure 3b indicates both RMSE minima in the respective spectral sampling as explained in Section 3.1.
Generally, the LAI retrieval results are consistent or slightly worse compared to similar studies with relative RMSEs from 0.13 to 0.23 m 2 •m −2 [35,39,40].The application of the 'optimized sampling' only led to a small and nonsignificant improvement compared to the usage of all available bands (maximal rRMSE difference = 0.01).This is due to the fact that the estimation performance does decrease somewhat after deleting a certain number of bands, especially for soybean; see Figure 3b.The 'reduced' spectral information of the Nebraska dataset from 423-863 nm provides valuable information of LAI.An interesting study was conducted by Verrelst et al. [29], who used the same dataset and selected the most important spectral bands for LAI retrieval (406 nm, 746 nm, 792 nm, 794 nm, 798 nm, 858 nm, and 878 nm).The authors also obtained a very high estimation performance for LAI retrieval of RMSE = 0.4, suggesting that this limited spectral range turned out to be sufficient for good retrieval.However, the authors [29] used a very different approach, which was based on the target variable, whereas in our study, the feature selection was purely based on the model itself.
A band selection study is highly relevant even if restricted to the VNIR domain.The SWIR region, nonetheless, is a very important spectral region for LAI retrieval as well, as it was demonstrated by global sensitivity analysis of the PROSAIL model (for instance, in [29]), and should certainly be regarded in another study where corresponding spectral data are available-especially if the focus is on leaf water and/or protein content.
However, LAI shows high sensitivity across the whole spectrum (400-2500 nm).It is therefore possible to obtain stable estimates using only a reduced number of wavebands.This has been shown by numerous studies in the last years, for instance [41][42][43], even estimating LAI successfully from multispectral information and/or using simple vegetation indices.
Contrary to LAI, the feature selection showed a significant positive impact for leaf chlorophyll content estimation.This was already evident in Figure 3c, and becomes more obvious in Figure 4: whereas the use of all available VNIR (N bands = 73) spectral bands led to an overestimation and saturation of the LCC retrieval of maize (see Figure 4a), the exclusion of the seven bands with the highest MAE between the model simulation and spectral measurements (i.e., 705 nm, 712 nm, 719 nm, 726 nm, 733 nm, 740 nm and 748 nm) clearly improved the estimation results, from rRMSE of 0.26 to 0.15 (see Figure 4b).
The removed wavebands are situated partly in the same spectral region for both crops (red edge from 700-750 nm), where the chlorophyll content has expiring absorption features: In PROSPECT-D, the absorbance of LCC drops below 0.001 beyond 733 nm (0.14% of the maximum factor) and becomes 0 at 781 nm.Moreover, although there is no influence of LCC anymore, the wavebands in the near-infrared (NIR) region from 755 to 863 nm were kept for further simulations.Two factors need to be considered: First, the NIR region can still influence the LCC, as it was also found and explained by, e.g., the study of Verrelst et al. [29].They selected the following nine bands for optimal LCC retrieval: 482 nm, 500 nm, 564 nm, 710 nm, 712 nm, 714 nm, 878 nm, 966 nm, and 980 nm.The authors explained this phenomenon by mechanisms of variable covariation: measured leaf and canopy variables of the same target hold some dependency as vegetation variables are For soybean, again, a strong LCC overestimation prevails when using all available 73 bands (Figure 4c).When optimizing to N bands = 53, i.e., excluding 498 nm, 503 nm, 508 nm, 513 nm, 518 nm, 523 nm, 528 nm, 533 nm, 538 nm, 543 nm, 548 nm, 553 nm, 698 nm, 705 nm, 712 nm, 719 nm, 726 nm, 733 nm, 740 nm, and 748 nm central wavelengths, retrieval results significantly improved (see Figure 4d).In summary, results of LCC (µg•cm −2 ) retrieval clearly show a positive impact of removing spectral bands with absolute MAE (between simulation and measurement) larger than 0.01 before applying the RTM inversion.
The removed wavebands are situated partly in the same spectral region for both crops (red edge from 700-750 nm), where the chlorophyll content has expiring absorption features: In PROSPECT-D, the absorbance of LCC drops below 0.001 beyond 733 nm (0.14% of the maximum factor) and becomes 0 at 781 nm.Moreover, although there is no influence of LCC anymore, the wavebands in the near-infrared (NIR) region from 755 to 863 nm were kept for further simulations.Two factors need to be considered: First, the NIR region can still influence the LCC, as it was also found and explained by, e.g., the study of Verrelst et al. [29].They selected the following nine bands for optimal LCC retrieval: 482 nm, 500 nm, 564 nm, 710 nm, 712 nm, 714 nm, 878 nm, 966 nm, and 980 nm.The authors explained this phenomenon by mechanisms of variable covariation: measured leaf and canopy variables of the same target hold some dependency as vegetation variables are interrelated.Second, as the reflectance in the VIS region is controlled by both LCC and LAI, it certainly helps to get a good estimation of LAI (from VNIR) to model LCC well.Here, the NIR region is used somehow to 'normalize' for variations in biomass/LAI.For soybean, even wavelengths from the chlorophyll-sensitive region had to be removed (green visible range 500-550 nm).In the study of Verrelst et al. [29], the red edge region was found to be important for LCC retrieval using the same dataset, but when applying a statistical feature-selection method (based on Gaussian processes regression, GPR), which was directly applied on the target in situ validation dataset.However, in our study, the feature selection was purely model-based, implying that the selected spectral samplings only relied on spectral fits between the model simulation and reflectance measurements before the retrieval algorithm was applied.Therefore, the in situ measured variables had no impact on the retrieval, and a high MAE leads to the exclusion of that band regardless of whether it is sensitive for the variable.Besides, the usage of spectral bands not located in sensitive regions of the considered variable (e.g., the near-infrared region for chlorophyll content) could potentially add some value to the retrieval due to the mechanisms of variable covariation [29].We could only partly confirm this with our findings, since the red edge region was removed due to PROSAIL simulation failure.The different findings (i.e., different optimal spectral samplings) of our study compared to the study of Verrelst et al. [29] imply that results further depend on the used retrieval algorithm.Statistical methods may achieve the same or even superior variable estimation results with a lower number of bands, but they always rely on the availability of in situ measured variables.This shortcoming can be overcome with a physically based approach.However, the aim of this model error analysis was not to generate the best possible spectral sampling for retrieval of certain variables, but mainly to assure that the simulated reflectance corresponds well with the measured (field/satellite) spectra.Thus, our feature-selection algorithm could be applied at first, and then-if the user has confirmed that the model reproduces the measured spectral bands with satisfying accuracy (for instance, MAE < 0.01)-statistical methods such as a principal component analysis (PCA) could be employed to condense the significant information content, which should further improve the retrieval of the variables.The two approaches therefore have different purposes and effects.With our analysis, we aim to raise the awareness of researchers to the limitations of the applied models.
Even after feature selection, the scatter around the 1:1 line remains high, with rRMSE of 0.15 (maize) and 0.29 (soybean).These values are comparable with other studies, for instance, with rRMSE of 19% and RMSE of 8.4 µg•cm −2 for LCC of the same crops and site [29].Innovative multispectral EO systems, such as the ESA's Sentinel-2 sensors, enable the derivation of vegetation products, as shown by many studies (for a discussion, see [3]).However, hyperspectral sensors equipped with very narrow spectral bands over a continuous spectral range will support the detection of vegetation properties more accurately [3].This may concern, in particular, diverse biochemical plant components, such as chlorophyll, anthocyanins, and carotenoids, with very close and partly overlaying absorption features not distinguishable by broadband scanners.This advantage of hyperspectral over multispectral data was also found by other studies using hyperspectral field data for LCC as well as for LAI estimations [44][45][46].Whatever the system used, full uncertainty information should be provided along with the biochemical and biophysical products [47], since even small margins of increased accuracy may significantly reduce uncertainties in an operational processing chain [3].

Conclusions
Radiative transfer models, such as PROSAIL, have been applied for variable retrieval without testing the model in the forward direction.Researchers often relied on the apparently good experience of previous studies.However, observed spectral reflectance (from field/airborne/spaceborne sensors) can deviate from the forward model simulation in certain spectral regions, further leading to errors in the retrieval with whatever method used.Thus, before inverting these models, suitability tests are recommended [13,24,48].This implies a need to analyze the model's capability of reproducing the measured (canopy bidirectional) spectral signals when parameterized using field measurements.Spectral bands with a systematic mismatch between measured and simulated spectra should be removed to avoid bias effects leading to suboptimal performances.So far, no publication exists analyzing the capability of PROSAIL to reproduce crop hyperspectral measurements from simulated data of future sensors, such as HyspIRI, DESIS, PRISMA, or EnMAP.We could show with our analyses that the PROSAIL model is not able to reproduce the measured spectral signals with adequate precision in certain wavelength regions (green VIS/red edge).We could further show that these model limitations depend on the crop type.Eliminating these spectral regions by applying a LUT-based feature-selection algorithm, i.e., deleting the wavebands with MAE > 0.01 between the simulated and measured reflectance, improved variable retrieval (depending on the analyzed crop type and the considered variable).This process can be fully automated and thus could be integrated in standard retrieval schemes, which are based on the comparison of modeled and measured reflectance signatures.Note that these findings are not related to the sensitivity of the variable, but rather to the limited model performance in the green visible and red edge spectral regions for these two specific crop types.Depending on the current growth stage and specific canopy architecture, the simulations can represent the actual measurements with varying precision [13].It must be taken into consideration, however, that also the collected field spectra may contain errors.To minimize impacts of erroneous field measurements, we analyzed data from 40 experiments with carefully measured spectra.
In summary, our study has two main outcomes: • There are two spectral regions in the VNIR region which are less well-modelled by PROSAIL independently of crop type: the green visible and the red edge.This can be explained by complex interactions of several biochemical and structural variables in these specific spectral regions.The green visible wavelength region is characterized by the influence of several pigments, in particular, carotenoids, chlorophylls, and anthocyanins.Moreover, there is an influence of leaf dry matter content, LAI, ALIA, and soil background.Regarding the red edge region, there is also a high variability with two strong peaks of chlorophyll content interacting with structural LAI and ALIA parameters.

•
Discarding those wavelengths with a spectral mismatch of MAE > 0.01 leads to improvements of the leaf chlorophyll content retrieval.Such model-based analysis should therefore be carried out before applying any retrieval or data reduction algorithm.
The PROSPECT-D model used for this study has been improved using large validation datasets and outperforms previous model versions [11].However, another prerequisite of a well-working model is proper parameterization.This may even more significantly concern the structural properties of the SAIL model, especially LAI and ALIA.An extensive study is currently in preparation analyzing the suitability of the ALIA parameter within the PROSAIL model environment ( [37], intended for the same Special Issue).The practical implications of our results for users of the PROSAIL model would be to parameterize as solidly as possible, optimally through the acquisition of field measurements.
Since for each crop type, specific band combinations were identified, crop-type maps are crucial for the success of our proposed feature-selection method.Alternatively, trade-offs must be defined.For instance, the proposed model-error threshold of MAE ~0.01, at which the model (PROSAIL) matches the measured spectra in the VNIR region for all crops, should be applied before employing any retrieval algorithm in the case that the crop types are not known.
Upcoming spaceborne imaging spectrometers will lead to vast hyperspectral data streams.This calls for automated and optimized spectral uncertainty reduction techniques to ensure fast and efficient data processing, for instance, for the retrieval of vegetation properties.Conclusively, our proposed feature-selection method can provide an efficient measure to improve radiative transfer model simulations that are to be used within retrieval schemes for biophysical and biochemical variables from future hyperspectral sensor missions.To confirm our findings, we strongly recommend extending the study to other crop types as well as other (ground, airborne, and satellite-based) datasets.

Figure 1
Figure 1 shows the spectral deviations (i.e., MAE) of 20 IMZs between measured and simulated canopy spectra for maize and soybean for known (field-measured) LAI (in m 2 •m −2 ).

Figure 1
Figure 1 shows the spectral deviations (i.e., MAE) of 20 IMZs between measured and simulated canopy spectra for maize and soybean for known (field-measured) LAI (in m 2 •m −2 ).

Figure 1 .
Figure 1.Mean absolute error between measured and simulated reflectance (R) average values for 20 intensive measurement zones (IMZs) of maize (a) and soybean (b) with known (field-measured) LAI.

Figure 1 .
Figure 1.Mean absolute error between measured and simulated reflectance (R) average values for 20 intensive measurement zones (IMZs) of maize (a) and soybean (b) with known (field-measured) LAI.

Figure 2 .
Figure 2.Results of Fourier amplitude sensitivity testing (FAST) first-order sensitivity coefficients and interactions of canopy reflectance: global sensitivity analysis (GSA) of the PROSAIL model.N is the leaf structural parameter, LCC is the leaf chlorophyll content, Ccx is the leaf carotenoid content, CAnth is the leaf anthocyanin content, Cw is the equivalent water thickness (not present), Cm is the leaf dry matter content, LAI is the leaf area index, ALIA is the average leaf inclination, and αsoil is soil brightness.The brownish-red area corresponds to parameter interactions.Applied units and parameter ranges for the GSA can be found in Table2.

Figure 2 .
Figure 2.Results of Fourier amplitude sensitivity testing (FAST) first-order sensitivity coefficients and interactions of canopy reflectance: global sensitivity analysis (GSA) of the PROSAIL model.N is the leaf structural parameter, LCC is the leaf chlorophyll content, C cx is the leaf carotenoid content, C Anth is the leaf anthocyanin content, C w is the equivalent water thickness (not present), C m is the leaf dry matter content, LAI is the leaf area index, ALIA is the average leaf inclination, and α soil is soil brightness.The brownish-red area corresponds to parameter interactions.Applied units and parameter ranges for the GSA can be found in Table2.

Figure 3 .
Figure 3. PROSAIL behavior when applying sequential band removal, i.e., gradually deleting the least-accurately simulated band during each LUTind run.Mean absolute error (MAE) between measured and simulated spectra (a).Usage of corresponding crop-specific band settings for LAI (m 2 m -2 ) (b), and LCC (µg cm -2 ) (c) retrieval, respectively.For the analyses in (b) and (c), the whole dataset with N = 169 maize (brown) and N = 68 soybean (green) samples was used.

Figure 3 .
Figure 3. PROSAIL behavior when applying sequential band removal, i.e., gradually deleting the least-accurately simulated band during each LUT ind run.Mean absolute error (MAE) between measured and simulated spectra (a).Usage of corresponding crop-specific band settings for LAI (m 2 m −2 ) (b), and LCC (µg cm −2 ) (c) retrieval, respectively.For the analyses in (b,c), the whole dataset with N = 169 maize (brown) and N = 68 soybean (green) samples was used.

Table 2 .
Parameterization of individual look-up tables (LUT ind ) with notations, units, range of parameters, and references for the model suitability test.LAI (m 2 •m −2 ) was fixed as measured during the field campaigns at the individual maize and soybean intensive measurement zones (IMZs).