Optimal Exploitation of the Sentinel-2 Spectral Capabilities for Crop Leaf Area Index Mapping

The continuously increasing demand of accurate quantitative high quality information on land surface properties will be faced by a new generation of environmental Earth observation (EO) missions. One current example, associated with a high potential to contribute to those demands, is the multi-spectral ESA Sentinel-2 (S2) system. The present study focuses on the evaluation of spectral information content needed for crop leaf area index (LAI) mapping in view of the future sensors. Data from a field campaign were used to determine the optimal spectral sampling from available S2 bands applying inversion of a radiative transfer model (PROSAIL) with look-up table (LUT) and artificial neural network (ANN) approaches. Overall LAI estimation performance of the proposed LUT approach (LUTN50) was comparable in terms of retrieval performances with a tested and approved ANN method. Employing sevenand eight-band combinations, the LUTN50 approach obtained LAI RMSE of 0.53 and normalized LAI RMSE of 0.12, which was comparable to the results of the ANN. However, the LUTN50 method showed a higher robustness and insensitivity to different band settings. Most frequently selected wavebands were located in near infrared and red edge spectral regions. In conclusion, our results emphasize the potential benefits of the Sentinel-2 mission for agricultural applications. OPEN ACCESS Remote Sens. 2012, 4 562


Introduction
The increasing demand of accurate quantitative information on land surface properties continues to drive the design and launch of innovative Earth observation (EO) missions.High-quality data delivered from new sensors offer the unique opportunity to continue and improve the derivation of biophysical variables, such as leaf area index (LAI), vegetation cover fraction (fCover) or fraction of photosynthetically-active radiation (fAPAR), being drivers of several important physiological key processes, including evapotranspiration and photosynthesis.These variables describe the spatial distribution of vegetation state and dynamics and therefore provide essential input for a wide range of ecological models in numerous fields of application and research [1,2].This includes, for instance, the monitoring of forest dynamics [3], climate modeling or the assessment of the carbon and nutrient cycle from regional to global scales [4].One of the main application fields is the agricultural sector, where biophysical variables, such as LAI, are needed, among others, to support the development of precision farming techniques.This anticipates their implementation into different models for the simulation of crop growth and variability, nutrient demand or irrigation water requirements [5].
The present study was conducted against the background of the future ESA Sentinel-2 (S2) multispectral satellite mission [6], which is embedded in the framework of the Global Monitoring for Environment and Security (GMES).The Sentinel mission's objective is to provide continuity to services depending on current multi-spectral high-resolution optical observations over global terrestrial surfaces, such as the adequate quantification of geo-biophysical variables and the mapping of land-cover/land-change detection [7].
The first from two S2 sensors is scheduled to be launched in the year 2013.Spectral sampling is inherited from sensors that have been used for vegetation monitoring in the last decades, such as SPOT, Landsat and MODIS.The pair of Sentinel-2 satellites will be equipped with visible, near infrared and shortwave infrared sensors.In total, the sensors will have 13 spectral bands (with central wavebands described in the method section), listed as Band 1-8, 8a and 9-12 in [7].With a spatial resolution of 10 m (4 bands), 20 m (6 bands) and 60 m (3 bands), the new sensors will address high to medium resolution applications.The short revisit time of five days (at the equator under cloud-free conditions) or even 2-3 days (at mid-latitudes) will allow the effective monitoring of vegetation status and dynamics.More detailed information and technical characterization of the mission can be found in the mission requirements document [7] and on the ESA Sentinel-2 website.
Various methods have been proposed, applied and improved in recent decades for the retrieval of biophysical products.Estimation approaches can be pooled into two groups [1,8].The first group comprises statistical models, principally relying on the learning of the relation between the sought variable and the reflectance.Possibly the simplest statistical approach is represented by vegetation indices (VI), being among the oldest methods to gain information about vegetation characteristics from remote sensing data [9,10].More complex chemometric and statistical techniques have been developed to overcome the widely-discussed drawbacks of VIs [11][12][13].These include, among others, partial least square regression (PLSR) [14], stepwise multiple linear regression (SMLR) [15], red-edge inflection point (REIP) [16], spectral unmixing (SUM) [17], artificial neural networks (ANN) [18] or kernel-based methods, such as support vector regression (SVR) [19].
The second group involves physically-based approaches, i.e., radiative transfer models (RTM) in combination with different inversion strategies [20][21][22].RTMs of different complexities have been developed, describing the interaction of radiation with vegetation, assuming the canopy as a simple one-dimensional (1-D) turbid medium up to more realistic three-dimensional (3-D) architectures [23].The choice of the model depends on the kind of application, vegetation types monitored and the level of accuracy required.
Regarding the inversion of RTMs, different methodologies have been employed.The most traditional and classical approaches are iterative optimization techniques, e.g., [21,24].Look-up tables (LUT) are also widely applied inversion strategies [24][25][26].Other methodologies involve genetic algorithms [27], SVR [28] or Bayesian methods such as Monte-Carlo Markov Chains (MCMC) [29].Some of the latter techniques must be regarded as hybrid approaches, since they combine statistical principles with the radiative transfer theory.Prominent representatives of these hybrid methods are artificial neural networks (ANNs) [30].One of the main concerns for the retrieval of variables with these techniques is that model inversion usually does not correspond to the Hadamard's postulate of well-posedness [24], meaning that more than one unique solution to the problem is possible.Different methodologies have been studied to overcome this problem (for an overview see [1]), among others, the use of a priori information of the estimates [24] or increasing of data dimensionality in spectral, spatial or temporal terms.A very promising strategy, for instance, is the use of neighborhood information for regularization, as demonstrated by [31] using data based on Sentinel-2 sensors configuration.
An extensive discussion about different inversion techniques, the ill-posed nature of model inversion and related issues can be found in [8] or [32].
The definition of the S2 sensors spectral sampling was based on careful analyses of spectral regions of interest for geo-/biophysical vegetation variables, but also a range of other applications, constituting the best compromise in terms of user requirements and mission performance [6].Several studies evaluated spectral regions or band settings of the S2 sensors for vegetation biophysical variable estimation applying empirical (e.g., [33]), or physically-based retrieval methods (e.g., [26]).Though these studies found good-to-optimal results, differences in retrieval accuracy for individual variables may be achieved when employing different retrieval methods with diverse band settings.For instance, some retrieval methods, such as vegetation indices, are limited to only a few spectral bands, which face serious shortcomings as discussed in [34].On the other hand, physically-based retrieval methods, such as RTM inversion, could obtain better accuracies when using defined spectral sampling compared to the exploitation of the full spectral information available from a sensor [35,36].This is explained by some studies [35,37] with (excessively) high noise levels in some bands of the spectral measurements or by limitations of the model to properly simulate those bands.The reasons for this effect may be even more multifaceted, depending on the used sensor, field measurement protocol, applied model and model configurations.In fact, the study from [35] showed that the use of spectral subsets does not improve the retrieval performance compared to full spectral resolution when the (full) measured spectra are well presented by the model simulations.Nevertheless, several studies have revealed that few bands from multi-or hyperspectral data sources are sufficient for the characterization of vegetation and the discrimination between different surface properties, such as soil, water or snow [38,39].Employing most possible spectral information would only increase the computation time without further improvement of the results.Therefore, to obtain the most accurate results of vegetation biophysical variables possible, together with reasonable computational demands, a defined spectral sampling should be preferred in many cases over the use of the full spectral information.
ESA Sentinel-2 sensors will be highly requested for operational agricultural applications.Therefore, the objective of the present study was to define the optimal use (number of bands and spectral regions) of available Sentinel-2 bands for crop LAI retrieval.For this purpose, data from a field campaign were exploited by means of RTM-based model inversions using look-up table and neural network algorithms.

Test Site and Data Acquisition
The data set investigated in the present study was obtained in the framework of the ESA SPectra bARrax Campaigns (SPARC).These campaigns were carried out in July 2003 (therefore called here: SPARC'03) in Barrax, which is an agricultural test area situated within La Mancha region in Southern Spain (approx.39°3′N, 2°6′W).Briefly, the SPARC'03 campaign aimed at supporting calibration and validation activities of existing algorithms and the development of new ones, such as for geo-/biophysical variable retrievals.Details can be found in [40].
Airborne hyperspectral data of the area were acquired by the HyMap imaging system [41] on 13 July 2003 around 11:20 UTC, with flight lines parallel to the principal plane (towards the sun).The sensor recorded spectral reflectance in 126 spectral channels with a ground sampling distance (GSD) of 5 m.Atmospheric correction and radiometric calibration were carried out by the Laboratory for Earth Observation of the University of Valencia using a modified MODTRAN4 code [42].Ground LAI measurements were collected non-destructively by means of the LICOR LAI-2000 Plant Canopy Analyzer instrument [43].In total, 70 LAI measurements of alfalfa, maize, sugar beet, garlic and onion, collected concurrently to the HyMap sensor overpass, were analyzed for the present study.A stratified random sampling strategy was applied with a minimum of 12 measurements per Elementary Sampling Unit (ESU).The mean value of these measurements represented the final LAI value for each ESU.Dimensions of the ESUs corresponded to 20 m × 20 m, being a compromise for the different spatial resolutions of the various remote sensing acquisitions during the SPARC'03 campaign.Detailed description of the measurements can be found in [44,45].
Clumping of the leaves was only partially regarded by the instruments and corresponding software.Moreover, no corrections were applied to account for the influence of non-green plant components, such as stems or senescent leaves.Thus, the term LAI used here for ground measurements corresponds to the effective plant area index (PAI eff ) [46,47].The error arising from the ground LAI measurements can be up to 10% depending on the degree of crop heterogeneity.Moreover, other potential sources of uncertainties may originate from the (optical) instrument, such as illumination conditions, saturation effects, or instrument simplifications [48].However, the vegetation surface apparent to a space-or airborne remote sensing instrument corresponds rather to an "effective green area index," since leaf overlapping can lead to saturation of reflectance, in particular for higher LAI values [1].Therefore, a correction of the clumping effect may not be explicitly necessary when comparing deviates of the LAI-2000 instrument and a remote sensor [49].Measurement differences between the two instruments might be largest in the presence of non-green plant components, for instance during flowering or later crop growth stages.For a homogenous coverage-such as in a middle growth stage with mainly green plant components-differences between the two approaches may be rather marginal.

Relative Transfer (RT) Model and Inversion Procedures
The widespread PROSPECT-5 [50] and SAIL models [51] coupled in "PROSAIL" were chosen for the study.Comprehensive descriptions of the models already have been published [52].Thus, only their main characteristics are briefly sketched here: the SAIL model simulates the bi-directional reflectance of homogeneous canopies as a function of soil reflectance, illumination and viewing geometries, several structural and biophysical variables, such as LAI, average leaf angle (ALA) and a hot spot parameter, implemented by [53].Leaf optical properties (reflectance and transmittance) are simulated by the PROSPECT-5 model as a function of a structure parameter N, leaf chlorophyll content (C ab ), dry matter content (C m ), carotenoids (C ar ) and leaf water content (C w ).
For the estimation of the variables from the PROSAIL model, an adequate inversion procedure has to be defined.For this purpose, a look-up table was chosen as the main approach.LUTs belong to the most simple inversion strategies.However, they provide accurate results if an appropriate sampling of the canopy characteristics is realized [1,54].By means of the LUT method, a global search of the best solution is performed, thus avoiding being trapped into local minima, as can occur with iterative optimization methods [35].For comparison, an approved artificial neural network inversion approach was selected [37].ANNs combine two advantages: first, they are computationally very fast, and second, they have the ability to approximate any (non-linear) relationship between different variables.
However, unexpected behavior may occur if the training data base does not well represent the spectral characteristics of the analyzed canopies [55].
For the setup of both inversion strategies, a synthetic data base with a size of 49,152 variable combinations was generated using the PROSAIL model.PROSAIL was configured for the simulation of the future ESA S2 sensors spectral band configurations according to the sensor spectral response functions.Variables and model parameters were randomly sampled within bounds (see Table 1) and applying truncated Gaussian distribution laws representative for different world vegetation types as proposed by [20].Soil background was approximated by extracting and averaging bare soil signatures from different fields of the HyMap imagery.A simple multiplicative soil reflectance factor (α soil , Table 1) is assumed to mimic variations of reflectance due to changes in superficial soil water content [56].
A stratified sampling scheme was used to ensure that values from each class (N class, Table 1) were combined with values from each other variable class.Illumination and viewing conditions (sun and sensor viewing angle, azimuth between sun and sensor) corresponded to those during the image acquisition.Atmospheric correction and instrumental noise can result in multiplicative and additive uncertainties.Radiometric calibration might be inaccurate, which leads to systematic errors.Moreover, the used RTM may contain errors depending on its (simplified) description of the radiation regime in a vegetation canopy.Thus, to at least partly account for these uncertainties, the inclusion of noise in the simulations was decided.Fifty random initializations were generated adding and multiplying Gaussian white noise (absolute: 0.01 and relative: 4 %) to the simulated reflectance.This was done band-dependently and band-independently according to [37], who demonstrated that the combination of all these error-terms performed best for variable retrievals: (1) R(λ) corresponds to the final and R sim (λ) to the simulated reflectance by the RTM, ε(0,σ) representing a normal distribution, with σ rel (λ) and σ rel (all) representing the relative uncertainty applied to band λ and to all bands respectively, and with σ abs (λ) and σ abs (all) characterizing the absolute uncertainty added to band λ and to all bands, respectively.
For the LUT, a simple cost function composed of the root mean square error (RMSE) was employed [24,26].Hereby, the spectra of the closest (radiometric) match with the measured signal were selected.The selection of the final solution was composed of two steps.As a first step, all variables (i.e., LAI) were averaged that corresponded to the spectra within less than 20% of the lowest RMSE value.This value has been chosen according to our own tests and trials.Moreover, the 20%-threshold [22,57], or generally the application of multiple solutions [35], was also proposed by similar studies.As second step and final solution, the mean LAI was computed over fifty random initializations (found as sufficient by [37] for ANNs) with additive/multiplicative noise (Equation ( 1)).The results of this procedure are abbreviated with "LUT N50 ".For comparison purposes, the LUT retrieval was performed without step two, i.e., the selection procedure was performed only once, comparing measured and simulated spectra without adding/multiplying noise, abbreviated with "LUT N1 ".
[ ] For the ANN, a three-layer feed forward, back propagation neural network was designed using the neural network toolbox in MatLAB ® .Tan-sigmoid transfer functions were implemented in the hidden layer and linear transfer functions in the output layer.
Performing a sensitivity analysis, we found a number of five neurons for the hidden layer as optimal (not shown).This setting has also been found by [20,37].The number of input neurons depended on the number of bands used for the training, while the output layer was composed of only a single neuron for the prediction of LAI.The use of a single neuron in the output layer has been suggested by [20,37] and moreover was found as optimal by our own tests.
The synthetic data base was split into three subsets.The first was used for updating the weights and biases of the network (50%-training).The second data set (25%) was employed to check the progress of the training algorithm, thus to prevent over-fitting.This implies that these data were not completely independent but an essential part of the training process to select the right model.The third (25%) subset was then used for independent model evaluation and therefore to obtain confidence of the final model.Whereas the second data subset was a part of the reiteration process, the (third) validation data set was used only once.The final solution of LAI was then calculated as average of all 50 networks.

Band Sensitivity Analysis
In order to identify the optimal spectral sampling, i.e., how many and which of the available S2 bands would be required for best LAI retrieval performance, a band sensitivity analysis was carried out.For this purpose, the approaches described above were applied to all possible combinations of bands.The synthetic S2-bands were grouped into arrangements, which are defined by different numbers of potentially used bands.Thereby, between two, and up to ten, spectral bands may be included in one arrangement.For each possible arrangement, the maximum number of possible band combinations was calculated (see Table 2).Since the HyMap sensor covers the spectral information of future S2 sensors, all bands of interest for our study (i.e., 10 out of 13) could be included in the analyses.Hence, the wavebands of the HyMap sensor most adjacent to the following central S2 wavebands were incorporated: 490 nm, 560 nm, 665 nm, 705 nm, 740 nm, 783 nm, 842 nm and 865 nm, 1,610 nm and 2,190 nm.This may include uncertainty.However, HyMap does not provide the required high spectral resolution to apply the S2 sensors' spectral response functions.
In this way, only channels providing a GSD of 10 m or 20 m were considered.This decision was taken, because in the context of GMES land monitoring applications, the purpose of these bands will mainly be the mapping of geo-biophysical vegetation variables, land use and land cover, whereas the remaining three bands with a GSD of 60 m (443 nm, 945 nm and 1,375 nm) are foreseen as being used for atmospheric correction [7].
Spatial aspects in view of the three different GSDs of the S2 sensors were not considered in the current study.However, to provide comparability of the remotely-sensed estimates from the HyMap sensor with the in situ LAI measurements, reflectance mean values of 4 × 4 pixels were extracted.For this purpose, the central coordinates of the LAI ESUs were taken.Comparison of measurements and simulations are therefore based on a 20 m ground sampling distance.

Results and Discussion
In this section, outcomes of the spectral band analyses are presented and discussed.Three aspects are considered: first, the distribution of RMSE values between measured and estimated LAI ("LAI RMSE") for all possible band arrangements is analyzed.Second, the importance of the different spectral regions for LAI estimation is addressed.Finally, crop specific differences are elaborated.

Optimal Number of Bands
The distribution and variation of the resulting LAI RMSE values for each band arrangement can be well illustrated through box plot diagrams (Figure 1).Overall best accuracy (LAI RMSE min = 0.53, Table 2) was obtained by the LUT N50 with the seven/eight-band arrangements (Figure 1(a)).Worst results instead are achieved through LUT N1 by a combination within a two-band arrangement (LAI RMSE min = 2.1, Table 2, Figure 1(b)).However, the RMSE min differences between ANN and LUT N50 approaches cannot be regarded as significant.These results rather show that the proposed LUT N50 approach with implemented noise levels is comparable in terms of retrieval performances with a tested and approved ANN method.
For all three approaches, the size of the boxes (which correspond to 50 % of the data) tends to decrease with increasing number of bands included in the arrangement, meaning that the variability (dispersion) of the results also diminished when higher numbers of bands were used.This is also expressed by the smaller distance of the whiskers for the band arrangements with higher numbers of included bands.For the ANN approach, the decrease only is constant until eight bands are included, Figure 1(c).
An overview of the obtained RMSE min from the combinations of each investigated band arrangement is presented in Table 2: whereas the ANN and LUT N1 approaches reached absolute minima with 4 to 6 (ANN) and 5/6 (LUT N1 ) band combinations, the LUT N50 approach obtained the RMSE min with 7 and 8 bands.Similar-though not identical-results have been found in previous studies: Verger et al. [37] found seven out of 62 bands as optimal for LAI estimation using Compact High Resolution Imaging Spectrometer (CHRIS)/Proba sensor data and applying an ANN approach.Weiss et al. [58] selected six from nine synthetic bands (simulated with a RTM), obtaining RMSE min between measured and simulated LAI using a look-up table approach.Fourty et al. [59] found five to eight wavebands for estimating accurately different canopy biophysical variables, using multiple linear regression on simulated data with PROSAIL.In another study, exploiting the PROSAIL model with hyperspectral airborne DAIS data, 22 from 30 bands performed best for LAI retrieval.However, the strongest reduction of RMSE was found from using up to eight bands [36].Whereas the absolute lowest RMSE values are on a similar level for ANN and LUT N50 , the latter provides lower variability of RMSE differences (i.e., highest RMSE min − lowest RMSE min ): 0.17 of LUT N50 compared to 0.25 of the LUT N1 and 0.33 of the ANN methods.This indicates higher robustness and a lower sensitivity of the LUT N50 method regarding the optimal number of spectral bands.
As demonstrated, the results of such analyses may depend on the algorithms employed.However, generally it can be said that the optimal number of bands is around six to eight for the estimation of LAI.Whereas the use of only a few bands (two to four) enhances the ill-posed inverse problem and therefore the retrieval uncertainty, the employment of too many bands (more than eight/nine) may again lead to decreasing accuracy of the estimates.This can be caused by redundant spectral information, noise in the measured reflectance or the inability of the RTM to appropriately simulate certain spectral regions [37].

Optimal Spectral Sampling
In order to identify the spectral bands most often used by the approaches, all band combinations of each arrangement between the RMSE min and the 0.05 quantile of RMSE min were selected from LUT N50 and ANN approaches.All cases within the 0.05 quantile, instead of simply selecting the best band combination, were included in order to account for some uncertainty.The 0.05 quantile included between one (for nine-band arrangement) and 13 (for five-band arrangement) cases.In Figure 2, the frequency of the selected bands is presented.Whereas Figure 2(a) shows the frequency of selection for each single band, Figure 2(b) indicates the most frequent selection grouped per spectral region: visible (VIS, 490 nm, 560 nm and 665 nm), red edge (705 nm, 740 nm and 783 nm), near infrared (NIR, 842 nm, 865 nm) and short wave infrared (SWIR, 1,610 nm and 2,190 nm).
In some aspects, both approaches (i.e., LUT N50 and ANN) show the same tendency: no band and thus no spectral region were completely excluded by the algorithms.However, there are some strong differences, visible in Figure 2(a): the LUT N50 approach selected most often the red edge band (705 nm), closely followed by the two NIR bands and the red and green visible (665 nm and 560 nm).Less often selected bands were located in the blue visible (490 nm), but in particular in the SWIR region.Instead, the ANN approach prioritized the two NIR bands, followed by the blue visible, then red edge, green visible and the two SWIR bands.The red visible was the less often selected spectral band.Looking at the spectral groups (Figure 2(b)), these differences diminish to a more similar pattern: most often selected bands were located in the NIR, followed by the red edge and VIS domains, or by the SWIR respectively, in case of the ANN.To some extent, the same tendency as for the optimal number of bands was found: the LUT N50 methods exhibit lower sensitivity to the selection of bands than the ANN approach, at least from visible to NIR domains.
The frequent selection of NIR bands was expected: multiple scattering between the spongy mesophyll cells is very pronounced in this spectral region (e.g., [60]).Therefore, it is a well-known fact that reflectance increases with increasing leaf material-thus LAI.The dominance of NIR reflectance in this context has been also found by other studies, for instance [36], where the absolute minimum RMSE for LAI retrieval was reached after selecting the majority of available NIR bands (15) and four bands in the visible region.In the study of [37] the NIR was also found to be the spectral region of most interest for LAI retrieval, selecting five bands from the NIR and two from the red visible domains.
There are diverging opinions in the literature concerning the importance of the red edge spectral region for LAI estimation (e.g., [33,61]).Our results suggested that the red edge has more influence on LAI retrieval than visible and SWIR bands.In the study of [58], two of the six selected bands were located in the red edge domain, three in the visible and one band in NIR region.
Spectral bands located in the SWIR range were less frequently chosen than most others, as also found by [36] who selected only two from 21 in the SWIR domain for an optimal LAI retrieval.Nevertheless, the SWIR bands also contributed to the (relatively) high retrieval accuracies in our study as well as in others, for instance [59].Even though only wavelengths from 880 to 2,380 nm were considered, five of the six selected bands for optimal LAI estimation were located in the SWIR [59].Due to the limited data availability from sensors operating in the SWIR, previous studies often could employ only visible and NIR bands.Since some studies demonstrated that the inclusion of SWIR improved the retrieval accuracy of LAI [61,62], data delivery from Sentinel-2 in this spectral region will certainly be valuable.Moreover, the SWIR bands may play an important role for discriminating the spectral signal for different soil and vegetation variables, such as dry matter or water content [63].However, further research is still required in this regard.

Crop Specific Differences
The RT model (PROSAIL) applied here is based on a (1-D) turbid medium assumption and thus has a limited capability to simulate complex canopy architectures [57].Mainly for this reason, retrieval accuracies may vary between the different crops exhibiting different canopy structures and growth stages, as found by several studies (e.g., [26,31,57]).Therefore, crop-specific accuracies were calculated and depicted in Figure 3 in the form of scatter plots for the LUT N50 and the ANN approaches.LAI estimation was performed, employing the band combinations providing the minimum RMSE (RMSE min ), i.e., an eight-band combination from LUT N50 , including the first eight bands without the SWIR and a five-band combination from ANN, including the green visible, the two NIR and the two SWIR bands.
The use of a single statistical measure (such as RMSE) only provides limited information of the retrieval performance: differences in absolute number, magnitude, range or spatial patterns of the measured/simulated values can influence the indicators.Thus, to give a valid overview of model performance, a statistical indicator set proposed by [64] was calculated (see Table 3): coefficient of determination (R 2 ), RMSE, normalized RMSE (NRMSE)-which is the RMSE, divided by the range of the reference measurements-and Nash-Sutcliffe efficiency index (NSE, [65], Equation ( 2)).The NSE index, which can range between −∞ and 1, gives a good indication of a model's prediction capability.A value of NSE below 0 indicates that the estimated variable values obtain lower accuracies than simply the mean of the observed (measured) variables.Therefore, model reliability is only provided for NSE > 0. The index is calculated according to the following equation (Equation ( 2)): is the observed (measured) variable i and the corresponding estimated value.The mean value of all observed variables is indicated with .
In fact, it is notable that the retrieval performance depends on the crop type (Figure 3, Table 3).Whereas the highest accuracy was found by all statistical indicators for onions and alfalfa using the LUT N50 approach, the ANN method obtained better results for maize, sugar beet and garlic.For these three crops, however, the spatial patterns (indicated by R²) were better reproduced by the LUT N50 approach, although the correlations were still low (from R² ~ 0.1 to 0.4).Moreover, all NSE are < 0, implying that the mean value of the observed LAI would obtain a higher accuracy than the estimated LAI values.Thus, the models prediction capabilities are doubtful.This may be due to the above-mentioned limitations of the used RT model: the erectophile canopy of garlic, for instance, may lead to a strong influence of the soil, being a very critical factor for model inversion [31,37].Looking at Figure 4, the garlic spectra resemble bare soil signatures (both measured and simulated) due to absent chlorophyll absorption even though the field measurements indicated a LAI of 0.8.According to the picture taken during the campaign, garlic exhibits already a senescent growth stage.The measurement from the LAI-2000 instrument (PAI eff ) is therefore strongly influenced by non-green plant components.This suggests that the LAI estimated by the LUT N50 approach is closer to the green LAI value than estimated by the ANN approach (see also Figure 3) or measured in the field.Despite the fact that the measured and ANN estimated LAI-values of garlic are very close, the LUT N50 approach seems to give the most accurate interpretation of the spectral signature.
Moreover, the presence of row structures, which are not accounted for by the 1-D RT model, may lead to inaccuracies, for instance as is often the case in maize [26].However, the maize already reached LAI values between 3 and 4 and thus almost approached a homogenous coverage (see also picture Figure 4).In fact, seven from ten values are located on the 1:1 line (for both approaches).Moreover, with RMSE values of 0.5 (LUT N50 ) and 0.47 (ANN), the accuracy is higher than the average.The influence of leaves not randomly distributed as assumed by the model but clumped, can result in an underestimation of high LAI values [20], as it occurred in the actual growth stage of sugar beet for both approaches.However, as discussed in Section 2.1., the "effective LAI" is measured, rather than the true LAI, also by the optical ground-based instrument.A proper interpretation of the overestimation is therefore difficult in this case.It could be speculated that the higher LAI values obtained with the LAI-2000 were an effect of beginning leaf senescence, apparently leading to higher LAI values.However, as shown in Figure 4, this is not as strong as in case of garlic.Even when looking at the crop-specific results, it cannot be concluded that one method outperforms the other.Both approaches, LUT N50 and ANN, reveal similar performances with reasonable results.However, problems are still present depending on the architecture and actual growth stages of the crops.
In order to obtain an idea of the model's ability to reproduce the HyMap reflectance data, some exemplary spectra are presented in Figure 4.For this purpose, the simulated spectra obtaining the best radiometric match within the LUT N50 approach were chosen (eight bands).It is clearly visible in Figure 4 that in all cases the HyMap reflectance is appropriately reproduced by the model.

Limitations of the Study
In the retrieval of biophysical vegetation variables (such as LAI), various components can influence the estimation accuracy.These include, for instance, the type of remote sensor with its spectral and radiometric characteristics, crop type monitored as well as the applied model and retrieval (inversion) methods.Moreover, the validation of the final estimates is influenced by the instruments and strategy used for acquisition of the in situ reference data.The contribution to the overall uncertainty of each of these components may vary from case to case.Still further research efforts are required to reduce or at least mitigate these uncertainties and errors and to guarantee high retrieval qualities within the context of both current and future satellite missions.Therefore, the question of the ideal number and position of spectral bands for LAI retrieval cannot, of course, be entirely answered solely with the results of our study.
Our study was conducted using a part of the extensive database generated during one of the largest agricultural field campaigns in the last years, the SPARC campaign in the Barrax area.Nevertheless, the application of the method to other environmental sites and sensors and thus confirmation of the algorithms and outcomes would be desirable.
As in our study from 2009 [26], we can draw the conclusion that the inversion strategies have only a minor influence on the LAI retrieval accuracy when using well-established approaches such as ANN or LUT including noise (i.e., LUT N50 ).The inversion strategy is, however, of secondary importance, since it will not compensate for problems related to the choice of the appropriate radiative transfer model.We have again selected the PROSAIL model for our study because it has been widely applied and tested and has been proven to be a feasible compromise between accuracy, variable number input and computation time.However, due to the model's turbid medium assumption it is also well known (and found in our own studies) that PROSAIL has limitations, especially for crops in particular growth stages, where clumping occurs or the underlying soil and row structures affect the spectral signal.
Moreover, as also shown by our results, optimum spectral sampling can depend on the retrieval method.The employment of other band selection/elimination methods, for instance from SVR [66], may again lead to diverging results.The band setting, however, has a major influence on retrieval accuracy in contrast to the findings of [26].This was demonstrated in the present study with a more dedicated band selection process: with the box plots, a reliable indicator of the required quantity of spectral information for biophysical variable (LAI) retrieval is presented.By means of this tool, the optimal number of spectral bands can be detected.On the one hand, this avoids the use of limited spectral information (e.g., with VIs), diminishing the ill-posed inverse problem, while on the other, the employment of too many bands, for instance in RTM inversion schemes, can be prevented, reducing inaccuracies and computation time due to redundant spectral information.
Therefore, the application of the presented analyses to other sites as well as further tests of the applied method would enhance the validity of our results.This enhancement is required to ensure high quality biophysical data products from Sentinel-2 sensors.

Conclusions
In this study, spectral issues for the retrieval of leaf area index, one of the major biophysical variables required for agricultural applications, were addressed.We focused on the question of the optimal spectral sampling for LAI retrieval from future Sentinel-2 sensor data using two Look-up tables and an approved artificial neural network approach.In summary, results from our analyses lead to the following conclusions: though LAI can be roughly estimated using only a few bands (i.e., two or three, as with VI approaches), a high retrieval uncertainty must be taken into account when using this minimum spectral information.Box plots in Figure 1 demonstrate that this uncertainty can be diminished by including a higher number of spectral bands (six to eight), thus adding important information until spectral redundancy may outweigh the information gain.Regarding the band positions, NIR and red edge spectral regions provide the most relevant information for LAI, confirming previous literature findings.Moreover, the proper inclusion (i.e., 50 times) of additive and multiplicative noise accounting for uncertainties from atmospheric correction, instrument and RT model into the LUT N50 method significantly improved the retrieval results of this approach.
The best result from the band selection process (RMSE of 0.53 from LUT N50 ) corresponds to a normalized RMSE of 12 % from the LUT N50 approach (13 % from ANN, respectively).Results of both approaches are therefore comparable.More importantly, the LUT N50 provided a higher robustness and lower sensitivity to band selection, as indicated by the low variability between the RMSE min values and the more equal selection of bands between visible and NIR domains.
Looking at crop specific results, NRMSE values were below 16%.A range of 15%-20% is regarded by [67] as the currently achievable accuracy for LAI from remote sensing observations.Hence, the spectral channels planned for the Sentinel-2 sensors offer a valid information basis for LAI retrieval.However, a retrieval accuracy of 10 % for LAI is targeted for the mission [7].Thus, improvements would be desirable.With Sentinel-2 sensors data, such improvements could be achieved by employing spatial [31] and/or temporal information [22,68].
The improved retrieval of biophysical variables may further encourage the development of advanced strategies for the use of Earth observation EO data.By these means, the assimilation of remote sensing data into land surface process models [69] may largely contribute to an enhanced application of Sentinel-2 products.

Figure 1 .
Figure 1.Distribution and variation of the resulting LAI RMSE values for each band arrangement for the LUT based approaches (a): LUT N50 and (b): LUT N1 as well as the ANN approach (c).The bottom and the top of the boxes describe the 25th and the 75th percentiles; the red line within the boxes corresponds to the 50th percentile (or median); lower and upper ends of the whiskers show lowest and highest values within the 1.5 interquartile range (IQR).Outliers are marked with a red cross.

Figure 2 .
Figure 2. Frequency of selected bands from RMSE min to the 0.05 quantile, LUT N50 and ANN approaches.(a) per band and (b) per spectral group.

Figure 4 .
Figure 4. Selected simulated (PROSAIL generated database for LUT and ANN approaches) and HyMap spectra, one for each crop type; band setting of best results (RMSE min ) of LUT N50 approach, i.e., first eight S2 bands used in the study (i.e., from 490 nm to 865 nm).All simulated spectra with radiometric RMSE value within 20% of the lowest RMSE are presented.Measured LAI values are indicated in the titles.On the right the corresponding pictures of each crop field, taken during the campaign, are presented.

Table 1 .
Variables, number of classes (N class), mean values, bounds (min/max) and standard deviation (SD) as input for the PROSAIL model for the generation of the training data base for LUT and ANN approaches.

Table 2 .
Number of possible band combinations (N) and retrieved minimum RMSE (RMSE min ) between measured and estimated LAI of each arrangement for the three approaches: LUT N50 , LUT N1 and ANN.SPARC'03 campaigns data analyses (best results of each approach in bold).

Table 3 .
Goodness-of-fit statistics (R 2 , RMSE, NRMSE and NSE) between observed and estimated LAI from SPARC'03 data.Higher performance values between the two approaches are emphasized in bold.