This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Lookuptable (LUT)based radiative transfer model inversion is considered a physicallysound and robust method to retrieve biophysical parameters from Earth observation data but regularization strategies are needed to mitigate the drawback of illposedness. We systematically evaluated various regularization options to improve leaf chlorophyll content (LCC) and leaf area index (LAI) retrievals over agricultural lands, including the role of (1) cost functions (CFs); (2) added noise; and (3) multiple solutions in LUTbased inversion. Three families of CFs were compared:
Leaf area index (LAI) and leaf chlorophyll content (LCC) are essential land biophysical parameters retrievable from optical Earth observation (EO) data [
For the last few decades, various spacebased LAI and LCC retrieval approaches have been proposed (see review in [
Meanwhile, new generation of high resolution (
Such unprecedented richness of high spectral and spatial resolution data streams makes the availability of robust retrieval methods more important than ever.
To implement a method in an operational processing chain the method should be able to deliver accurate estimates with easy implementation in practice. Inversion of physicallybased canopy radiative transfer models (RTMs) against actual EO data is generally considered as one of the most robust approaches to map biophysical parameters over terrestrial surfaces [
Several strategies have been proposed to circumvent the drawback of illposedness, including Lookuptable (LUT)based inversion strategies [
LUTbased inversion in its essential form,
First, in the majority of these studies root mean square error (RMSE) was used as CF between simulated and measured spectra. However, in case of outliers and nonlinearity, the residuals are distorted and therefore the key assumption for using RMSE (maximum likelihood estimation with the Gaussian noise) is violated [
Second, the majority of these studies focus on a specific vegetation type such as croplands, often identified as a land cover class within an image [
Henceforth, for the sake of realizing robust LCC and LAI retrievals at high spatial resolution, the aim of this work is to invest in a retrieval processing chain that combines LUTbased inversion with different CFs and regularization options. Simulated Sentinel2 data at 20 m resolution will be used for this exercise but in principle the inversion schemes can be applied to any optical multispectral EO data. This brings us to the following objectives: to evaluate the role of (1) cost functions; and (2) regularization options in LUTbased inversion strategies, such as: (
The estimation of biophysical parameters from satellite data is hampered by uncertainties of very complicated and different nature in every particular study. In many cases it is difficult to study these errors since they have unknown magnitude and distribution. On this basis, we opted to evaluate multiple cost functions that have been introduced in Leonenko
These distances/metrics came from different fields of mathematics, statistics and physics and they all represent “closeness” between two functions but the nature of these functions can be different. For this reason, these metrics have been divided into three broad families based on physical properties of functions in consideration: information measures, Mestimates and minimum contrast method. The detailed description of these families can be found in Leonenko
To describe the problem in a statistical way we suppose that
This family of measures, also referred to as “divergence measures”, is based on minimization of distances between two probability distributions. In this case reflectances are considered as probability distributions and normalization is required (sum of probabilities is (1) prior to numerical application. Note that normalization has been performed on LUT reflectances as well. This family was first introduced by Kullback and Leibler (KL) [
This measure is called the Kullback Leibler divergence and it also corresponds to the maximum likelihood distance in probabilistic space:
This measure is called Pearson chisquare:
SquaredHellinger measure:
Neyman chisquare divergence:
JeffreysKullbackLeibler:
Kdivergence of Lin:
Ldivergence of Lin is a symmetric version of Kdivergence:
The harmonique Toussaint measure:
The negative exponential disparity measure:
Bhattacharyya divergence:
Shannon (1948):
The classical LSE distance corresponds to the function
This class of estimates considers spectral domain and reflectances in this case can be seen as spectral density functions of some stochastic process. It is close to the class of quasilikelihood estimators, where instead of independence (which does not hold for many cases) is used asymptotical independence. Under some sets of conditions the minimum contrast estimators are consistent. More information can be found in [
The basic idea behind it is to minimize the distance (which also called “contrast” in this case) between a parametric model and a nonparametric spectral density. Since one can interpret satellite observations as measurements in the spectral domain these distances seem to be a natural choice for analyzing satellite data. Also for this family of CFs normalization is not required but it may help improving accuracies.
Here some of the spectral distances that have been used in our research with correspondent
Let
let
let
let
A diverse field dataset, covering various crop types, growing phases, canopy geometries and soil conditions was collected during SPARC (SPectra bARrax Campaign). The SPARC2003 and SPARC2004 campaigns took place in Barrax, La Mancha, Spain (coordinates 30°3′N, 28°6′W, 700 m altitude). The test area has an extent of 5 km × 10 km, and is characterized by a flat morphology and large, uniform landuse units. The region consists of approximately 65% dry land and 35% irrigated land. The annual rainfall average is about 400 mm. In the 2003 campaign (12–14 July) biophysical parameters were measured within a total of 110 Elementary Sampling Units (ESU) among different crops. ESU refers to a plot size of about 20^{2} m^{2}.
LCC was derived by measuring within each ESU about 50 samples with a calibrated CCM200 Chlorophyll Content Meter. The calibration took place against field values taken from 50 ESUs. A logarithmic function led to best fit with a
Although the listed CFs can be applied to any EO data, in preparation to forthcoming Sentinel missions we chose applying them to simulated Sentinel2 (S2) data. The S2 satellites capitalize on the technology and the vast experience acquired with Landsat and SPOT. S2 will be a polarorbiting, superspectral highresolution imaging mission [
Here, S2 MSI imagery was simulated on the basis of Compact High Resolution Imaging Spectrometry (CHRIS) data because of its high spatial and spectral resolution. CHRIS provides high spatial resolution (up to ∼17 m) hyperspectral data over the VNIR spectra from 400 to 1,050 nm at 5 different viewing angles. It can operate in different modes, balancing the number of spectral bands, site of the covered area and spatial resolution because of onboard memory storage reasons [
Constrained by the spectral range of CHRIS, we considered the S2 bands starting from B2 (490 nm) until B8a (865 nm). The bands at a spatial resolution of 10 m have been coarsegrained to 20 m so that in total 8 bands in the visible and NIR at a 20 m resolution are available. S2 bands at 60 m were not considered as these bands are intended for atmospheric applications. These bands are intended for atmospheric applications, such as aerosols correction, water vapor correction and cirrus detection [
This work was carried out with ARTMO (Automated Radiative Transfer Models Operator) [
From the in ARTMO available models we chose for coupling PROSPECT4 with 4SAIL because of being fast, invertible and well representing homogeneous plant covers on flat surfaces areas such as those present at Barrax. Both models, commonly referred to as PROSAIL, have been used extensively over the past few years for a variety of applications [
The bounds and distributions of the PROSAIL variables are depicted in
Two regularization options are commonly applied in LUTbased inversion strategies. First, a Gaussian noise is often added to the simulated canopy reflectance to account for uncertainties [
Thus, to summarize, we have:
Various standalone cost functions from three mathematically different families.
Insertion of Gaussian noise on simulated spectra: 0–50%.
Use of multiple sorted best solutions in the inversion: 0–50%.
Impact of normalization for CFs in
Given all these factors, their effects on the robustness of LUTbased inversion have been assessed for the retrieval of LCC and LAI at the 20m S2 resolution. The retrieved predictions were compared against the groundbased SPARC validation dataset using statistics such as coefficient of determination (
Moreover, these error matrices reveal the true interactions between the different cost functions and the regularization options. The matrices suggest that each of the tested CFs respond differently to the regularization options. It is interesting to see that in the majority of them a pattern with optimized performances appear (darkest blue), though the location and shape of this pattern may vary per CF. Thereby, not only CFs that are mathematically more alike led to similar trends, but also similarities appeared within the same CF family. This is particularly noticeable for the family of
Interestingly, the matrices also revealed that data normalization governs the success of
Introducing noise and mean of multiple solutions considerably improved the LAI accuracies (
Obtained systematic overview of the various inversion performances shows that deriving simultaneously multiple biophysical parameters using one inversion strategy is not the best choice. When parameters are correlated in a nonlinear way it appears that the optimal cost functions are different for each parameter [
These maps are briefly interpreted. Starting with the
At the same time, parameterindependent information regarding the performance of the inversion process can also be obtained by mapping the residuals of the CFs. They provide another indicator on the inversion certainty. For each pixel, it indicates the degree of mismatch between the observed spectrum and the best matching simulated spectra. More than CV, these maps delineate the surfaces where the simulated spectra closely matched the observed spectra and thus a higher probability achieving a successful inversion. The maps suggest that particularly LAI was retrieved without difficulty, with perfect match over the vegetated areas. The inversion process had in all generality more difficulty with LCC, but large differences can be observed, leading to essentially the same earlier observed pattern; a close agreement over vegetated areas as opposed to senescent and nonvegetated lands. On the whole, the uncertainty analysis suggests that the inversion would benefit from a wider LUT with more spectral variation of senescent vegetation and bare soil surfaces.
The upcoming S2 missions open opportunities to implement novel retrieval algorithms in operational processing chains. Specifically, there is a need for retrieval methods that are accurate, robust, and make fully use of the new S2 MSI bands. While in related works vegetation indices (e.g., [
To mitigate the drawback of illposedness instead of introducing prior information it was opted to exploit the performances of alternative CFs in combination with regularization options. The rationale for evaluating CFs as opposed to the widely used LSE (or RMSE) is that propagated uncertainties and errors, e.g., due to uncertainties in instrument calibration, variations in atmospheric composition or simplifying assumptions in the representation of canopy and soil background, distort the residuals and in many cases violate a key assumption for using LSE, which corresponds to the maximum likelihood estimation with Gaussian distribution of residuals [
The evaluation of various CFs and regularization options led to identified inversion strategies with accuracies of
At the same time, information about the inversion uncertainty on a pixelbypixel basis may be as relevant as overall accuracies calculated from a limited set of groundbased validation data. Uncertainty indicators were obtained through mapping of CV and residuals. The CV is an indicator of the uncertainty range around the mean estimate, which tells something about the illposedness of the inversion of a retrievable parameter, whereas the residuals tells us how much the observed spectra deviate against that from the LUT spectra in the inversion scheme. These quality layers allow masking out uncertain estimates.
Both indicators showed a consistent spatial trend for all parameters: pixels of vegetated surfaces matched closely with the synthetic reflectance database while pixels of nonvegetated surfaces faced more difficulties. Two reasons can be identified for this discrepancy: (1) the inversion strategy was optimized against validation data that was exclusively collected on vegetated areas; and (2) PROSAIL is a canopy reflectance model and thus well able to detect variation in vegetation properties. Hence the generated LUT and final inversion scheme were not optimized to detect variations in driedout fallow and bare soil lands. For retrievals over full images there is thus a need for regulating the inversion strategy both over vegetated targets as well over nonvegetated targets. Therefore, while having inversion over vegetated canopies resolved, adequately processing nonvegetated surfaces remains to be optimized. To start with, further efforts are needed in the generation and evaluation of more generic LUTs,
LUTbased inversion is considered as a physicallysound retrieval method to quantify biophysical parameters from Earth observation imagery, but the full potential of this method has not been consolidated yet. Here, we have systematically compared 18 different cost functions (CFs) originating from three major statistical families:
All evaluated CFs and biophysical parameters gained from regularization options such as adding some noise and multiple solutions in the inversion. These options with proper adjustment can significantly reduce relative errors.
With introduction of multiple solutions and noise
Data normalization appeared to be unsuccessful for retrieving LAI. Here, the classical LSE yielded best results for nonnormalized data; NRMSE of 15.3% at 6% multiple solutions and 18% noise, and 16.4% at 16% multiple solutions and 0% noise, respectively.
Systematic analysis for each biophysical parameter identified different optimized inversion strategy, which was subsequently applied to pixelbypixel Sentinel2 imagery. It provided us with maps of mean estimates and associated statistics and showed insight into the uncertainty of the retrievals (e.g., coefficient of variation and residuals). These indicators showed that inversions were most successful over densely vegetated areas. PROSAIL had more difficulty with processing fallow and nonvegetated lands, but that is expected to be resolved with an adjusted LUT and the addition of SWIR bands in actual Sentinel2 data.
The bottom line of this work is that, despite common practice, no single inversion strategy was found to be optimal for deriving multiple biophysical parameters. While some general trends with respect to regularization options and CFs was revealed, it is the data distribution that determines the success of the inversion strategy, which is governed by the biophysical parameter, the generated LUT, the spectral bands and the validation data. It is therefore recommended to test different CFs and regularization options before implementing an inversion scheme in a processing chain.
This paper has been partially supported by the Spanish Ministry for Science and Innovation under projects: AYA201021432C0201 and CSD200700018. Three anonymous reviewers are thanked for providing comments that helped with improving the quality of the original manuscript.
The authors declare no conflict of interest.
Normalised RMSE (NRMSE) matrices for LCC retrieval using cost function displaying the impact of % noise (Xaxis) against multiple solutions (Yaxis) in LUTbased RTM inversion. * : normalized; ** : nonnormalized. The more bluish, the lower relative errors and thus the better the inversion.
NRMSE matrices for LAI retrieval using cost function displaying the impact of % noise (Xaxis) against multiple solutions (Yaxis) in LUTbased RTM inversion. * : normalized; ** : nonnormalized. The more bluish, the lower relative errors and thus the better the inversion.
Mean predictions, standard deviation (SD), coefficient of variation (CV) and residuals for LCC and LAI by using for each parameter best evaluated inversion strategy (see
Sentinel2 MSI band settings. Bands used in this study are bolded.
Band #  B1  B9  B10  B11  B12  
Band center (nm)  443  945  1375  1610  2190  
Band width (nm)  20  20  30  90  180  
Spatial resolution (m)  60  60  60  20  20 
Range and distribution of input parameters used to establish the synthetic canopy reflectance database for use in the LUT.


Leaf structure index  unitless  1.3–2.5  Uniform  
LCC  Leaf chlorophyll content  ( 
5–75  Gaussian ( 
Leaf dry matter content  (g/cm^{2})  0.001–0.03  Uniform  
Leaf water content  (cm)  0.002–0.05  Uniform  


LAI  Leaf area index  (m^{2}/m^{2})  0.1–7  Gaussian ( 
Soil scaling factor  unitless  0–1  Uniform  
ALA  Average leaf angle  (○)  40–70  Uniform 
HotS  Hot spot parameter  (m/m)  0.05–0.5  Uniform 
skyl  Diffuse incoming solar radiation  (fraction)  0.05   
Sun zenith angle  (○)  22.3    
View zenith angle  (○)  20.19    
φ  Sunsensor azimuth angle  (○)  0   
Similar variable ranges/values/distributions were used according to field configurations and related studies [
Statistics (
Kullback leibler *  10  26  0.71  7.34  19.47  47.54 
Chi square *  8  24  0.69  7.57  18.64  48.94 
Generalised Hellinger *  20  36  0.74  7.16  17.63  47.55 
Neyman Chi square *  16  44  0.73  7.16  17.63  46.62 
Jeffreys Kullback leibler *  10  26  0.70  7.41  18.25  48.36 
Kdivergence Lin *  8  26  0.70  7.43  18.30  48.01 
Ldivergence Lin *  10  26  0.70  7.39  18.20  48.25 
Harmonique Toussaint *  10  26  0.71  7.37  18.15  48.36 
Negative exp. disparity *  10  26  0.71  7.31  18.01  47.43 
Bhattacharyya divergence *  10  28  0.70  7.40  18.22  48.18 
Shannon 1948 *  10  26  0.70  7.39  18.20  48.18 
LSE *  20  36  0.74  7.16  17.63  47.55 
LSE **  22  0  0.68  8.23  20.27  46.99 
20  12  0.61  9.14  22.52  45.58  
Geman and McClure *  20  36  0.74  7.16  17.63  47.55 
Geman and McClure **  22  0  0.68  8.24  20.30  46.99 
K(x) = log(x) + 1/x *  50  50  0.68  13.43  33.09  70.96 
K(x) = log(x) + 1/x **  16  50  0.62  13.36  32.91  43.86 
K(x) = −log(x) + x *  2  0  0.70  7.45  18.34  31.82 
K(x) = −log(x) + x **  50  50  0.48  15.76  38.83  83.15 
K(x) = log(x)^{2} *  8  32  0.68  7.76  19.11  46.68 
K(x) = log(x)^{2} **  50  50  0.31  13.87  34.15  44.14 
K(x) = x(log(x)) − x *  6  30  0.66  7.95  19.58  47.19 
K(x) = x(log(x)) − x **  50  50  0.30  15.05  37.06  48.99 
Statistics (
Kullback leibler *  4  50  0.63  1.25  22.74  45.59 
Chi square *  8  50  0.62  1.29  23.53  45.74 
Generalised Hellinger *  2  42  0.62  1.17  21.34  44.59 
Neyman Chi square *  4  50  0.62  1.24  22.48  45.51 
Jeffreys Kullback leibler *  6  50  0.62  1.26  22.93  45.60 
Kdivergence Lin *  6  50  0.62  1.27  23.06  45.67 
Ldivergence Lin *  10  26  0.70  1.26  22.92  45.54 
Harmonique Toussaint *  6  50  0.62  1.26  22.91  45.60 
Negative exp. disparity *  4  50  0.63  1.25  22.72  45.53 
Bhattacharyya divergence *  6  50  0.62  1.26  22.93  45.53 
Shannon 1948 *  6  50  0.62  1.26  22.92  45.54 
LSE *  2  42  0.62  1.17  21.34  44.58 
2  50  0.62  1.22  22.25  44.93  
2  12  0.73  0.91  16.57  25.31  
Geman and McClure *  2  42  0.62  1.17  21.34  44.59 
Geman and McClure **  2  14  0.74  0.85  15.39  25.45 
K(x) = log(x) + 1/x *  2  42  0.63  0.91  16.51  34.02 
K(x) = log(x) + 1/x **  10  50  0.50  1.35  24.57  46.17 
K(x) = −log(x) + x *  50  50  0.52  1.53  27.80  51.69 
K(x) = −log(x) + x **  2  0  0.64  1.09  19.86  51.17 
K(x) = log(x)^{2} *  12  50  0.54  1.40  25.42  45.13 
K(x) = log(x)^{2} **  2  46  0.66  1.13  20.47  37.94 
K(x) = x(log(x)) − x *  6  50  0.63  1.42  25.82  45.11 
K(x) = x(log(x)) − x **  2  40  0.63  1.08  19.56  35.57 