^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (

Radiative transfer models predicting the bidirectional reflectance factor (BRF) of leaf canopies are powerful tools that relate biophysical parameters such as leaf area index (LAI), fractional vegetation cover _{V}_{APAR}) to remotely sensed reflectance data. One of the most successful approaches to biophysical parameter estimation is the inversion of detailed radiative transfer models through the construction of Look-Up Tables (LUTs). The solution of the inverse problem requires additional information on canopy structure, soil background and leaf properties, and the relationships between these parameters and the measured reflectance data are often nonlinear. The commonly used approach for optimization of a solution is based on minimization of the least squares estimate between model and observations (referred to as cost function or distance; here we will also use the terms “statistical distance” or “divergence” or “metric”, which are common in the statistical literature). This paper investigates how least-squares minimization and alternative distances affect the solution to the inverse problem. The paper provides a comprehensive list of different cost functions from the statistical literature, which can be divided into three major classes:

Biophysical parameters estimated from satellite data are important inputs to ecological models and land-surface models [

Biophysical parameters are estimated from satellite data by inverting a sample of the of bidirectional reflectance factor, BRF(

Four approaches can be distinguished to estimate biophysical parameters from satellite data; the advantages and limitations of various approaches of biophysical parameter estimation are discussed in [

The estimation of biophysical parameters from satellite data is hampered by uncertainties and errors that arise from a number of sources. These include uncertainties in instrument calibration, variations in atmospheric composition or simplifying assumptions in the representation of canopy and soil background [

The first class of distance or divergence measures is referred to as

The second class, of

The third class, of

The present paper has two aims. The first is to provide a review of available statistical distances and divergences to date (Sections 3–5 and

The paper is organized as follows. In Section 2 the estimation of biophysical parameters from Earth Observation (EO) data is expressed in a form comparable with statistical distance theories. Sections 3–5 describe the statistical distance and divergence measures that performed best in our study. Section 6 provides a description of the BRF simulations with FLIGHT; this includes a description of the land-surface scenes and the generation of LUTs. In Section 7 the statistical distance and divergence measures are applied to the estimation of _{V}_{APAR} by numerical inversion of the LUTs. The

We acknowledge that the range of conditions (vegetation type, simulated error distribution, land-surface properties, BRF sampling) is limited; the results of this study can therefore only be used as a guideline.

The present section formulates the BRF in a way that is appropriate for the application of statistical distances and divergence measures. First we represent the following elements in the LUT: _{i}_{1}, ..., _{n}_{V}_{1}, ..., _{n}_{1}, ..., _{k}_{V}_{APAR}_{1}, ..., _{r}^{*}(_{1}, ..., _{n}^{*}, by minimizing a measure that provides the best “closeness” between ^{*}.

Let Γ be a class of measures (distances) Γ(^{*}(_{j}_{i}_{j}^{*}) of the radiative transfer model can be formulated as a semi-parametric problem
^{*} by solving the minimization problem (1) using different statistical distances and divergences between simulated satellite signals (“observations”) and LUTs. We consider the parameters
_{s,i}

We consider alternative statistical distances, which can be divided into three classes. The majority of statistical distances belong to the so-called class of

Finally, to compare the results obtained with different distance measures, the mean absolute error in parameter retrieval is defined, for example to assess the residual error in estimated LAI the following merit function is used

In the next three sections we discuss the different statistical distances evaluated in the present study. The distances are applied to two reflectance distributions,

Information theory was born in 1948 when Shannon [

Divergence is an important concept in information theory and it is useful in many applications such as multimedia classification, neuroscience, optimization of the performance of density estimation methods, and cluster analysis. Distance measures also allow a wide range of tests to see if samples are from the same distribution.

Entropies are defined over the space of distributions that form the bases of independence/dependence concepts. For these reasons, Shannon’s mutual information function has been increasingly utilized in the literature [

Informally, entropy can be understood as “the quantity of surprise one should feel upon reading the result of a measurement”. More formally, we can write: if event

Kullback and Leibler (KL) [

A general class of divergence measures is given by

0 ≤ _{1}, ..._{n}_{1}, ..._{n}

_{F}_{1}, ..., _{n}

For fixed _{f}_{l}_{l}

for a given strictly convex twice differentiable function

Many measures were added to this class from different areas of science and new divergences are still being discovered. Here we present some of these f-divergence measure, see [

Let

Let

^{α}^{α}

Let

It can be generalized in the following form to give more flexibility on parameter estimation. Assume ^{α}^{1/}^{α}

Let ^{2}^{j}

Power divergence measures [

There are another two important sub-classes based on the divergence between probability distributions that do not belong to the class of

The first of these sub-classes, (

Here is one of the examples of these measures. Additional list can be found in

Rényi divergence with

The second subclass is referred to as entropy measures and can be introduced as follows. Let _{f}

In order to present a systematic way of studying the different entropy measures, Burbea and Rao introduced the so-called

Based on the concavity property of the (

These measures of divergence have been introduced to present systematic ways to study different entropy measures. They are used in applications that are associated with random variables with finite support in genetic diversity between populations, the study of taxonomy in biology and to test if populations are homogeneous in genetics and for the analysis of discriminant techniques.

An example of these measures can be seen below and additional list can be found in

Arimoto (1971)

A third group of divergences is referred to as blended divergences. Lindsay [

All blended

To illustrate the theory of blended divergences, we give several examples below.

Blended weighting scheme that generalizes Hellinger distance:

Robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and non-parametric methods. Regression analysis seeks to find the relationship between one or more independent variables and a dependent variable. Certain widely used methods of regression, such as LSE, have favorable properties if their underlying assumptions are true, but can give misleading results if those assumptions are violated. In cases where errors are not normally distributed, outliers occur, or ordinary LSE assumptions are violated in some other way, the validity of the regression results is compromised if a non-robust regression technique is used.

We use _{j}_{j}_{j}_{j}^{*}(_{j}_{j}_{j}_{j}

In our case we are interested in the unknown parameter vector of interest _{V}_{APAR}

The _{j} j

The classical LSE corresponds to the case ^{2}, _{j}_{j}

The following examples belong to the class of

If errors are normally distributed

The function
_{j}_{j}_{j},η ̄_{c}_{j}

To apply

We adopt the following terminology from time series analysis to interpret our observations as measurements in the spectral domain [_{t}_{θ}^{p}_{θ*}(

To implement this idea we consider the so-called quasi-likelihood method [_{n}_{j}_{j}_{1}, ..., _{n}

This is the so-called periodogram non-parametric estimation of spectral density

Under some general conditions [_{j}_{n}_{j}

Thus, one can construct a quasi-likelihood function or its logarithm

The Whittle estimator was also extended to cover correlated signal-plus-noise models, providing a formal asymptotic distribution theory specifically tailored for parameter estimation. This approach was first applied in time series for exponential volatility models; it then caught attention in financial econometrics and in related fields. These models are able to represent some of the stylized features of financial returns, such as uncorrelation in levels but strong dependence in squares and log-squares and leverage effect.

The Whittle estimator belongs to a class of more general estimates known as minimum contrast estimates, see [^{*} ∈ ^{*} is a deterministic function ^{*}, _{θ*}: Θ → ℝ_{+}, which has a unique minimum, ^{*}.

A contrast process for _{θ*} is a sequence of random variables _{n}

The minimum contrast estimator is a value of _{n}

Under some sets of conditions the minimum contrast estimators are consistent, see [_{θ}_{θ}

The following examples of distances _{θ}

One of the example can be seen below and the full list of such distances can be found in

Let

Note that function _{1}, ..., _{k}

Additional cost functions from the literature [

The performance of the distance measures in terms of retrieving biophysical parameters (LAI, _{APAR}, _{V}

Simulations were carried out similar to the approach taken by Prieto-Blanco

PROSPECT [_{ab}^{2}); ^{2}); ^{2}). Chlorophyll content (_{ab}_{ab}

FLIGHT [

Radiation was simulated in 15 spectral bands (500, 560, 630, 690, 700, 740, 790, 830, 870, 1,035, 1,200, 1,250, 1,650, 2,100, 2,250 nm). A previous study [

The simulations were carried out on two main types of forest: conifer and broadleaf forests. Three conifer representatives were chosen from the former BOREAS sites [_{ab}

The LUT contains a total of _{V}_{ab}_{APAR}_{APAR}_{a,i}_{i}

We simulated BRF values using the coefficients in _{V}

Canopy reflectance models demonstrate increasing sensitivity to soil reflectance at lower vegetation cover. Soil reflectance is one of the most sensitive parameters in canopy reflectance models [

Our simulations are necessarily restricted and represent only a subset of all parameters that can be varied within Prospect and FLIGHT. We believe that these simulations of errors are more realistic than the commonly adopted method of adding Gaussian noise to the spectrum. Errors due to incorrect assumptions on soil or leaf spectral properties will be spectrally correlated.

The statistical distances listed in Sections 3–5 were evaluated as follows. Reflectances in the LUTs were matched to “observed” reflectances. For each case, we matched _{V}_{APAR}

For the purpose of clarity we present a selection of results that consists of the best performing measures from each of the three classes of statistical distance measures. The results show significant variability in retrieval accuracy depending on the chosen divergence measure. Overall, the optimal measures in each case show an improvement over using LSE, see

The improvement obtained in estimating biophysical parameters for broadleaf forest is further illustrated in

The results summarized in

(1) For the broadleaf forest the best distances were:

for the estimation of

for the estimation of _{V}

for the estimation of _{APAR}

We found that compared with LSE, the Koenker–Basset distances (_{V}_{APAR}

(2) For conifer forest the best distances were:

for the estimation of _{APAR}

for the estimation of _{V}

Similar to the broadleaf case, we found that the Koenker–Basset metric (_{V}_{APAR}

When optimizing parameterized models for which the error distribution (shape and bias) is known, the user can choose an appropriate cost functions based on specific physical properties of the model and metrics. When we deal with non-parametric models or non-linear model with many parameters (such as the present case where we simulate BRF using PROSPECT, FLIGHT) it may be useful to check a range of available distances to get the optimal cost function. For problems similar to the present study we can provide the following guidance.

For non-symmetric error distributions the recommended cost function is Koenker–Basset (

For symmetric error distributions we can recommend Hellinger (

For heavy-tailed distribution (right) and semi-heavy-tailed distribution (left) (

Special interest should be paid to the class of spectral metrics, since it represents informational distances in the spectral domain. Since informational transformation to the spectral domain usually makes our observation asymptotically independent, it is plausible that the spectral metric (

For all types of forest we found that for _{V}

In the case when parameters of the model are linearly correlated, we observe consistency in the optimal cost function for these parameters (for example LAI and _{V}_{APAR}

Over 60 statistical distances from three major classes,

For the numerical experiments we use PROSPECT and FLIGHT to simulate “observed” reflectance values in 15 different spectral bands. We generate LUTs for a limited set of land-surface and atmospheric conditions. However for the observations we generate reflectance values for a wider range of conditions and thus introduce a mixture of errors caused by variations in angular geometry, _{V}

We found that the

A caveat of the present study is that we analyzed only a limited subset of a wide range of possibilities, and for different applications it is likely that different cost functions may be more suitable. We are preparing a study where cost functions are used on real observations as opposed to simulated observations [

Alternative divergence measures and distances have been known in the statistical literature for some time and could find an application in many areas besides remote sensing, such as in biology, geography and geophysics.

The approach outlined in the present study can be extended to other applications that use LUT optimization, interpolation, linearization of parameter space,

We thank Ana Prieto for advice and provision of LUT code. This work was funded by the NERC National Centre for Earth Observation (NCEO). Three anonymous reviewers are thanked for their constructive comments to improve the paper.

Additional information about the measures below can be found in [

Let ^{2} and

Let

K-divergence of Lin is used to analyze of contingency tables. It corresponds to the function

L-divergence of Lin is a symmetric version of K-divergence. It corresponds to the function

Let ^{α}

We can use a higher order cross-entropy, or the so-called cross-entropy of order

Liese and Vajda [

The harmonic Toussaint measure corresponds to the function

The negative exponential disparity measure is used as an estimator that is asymptotically fully efficient and is robust against outliers and inliers, see [

The Bregman divergences [

The powered Pearson divergence is reasonably efficient and robust [

The Cressie and Read power divergence [

The Sharma and Mittal divergences [

Finally, the

This divergence was introduced by Itakura–Saito for the estimation of short-time speech spectra using an autoregressive model. It became popular in speech and acoustics research and it was applied to denoising and up-mix (mono to stereo conversion) of music.

List of (

Sharma–Mittal divergence [

Bhattacharyya divergence has interesting application in signal selection and it has the following form

Shannon (1948) [

Rényi (1961) [

Varma (1966) [

Varma (1966) [

Havrda and Charvat (1967) [

Sharma and Mittal [

Sharma and Mittal [

Ferreri (1980) [

Kapur (1972) [

Burbea (1984) [

More information of the measures below could be found in [

Pearson–Neyman blend with corresponding blended divergence

Blended power divergence-variant A. For _{0}_{,β}_{1}_{,β}

Blended power divergence-variant B. For 0 < |

Information of the measures below is based on [

More general estimates with Laplace distribution
_{p}

For positive

For

For the Cauchy distribution ^{2})) we have

Latter influence functions trimmed at 0 <

The Welsh distance has the form

The Geman and McClure function tries to reduce the effect of large errors further, but it also cannot guarantee unicity.

The Tukey function also encounters the problem of unicity; it can be written as

The Huber function has the following form
_{c}

Detailed information of the following measures can be found in [

let

let ^{2}, then

let

let ^{α}^{2}, where 0

Comparison of residual errors resulting from the application of two different distance measures used to estimate biophysical parameters for broad leaf canopies with a look-up table (LUT). Biophysical parameters, including leaf area index, _{APAR}_{V}_{V}_{APAR}_{APAR}_{V}

Comparison of residual errors resulting from the application of two different distance measures used to estimate biophysical parameters for needleleaf canopies with a look-up table (LUT). Biophysical parameters, including leaf area index, _{APAR}_{V}_{V}_{APAR}_{APAR}_{V}

PROSPECT input parameters. _{ab}^{2}); ^{2}); and ^{2}).

| |||||
---|---|---|---|---|---|

2.47 | 2.55 | 2.55 | 1.43 1.5 1.6 | 1.61 1.97 2.64 | |

29 19.39 27.56 | 27.07 13.10 24.27 | 21.89 19 29.03 | 44.7 | 65.1 | |

0.04 | 0.01 | 0.03 | 0.02 | 0.008 | |

0.028 | 0.012 | 0.012 | 0.003 | 0.006 |

FLIGHT input parameters. The column “range” represents the minimum and maximum values of the input parameters; column “step” is the increment for the LUT; column “Observed” represents the range over which a random number “r.n.” is selected to generate satellite observed BRF, ^{*}(_{1}, ..., _{n}

Solar zenith angle | 30°–70° | 10° | r.n.∈30°–70° |

View zenith angle | 0°–60° | 10° | r.n ∈0°–60° |

Relative azimuth angle | 0°–180° | 30° | r.n.∈0°–180° |

Fraction of green leaves | 0.8 | - | same |

Fraction of shoot material | 0.05 | - | same |

Fraction of bark in foliage | 0.15 | - | same |

Leaf angle distribution | Spruce leaf, Spherical | - | same |

Soil roughness index | 0 | - | 0 |

Soil Reflectance | sandy loam | drummer2, jal, lonrina, onaway, talbott | |

Frac. cover by trees | 0.1–0.9 | 0.1 | r.n.∈0.0–0.9 |

0–7, |
1 | r.n.∈0.0–7.0 |

FLIGHT input parameters: Crown shapes, where “c” represents cone shape and “e” represents ellipsoid shape.

Crown shape | cone | ellipsoid | |||

type of forest | OBS | OJP | YJP | Beech | Oak |

crown shape | “c” | “c” | “c” | “e” | “e” |

Crown radius (m) | 0.45 | 1.3 | 0.85 | 1.2 | 2.6 |

Crown center to top dist (m) | 9 | 7.2 | 4 | 4.2 | 3.2 |

Minimum height to first branch (m) | 0.49 | 6.9 | 0.49 | 6.4 | 7.1 |

Maximum height to first branch (m) | 0.51 | 7.1 | 0.51 | 10.2 | 9.2 |

Summary statistics for the performance of different distance measures in estimating biophysical parameters from ground reflectance data (BRF); reflectances were simulated for broadleaf trees.

_{V} |
_{APAR} | ||||
---|---|---|---|---|---|

inf. meas. | – | 0.63 | 0.25 | 0.10 | |

inf. meas. | 0.26 | ||||

inf. meas. | 0.69 | 0.12 | |||

inf. meas. | 0.66 | 0.26 | 0.10 | ||

inf. meas. | 0.63 | 0.26 | 0.10 | ||

inf. meas. | 0.63 | 0.26 | 0.10 | ||

inf. meas. | 0.62 | 0.26 | 0.10 | ||

M-estim | – | ||||

M-estim | 0.92 | 0.29 | 0.12 | ||

min. cont. meth. | – | 1.01 | 0.26 | 0.16 |

Summary statistics for the estimation of biophysical parameters from ground reflectance values (BRF) for needleleaf canopies.

_{V} |
_{APAR} | ||||
---|---|---|---|---|---|

inf. meas. | 0.74 | 0.21 | 0.08 | ||

^{2} |
inf. meas. | – | 0.69 | 0.22 | |

inf. meas. | 0.22 | ||||

inf. meas. | 0.77 | 0.08 | |||

inf. meas. | 0.70 | 0.22 | 0.078 | ||

inf. meas. | 0.69 | 0.22 | 0.076 | ||

M-estim | – | ||||

M-estim | 1.29 | 0.29 | 0.16 | ||

min. cont. meth. | – | 0.95 | 0.28 | 0.12 |