- freely available
- re-usable

*Remote Sensing*
**2013**,
*5*(3),
1355-1388;
doi:10.3390/rs5031355

^{★}

## Abstract

**:**Radiative transfer models predicting the bidirectional reflectance factor (BRF) of leaf canopies are powerful tools that relate biophysical parameters such as leaf area index (LAI), fractional vegetation cover f

_{V}and the fraction of photosynthetically active radiation absorbed by the green parts of the vegetation canopy (f

_{APAR}) to remotely sensed reflectance data. One of the most successful approaches to biophysical parameter estimation is the inversion of detailed radiative transfer models through the construction of Look-Up Tables (LUTs). The solution of the inverse problem requires additional information on canopy structure, soil background and leaf properties, and the relationships between these parameters and the measured reflectance data are often nonlinear. The commonly used approach for optimization of a solution is based on minimization of the least squares estimate between model and observations (referred to as cost function or distance; here we will also use the terms “statistical distance” or “divergence” or “metric”, which are common in the statistical literature). This paper investigates how least-squares minimization and alternative distances affect the solution to the inverse problem. The paper provides a comprehensive list of different cost functions from the statistical literature, which can be divided into three major classes: information measures, M-estimates and minimum contrast methods. We found that, for the conditions investigated, Least Square Estimation (LSE) is not an optimal statistical distance for the estimation of biophysical parameters. Our results indicate that other statistical distances, such as the two power measures, Hellinger, Pearson chi-squared measure, Arimoto and Koenker–Basset distances result in better estimates of biophysical parameters than LSE; in some cases the parameter estimation was improved by 15%.

## 1. Introduction

Biophysical parameters estimated from satellite data are important inputs to ecological models and land-surface models [1,2]. Various algorithms have been developed to estimate biophysical parameters from remotely-sensed reflectance data [3,4]. The forward problem, i.e., to predict the reflected radiation and canopy light interactions given the structure and optical properties of the canopy and surface, is well understood and several models exist that produce realistic results [4]. The inverse problem is difficult to solve, however, because the problem is underdetermined. A commonly adopted way to overcome this is to add additional constraints or make a priori assumptions regarding the properties of the land surface, see [5–7].

Biophysical parameters are estimated from satellite data by inverting a sample of the of bidirectional reflectance factor, BRF(λ) = f(Angular Geometry, Structural Parameters), where structural parameters (canopy properties, soil background reflectance, etc.) are input parameters and BRF(λ) is the model output (wavelength-dependent reflectance). Numerical solution of this inverse problem adjusts the model parameters such that model-predicted values closely match the measured values [7,8]. The match between model output and data is usually based on minimizing the sum of least squares [9].

Four approaches can be distinguished to estimate biophysical parameters from satellite data; the advantages and limitations of various approaches of biophysical parameter estimation are discussed in [10,11]. A first approach is to estimate biophysical parameters from an empirical relationship with a spectral index, see for example [1]. A second approach is to invert an analytical model; this approach puts a high demand on computing resources if the analytical model is complex. A third approach is to use machine learning, for a example by training a neural network using the inputs and outputs of a BRF model [12,13]. A fourth approach is to use LUTs. This is an attractive way to estimate biophysical parameters for various reasons. Solutions of the model can be constrained to a range of realistic input parameters, optimization is fast and the complexity of the analytical model is retained [14]. In the present study we adopt a LUT-based inversion using the FLIGHT radiative transfer model [9,15].

The estimation of biophysical parameters from satellite data is hampered by uncertainties and errors that arise from a number of sources. These include uncertainties in instrument calibration, variations in atmospheric composition or simplifying assumptions in the representation of canopy and soil background [16,17]. Errors in the representation of the canopy and soil background are of particular concern in the present study since they have non-zero mean and are not normally distributed. Outliers and nonlinearities distort the residuals and in these cases a key assumption for using LSE is violated, which is that errors have a white noise, zero mean distribution of residuals. For this reason, we investigate three broad classes of statistical distances (in the remote sensing literature referred to as cost functions, elsewhere also known as metrics, or divergence measures) that are based on different error distributions. These classes are: information measures of divergence[18], M-estimates[19], and minimum contrast methods[20,21].

The first class of distance or divergence measures is referred to as information measures; optimization using these measures is based on minimization of distances between two probability distributions (Section 3). Thus, we need to rewrite the BRF as a probability distribution function to apply these measures. The information measures can be further divided into three sub-classes. The first subclass is referred to as f-divergences, a term introduced by Kullback and Leibler [22]. This class of measures is based on the distance between probability distributions (Section 3.1). The f-divergences are not bounded, i.e., they range between 0 and ∞. A second subclass of divergence measures, referred to as blended measures, allows bounds to be calculated explicitly ([23] and Section 3.3). A third subclass—consisting of generalized (h, f)-divergences or superposition of two functions, see Section 3.2—is a generalization of the f-divergences [24].

The second class, of M-estimates, is a broad class of functions to which, among others, least squares optimization belongs. For this class of measures, the BRF is not considered a probability distribution. Within the class of M-estimates are a large number of functions with robust or resistant properties (Section 4).

The third class, of minimum contrast estimates, considers the spectral domain. We express the BRF as a spectral density function (Section 5) to apply these measures.

The present paper has two aims. The first is to provide a review of available statistical distances and divergences to date (Sections 3–5 and Appendix). Only a few of these measures have been applied to parameter estimation from remote sensing data. The second aim is to apply the distance and divergence measures to the estimation of biophysical parameters from satellite data. The availability of a large number of statistical distances gives a high degree of flexibility, since it allows model optimization for a wide range of error distributions. We illustrate this in the numerical experiments where retrieval of biophysical parameters is tested for simulated needleleaf and broadleaf forests for ground-measured BRF. The present paper tests the use of alternative distance measures on simulated observations. This allows assessment of distance measures in a well-controlled environment with known errors in estimated biophysical parameters for a wide range of simulated land-surface conditions. The method will be tested on real observations in a follow-on study [25].

The paper is organized as follows. In Section 2 the estimation of biophysical parameters from Earth Observation (EO) data is expressed in a form comparable with statistical distance theories. Sections 3–5 describe the statistical distance and divergence measures that performed best in our study. Section 6 provides a description of the BRF simulations with FLIGHT; this includes a description of the land-surface scenes and the generation of LUTs. In Section 7 the statistical distance and divergence measures are applied to the estimation of LAI, f_{V} and f_{APAR} by numerical inversion of the LUTs. The Appendix contains an extensive list of distance measures with references to examples of applications in the peer-reviewed literature.

We acknowledge that the range of conditions (vegetation type, simulated error distribution, land-surface properties, BRF sampling) is limited; the results of this study can therefore only be used as a guideline.

## 2. Statement of the Problem

The present section formulates the BRF in a way that is appropriate for the application of statistical distances and divergence measures. First we represent the following elements in the LUT: R_{i}(λ_{1}, ..., λ_{n}, θ̄) is a realization of the BRF dependent on wavelength, solar zenith angle, relative azimuth and view zenith angle, LAI, f_{V}, leaf area distribution, ground reflectance, etc. In this notation λ is the wavelength and λ_{1}, ..., λ_{n} ∈ Λ, i = 1, ..., N, where, N is the number of entries (rows) in the LUT and θ̄ =(η̄, ζ̄) is a vector with unknown biophysical parameters η̄ = (η_{1}, ..., η_{k}) of interest to our study (e.g., LAI, f_{V}, f_{APAR}) and ζ̄ = (ζ_{1}, ..., ζ_{r}) is a vector with parameters that we do not need to estimate, either because they are already known (e.g., solar zenith angle, relative azimuth and view zenith angle) or because their value is obtained by other means and is not estimated in the inversion (e.g., crown shape, soil reflectance). Denoting satellite observations by R^{*}(λ_{1}, ..., λ_{n}), we estimate the unknown parameters, the elements of the vector η̄^{*}, by minimizing a measure that provides the best “closeness” between R in the LUT and R^{*}.

Let Γ be a class of measures (distances) Γ(R^{*}(λ_{j}), R_{i}(λ_{j}, η̄, ζ̄)) between two BRF functions; the LUT and the observations. The classical statistical method of inversion (or estimation and finding required η̄^{*}) of the radiative transfer model can be formulated as a semi-parametric problem

^{*}by solving the minimization problem (1) using different statistical distances and divergences between simulated satellite signals (“observations”) and LUTs. We consider the parameters ${\eta}_{s}^{*}$ and η

_{s,i}, 1 ≤ i ≤ N closed if $\left|{\eta}_{s}^{*}-{\eta}_{s,i}\right|=\mathit{min}\{\left|{\eta}_{s}^{*}-{\eta}_{s,i}\right|$, 1 ≤ s ≤ k}, 1 ≤ i ≤ N. The classical approach of this minimization problem is known as LSE, which is based on the minimization of the quadratic function

We consider alternative statistical distances, which can be divided into three classes. The majority of statistical distances belong to the so-called class of information measures. This class considers distances or measures of divergence between two probability distributions. To apply these functions to biophysical parameter estimation, the BRF must be normalized such that the sum of probabilities is 1. The expression for the LUTs becomes

Finally, to compare the results obtained with different distance measures, the mean absolute error in parameter retrieval is defined, for example to assess the residual error in estimated LAI the following merit function is used

In the next three sections we discuss the different statistical distances evaluated in the present study. The distances are applied to two reflectance distributions, P refers to the LUT reflectances and Q refers to the observed “true” reflectances. To simplify the notation we drop the index i.

## 3. Information Measures

Information theory was born in 1948 when Shannon [26] published his revolutionary paper motivated by the problem of efficiently transmitting information over a noisy channel. Since Mahalanobis [27] introduced the concept of distances between two probability distributions, several other distance measures have been suggested in the statistical literature and these have been referred to as measures of distance between two distributions, measures of separation, measures of discriminatory information and measures of variation-distance. While these measures were not always introduced for the same reason, they all increase when two distributions become “further away” from each other.

Divergence is an important concept in information theory and it is useful in many applications such as multimedia classification, neuroscience, optimization of the performance of density estimation methods, and cluster analysis. Distance measures also allow a wide range of tests to see if samples are from the same distribution.

Entropies are defined over the space of distributions that form the bases of independence/dependence concepts. For these reasons, Shannon’s mutual information function has been increasingly utilized in the literature [28]. Shannon’s relative entropy and almost all other entropies fail to be “metric”, as they violate either symmetry, or the triangular rule, or both. For these reasons, it is more appropriate to refer to these entropies as measures of divergence rather than measures of distance.

Informally, entropy can be understood as “the quantity of surprise one should feel upon reading the result of a measurement”. More formally, we can write: if event A occurs with probability P (A), define the “information” I(A) gained by knowing that A has occurred to be

#### 3.1. f-Divergence Information Measures

Kullback and Leibler (KL) [22] first introduced the concept of information divergences, which are non-symmetric measures between two distributions P and Q. Typically P represents the “true” distribution of the data and Q represents a model or an approximation of P. In information theory, KL divergence can be interpreted as cross Shannon entropy. This class has been extended in many directions since its initial application in decoding schemes and in signal processing. In particular, Rényi proposed a generalization of Shannon entropy [29], one of a family of functionals for quantifying the diversity, uncertainty or randomness of a system. The Rényi entropies are important in ecology and statistics as indices of diversity. Later, KL and Rényi related divergences were included in a broader class of divergences called f-divergences, introduced by Csiszár [30]. This class can be formulated as follows.

A general class of divergence measures is given by

0 ≤ p

_{1}, ...p_{n}≤ 1, 0 ≤ q_{1}, ...q_{n}≤ 1, ${\sum}_{l=1}^{n}{q}_{l}=1$ and ${\sum}_{l=1}^{n}{p}_{l}=1$;F (p, q) is a strictly convex function of p so that Γ

_{F}(P, Q) is a strictly convex function of 0 ≤ p_{1}, ..., p_{n}≤ 1;For fixed Q, Γ

_{f}(P, Q) attains its unconstrained global minimum when p_{l}= q_{l}for all l, i.e., if P = Q;for a given strictly convex twice differentiable function f(.) we define

$${\mathrm{\Gamma}}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{q}_{l}f\left(\frac{{p}_{l}}{{q}_{l}}\right)$$

Many measures were added to this class from different areas of science and new divergences are still being discovered. Here we present some of these f-divergence measure, see [31] and additional list can be found in Section A.1 of the Appendix.

Let f(x) = x ln(x) and f(1) = 0. The corresponding measure is

$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{p}_{l}\text{ln}\left(\frac{{p}_{l}}{{q}_{l}}\right)$$Let $f\left(x\right)=\frac{1}{x}$ and f(1) = 1 then

$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}\frac{{q}_{l}^{2}}{{p}_{l}}=\sum _{l=1}^{n}\frac{{\left({q}_{l}-{p}_{l}\right)}^{2}}{{p}_{l}}$$χ

^{α}—Vajda divergence corresponds to the function f(x) = |x − 1|^{α}and f(1) = 0, where α ⩾ 1, then$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{\left|{p}_{l}-{q}_{l}\right|}^{\alpha}{q}^{1-\alpha}$$Let $f\left(x\right)={\left(\sqrt{x}-1\right)}^{2}$ and f(1) = 0, then

$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{q}_{l}{\left(\sqrt{\frac{{p}_{l}}{{q}_{l}}}-1\right)}^{2}=\sum _{l=1}^{n}{\left(\sqrt{{p}_{l}}-\sqrt{{q}_{l}}\right)}^{2}$$It can be generalized in the following form to give more flexibility on parameter estimation. Assume f(x) = (x

^{α}− 1)^{1/}^{α}and f(1) = 0, then$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{\left({p}_{l}^{1/2j}-{q}_{l}^{1/2j}\right)}^{2j},\hspace{0.17em}\hspace{0.17em}j=1,2,3\dots $$Let f(x) = (1 − x)

^{2}^{j}, f(1) = 0, then power divergence has the form$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{q}_{l}{\left(1-\frac{{p}_{l}}{{q}_{l}}\right)}^{2j},\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}j=1,2,3\dots $$Power divergence measures [32] with minimum at zero. This class was introduced to unite efficiency with robust properties; the class is also Fisher consistent

$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{p}_{l}\frac{\left\{{\left[{p}_{l}/{q}_{l}\right]}^{\alpha}-1\right\}}{\alpha \left(\alpha +1\right)}$$

#### 3.2. Other Divergence Measures between Two Probability Distributions

There are another two important sub-classes based on the divergence between probability distributions that do not belong to the class of f-divergences. These sub-classes are referred to as (h, f)-measures and f-entropy measures.

The first of these sub-classes, (h, f)-measures, was introduced by [33]. It can be written in the following form ${D}_{f}^{h}\left[P,Q\right]=h\left({D}_{f}\left[P,Q\right]\right)$, where h is a differentiable increasing function mapping from $\left[0,f\left(0\right)+\mathit{li}{m}_{t\to \infty}\frac{f\left(t\right)}{t}\right]\to \left[0,\infty \right]$. Under different assumptions, it is shown that the asymptotic distributions of the (h, f)-divergence statistics are either normal or chi-square. These divergences were developed for hypothesis testing on multinomial populations and to test goodness of fit and independence. This class is based on the superposition of two functions and it gives a large degree of flexibility to deal with outliers.

Here is one of the examples of these measures. Additional list can be found in Section A.2 of the Appendix.

Rényi divergence with

The second subclass is referred to as entropy measures and can be introduced as follows. Let X be a random variable with probability distribution P. Shannon’s entropy [26] has the form

_{f}[P, Q] = H(P) − H(P/Q).

In order to present a systematic way of studying the different entropy measures, Burbea and Rao introduced the so-called f-entropies, by

Based on the concavity property of the (h, f)-entropy, new generalization was introduced in [34]:

These measures of divergence have been introduced to present systematic ways to study different entropy measures. They are used in applications that are associated with random variables with finite support in genetic diversity between populations, the study of taxonomy in biology and to test if populations are homogeneous in genetics and for the analysis of discriminant techniques.

An example of these measures can be seen below and additional list can be found in Section A.2 of the Appendix.

Arimoto (1971)

#### 3.3. Blended f-Disparities

A third group of divergences is referred to as blended divergences. Lindsay [32] found that inference based on statistics of type f-divergence (obtained by replacing either one or both probability distributions by suitable estimators) requires either bounded differentiability of f or boundedness of f itself. He introduced a new class of divergences by the modification of weights inside the integral expression of Pearson’s chi-squared divergence called “blended weight chi-squared disparity”—BWCS(β) and “blended weight Hellinger disparity”—BWHD(β), β ∈ [0, 1]. In general all these new classes of disparities have the following common property. If the blending parameter is equal to the limiting values β = 0 or β = 1, then the two original divergences on which the blend was based are achieved in the class of blended divergences. Definitions and theorems can be found in [23].

All blended f-disparities have been used in goodness-of-fit tests in medical statistics and are shown to be an excellent compromise between the Pearson’s chi-square and the log likelihood ratio tests.

To illustrate the theory of blended divergences, we give several examples below.

Blended weighting scheme that generalizes Hellinger distance:

## 4. Nonlinear Regression and M-Estimates

Robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and non-parametric methods. Regression analysis seeks to find the relationship between one or more independent variables and a dependent variable. Certain widely used methods of regression, such as LSE, have favorable properties if their underlying assumptions are true, but can give misleading results if those assumptions are violated. In cases where errors are not normally distributed, outliers occur, or ordinary LSE assumptions are violated in some other way, the validity of the regression results is compromised if a non-robust regression technique is used. M-estimators (see for example [35]) form a broad class of estimators that exhibit certain robust properties. Estimates with robust regression methods can be more stable with respect to anomalous errors.

We use M-estimates to the estimation of biophysical parameters from reflectances as follows. Let us consider the BRF R(λ_{j}, θ̄) as a nonlinear regression function g(λ_{j}, θ̄) of its parameter set, which is observed at wavelength λ_{j} ∈ Λ with some noise ∊_{j} of complicated nature. The observations set R^{*}(λ_{j}) = x_{j} can be represented as follows

_{j}= 0, i.e., the expectation is that ∊

_{j}is random noise with zero mean, not Gaussian in general.

In our case we are interested in the unknown parameter vector of interest η̄ =(LAI, f_{V}, f_{APAR}). We use the M-estimates generated by a function of loss ρ(x), x ∈ ℝ, see [19,35], for some motivation details and properties of these estimates.

The M-estimate of the unknown parameter vector η̄ obtained from the observations x_{j} j = 1, ..., n is given by the solution of the minimization problem:

The classical LSE corresponds to the case ρ(x) = x^{2}, ψ(x) = 2x. In this case random noise ∊_{j} are i.i.d.r.v. that have a Gaussian distribution. In general for nonlinear regression, ∊_{j} may be independent but not necessarily Gaussian or even non-Gaussian dependent random variables. It is well known that LSE regression methods are consistent, asymptotically normal and asymptotically efficient. However, when the density function of errors is non-Gaussian or even has a non-symmetric skewed distribution, LSE estimates are no longer efficient and their application can result in large losses of efficiency. Robust methods replace the sum of squares by more suitable loss functions.

The following examples belong to the class of M-estimates and can be found in [35–37] and others. The full list of M-estimates can be found in Section A.2 of the Appendix.

If errors are normally distributed $f\left(x\right)=\left(1/\sqrt{2\pi}\right)\mathit{exp}\left(-{x}^{2}/2\right)$, then

$$\rho \left(x\right)={x}^{2},\psi \left(x\right)=x$$The function

$${\rho}_{c}\left(x\right)=\{\begin{array}{cc}cx,& x\ge 0\\ \left(c-1\right)x,& x<0\end{array}$$_{j}= x_{j}− g(λ_{j},η ̄) have some skewness (non-symmetric) property. The empirical quantile function may be defined in terms of solutions to a simple optimization problem. Explicitly,$${\widehat{Q}}_{\u220a}\left(c\right)=\mathit{inf}\left\{y|\sum _{j=1}^{n}{\rho}_{c}\left({\u220a}_{i}-y\right)=\mathit{min}\right\}$$_{c}-function (Equation (26)). One can also interpret this as (E∊_{j}= 0)$$\{\begin{array}{c}P\left\{{\u220a}_{i}\ge 0\right\}=c\\ P\left\{{\u220a}_{i}<0\right\}=1-c\end{array}$$

## 5. Minimum Contrast Estimation

To apply minimum contrast measures to our problem, we interpret the BRF as a spectrum or spectral density of some stochastic process. The basic idea behind a minimum contrast estimator is to minimize the distance (contrast) between a parametric model and a non-parametric spectral density. The estimates obtained are first-order efficient and are also attractive because they have robust properties. This class of estimates is close to the class of quasi-likelihood estimators, where instead of independence (which does not hold for many cases) we use asymptotical independence, as discussed below.

We adopt the following terminology from time series analysis to interpret our observations as measurements in the spectral domain [39]. Let {Z_{t}} be a stationary processes with spectral density f(λ) = f_{θ}(λ), λ ∈ Λ ⊆ R, with expectation m, and θ ∈ Θ ⊂ ℝ^{p}. Our aim is to estimate the unknown parameter θ, that is, to identify the true value of spectrum f_{θ*}(λ).

To implement this idea we consider the so-called quasi-likelihood method [40]. The BRF observations in the spectral domain are written in a form, I_{n}(λ_{j}), λ_{j} ∈ Λ = {λ_{1}, ..., λ_{n}}, where

This is the so-called periodogram non-parametric estimation of spectral density f(λ).

Under some general conditions [39], at the Fourier frequencies λ_{j} ∈ Λ, the random variables I_{n}(λ_{j}) are asymptotically independent and have an exponential distribution, that is

Thus, one can construct a quasi-likelihood function or its logarithm

The Whittle estimator was also extended to cover correlated signal-plus-noise models, providing a formal asymptotic distribution theory specifically tailored for parameter estimation. This approach was first applied in time series for exponential volatility models; it then caught attention in financial econometrics and in related fields. These models are able to represent some of the stylized features of financial returns, such as uncorrelation in levels but strong dependence in squares and log-squares and leverage effect.

The Whittle estimator belongs to a class of more general estimates known as minimum contrast estimates, see [20,21]. To demonstrate the idea, let us assume that the true value of unknown parameter θ^{*} ∈ intΘ, the interior of Θ. A contrast function for θ^{*} is a deterministic function F(θ^{*}, θ), F_{θ*}: Θ → ℝ_{+}, which has a unique minimum, θ = θ^{*}.

A contrast process for F_{θ*} is a sequence of random variables U_{n}(θ), n = 1, 2... such that the ergodic like condition holds for some U(θ) in probability:

The minimum contrast estimator is a value of θ for which the function U_{n}(θ) takes its minimum, or

Under some sets of conditions the minimum contrast estimators are consistent, see [41]. Often the contrast function can be chosen as a distance L(f_{θ}, g) between two spectral densities f_{θ}(λ) and g(λ), which can be written in the form:

The following examples of distances L(f_{θ}, g) are widely used in parametric estimation in time-series analysis in frequency domain, in particular for autoregressive and moving average models.

One of the example can be seen below and the full list of such distances can be found in Section A.3 of the Appendix.

Let $K\left(x\right)=\mathit{log}x+\frac{1}{x}$. This criterion is equivalent to the quasi-Gaussian maximum likelihood (73) and has the following form

Note that function K(x) have a unique minimum at x = 1. To find a minimum at zero we need to subtract K(1) under the sum from each of the functions. In practical applications we use the notation from Section 2, i.e., in the above methodology the abstract parameter θ is replaced by the vector parameter of our interest η̄ = (η_{1}, ..., η_{k}).

Additional cost functions from the literature [42–48] are summarized in the Appendix.

## 6. Methodology

#### 6.1. Radiative Transfer Modeling

The performance of the distance measures in terms of retrieving biophysical parameters (LAI, f_{APAR}, f_{V}) from reflectance values was tested on simulations. This approach has the advantage that the estimated biophysical parameters can be compared directly with model input parameters and that therefore errors associated with the use of different distance measures can be established with accuracy. Moreover, we can test distance measures on a large number of simulations and this provides a good indication of their robustness.

Simulations were carried out similar to the approach taken by Prieto-Blanco et al. [9]. We used two models in conjunction to simulate a set of the ground BRF observations and to generate the LUTs. Simulations were carried out for 3 needleleaf and 2 broadleaf scenes. We used PROSPECT [49] to simulate light scattering and absorption by leaves, and FLIGHT [15] to simulate light scattering and absorption by vegetation canopies. The models used are state of the art and provide realistic simulations of the interaction of solar radiation with the vegetation canopy and the soil.

PROSPECT [49] calculates leaf transmittance and reflectance from 400 to 2,500 nm. In PROSPECT4 each leaf is considered as a stack of N absorbing plates with rough surfaces giving rise to scattering of light. Absorption is calculated as the linear summation of the concentrations of chlorophyll, water, and dry matter, each with their specific absorption coefficients [49]. The PROSPECT input parameters are described in Table 1. The inputs include: N, the leaf structure parameter; C_{ab}, the chlorophyll a + b concentration (μg/cm^{2}); Cw, the equivalent water thickness (g/cm^{2}); Cm, the dry matter content (g/cm^{2}). Chlorophyll content (C_{ab}) in leaves is linked to the maximum photosynthetic capacity of vegetation and varies with leaf development stage, productivity, stress and nitrogen levels. For the LUTs of conifers, a maximum and minimum value for C_{ab} was entered to reflect a range of conditions.

FLIGHT [15] is a 3D radiative transfer model for light interaction with vegetation canopies using Monte Carlo simulation of photon transport. The original model traced the photons’ trajectories forwards from the source until they were absorbed in the canopy or left the canopy boundary. Subsequent improvements include calculation of paths back from any view direction to the intercepted surface facets [50,51], simulation of fine angular resolution, simulation of photosynthesis and simulation of LiDAR signals [52]. A hybrid representation is used to model the discontinuous nature of the forest canopy. Large-scale structure is represented by geometric primitives defining shapes and positions of the tree crowns and trunks, here estimated from a statistical distribution. Within each crown, foliage is approximated by structural parameters of area density, angular distribution and size and optical properties of reflectance and transmittance. These parameters are approximated as homogeneous within each boundary, but may vary between crowns. Simulation of 3D photon trajectories allows accurate evaluation of multiple scattering within crowns, and between distinct crowns, trunks and ground surface. FLIGHT simulations have previously been compared with other 3D canopy radiative transfer models as part of the Radiation Model Intercomparison (RAMI) project [4]. The recent analysis within RAMI of six selected 3D models, including FLIGHT, showed dispersion within 1% over a large range of canopy descriptions, see [53].

Radiation was simulated in 15 spectral bands (500, 560, 630, 690, 700, 740, 790, 830, 870, 1,035, 1,200, 1,250, 1,650, 2,100, 2,250 nm). A previous study [54] suggested such a selection of bands could provide approximately 90% of the information about the land surface that is provided by a full spectrum, although this study was based on field spectroscopy. The set is chosen here to demonstrate the retrieval method and selection of error metrics, but the method is applicable to any set of bands or potential view directions, where the study should be repeated to determine optimal error metrics.

#### 6.2. Sites

The simulations were carried out on two main types of forest: conifer and broadleaf forests. Three conifer representatives were chosen from the former BOREAS sites [55], each characterized by a different dominant species: the Old Black Spruce (Picea Mariana) site (OBS), the Old Jack Pine (Pinus Banksiana) site (OJP) and the Young Jack Pine (Pinus Banksiana) site. Vegetation of these sites has a complex structure, needles show a high degree of clumping and there is mutual shadowing by crowns. These sites are therefore known to pose a challenge to biophysical parameter estimation. Detailed crown, leaf and soil background measurements are available for these sites as these have been extensively studied in [56,57]. Chlorophyll content in coniferous canopies has been estimated in [58]. Changes in leaf chlorophyll produce large differences in leaf reflectance and transmittance spectra, therefore three values of C_{ab} were used to obtain a wide range of possible values [59], (Table 1). Broadleaf simulations were carried out for an oak and beech forest since these are among the most important species for the European forestry [60,61].

Table 1 shows the leaf optical properties for the conifer and broadleaf forests that were used in PROSPECT; values are based on [62]. Tables 2 and 3 show the vegetation structure parameters and the angular configurations for the FLIGHT model. Five characteristic soil spectra from the Purdue spectral library were selected (Table 2) [63,64].

#### 6.3. Generation of Look-Up Tables

The LUT contains a total of N = 90, 404 entries of BRF reflectances. These are reflectance values calculated for parameters obtained at regular intervals of solar zenith angle, view zenith angle, relative azimuth angle, LAI, f_{V} and C_{ab}, see Table 1. Crown shape parameters can be found in Table 3 and f_{APAR} is obtained by summing individual f_{APAR} values at each band times weighted by the fraction of downwelling light within the band:

_{a,i}is the mean fraction of radiant energy absorbed by canopy at band i, calculated from FLIGHT output, and W

_{i}are the weights, see also [9].

#### 6.4. Simulation of Observations

We simulated BRF values using the coefficients in Tables 1–3. For each realization the observation tables contain a total of M = 5,000 entries for the three conifer sites (OBS, OJP, YJP) and M = 5,000 for the broadleaf sites (beech and oak). The tables were simulated similar to the calculation of the LUT but using a random selection from the range of parameters in Table 2, last column (LAI, f_{V}, angular geometry, atmospheric aerosol depth). Note that the parameters from Tables 1 and 3 stay the same. The observations are simulated from a wider range of conditions that are not present in the conifer and broadleaf LUTs (soils, leaf area distribution). Further errors due to sensor noise and calibration are not considered here, but have been examined in previous studies [9,65] and should be considered prior to application to particular instruments. The tables of “observed” ground reflectances were constructed to contain complex errors for needleleaf and broadleaf forests. These errors, originating from the mismatch between LUT and “observations”, represent conditions encountered in real applications where we have to match a model representing only a selection of conditions with richly varied real world conditions.

Canopy reflectance models demonstrate increasing sensitivity to soil reflectance at lower vegetation cover. Soil reflectance is one of the most sensitive parameters in canopy reflectance models [17]. Errors are biased and unsymmetrical and for this reason we expect some robust distances to perform better than LSE. This effect is more pronounced at lower values of LAI (<3).

Our simulations are necessarily restricted and represent only a subset of all parameters that can be varied within Prospect and FLIGHT. We believe that these simulations of errors are more realistic than the commonly adopted method of adding Gaussian noise to the spectrum. Errors due to incorrect assumptions on soil or leaf spectral properties will be spectrally correlated.

## 7. Results

The statistical distances listed in Sections 3–5 were evaluated as follows. Reflectances in the LUTs were matched to “observed” reflectances. For each case, we matched one entry in the “observations” table consisting of M = 5,000 spectral bands with one entry in the LUT for each type of forest. The inversion finds parameters for the nearest angular geometry in the LUT, i.e., the angular geometry is known. Other parameters are assumed to be unknown. The performance of the statistical distances was then assessed on the biophysical parameter estimated (LAI, f_{V}, f_{APAR}). Some statistical distances allow parameters that govern the shape of the error distribution to be varied, thus the choice of these parameters leads to another optimization problem in itself. For these cases we tested a range of parameters and chose the parameter that minimizes the error in estimated biophysical parameters. All distances were tested by comparing the estimated biophysical parameters with the a priori known parameters.

For the purpose of clarity we present a selection of results that consists of the best performing measures from each of the three classes of statistical distance measures. The results show significant variability in retrieval accuracy depending on the chosen divergence measure. Overall, the optimal measures in each case show an improvement over using LSE, see Tables 4 and 5.

Table 4 shows the best distance measures (bold italic) for estimating biophysical parameters on broadleaf forests from BRF as well as results obtained with LSE (bold text). Table 5 shows the same but for estimating biophysical parameters on conifer forest.

The improvement obtained in estimating biophysical parameters for broadleaf forest is further illustrated in Figure 1. In Figure 1 residual errors in broadleaf biophysical parameters obtained from BRF reflectances by using LSE are compared with errors obtained by Hellinger (Equation (10)) and power divergence (Equation (12)). The range of errors (maximum and minimum errors) for these two methods is the same, however the alternative divergence shows a marked improvement in the error distribution for all biophysical parameters.

Figure 2 provides a further illustration of the reduction in the error in biophysical parameter estimation for needleleaf forests from BRF reflectance data obtained by power divergences in Equations (12) and (13) (see also Table 5).

The results summarized in Tables 4 and 5 and Figures 1 and 2 illustrate that the use of alternative distances measures can significantly improve parameter estimation. We found that the following distance measures perform well for the cases described above:

(1) For the broadleaf forest the best distances were:

for the estimation of LAI: Hellinger (Equation (10)) and Arimoto (Equation (20));

for the estimation of f

_{V}: power divergence (Equation (12));for the estimation of f

_{APAR}: generalized Hellinger measure (Equation (10)).

We found that compared with LSE, the Koenker–Basset distances (Equations (26) and (35)) gave better results in all cases. The improvement compared with LSE was 15% for LAI, 10% for f_{V} and 11% for f_{APAR}.

(2) For conifer forest the best distances were:

for the estimation of LAI and f

_{APAR}: power divergence (Equation (13)) and Pearson chi-square divergence (Equation (8));for the estimation of f

_{V}: power divergence (Equation (12));

Similar to the broadleaf case, we found that the Koenker–Basset metric (Equations (26) and (35)) improved estimation in all cases. All biophysical parameters, LAI, f_{V} and f_{APAR}, improved by around 10% in this case.

## 8. Discussion

#### 8.1. Recommendations for Distance Choice

When optimizing parameterized models for which the error distribution (shape and bias) is known, the user can choose an appropriate cost functions based on specific physical properties of the model and metrics. When we deal with non-parametric models or non-linear model with many parameters (such as the present case where we simulate BRF using PROSPECT, FLIGHT) it may be useful to check a range of available distances to get the optimal cost function. For problems similar to the present study we can provide the following guidance.

#### 8.2. Considering the Shape of the Distribution

For non-symmetric error distributions the recommended cost function is Koenker–Basset (Equation (26)). Such non-symmetric error distribution may arise, for example, from undetected sub-pixel cloud in pre-processing. Based on the shape of this function, it is expected that for the parameter c for this function becomes close to one with increasing skewness of the error distribution. We expect skewed error distributions to be common for biophysical parameter estimation from satellite data, especially when there is a mismatch between the soil reflectance specified in the LUT and the “true” soil reflectance.

For symmetric error distributions we can recommend Hellinger (Equation (10)), Arimoto (Equation (20)) and power divergence (Equation (12)) and standard LSE (Equation (25)).

For heavy-tailed distribution (right) and semi-heavy-tailed distribution (left) (i.e., the tail behave as negative exponent divided by power function) we can recommend non-symmetric power divergence (Equation (13)), Vajda (Equation (9)) and Pearson chi-square divergence (Equation (8)) cost functions. As can be seen from Figures 1 and 2, application of specific metrics to the biophysical parameters makes the errors more localized, removes heavy tails and makes the distributions more symmetric. This is due to the non-symmetric nature of the metrics themselves. To the best of our knowledge, these effects are usually not addressed in remote sensing inversion problems.

Special interest should be paid to the class of spectral metrics, since it represents informational distances in the spectral domain. Since informational transformation to the spectral domain usually makes our observation asymptotically independent, it is plausible that the spectral metric (Equation (35)) that corresponds to quasi-maximum likelihood provides good results. This is also consistent with the statistical theory that the maximum likelihood estimator is asymptotically optimal. For this reason we recommend to use this or a similar cost function.

#### 8.3. Considering Properties of the Estimated Parameters

For all types of forest we found that for f_{V} the optimal metric is power divergence (Equation (12)). This symmetric cost function represents a Rényi type entropy that maximizes the entropy distribution using a power law behavior. This gives us a better understanding of the nature of errors in this case.

In the case when parameters of the model are linearly correlated, we observe consistency in the optimal cost function for these parameters (for example LAI and f_{V} and f_{APAR} in our case). Thus, we can recommend using the same cost function for linearly correlated parameters. However, this does not hold if the correlation has a more complicated nature.

## 9. Conclusions

Over 60 statistical distances from three major classes, information measures, M-estimates and minimum contrast methods, were obtained from the mathematical literature. A comprehensive list of these statistical distances was provided. The statistical distances were tested to see, if compared with LSE, they improved the estimation of biophysical parameters for needleleaf and broadleaf forests. We found that the commonly used LSE distance is not the optimal cost function for the cases studied and that better results can be obtained using alternative cost functions.

For the numerical experiments we use PROSPECT and FLIGHT to simulate “observed” reflectance values in 15 different spectral bands. We generate LUTs for a limited set of land-surface and atmospheric conditions. However for the observations we generate reflectance values for a wider range of conditions and thus introduce a mixture of errors caused by variations in angular geometry, LAI, f_{V}, soil reflectance and leaf angle distribution. For the biophysical parameter estimation we match the observed reflectances to the reflectances of the LUT with different cost functions. The largest sources of (biased) error, i.e., the mismatch between observations and LUT, are potentially related to soils, since only a limited amount of variability associated with these variables was incorporated in the LUTs. We conclude that our analysis resembles a common problem for the estimation of biophysical parameters from satellite data, i.e., one estimates biophysical parameters assuming a limited set of ground conditions. A cost function that is based on an asymmetric, biased or heavy-tailed error distribution can therefore result in better estimates of biophysical parameters than LSE, which is based on a normal error distribution.

We found that the information measures from Section 3 provide better results when the BRF is normalized; see Equations (3) and (4). This result may not be valid for a smaller number of wavebands.

A caveat of the present study is that we analyzed only a limited subset of a wide range of possibilities, and for different applications it is likely that different cost functions may be more suitable. We are preparing a study where cost functions are used on real observations as opposed to simulated observations [25].

Alternative divergence measures and distances have been known in the statistical literature for some time and could find an application in many areas besides remote sensing, such as in biology, geography and geophysics.

The approach outlined in the present study can be extended to other applications that use LUT optimization, interpolation, linearization of parameter space, etc. It can be used in addition to or as an alternative to data training and machine learning schemes. We believe that the use of alternative statistical measures has great potential for remote sensing applications.

## Acknowledgments

We thank Ana Prieto for advice and provision of LUT code. This work was funded by the NERC National Centre for Earth Observation (NCEO). Three anonymous reviewers are thanked for their constructive comments to improve the paper.

## References

- Sellers, P.; Los, S.O.; Tucker, C.J.; Justice, C.O.; Dazlich, D.A.; Collatz, G.J.; Randall, D.A. A revised land-surface parameterization (SiB2) for atmospheric GCMs. Part 2: The generation of global fields of terrestrial biophysical parameters from satellite data. J. Clim
**1996**, 9, 706–737. [Google Scholar] - Jonckheere, I.; Fleck, S.; Nackaerts, K.; Muysa, B.; Coppin, P.; Weiss, M.; Baret, F. Review of methods for in situ leaf area index determination Part I. Theories, sensors and hemispherical photography. Agric. For. Meteorol
**2004**, 212, 19–35. [Google Scholar] - Verhoef, W. Light scattering by leaf layers with application to canopy reflectance modeling. The SAIL model. Remote Sens. Environ
**1984**, 16, 125–141. [Google Scholar] - Widlowski, J.-L.; Taberner, M.; Pinty, B.; Bruniquel-Pinel, V.; Disney, M.; Fernandes, R.; Gastellu-Etchegorry, J.-P.; Gobron, N.; Kuusk, A.; Lavergne, T.; et al. The third Radiation transfer Model Intercomparison (RAMI) exercise: Documenting progress in canopy reflectance models. In J. Geophys. Res.; 2007. [Google Scholar] [CrossRef]
- Iaquinta, J.; Pinty, B.; Privette, J.L. Inversion of a physically based bidirectional reflectance model of vegetation. IEEE Trans. Geosci. Remote Sens
**1997**, 35, 687–698. [Google Scholar] - Gao, F.; Jin, Y.F.; Li, X.W.; Schaaf, C.B.; Strahler, A.H. Bidirectional NDVI and atmospherically resistant BRDF inversion for vegetation canopy. IEEE Trans. Geosci. Remote Sens
**2002**, 40, 1269–1278. [Google Scholar] - Qi, J.; Kerr, H.; Moran, M.S.; Weltz, M.; Hueye, A.R.; Dorooshian, S.; Bryant, R. Leaf area index estimates using remotely sensed data and BRDF models in a semiarid region. Remote Sens. Environ
**2003**, 73, 18–30. [Google Scholar] - Pinty, B.; Verstraete, M.M.; Dickinson, R.E. Physical model of the bi-directional reflectance of vegetation canopies. Inversion and validation. J. Geophys. Res
**1990**, 95, 11767–11775. [Google Scholar] - Prieto-Blanco, A.; North, P.R.J.; Barnsley, M.J.; Fox, N. Satellite-driven modelling of Net Primary Productivity (NPP): Theoretical analysis. Remote Sens. Environ
**2009**, 113, 137–147. [Google Scholar] - Strahler, A.H. Vegetation canopy reflectance modelling. Recent developments and remote sensing perspectives. Remote Sens. Rev
**1997**, 15, 179–194. [Google Scholar] - Qiu, J.; Gao, W.; Lesht, B.M. Inverting optical reflectance to estimate surface properties of vegetation canopies. Int. J. Remote Sens
**1998**, 19, 641–656. [Google Scholar] - Verrelst, J.; Monoz, J.; Alonso, L.; Delegido, J.; Rivera, J.P.; Camp-Valls, G.; Moreno, J. Machine learning regression algorithms for biophysical parameter retrieval: Opportunities for sentinel-2 and -3. Remote Sens. Environ
**2012**, 118, 127–139. [Google Scholar] - Richter, K.; Hank, T.B.; Vuolo, F.; Mauser, W.; D’Urso, G. Optimal exploitation of the sentinel-2 spectral capabilities for crop leaf area index mapping. Remote Sens
**2012**, 4, 561–582. [Google Scholar] - Gascon, F.; Gastellu-Etchegorry, J.-P.; Lefevre-Fonollosa, M.-J.; Dufrene, E. Retrieval of forest biophysical variables by inverting a 3-D radiative transfer model and using high and very high resolution imagery. Int. J. Remote Sens
**2004**, 25, 5601–5616. [Google Scholar] - North, P.R.J. Three-dimensional forest light interaction model using a Monte Carlo method. IEEE Trans. Geosci. Remote Sens
**1996**, 34, 946–956. [Google Scholar] - Dawson, T.P.; Curran, P.J.; North, P.R.J.; Plummer, S.E. The propagation of foliar biochemical absorption features in forest canopy reflectance: A theoretical analysis. Remote Sens. Environ
**1999**, 67, 147–159. [Google Scholar] - Privette, J.L.; Myneni, R.B.; Emery, W.J.; Pinty, B. Inversion of a soil bidirectional reflectance model for use with vegetation reflectance models. J. Geophys. Res
**1995**, 100, 497–525. [Google Scholar] - Pardo, L. Statistical Inference Based on Divergence Measures. In Statistics: A Series Textbooks and Monographs; Chapman and Hall/CRC: New York, NY, USA, 2006. [Google Scholar]
- Staudte, R.G.; Sheather, S.J. Robust Estimation and Testing; John Wiley & Sons, Inc: New York, NY, USA, 1990. [Google Scholar]
- Taniguchi, M. On estimation of parameters on Gaussian stationary processes. J. Appl. Prob
**1981**, 16, 575–591. [Google Scholar] - Taniguchi, M. Minimum contrast estimation for spectral densities of stationary processes. J. R. Stat. Soc
**1987**, 9, 315–325. [Google Scholar] - Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat
**1951**, 22, 79–86. [Google Scholar] - Kûs, V. Blended divergences with examples. Kybernetika
**2003**, 39, 43–54. [Google Scholar] - Salicrú, M.; Menéndez, M.L.; Morales, D.; Pardo, L. Asymptotic distribution of -phi-entropies. Commun. Statist. Theor. Method
**1993**, 22, 2015–2031. [Google Scholar] - Leonenko, G.; Los, S.O.; North, P.R.J. Retrieval of leaf area index from MODIS surface reflectance by model inversion using different minimization criteria. Unpublished work. 2013. [Google Scholar]
- Shannon, C.E. The mathematical theory of communication. Bell Syst. Tech. J
**1948**, 27, 379–423. [Google Scholar] - Mahalanobis, P.C. On the generalised distance in statistics. Proc. Natl. Acad. Sci. India
**1936**, 2, 49–55. [Google Scholar] - Granger, C.; Lin, J.L. Using the mutual information coefficient to identify lags in nonlinear models. J. Time Ser. Anal
**1994**, 15, 371–384. [Google Scholar] - Rényi, A. Probability Theory; North-Holland: London, UK, 1970. [Google Scholar]
- Csiszar, I. Eine informationstheoretische ungleichung and ihre anwendung auf den beweis der ergodizitat von markoffischen ketten. Magyar Tud. Akad. Mat. Kutato Int. Kozl
**1963**, 8, 85–108. [Google Scholar] - Kapur, J.N. Maximum-Entropy Models in Science and Engineering; John Wiley & Sons, Inc: New York, NY, USA, 1989. [Google Scholar]
- Lindsay, B.G. Efficiency versus robustness: The case for minimum hellinger distance and related methods. Ann. Stat
**1994**, 22, 1081–1114. [Google Scholar] - Menéndez, M.L.; Morales, D.; Pardo, L.; Salicrú, M. Asymptotic behavior and statistical approach of divergence measures in multinomial populations: A unified study. Stat. Pap
**1995**, 36, 1–29. [Google Scholar] - Pardo, L.; Morales, D.; Salicrú, M.; Menéndez, M.L. Statistics in applied categorical data analysis with stratified sampling. Utilitas Math
**1993**, 44, 145–164. [Google Scholar] - Huber, P.J. Robust Statistics; Wiley: New York, NY, USA, 1981. [Google Scholar]
- Hampel, F.R.; Rousseeuw, P.J.; Ronchetti, E.M.; Stahel, W. Robust Statistics. The Approach Based in Influence Functions; Wiley: New York, NY, USA, 1989. [Google Scholar]
- Edlund, O.; Ekblom, H. Computing the constrained M-estimates for regression. Comput. Stat. Data An
**2005**, 49, 19–32. [Google Scholar] - Koenker, R.; Bassett, G. Regression quantiles. Econometrica
**1978**, 46, 33–50. [Google Scholar] - Brillinger, D.R. The Series Data Analysis and Theory; Hoden Day: San Francisco, CA, USA, 1981. [Google Scholar]
- Heyde, C.C. Quasi-Likelihood and Its Application. A General Approach to Optimal Parameter Estimation; Springer: New York, NY, USA, 1997. [Google Scholar]
- Dacunha-Castelle, D.; Duflo, M. Probability and Statistics; Springer: New York, NY, USA, 1986. [Google Scholar]
- Liese, F.; Vajda, I. Convex Statistical Distances; Teubner: Leipzig, Germany, 1987. [Google Scholar]
- Bregman, L.M. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math
**1967**, 7, 200–217. [Google Scholar] - Basu, A.; Park, C.; Lindsay, B.G.; Li, H. Some variants of minimum disparity estimation. Comput. Stat. Data Anal
**2004**, 45, 741–763. [Google Scholar] - Read, T.R.C.; Cressie, N.A.C. Goodness of Fit Statistics for Discrete Multivariate Data; Springer-Verlag: New York, NY, USA, 1988. [Google Scholar]
- Sharma, B.D.; Mittal, D.P. New non-additive measures of relative information. J. Combinatorics Inf. Sys Sci
**1977**, 2, 122–133. [Google Scholar] - Itakura, F.; Saito, S. Analysis Synthesis Telephony Based on the Maximum Likelihood Method. Proceedings of the 6th International Congress on Acoustics, Tokyo, Japan, 21–28 August 1968; pp. C17–C20.
- Rényi, A. On Measures of Entropy and Information. Proceedings of the International 4th Berkeley Symposium Mathematical Statistics Probability, Berkeley, CA, USA, 30 June–30 July 1961; 1, pp. 547–561.
- Jacquemoud, S.; Ustin, S.L.; Verdebout, J.; Schmuck, G.; Andreoli, G.; Hosgood, B. Estimating leaf biochemistry using the PROSPECT leaf optical properties model. Remote Sens. Environ
**1996**, 56, 194–202. [Google Scholar] - Disney, M.I.; Lewis, P.; North, P.R.J. Monte Carlo ray tracing in optical canopy reflectance modelling. Remote Sens. Rev
**2000**, 18, 197–226. [Google Scholar] - Barton, C.V.M.; North, P.R.J. Remote sensing of canopy light use efficiency using the photochemical reflectance index: Model and sensitivity analysis. Remote Sens. Environ
**2001**, 78, 164–273. [Google Scholar] - North, P.R.J.; Rosette, J.A.B.; Suárez, J.C.; Los, S.O. A Monte Carlo radiative transfer model of satellite waveform LiDAR. Int. J. Remote Sens
**2010**, 31, 1343–1358. [Google Scholar] - Widlowski, J.-L.; Pinty, B.; Disney, M.; Gastellu-Etchegorry, J.-P.; Lavergne, T.; Lewis, P.E.; North, P.R.J.; Pinty, B.; Thompson, R.; Verstraete, M.M. The RAMI on-line model checker (ROMC): A web-based benchmarking facility for canopy reflectance models. Remote Sens. Environ
**2008**, 112, 1144–1150. [Google Scholar] - Thenkabail, P.S.; Enclona, E.A.; Ashton, M.S.; van Der, M. Accuracy assessments of hyperspectral waveband performance for vegetation analysis applications. Remote Sens. Environ
**2004**, 91, 354–376. [Google Scholar] - Newcomer, J.; Landis, D.; Conrad, S.; Curd, S.; Huemmrich, K.; Knapp, D.; Morrell, A.; Nickeson, J.; Rinker, P.A.D.; Strub, R.; et al. Collected Data for the Boreas Ecosystem-Atmosphere Study; BOREAS Information System NASA/Goddard Space Flight Center: Greenbelt, MD, USA, 2000; (CD-ROM). [Google Scholar]
- Sellers, P.; Hall, F.; Kelly, R.; Black, A.; Baldocchi, D.; Berry, J.; Ryan, M.; Ranson, J.; Crill, K.; Lettenmaier, D.; et al. BOREAS. Experiment overview, scientific results and future directions. J. Geophys. Res
**1997**, 102, 28731–28770. [Google Scholar] - Gamon, J.; Huemmrich, K.; Peddle, D.; Chen, J.; Fuentes, D.; Hall, F.; Kimball, J.S.; Goetz, S.; Gu, J.; McDonald, K.C.; et al. Remote sensing in Boreas: Lessons learned. Remote Sens. Environ
**2004**, 89, 139–162. [Google Scholar] - Verrelst, J.; Schaepman, M.E.; Malenovsky, Z.; Clevers, J.G.P.W. Effects of woody elements on simulated canopy reflectance: Implications for forest chlorophyll content retrieval. Remote Sens. Environ
**2010**, 114, 647–656. [Google Scholar] - Kempeneers, P.; Zarco-Tejada, P.J.; North, P.R.J.; De Backer, S.; Delalieux, S.; Sepulcre-Cant, G.; Morales, F.; Van Aardt, J.A.N.; Sagardoy, R.; Coppin, P.; Scheunders, P. Model inversion for chlorophyll estimation in open canopies from hyperspectralimagery. Int. J. Remote Sens
**2008**, 29, 5093–5111. [Google Scholar] - Rock, J.; Puettmann, K.J.; Gockel, H.A.; Schulte, A. Spatial aspects of the influence of silver birch (Betula pendula L.) on growth and quality of young oaks (Quercus spp.) in central Germany. Forestry
**2004**, 77, 235–247. [Google Scholar] - Grotem, R.; Reiter, I.M. Competition-dependent modelling of foliage biomass in forest stands. Trees
**2004**, 18, 596–607. [Google Scholar] - Yang, Z.; Shi, R. Calculation of Mesophyll Structure Parameter and Its Effect on Leaf Spectral Reflectance. Proceedings of the 2005 IEEE International Geoscience and Remote Sensing Symposium (IGARSS ’05), Seoul, Korea, 25–29 July 2005; pp. 1299–1301.
- Baumgardner, M.F.; LeRoy, F.S.; Biehl, L.L.; Stoner, E.R. Reflectance properties of soils. Adv. Agron
**1985**, 38, 1–42. [Google Scholar] - North, P.R.J. Estimation of f, LAI, and vegetation fractional cover from ATSR-2 imagery. Remote Sens. Environ
**2002**, 80, 114–121. [Google Scholar] - Moses, W.J.; Bowles, J.H.; Lucke, R.L.; Corson, M.R. Impact of signal-to-noise ratio in a hyperspectral sensor on the accuracy of biophysical parameter estimation in case II waters. Opt. Express
**2012**, 20, 4309–4330. [Google Scholar]

## Appendix

#### A.1. List of f-Divergence Information Measures

Additional information about the measures below can be found in [31]. List of measures with minimum at f(1).

Let f(x) = x

^{2}and f(1) = 1 then$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}\frac{{p}_{l}^{2}}{{q}_{l}}=\sum _{l=1}^{n}\frac{{\left({p}_{l}-{q}_{l}\right)}^{2}}{{q}_{l}}$$Let f(x) = (x − 1) ln(x) and f(1) = 0, then

$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}\left({p}_{l}-{q}_{l}\right)\left(\text{ln}\left({p}_{l}\right)-\text{ln}\left({q}_{l}\right)\right)$$K-divergence of Lin is used to analyze of contingency tables. It corresponds to the function $f\left(x\right)=x\text{ln}\left(x\right)-x\text{ln}\left(\frac{1+x}{2}\right)$ and f(1) = 0 and takes the form

$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{p}_{l}\text{ln}\left(\frac{2{p}_{l}}{{p}_{l}+{q}_{l}}\right)$$L-divergence of Lin is a symmetric version of K-divergence. It corresponds to the function $f\left(x\right)=x\text{ln}\left(x\right)-\left(1+x\right)\text{ln}\frac{1+x}{2}$ and f(1) = 0, thus

$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{p}_{l}\text{ln}\left({p}_{l}\right)+{q}_{l}\text{ln}\left({q}_{l}\right)-\left({p}_{l}+{q}_{l}\right)\text{ln}\left(\frac{{p}_{l}+{q}_{l}}{2}\right)$$Let f(x) = x

^{α}and f(1) = 1 and x > 0 then$${\mathrm{\Gamma}}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{q}_{l}{\left(\frac{{p}_{l}}{{q}_{l}}\right)}^{\alpha}=\sum _{l=1}^{n}{p}_{l}^{\alpha}{q}_{l}^{1-\alpha}$$$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{p}_{l}^{\alpha}{q}_{l}^{1-\alpha}-1\ge 0$$$${D}_{f}\left[P,Q\right]=\frac{{\sum}_{l=1}^{n}{p}_{l}^{\alpha}{q}_{l}^{1-\alpha}}{{e}^{\alpha -1}-1}$$We can use a higher order cross-entropy, or the so-called cross-entropy of order α

$${D}_{f}\left[P,Q\right]=\frac{1}{\alpha -1}\text{ln}\left(\sum _{l=1}^{n}{p}_{l}^{\alpha}{q}_{l}^{1-\alpha}\right),\alpha >0,\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\alpha \ne 1$$Liese and Vajda [42] proposed bounded asymptotic Rényi measure, which has application in signal detection problem. For all α ≠ 0, 1 we define

$${D}_{f}\left[P,Q\right]=\frac{1}{\alpha \left(\alpha -1\right)}\text{ln}\left(\sum _{l=1}^{n}{p}_{l}^{\alpha}{q}_{l}^{1-\alpha}\right)$$The harmonic Toussaint measure corresponds to the function $f\left(x\right)=x\frac{x-1}{x+1}$ and f(1) = 0. It has application in measuring the dissimilarity between musical rhythms, music information retrieval and copyright infringement resolution to computational music theory and evolutionary studies of music. The measure has the following form

$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}\left({p}_{l}-\frac{2{p}_{l}{q}_{l}}{{p}_{l}+{q}_{l}}\right)$$

The negative exponential disparity measure is used as an estimator that is asymptotically fully efficient and is robust against outliers and inliers, see [32]:

$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}{q}_{l}\left(\mathit{exp}\left(-\frac{{p}_{l}-{q}_{l}}{{q}_{l}}\right)-1\right)$$The Bregman divergences [43] are not full distance measures, because they does not satisfy the triangle inequality and they are not symmetric. Bregman divergences are important for two reasons: Firstly, they generalize squared Euclidean distances to a class of distances that all share similar properties. Secondly, they bear a strong connection to exponential families of distributions.

$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}\left({q}_{l}^{a}+\frac{1}{a-1}{p}_{l}^{a}-\frac{1}{a-1}{p}_{l}{q}_{l}^{a-1}\right),\hspace{0.17em}\hspace{0.17em}a\ne 1$$$${D}_{f}\left[P,Q\right]=\sum _{l=1}^{n}\left({p}_{l}-{q}_{l}\right)\left[{p}_{l}^{a-1}-{q}_{l}^{a-1}\right]$$The powered Pearson divergence is reasonably efficient and robust [44], and has wide applicability in genetics.

$${D}_{f}\left[P,Q\right]=\frac{1}{2{\alpha}^{2}}\sum _{l=1}^{n}{q}_{l}{\left(\frac{\left\{{p}_{l}{}^{\alpha}-{q}_{l}^{\alpha}\right\}}{{q}_{l}^{\alpha}}\right)}^{2},\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\alpha \in \left(0,1\right]$$The Cressie and Read power divergence [45] results in stable disparity measures and solutions when outliers are added to the data. Under certain general conditions this estimator has asymptotic breakdown points of 50%.

$${D}_{f}\left[P,Q\right]=\frac{1}{\alpha \left(\alpha +1\right)}\left[\sum _{l=1}^{n}{p}_{l}{\left[{p}_{l}/{q}_{l}\right]}^{\alpha}-1\right],\hspace{0.17em}\hspace{0.17em}-\infty <\alpha <\infty $$The Sharma and Mittal divergences [46] are two generalizations of the Kullback–Leibler measure. One is called α-order and β-degree divergence measure and the other is called 1-order and β-degree divergence measure

$${D}_{f}\left[P,Q\right]=\frac{1}{\left(\beta -1\right)}\left[{\left(\sum _{l=1}^{n}{p}_{l}^{\alpha}{q}_{l}^{1-\alpha}\right)}^{\frac{\beta -1}{\alpha -1}}-1\right],\hspace{0.17em}\hspace{0.17em}\alpha ,\beta \ne 1$$$${D}_{f}\left[P,Q\right]=\frac{1}{\left(\beta -1\right)}\left[\mathit{exp}\left(\left(\beta -1\right)\sum _{l=1}^{n}{p}_{l}\mathit{log}\left(\frac{{p}_{l}}{{q}_{l}}\right)\right)-1\right],\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\beta \ne 1$$Finally, the β-divergence [47], is given by

$${D}_{f}\left[P,Q\right]=\{\begin{array}{cc}{\sum}_{l=1}^{n}\left(\frac{1}{\beta \left(\beta -1\right)}\right)\left({p}_{l}^{\beta}+\left(\beta -1\right){q}_{l}^{\beta}-\beta {p}_{l}{q}_{l}^{\beta -1}\right){p}_{l}/{q}_{l}-\mathit{log}\left[{p}_{l}/{q}_{l}\right]-1,& \beta \in \mathbb{R}/\left(0,1\right)\\ {\sum}_{l=1}^{n}{p}_{l}\left(\mathit{log}\left[{p}_{l}\right]-\mathit{log}\left[{q}_{l}\right]\right)+\left({q}_{l}-{p}_{l}\right),& \beta =1\\ {\sum}_{l=1}^{n}{p}_{l}/{q}_{l}-\mathit{log}\left[{p}_{l}/{q}_{l}\right]-1,& \beta =0\end{array}$$

This divergence was introduced by Itakura–Saito for the estimation of short-time speech spectra using an autoregressive model. It became popular in speech and acoustics research and it was applied to denoising and up-mix (mono to stereo conversion) of music.

#### A.2. List of the Measures between Two Probability Distributions

List of (h, f)-measures [18]:

Sharma–Mittal divergence [46] with

$$h\left(x\right)=\frac{1}{\left(\beta -1\right)}\left({\left(1+\alpha \left(\alpha -1\right)x\right)}^{\frac{\beta -1}{\alpha -1}}-1\right);\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}f\left(x\right)=\frac{{x}^{\alpha}-\alpha \left(x-1\right)-1}{\alpha \left(\alpha -1\right)};\hspace{0.17em}\hspace{0.17em}\alpha \ne 0,1;$$$${D}_{f}^{h}\left[P,Q\right]=\frac{1}{\left(\beta -1\right)}\left({\left[1+\sum _{l=1}^{n}{q}_{l}{\left({p}_{l}/{q}_{l}\right)}^{\alpha}-\alpha \left({p}_{l}-{q}_{l}\right)-{q}_{l}\right]}^{\frac{\beta -1}{\alpha -1}}-1\right)$$Bhattacharyya divergence has interesting application in signal selection and it has the following form

$$h\left(x\right)=-\mathit{log}\left(-x+1\right);\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}f\left(x\right)=-{x}^{1/2}+\left(1/2\right)\left(x+1\right)$$$${D}_{f}^{h}\left[P,Q\right]=-\mathit{log}\left(1+\sum _{l=1}^{n}\sqrt{{p}_{l}{q}_{l}}-\frac{1}{2}\left({p}_{l}+{q}_{l}\right)\right)$$

Shannon (1948) [26]

$$f\left(x\right)=-x\mathit{log}x,h\left(x\right)=x$$$${D}_{f}^{h}\left(P,Q\right)=-\sum _{l=1}^{n}\left(\frac{{p}_{l}+{q}_{l}}{2}\right)\mathit{log}\left(\frac{{p}_{l}+{q}_{l}}{\mathit{2}}\right)+\frac{1}{2}\left(\sum _{l=1}^{n}{p}_{l}\mathit{log}\left({p}_{l}\right)+\sum _{l=1}^{n}{q}_{l}\mathit{log}\left({q}_{l}\right)\right)$$Rényi (1961) [48]

$$f\left(x\right)={x}^{\alpha},\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}h\left(x\right)=\left[\frac{1}{\alpha \left(1-\alpha \right)}\mathit{log}x\right],\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\alpha \ne 0,1$$$${D}_{f}^{h}\left(P,Q\right)=\left(\frac{1}{\alpha \left(1-\alpha \right)}\right)\left[\mathit{log}\left(\sum _{l=1}^{n}{\left(\frac{{p}_{l}+{q}_{l}}{2}\right)}^{\alpha}\right)-\frac{1}{2}\left(\mathit{log}\sum {\left({p}_{l}\right)}^{\alpha}+\mathit{log}\sum _{l=1}^{n}{\left({q}_{l}\right)}^{\alpha}\right)\right]$$Varma (1966) [18]

$$f\left(x\right)={x}^{\alpha -\beta +1},\hspace{0.17em}\hspace{0.17em}h\left(x\right)=\left[\frac{1}{\beta -\alpha}\mathit{log}x\right],\hspace{0.17em}\hspace{0.17em}\beta -1<\alpha <\beta ,\hspace{0.17em}\hspace{0.17em}\beta \u2a7e1$$$${D}_{f}^{h}\left(P,Q\right)=\left(\frac{1}{\beta -\alpha}\right)\left[\mathit{log}\left(\sum _{l=1}^{n}{\left(\frac{{p}_{l}+{q}_{l}}{2}\right)}^{\alpha -\beta +1}\right)-\frac{1}{2}\left(\mathit{log}\sum {\left({p}_{l}\right)}^{\alpha -\beta +1}+\mathit{log}\sum _{l=1}^{n}{\left({q}_{l}\right)}^{\alpha -\beta +1}\right)\right]$$Varma (1966) [18]

$$f\left(x\right)={x}^{\alpha /\beta},\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}h\left(x\right)=\left[\frac{1}{\beta \left(\beta -\alpha \right)}\mathit{log}x\right],\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}0<\alpha <\beta ,\hspace{0.17em}\hspace{0.17em}\beta \u2a7e1$$$${D}_{f}^{h}\left(P,Q\right)=\left(\frac{1}{\beta \left(\beta -\alpha \right)}\right)\left[\mathit{log}\left(\sum _{l=1}^{n}{\left(\frac{{p}_{l}+{q}_{l}}{2}\right)}^{\alpha /\beta}\right)-\frac{1}{2}\left(\mathit{log}\sum {\left({p}_{l}\right)}^{\alpha /\beta}+\mathit{log}\sum _{l=1}^{n}{\left({q}_{l}\right)}^{\alpha /\beta}\right)\right]$$Havrda and Charvat (1967) [18]

$$f\left(x\right)=\frac{1}{1-\alpha}\left({x}^{\alpha}-x\right),\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}h\left(x\right)=x,\hspace{0.17em}\hspace{0.17em}\alpha >0,\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\alpha \ne 1$$$${D}_{f}^{h}\left(P,Q\right)=\left(\frac{1}{1-\alpha}\right)\left[\left(\sum _{l=1}^{n}{\left(\frac{{p}_{l}+{q}_{l}}{2}\right)}^{\alpha}-\left(\frac{{p}_{l}+{q}_{l}}{2}\right)\right)-\frac{1}{2}\left(\sum \left({p}_{l}{}^{\alpha}-{p}_{l}\right)+\sum \left({q}_{l}{}^{\alpha}-{q}_{l}\right)\right)\right]$$Sharma and Mittal [46]

$$f\left(x\right)=x\mathit{log}x,\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}h\left(x\right)=\frac{\mathit{exp}\left(\left(\alpha -1\right)x\right)-1}{\left(1-\alpha \right)},\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\alpha >0,\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\alpha \ne 1$$$$\begin{array}{l}Df\left(P,Q\right)=\left(\frac{1}{1-\alpha}\right)\mathit{exp}\left(\left(\alpha -1\right)\sum _{l=1}^{n}\left(\left(\frac{{p}_{l}+{q}_{l}}{2}\right)\mathit{log}\left(\frac{{p}_{l}+{q}_{l}}{2}\right)\right)\right)\\ -\left(\frac{1}{1-\alpha}\right)\left[\frac{1}{2}\left(\mathit{exp}\left(\left(\alpha -1\right)\sum _{l=1}^{n}\left({p}_{l}\mathit{log}{p}_{l}\right)\right)+\mathit{exp}\left(\left(\alpha -1\right)\sum _{l=1}^{n}\left({q}_{l}\mathit{log}{q}_{l}\right)\right)\right)\right]\end{array}$$Sharma and Mittal [46]

$$f\left(x\right)={x}^{\beta},\hspace{0.17em}\hspace{0.17em}h\left(x\right)=\frac{1}{\left(1-\alpha \right)}\left({x}^{\frac{\alpha -1}{\beta -1}}-1\right)\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\alpha >0,\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\beta >0\alpha ,\beta \ne 1$$$${D}_{f}^{h}\left(P,Q\right)=\left(\frac{1}{1-\alpha}\right)\left[{\left(\sum _{l=1}^{n}{\left(\frac{{p}_{l}+{q}_{l}}{2}\right)}^{\beta}\right)}^{\frac{\alpha -1}{\beta -1}}-\frac{1}{2}\left({\left[\sum _{l=1}^{n}\left({p}_{l}{}^{\beta}\right)\right]}^{\frac{\alpha -1}{\beta -1}}+{\left[\sum _{l=1}^{n}\left({q}_{l}{}^{\beta}\right)\right]}^{\frac{\alpha -1}{\beta -1}}\right)\right]$$Ferreri (1980) [18]

$$f\left(x\right)=\left(1+\alpha x\right)\mathit{log}\left(1+\alpha x\right),\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}h\left(x\right)=\left(1+\frac{1}{\alpha}\right)\mathit{log}\left(1+\alpha \right)-\frac{x}{\alpha},\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\alpha >0$$$$\begin{array}{l}{D}_{f}^{h}\left(P,Q\right)=-\frac{1}{\alpha}\left(\sum _{l=1}^{n}\left(1+\alpha \left[\frac{{p}_{l}+{q}_{l}}{2}\right]\right)\mathit{log}\left(1+\alpha \left[\frac{{p}_{l}+{q}_{l}}{2}\right]\right)\right)\\ +\frac{1}{\alpha}\left[\frac{1}{2}\left(\sum _{l=1}^{n}\left(1+\alpha {p}_{l}\right)\mathit{log}\left(1+\alpha {p}_{l}\right)+\sum _{l=1}^{n}\left(1+\alpha {q}_{l}\right)\mathit{log}\left(1+\alpha {q}_{l}\right)\right)\right]\end{array}$$Kapur (1972) [18]

$$f\left(x\right)=\frac{{x}^{\alpha}+{\left(1-x\right)}^{\alpha}-1}{\left(1-\alpha \right)},\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}h\left(x\right)=x,\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\alpha \ne 1$$$$\begin{array}{l}{D}_{f}^{h}\left(P,Q\right)=\left(\frac{1}{1-\alpha}\right)\left(\sum _{l=1}^{n}{\left(\frac{{p}_{l}+{q}_{l}}{2}\right)}^{\alpha}+{\left(1-\left(\frac{{p}_{l}+{q}_{l}}{2}\right)\right)}^{\alpha}-1\right)\\ -\left(\frac{1}{1-\alpha}\right)\left[\frac{1}{2}\left(\left[\sum _{l=1}^{n}{p}^{\alpha}+{\left(1-{p}_{l}\right)}^{\alpha}-1\right]+\left[\sum _{l=1}^{n}{q}^{\alpha}+{\left(1-{q}_{l}\right)}^{\alpha}-1\right]\right)\right]\end{array}$$Burbea (1984) [18]

$$f\left(x\right)=\frac{{x}^{\alpha}-{\left(1-x\right)}^{\alpha}+1+{\left(\alpha -1\right)}^{-1}\left({2}^{\alpha}-2\right)x}{\left(\alpha -2\right)},\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}h\left(x\right)=x,\hspace{0.17em}\hspace{0.17em}\alpha >0,\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\alpha \ne 1$$$$\begin{array}{c}{D}_{f}^{h}\left(P,Q\right)=\left(\frac{1}{\alpha -2}\right)[\sum _{l=1}^{n}{\left(\frac{{p}_{l}+{q}_{l}}{2}\right)}^{\alpha}-{\left(1-\left(\frac{{p}_{l}+{q}_{l}}{2}\right)\right)}^{\alpha}+1+{\left(\alpha -1\right)}^{-1}\left({2}^{\alpha}-2\right)\left(\frac{{p}_{l}+{q}_{l}}{2}\right)]\\ -\frac{1}{2}\left(\frac{1}{\alpha -2}\right)\left[\sum _{l=1}^{n}{p}^{\alpha}-{\left(1-{p}_{l}\right)}^{\alpha}+1+{\left(\alpha -1\right)}^{-1}\left({2}^{\alpha}-2\right){p}_{l}\right]\\ -\frac{1}{2}\left(\frac{1}{\alpha -2}\right)\left[\sum _{l=1}^{n}{q}^{\alpha}-{\left(1-{q}_{l}\right)}^{\alpha}+1+{\left(\alpha -1\right)}^{-1}\left({2}^{\alpha}-2\right){q}_{l}\right]\end{array}$$

#### A.3. List of the Blended f-Disparities

More information of the measures below could be found in [23].

Pearson–Neyman blend with corresponding blended divergence

$${D}_{\beta}\left(P,Q\right):=\frac{1}{2}\sum _{l=1}^{n}\frac{{\left({p}_{l}-{q}_{l}\right)}^{2}}{\beta {p}_{l}+\left(1-\beta \right){q}_{l}}$$$$0\le {D}_{f}\left(P,Q\right)\le \frac{1}{2}\frac{1}{1-\beta}+\frac{1}{2\beta}$$Blended power divergence-variant A. For a ∈ R − {0, 1} we have

$${D}_{a,\beta}\left(P,Q\right)=\frac{1}{a\left(a-1\right)}\sum _{l=1}^{n}\frac{{p}_{l}^{a}+{q}_{l}^{a}}{{\left(\beta {p}_{l}+\left(1-\beta \right){q}_{l}\right)}^{a-1}}-2,\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\ne a0,1$$_{0}_{,β}(P, Q) are unbounded for all β ∈ [0, 1), which corresponds to the reversed Kullback blend. And D_{1}_{,β}(P, Q) for all β ∈ (0, 1)- Kullback-reversed blend is bounded and$$0\le {D}_{f}\left(P,Q\right)\le -\text{ln}\left(1-\beta \right)-\text{ln}\beta $$$${D}_{1,\beta}\left(P,Q\right)=\sum _{l=1}^{n}{p}_{l}\text{ln}\left(\frac{{p}_{l}}{\beta {p}_{l}+\left(1-\beta \right){q}_{l}}\right)+{q}_{l}\text{ln}\left(\frac{{q}_{l}}{\beta {p}_{l}+\left(1-\beta \right){q}_{l}}\right)$$Blended power divergence-variant B. For 0 < |a| < 1 and β ∈ (0, 1)

$$Da,\beta \left(P,Q\right)=-\mathit{sign}\left(a\right)\sum _{l=1}^{n}\frac{\left(\beta {p}_{l}+\left(1-\beta \right){q}_{l}\right){p}_{l}^{a}+{q}_{l}^{a+1}}{{\left(\beta {p}_{l}+\left(1-\beta \right){q}_{l}\right)}^{a}},\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}0<\left|a\right|<1$$

#### A.4. List of M-Estimates

Information of the measures below is based on [35–37].

More general estimates with Laplace distribution

$$\rho \left(x\right)={\left|x\right|}^{p},\psi \left(x\right)=p\mathit{sign}\left(x\right){\left|x\right|}^{p-1},1\le p\le 2$$_{p}-estimates. It has been used in statistics of speech and image data processing, especially when observed in a transform domain like the wavelet or discrete Fourier transform domains. For example, the over complete wavelet transform coefficients of images are found to have sparse distributions, a property that has been extensively exploited in coding and denoising. It appears that p must be fairly moderate to provide a relatively robust estimator or, in other words, to provide an estimator scarcely perturbed by outlying data.For positive α, the function

$$\rho \left(x\right)=\frac{1}{2\alpha}-\frac{\mathit{exp}\left(-\alpha {x}^{2}\right)}{2\alpha},\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\psi \left(x\right)=x\text{exp}\left\{-\alpha {x}^{2}\right\}$$For ν, s > 0, we can determine the trigonometric and the hyperbolic estimators

$$\rho \left(x\right)=\nu \left(x\mathit{arctan}\left(sx\right)-\frac{\mathit{log}\left({s}^{2}{x}^{2}+1\right)}{2s}\right),\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\psi \left(x\right)=\nu \text{arctan}\left(sx\right)$$$$\rho \left(x\right)=\nu \frac{\mathit{log}\left(\mathit{cosh}\left(sx\right)\right)}{s},\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\psi \left(x\right)=\nu \text{tanh}\left(sx\right)$$For the Cauchy distribution f(x) = 1/(π(1 + x

^{2})) we have$$\rho \left(x\right)=\frac{c2}{2}\mathit{log}\left(\left({x}^{2}/{c}^{2}\right)+1\right),\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\psi \left(x\right)=\frac{2x}{1+{x}^{2}/{c}^{2}}$$Latter influence functions trimmed at 0 < c < ∞ are presented in the form

$$\psi \left(x\right)=x{\mathit{1}}_{\left(-c,c\right)}\left(x\right),\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\psi \left(x\right)={\mathit{1}}_{\left(-c,c\right)}\left(x\right)\mathit{sign}\left(x\right)$$The Welsh distance has the form

$$\rho \left(x\right)=\frac{{c}^{2}}{2}\left(1-\mathit{exp}\left(-{x}^{2}/{c}^{2}\right)\right)$$The Geman and McClure function tries to reduce the effect of large errors further, but it also cannot guarantee unicity.

$$\rho \left(x\right)={x}^{2}{\left(1+{x}^{2}\right)}^{-1}$$The Tukey function also encounters the problem of unicity; it can be written as

$$\rho \left(x\right)=\{\begin{array}{cc}\frac{{c}^{2}}{6}\left(3{x}^{2}-3{x}^{4}+{x}^{6}\right),& \left|x\right|\le c\\ {c}^{2}/6& \left|x\right|>c\end{array}$$The Huber function has the following form

$${\rho}_{c}\left(x\right)=\{\begin{array}{cc}\frac{1}{2}{x}^{2},& \left|x\right|<c\\ c\left|x\right|-\frac{1}{2}{c}^{2},& \left|x\right|\ge c\end{array}$$$${\psi}_{c}\left(x\right)=\text{max}\left\{\text{min}\left(x,c\right),-c\right\},c>0$$_{c}(x) can be approximated by a twice differentiable score function. Huber’s M-estimation has been applied in GPS positioning and modeling of complex technical experiments where it reduces the effect of outliers. This estimator is so satisfactory that it has been recommended for almost all situations, however, from time to time, difficulties are encountered, which may be related to a lack of stability.

#### A.5. Minimum Contrast Estimation List

Detailed information of the following measures can be found in [20,21].

let K(x) = −logx + x, then

$$L\left({f}_{\theta},g\right)=\sum _{{\lambda}_{j}\in \mathrm{\Lambda}}\left\{-\mathit{log}\left({f}_{\theta}\left({\lambda}_{j}\right)/g\left({\lambda}_{j}\right)\right)+{f}_{\theta}\left({\lambda}_{j}\right)/g\left({\lambda}_{j}\right)\right\}$$let K(x) = (logx)

^{2}, then$$L\left({f}_{\theta},g\right)=\sum _{{\lambda}_{j}\in \mathrm{\Lambda}}{\left\{\mathit{log}{f}_{\theta}\left({\lambda}_{j}\right)-\mathit{log}\left(g\left({\lambda}_{j}\right)\right)\right\}}^{2}$$let K(x) = xlogx − x, then

$$L\left({f}_{\theta},g\right)=\sum _{{\lambda}_{j}\in \mathrm{\Lambda}}{f}_{\theta}\left({\lambda}_{j}\right)g{\left({\lambda}_{j}\right)}^{-1}\left\{\mathit{log}\left({f}_{\theta}\left({\lambda}_{j}\right)g{\left({\lambda}_{j}\right)}^{-1}\right)-1\right\}$$let K(x) = (x

^{α}− 1)^{2}, where 0 < α < ∞ then$$L\left({f}_{\theta},g\right)={\sum _{{}^{{\lambda}_{j}\in \mathrm{\Lambda}}}\left\{{\left({f}_{\theta}\left({\lambda}_{j}\right)/g\left({\lambda}_{j}\right)\right)}^{\alpha}-1\right\}}^{2}$$

**Figure 1.**Comparison of residual errors resulting from the application of two different distance measures used to estimate biophysical parameters for broad leaf canopies with a look-up table (LUT). Biophysical parameters, including leaf area index, LAI, fraction of photosynthetically active radiation absorbed by the canopy, f

_{APAR}, and vegetation cover fraction, f

_{V}, are estimated from simulated ground reflectance values. The rows from top to bottom show the comparison of residual errors in estimating LAI with the least squares estimate (LSE) and Hellinger Equation (10) (top row), of residual errors in f

_{V}with LSE and the power divergence Equation (12) with power 1 and j = 4 (centre row) and of residual errors in f

_{APAR}with LSE and the Hellinger equation (bottom row). The columns from left to right show frequency distributions of residual errors using LSE as a cost function (left column) and the residual errors using either a Hellinger Equation (10) for LAI and f

_{APAR}or a power divergence Equation (12) with power 1 and j = 4 for f

_{V}(centre column). The right column shows the quantile–quantile plots comparing the frequency distributions of the left and centre columns expressed as absolute errors. The errors in biophysical parameters associated with the alternative cost functions are smaller and better behaved (more symmetrical and smaller bias).

**Figure 2.**Comparison of residual errors resulting from the application of two different distance measures used to estimate biophysical parameters for needleleaf canopies with a look-up table (LUT). Biophysical parameters, including leaf area index, LAI, fraction of photosynthetically active radiation absorbed by the canopy, f

_{APAR}, and vegetation cover fraction, f

_{V}, are estimated from simulated ground reflectance values. The rows from top to bottom show the comparison of residual errors in estimating LAI with the least squares estimate (LSE) and power Equation (13) with power 2 and α = −5 (top row), of residual errors in f

_{V}with LSE and the power divergence Equation (12) with power 1 and j = 4 (centre row), and of residual errors in f

_{APAR}with LSE and the power Equation (13) with power 2 and α = −5 (bottom row). The columns from left to right show frequency distributions of residual errors using LSE as a cost function (left column) and the residual errors using either a power equation for LAI and f

_{APAR}or a power divergence Equation ((12)) with power 1 and j = 4 for f

_{V}(centre column). The right column shows the quantile–quantile plots comparing the frequency distributions of the left and centre columns expressed as absolute errors. The errors in biophysical parameters associated with the alternative cost functions are smaller and better behaved (more symmetrical and smaller bias).

**Table 1.**PROSPECT input parameters. N is the leaf structure parameter; C

_{ab}are the chlorophyll a + b concentrations (μg/cm

^{2}); Cw is the equivalent water thickness (g/cm

^{2}); and Cm is the dry matter content (g/cm

^{2}).

CONIFER | BROADLEAF | ||||
---|---|---|---|---|---|

OBS | OJP | YJP | Beech | Oak | |

N | 2.47 | 2.55 | 2.55 | 1.43 1.5 1.6 | 1.61 1.97 2.64 |

Cab | 29 19.39 27.56 | 27.07 13.10 24.27 | 21.89 19 29.03 | 44.7 | 65.1 |

Cw | 0.04 | 0.01 | 0.03 | 0.02 | 0.008 |

Cm | 0.028 | 0.012 | 0.012 | 0.003 | 0.006 |

**Table 2.**FLIGHT input parameters. The column “range” represents the minimum and maximum values of the input parameters; column “step” is the increment for the LUT; column “Observed” represents the range over which a random number “r.n.” is selected to generate satellite observed BRF, R

^{*}(λ

_{1}, ..., λ

_{n}). Thus the simulations and LUT are generated from different input parameters; in particular, different values are used for viewing geometry, soil reflectance, fractional cover, and leaf area index (Section 6.4).

Parameter | Range | Step | “Observed” |
---|---|---|---|

Solar zenith angle | 30°–70° | 10° | r.n.∈30°–70° |

View zenith angle | 0°–60° | 10° | r.n ∈0°–60° |

Relative azimuth angle | 0°–180° | 30° | r.n.∈0°–180° |

Fraction of green leaves | 0.8 | - | same |

Fraction of shoot material | 0.05 | - | same |

Fraction of bark in foliage | 0.15 | - | same |

Leaf angle distribution | Spruce leaf, Spherical | - | same |

Soil roughness index | 0 | - | 0 |

Soil Reflectance | sandy loam | drummer2, jal, lonrina, onaway, talbott | |

Frac. cover by trees | 0.1–0.9 | 0.1 | r.n.∈0.0–0.9 |

LAI | 0–7, LAI ≤ 8FC | 1 | r.n.∈0.0–7.0 |

**Table 3.**FLIGHT input parameters: Crown shapes, where “c” represents cone shape and “e” represents ellipsoid shape.

Parameter | Conifer forest | Broadleaf forest | |||
---|---|---|---|---|---|

Crown shape | cone | ellipsoid | |||

type of forest | OBS | OJP | YJP | Beech | Oak |

crown shape | “c” | “c” | “c” | “e” | “e” |

Crown radius (m) | 0.45 | 1.3 | 0.85 | 1.2 | 2.6 |

Crown center to top dist (m) | 9 | 7.2 | 4 | 4.2 | 3.2 |

Minimum height to first branch (m) | 0.49 | 6.9 | 0.49 | 6.4 | 7.1 |

Maximum height to first branch (m) | 0.51 | 7.1 | 0.51 | 10.2 | 9.2 |

**Table 4.**Summary statistics for the performance of different distance measures in estimating biophysical parameters from ground reflectance data (BRF); reflectances were simulated for broadleaf trees.

Distance | Type | Parameters | Error LAI | Error f_{V} | Error f_{APAR} |
---|---|---|---|---|---|

Equation (7) Kullback–Leibler | inf. meas. | – | 0.63 | 0.25 | 0.10 |

Equation (10) Hellinger | inf. meas. | 0.61 | 0.26 | 0.09 | |

Equation (12) power | inf. meas. | j = 4 | 0.69 | 0.24 | 0.12 |

Equation (11) Gen Hellinger | inf. meas. | j = 2 | 0.66 | 0.26 | 0.10 |

Equation (14) Rényi | inf. meas. | α = 0.5 | 0.63 | 0.26 | 0.10 |

Equation (21) Blended Gen Helling | inf. meas. | β = 0.9 | 0.63 | 0.26 | 0.10 |

Equation (20) Arimoto | inf. meas. | α = 0.8 | 0.62 | 0.26 | 0.10 |

Equation (25) LSE | M-estim | – | 1.79 | 0.34 | 0.21 |

Equation (26) Koenker -B. | M-estim | α = 0.99 | 0.92 | 0.29 | 0.12 |

Equation (35) | min. cont. meth. | – | 1.01 | 0.26 | 0.16 |

**Table 5.**Summary statistics for the estimation of biophysical parameters from ground reflectance values (BRF) for needleleaf canopies.

Distance | Type | Parameters | Error LAI | Error f_{V} | Error f_{APAR} |
---|---|---|---|---|---|

Equation (9) Vajda | inf. meas. | α = 3 | 0.74 | 0.21 | 0.08 |

Equation (8) Pearson χ^{2} | inf. meas. | – | 0.69 | 0.22 | 0.07 |

Equation (13) power | inf. meas. | α = −5 | 0.68 | 0.22 | 0.07 |

Equation (12) power | inf. meas. | j = 4 | 0.77 | 0.19 | 0.08 |

Equation (14) Rényi | inf. meas. | α = 0.5 | 0.70 | 0.22 | 0.078 |

Equation (21) Blend Helling | inf. meas. | β = 0.9 | 0.69 | 0.22 | 0.076 |

Equation (25) LSE | M-estim | – | 1.35 | 0.30 | 0.16 |

Equation (26) Koenker -B. | M-estim | α = 0.2 | 1.29 | 0.29 | 0.16 |

Equation (35) | min. cont. meth. | – | 0.95 | 0.28 | 0.12 |