Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory

Sasi, Sruthy; Natraj, Vijay; Molina García, Víctor; Efremenko, Dmitry S.; Loyola, Diego; Doicu, Adrian

doi:10.3390/rs12223724

Open AccessArticle

Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory

by

Sruthy Sasi

¹

,

Vijay Natraj

^2,*

,

Víctor Molina García

¹

,

Dmitry S. Efremenko

¹

,

Diego Loyola

¹

and

Adrian Doicu

¹

Remote Sensing Technology Institute, German Aerospace Center (DLR), 82234 Oberpfaffenhofen, Germany

²

Jet Propulsion Laboratory (NASA-JPL), California Institute of Technology, 4800 Oak Grove Drive, Pasadena, CA 91109, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2020, 12(22), 3724; https://doi.org/10.3390/rs12223724

Submission received: 1 October 2020 / Revised: 1 November 2020 / Accepted: 6 November 2020 / Published: 12 November 2020

(This article belongs to the Special Issue Advances of Remote Sensing Inversion)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The retrieval of aerosol and cloud properties such as their optical thickness and/or layer/top height requires the selection of a model that describes their microphysical properties. We demonstrate that, if there is not enough information for an appropriate microphysical model selection, the solution’s accuracy can be improved if the model uncertainty is taken into account and appropriately quantified. For this purpose, we design a retrieval algorithm accounting for the uncertainty in model selection. The algorithm is based on (i) the computation of each model solution using the iteratively regularized Gauss–Newton method, (ii) the linearization of the forward model around the solution, and (iii) the maximum marginal likelihood estimation and the generalized cross-validation to estimate the optimal model. The algorithm is applied to the retrieval of aerosol optical thickness and aerosol layer height from synthetic measurements corresponding to the Earth Polychromatic Imaging Camera (EPIC) instrument onboard the Deep Space Climate Observatory (DSCOVR) satellite. Our numerical simulations show that the heuristic approach based on the thesolution minimizing the residual, which is frequently used in literature, is completely unrealistic when both the aerosol model and surface albedo are unknown.

Keywords:

model selection; retrieval; DSCOVR/EPIC

Graphical Abstract

1. Introduction

The limited information provided by satellite measurements does not allow for a complete determination of aerosol and cloud properties and, in particular, their microphysical properties. To deal with this problem, a set of models describing the microphysical properties is typically used. For example, the aerosol models implemented in the Moderate Resolution Imaging Spectroradiometer (MODIS) aerosol algorithm over land [1] consist of three spherical, fine-dominated types and one spheroidal, coarse-dominated type, while typical cloud models consist of water and ice clouds with a predefined particle shape and size. In this regard, a priori information such as the selection of a microphysical model is an essential part of the retrieval process. Standard retrieval algorithms usually ignore model uncertainty, i.e., a model is chosen from a given set of candidate models, and the retrieval is performed as if the selected model represents the true state. This approach is very risky and may result in large errors in the retrieved parameters if the model does not reflect the real scenario. An efficient way to quantify the uncertainty in model selection is the Bayesian approach and, in particular, Bayesian model selection and model averaging [2]. By model selection, we mean the specific problem of choosing the most suitable model from a given set of candidate models. In general, model selection is not a trivial task because for a given measurement, there will be several models that fit the data equally well. These tools were used, among others, in Refs. [3,4] to study uncertainty quantification in remote sensing of aerosols from Ozone Monitoring Instrument (OMI) data.

The key quantity in Bayesian model selection is relative evidence, which is a measure of how well the model fits the measurement. This is expressed in terms of the marginal likelihood, which is the integral over the state vector space of the a priori density times the likelihood density (see below). An accurate computation of the marginal likelihood is not a trivial task because some parameters characterizing the a priori and likelihood densities, e.g., the data error variance and the a priori state variance, are not precisely known and must be estimated. Moreover, the integral over the state vector space cannot be computed analytically because the likelihood density depends on the nonlinear forward model, for which an analytical representation is not available.

In this paper, we aim to eliminate the drawbacks of the Bayesian approach by using:

an iteratively regularized Gauss–Newton method for computing the solution of each model and estimating the ratio of the data error variance and the a priori state variance;
a linearization of the forward model around the solution for integrating the likelihood density over the state vector;
parameter choice methods from linear regularization theory for estimating the optimal model and the data error variance.

The paper is organized as follows. In Section 2, we derive a scaled data model with white noise in order to simplify the subsequent analysis. Section 3 is devoted to a review of the Bayesian parameter estimation and model selection in order to describe the problem and clarify the nomenclature. In Section 4, we summarize the iteratively regularized Gauss–Newton method and emphasize some special features of this method, including the estimation of the ratio of the data error variance and the a priori state variance. In Section 5, we assume a linearization of the forward model around the solution and, under this assumption, extend some regularization parameter choice methods for linear problems (the maximum marginal likelihood estimation and the generalized cross-validation) for data error variance estimation and model selection. Section 6 summarizes the computational steps of the proposed retrieval algorithm. In Section 7, we apply the algorithm to the retrieval of aerosol optical thickness and aerosol layer height from the Earth Polychromatic Imaging Camera (EPIC)/Deep Space Climate Observatory (DSCOVR) measurements.

2. Data Model

Consider

N_{m}

microphysical models, and let

F_{m} (x) \in R^{M}

be the forward model corresponding to the model m for

m = 1, \dots, N_{m}

. More specifically, in our analysis,

F_{m} (x)

is the vector of the logarithms of the simulated radiances at all wavelengths

λ_{i}

,

i = 1, \dots, M

, i.e.,

{[F_{m} (x)]}_{i} = ln I_{m} (λ_{i}, x)

, and

x \in R^{N}

is the state vector encapsulating the atmospheric parameters to be retrieved, e.g., aerosol optical thickness and layer height.

The nonlinear data model under examination consists of the equations:

y^{δ} = y + δ_{mes},

(1)

and:

y = F_{m} (x) + δ_{aer m}

(2)

where

y^{δ} \in R^{M}

is the measurement vector or the noisy data vector,

y

the exact data vector,

δ_{mes} \in R^{M}

the measurement error vector, and

δ_{aer m} \in R^{M}

the model error vector, i.e., the error due to an inadequate model. Defining the (total) data error vector by:

δ_{m} = δ_{mes} + δ_{aer m},

the data model (1) and (2) becomes:

y^{δ} = F_{m} (x) + δ_{m} .

(3)

In a deterministic setting,

δ_{m}

is characterized by the noise level

Δ_{m}

(defined as an upper bound for

δ_{m}

, i.e.,

‖ δ_{m} ‖ \leq Δ_{m}

),

x

is a deterministic vector, and we are faced with the solution of the nonlinear equation

y^{δ} = F_{m} (x)

. In a stochastic setting,

δ_{m}

and

x

are random vectors, and the data model (3) is solved by means of a Bayesian approach.

Using the prewhitening technique, we transform the data model (3) into a model with white noise. For this purpose, we take:

$δ_{mes}$ as a Gaussian random vector with zero mean and covariance matrix:

$\begin{matrix} C_{mes} & = diag {(σ_{mes i}^{2})}_{M} = σ_{mes}^{2} C_{mes}, \\ C_{mes} & = diag {(\frac{σ_{mes i}^{2}}{σ_{mes}^{2}})}_{M}, σ_{mes}^{2} = \frac{1}{M} \sum_{i = 1}^{M} σ_{mes i}^{2}, \end{matrix}$

and
$δ_{aer m}$ as a Gaussian random vector with zero mean and covariance matrix:

$C_{aer m} = σ_{aer m}^{2} I_{M} .$

If

δ_{mes}

and

δ_{aer m}

are independent random vectors, then

δ_{m}

is a Gaussian random vector with zero mean and covariance matrix:

\begin{matrix} C_{δ m} & = C_{mes} + C_{aer m} = σ_{m}^{2} C_{δ m}, \\ C_{δ m} & = w_{mes m} C_{mes} + (1 - w_{mes m}) I_{M}, \end{matrix}

where:

\begin{matrix} σ_{m}^{2} & = σ_{mes}^{2} + σ_{aer m}^{2} \end{matrix}

is the variance of the data error vector and:

\begin{matrix} w_{mes m} & = σ_{mes}^{2} / σ_{m}^{2} \end{matrix}

is the weighting factor giving the contribution of

C_{mes}

to the covariance matrix

C_{δ m}

. After these preliminary constructions, we introduce the scaled data model:

{\bar{y}}^{δ} = {\bar{F}}_{m} (x) + {\bar{δ}}_{m},

(4)

where:

{\bar{y}}^{δ} = P y^{δ}, {\bar{F}}_{m} (x) = P F_{m} (x), {\bar{δ}}_{m} = P δ_{m},

and:

P = C_{δ m}^{- 1 / 2} = diag {({[1 - w_{mes m} (1 - \frac{σ_{mes i}^{2}}{σ_{mes}^{2}})]}^{- 1 / 2})}_{M}

(5)

is the scaling matrix. Taking into account that:

{\bar{C}}_{δ m} = E {{\bar{δ}}_{m} {\bar{δ}}_{m k}^{T}} = σ_{m}^{2} P C_{δ m} P^{T} = σ_{m}^{2} I_{M},

we see that by means of the prewhitening technique, the Gaussian random vector

δ_{m} \sim N (0, C_{δ m} = σ_{m}^{2} C_{δ m})

is transformed into the white noise

{\bar{δ}}_{m} \sim N (0, {\bar{C}}_{δ m} = σ_{m}^{2} I_{M})

. Here and in the following, the notation

N (x_{mean}, C_{x})

stands for a normal distribution with mean

x_{mean}

and covariance matrix

C_{x}

.

The following comments can be taken into consideration:

In [3], the covariance matrix $C_{aer m}$ was estimated by empirically exploring a set of residuals of model fits to the measurements. Essentially, $C_{aer m}$ is assumed to be in the form:

${[C_{aer m}]}_{i j} = \{\begin{matrix} σ_{1}^{2} exp [- {(λ_{i} - λ_{j})}^{2} / l^{2}], i \neq j \\ σ_{0}^{2} + σ_{1}^{2}, i = j \end{matrix},$

(6)

where $σ_{0}^{2}$ , $σ_{1}^{2}$ , and l (representing the non-spectral diagonal variance, the spectral variance, and the correlation length, respectively) are computed by means of an empirical semivariogram, i.e., the variances of the residual differences are calculated for each wavelength pair with the distance $d = | λ_{i} - λ_{j} |$ , and a theoretical Gaussian variogram model is fitted to these empirical semivariogram values. As compared to Equation (6), in our analysis, we use the diagonal matrix approximation $C_{aer m} \approx σ_{aer m}^{2} I_{M}$ with $σ_{aer m}^{2} = σ_{0}^{2} + σ_{1}^{2}$ .
In principle, the scaling matrix $P$ depends on the model error variance $σ_{aer m}^{2}$ through the weighting factor $w_{mes m}$ . However, for ${[F_{m} (x)]}_{i} = ln I_{m} (λ_{i}, x)$ , we usually have $σ_{mes i}^{2} \approx σ_{mes}^{2}$ for all $i = 1, \dots, M$ . In this case, it follows that $C_{mes}$ , together with $C_{δ m}$ and $P$ , are close to the identity matrix. This result does not mean that in our model, $σ_{aer m}^{2}$ does not play any role; $σ_{aer m}^{2}$ is included in $σ_{m}^{2}$ , which is the subject of an estimation process.

3. Bayesian Approach

In this section, we review the basics of Bayesian parameter estimation and model selection by following the analysis given in Refs. [2,3].

3.1. Bayesian Parameter Estimation

In statistical inversion theory, the state vector

x

is assumed to be a random vector, and in this regard, we take

x \sim N (x_{a}, C_{x})

, where

x_{a}

is the a priori state vector, the best beforehand estimate of the solution, and:

\begin{matrix} C_{x} & = diag {(σ_{x i}^{2})}_{N} = σ_{x}^{2} C_{x}, \\ C_{x} & = diag {(\frac{σ_{x i}^{2}}{σ_{x}^{2}})}_{N}, σ_{x}^{2} = \frac{1}{N} \sum_{i = 1}^{N} σ_{x i}^{2} . \end{matrix}

Furthermore, the matrix

L

is defined through the Cholesky factorization:

C_{x}^{- 1} = L^{T} L, that is, L = diag {(\frac{σ_{x}}{σ_{x i}})}_{N},

and the parameter

α

is defined as:

α = \frac{σ_{m}^{2}}{σ_{x}^{2}} .

(7)

The data model (3) gives a relation between the three random vectors

{\bar{y}}^{δ}

,

x

, and

δ_{m}

, and therefore, their probability densities depend on each other. The following probability densities are relevant in Bayesian parameter estimation: (i) the a priori density

p (x ∣ m)

, which represents the conditional probability density of

x

given the model m before performing the measurement

{\bar{y}}^{δ}

, (ii) the likelihood density

p ({\bar{y}}^{δ} ∣ x, m)

, which represents the conditional probability density of

{\bar{y}}^{δ}

given the state

x

and the model m, and (iii) the a posteriori density

p (x ∣ {\bar{y}}^{δ}, m)

, which represents the conditional probability density of

x

given the data

{\bar{y}}^{δ}

and the model m.

The Bayes theorem of inverse problems relates the a posteriori density to the likelihood density:

\begin{matrix} p (x ∣ {\bar{y}}^{δ}, m) & = \frac{p ({\bar{y}}^{δ} ∣ x, m) p (x, m)}{p ({\bar{y}}^{δ}, m)} \\ = \frac{p ({\bar{y}}^{δ} ∣ x, m) p (x ∣ m)}{p ({\bar{y}}^{δ} ∣ m)} . \end{matrix}

In Bayesian parameter estimation, the marginal likelihood density

p ({\bar{y}}^{δ} ∣ m)

, defined by:

p ({\bar{y}}^{δ} ∣ m) = \int p (x, {\bar{y}}^{δ} ∣ m) d x = \int p ({\bar{y}}^{δ} ∣ x, m) p (x ∣ m) d x,

(8)

plays the role of a normalization constant and is usually ignored. However, as we will see in the next section, this density is of particular importance in Bayesian model selection.

For

x \sim N (x_{a}, C_{x} = σ_{m}^{2} {(α L^{T} L)}^{- 1})

and

{\bar{δ}}_{m} \sim N (0, {\bar{C}}_{δ m} = σ_{m}^{2} I_{M})

, the probability densities

p (x ∣ m)

and

p ({\bar{y}}^{δ} ∣ x, m)

are given by:

\begin{matrix} p (x ∣ m) & = \frac{1}{\sqrt{{(2 π σ_{m}^{2})}^{N} det [{(α L^{T} L)}^{- 1}]}} \\ \times exp [- \frac{α}{2 σ_{m}^{2}} ‖ L (x - x_{a}) ‖^{2}] \end{matrix}

(9)

and:

\begin{matrix} p ({\bar{y}}^{δ} ∣ x, m) & = \frac{1}{\sqrt{{(2 π σ_{m}^{2})}^{M}}} exp [- \frac{1}{2 σ_{m}^{2}} ‖ {\bar{y}}^{δ} - {\bar{F}}_{m} (x) ‖^{2}], \end{matrix}

(10)

respectively. The Bayes formula yields the following expression for the a posteriori density:

p (x ∣ {\bar{y}}^{δ}, m) \propto exp [- \frac{1}{2} V_{α} (x ∣ {\bar{y}}^{δ}, m)],

(11)

where the a posteriori potential

V_{α} (x ∣ {\bar{y}}^{δ}, m)

is defined by:

V_{α} (x ∣ {\bar{y}}^{δ}, m) = \frac{1}{σ_{m}^{2}} [{∥{\bar{y}}^{δ} - {\bar{F}}_{m} (x)∥}^{2} + α {∥L (x - x_{a})∥}^{2}] .

(12)

The maximum a posteriori estimator

{\hat{x}}_{m α}^{δ}

maximizing the conditional probability density

p (x ∣ {\bar{y}}^{δ}, m)

also minimizes the potential

V_{α} (x ∣ {\bar{y}}^{δ}, m)

, i.e.,

{\hat{x}}_{m α}^{δ} = arg min_{x} V_{α} (x ∣ {\bar{y}}^{δ}, m) .

(13)

3.2. Bayesian Model Selection

The relative evidence, also known as the a posteriori probability of the model m given the measurement

y^{δ}

,

p (m ∣ {\bar{y}}^{δ})

, is used for model comparison. By taking into consideration the Bayes theorem, this conditional probability density is defined by:

p (m ∣ {\bar{y}}^{δ}) = \frac{p ({\bar{y}}^{δ} ∣ m) p (m)}{p ({\bar{y}}^{δ})} .

The mean formula:

p ({\bar{y}}^{δ}) = \sum_{m = 1}^{N_{m}} p ({\bar{y}}^{δ} ∣ m) p (m)

and the assumption that all models are equally likely, i.e.,

p (m) = 1 / N_{m}

, yield:

\begin{matrix} p (m ∣ {\bar{y}}^{δ}) & = \frac{p ({\bar{y}}^{δ} ∣ m)}{\sum_{m = 1}^{N_{m}} p ({\bar{y}}^{δ} ∣ m)} . \end{matrix}

(14)

Intuitively, the model with the highest evidence is the best among all the models involved, and in this regard, we define the maximum solution estimate as:

{\hat{x}}_{\max}^{δ} = {\hat{x}}_{m^{☆} α}^{δ}, m^{☆} = arg max_{m} p (m ∣ {\bar{y}}^{δ}) .

(15)

In fact, we can compare the models to see if one of them clearly shows the highest evidence, or if there are several models with comparable values of evidence. When several models can provide equally good fits to the measurement, we can use the Bayesian model averaging technique to combine the individual a posteriori densities weighted by their evidence. Specifically, using the relation:

\begin{matrix} p (x, {\bar{y}}^{δ}) & = \sum_{m = 1}^{N_{m}} p (x, {\bar{y}}^{δ}, m) \\ = \sum_{m = 1}^{N_{m}} p (x ∣ {\bar{y}}^{δ}, m) p (m ∣ {\bar{y}}^{δ}) p ({\bar{y}}^{δ}) \end{matrix}

on the one hand, and

p (x, {\bar{y}}^{δ}) = p (x ∣ {\bar{y}}^{δ}) p ({\bar{y}}^{δ})

, on the other hand, we are led to the Bayesian model averaging formula:

p_{mean} (x ∣ {\bar{y}}^{δ}) = \sum_{m = 1}^{N_{m}} p (x ∣ {\bar{y}}^{δ}, m) p (m ∣ {\bar{y}}^{δ}),

(16)

and, consequently, to the mean solution estimate:

{\hat{x}}_{mean}^{δ} = \sum_{m = 1}^{N_{m}} {\hat{x}}_{m α}^{δ} p (m ∣ {\bar{y}}^{δ}) .

(17)

The above analysis shows that in Bayesian parameter estimation and model selection, we are faced with the following problems:

Problem 1.: From Equations (11)–(13), it is apparent that the computation of the estimator ${\hat{x}}_{m α}^{δ}$ requires the knowledge of the parameter $α$ , i.e., the ratio of the data error variance and the a priori state variance.
Problem 2.: From Equations (15) and (17), we see that the solution estimates ${\hat{x}}_{\max}^{δ}$ and ${\hat{x}}_{mean}^{δ}$ are expressed in terms of relative evidence $p (m ∣ {\bar{y}}^{δ})$ , which in turn, according to Equation (14), is expressed in terms of the marginal likelihood density $p ({\bar{y}}^{δ} ∣ m)$ . In view of Equation (8), the computation of the marginal likelihood density $p ({\bar{y}}^{δ} ∣ m)$ requires the knowledge of the likelihood density $p ({\bar{y}}^{δ} ∣ x, m)$ , and therefore of Equation (10), of the data error variance $σ_{m}^{2}$ .
Problem 3.: The dependency of the likelihood density $p ({\bar{y}}^{δ} ∣ x, m)$ on the nonlinear forward model ${\bar{F}}_{m} (x)$ does not allow an analytical integration in Equation (8).

4. Iteratively Regularized Gauss–Newton Method

In the framework of Tikhonov regularization, a regularized solution

x_{m α}^{δ}

to the nonlinear equation

{\bar{y}}^{δ} = {\bar{F}}_{m} (x)

is computed as the minimizer of the Tikhonov function:

F_{m α} (x) = {∥{\bar{y}}^{δ} - {\bar{F}}_{m} (x)∥}^{2} + α {∥L (x - x_{a})∥}^{2},

(18)

that is,

x_{m α}^{δ} = arg min_{x} F_{m α} (x) .

(19)

In classical regularization theory, the first term on the right-hand side of Equation (18) is the squared residual; the second one is the penalty term; while

α

and

L

are the regularization parameter and the regularization matrix, respectively. From Equations (12) and (18), we see that

V_{α} (x ∣ {\bar{y}}^{δ}, m) = (1 / σ_{m}^{2}) F_{m α} (x)

, and from Equations (13) and (19), we deduce that

{\hat{x}}_{m α}^{δ} = x_{m α}^{δ}

. Thus, the maximum a posteriori estimate coincides with the Tikhonov solution, and therefore, the Bayesian parameter estimation can be regarded as a stochastic version of the method of Tikhonov regularization with an a priori chosen regularization parameter

α

.

In the framework of Tikhonov regularization, the computation of the optimal regularization parameter is a crucial issue. With too little regularization, reconstructions deviate significantly from the a priori, and the solution is said to be under-regularized. With too much regularization, the reconstructions are too close to the a priori, and the solution is said to be over-regularized. In the Bayesian framework, the optimal regularization parameter is identified as the true ratio of the data error variance and the a priori state variance.

Several regularization parameter choice methods were discussed in [5]. These include methods with constant regularization parameters, e.g., the maximum likelihood estimation, the generalized cross-validation, and the nonlinear L-curve method. Unfortunately, at present, there is no fail-safe regularization parameter choice method that guarantees small solution errors in any circumstance, that is for any errors in the data vector.

An improvement of the problems associated with the regularization parameter selection is achieved in the framework of the so-called iterative regularization methods. These approaches are less sensitive to overestimates of the regularization parameter, but require more iteration steps to achieve convergence. A representative iterative approach is the iteratively regularized Gauss–Newton method.

At iteration step k of the iteratively regularized Gauss–Newton method, the forward model

{\bar{F}}_{m} (x)

is linearized around the current iterate

x_{m k}^{δ}

,

{\bar{F}}_{m} (x) \approx {\bar{F}}_{m} (x_{m k}^{δ}) + {\bar{K}}_{m k} (x - x_{m k}^{δ}),

where:

{\bar{K}}_{m k} = \frac{\partial {\bar{F}}_{m}}{\partial x} (x_{m k}^{δ})

is the Jacobian matrix at

x_{m k}^{δ}

, and the nonlinear equation

{\bar{F}}_{m} (x) = {\bar{y}}^{δ}

is replaced by its linearization:

{\bar{K}}_{m k} p = {\bar{y}}_{m k}^{δ},

(20)

where:

p = x - x_{a}

is the step vector with respect to the a priori and:

{\bar{y}}_{m k}^{δ} = {\bar{y}}^{δ} - {\bar{F}}_{m} (x_{m k}^{δ}) + {\bar{K}}_{m k} (x_{m k}^{δ} - x_{a})

is the linearized data vector at iteration step k.

Since the nonlinear problem is ill-posed, its linearization is also ill-posed. Therefore, the linearized Equation (20) is solved by means of Tikhonov regularization with the penalty term

α_{k} {‖ L p ‖}^{2}

, where the regularization parameters

α_{k}

are the terms of a decreasing (geometric) sequence, i.e.,

α_{k} = q α_{k - 1}

with

q < 1

. Note that in the method of Tikhonov regularization, the regularization parameter

α

is kept constant during the iterative process. The Tikhonov function for the linearized equation takes the form:

F_{l m k} (p) = {∥{\bar{y}}_{m k}^{δ} - {\bar{K}}_{m k} p∥}^{2} + α_{k} {∥L p∥}^{2},

and its minimizer is given by:

p_{m k}^{δ} = {\bar{K}}_{m k}^{†} {\bar{y}}_{m k}^{δ},

where:

{\bar{K}}_{m k}^{†} = {({\bar{K}}_{m k}^{T} {\bar{K}}_{m k} + α_{k} L^{T} L)}^{- 1} {\bar{K}}_{m k}^{T}

is the regularized generalized inverse at

x_{m k}^{δ}

. The new iterate is computed as:

x_{m k + 1}^{δ} = x_{a} + {\bar{K}}_{m k}^{†} {\bar{y}}_{m k}^{δ} .

For the iterative regularization methods, the number of iteration steps k plays the role of the regularization parameter, and the iterative process is stopped after an appropriate number of steps

k^{☆}

in order to avoid an uncontrolled expansion of errors in the data. In fact, a mere minimization of the residual

‖ r_{m k}^{δ} ‖

, where:

r_{m k}^{δ} = {\bar{y}}^{δ} - {\bar{F}}_{m} (x_{m k}^{δ})

is the residual vector at

x_{m k}^{δ}

, leads to a semi-convergent behavior of the iterated solution: while the error in the residual decreases as the number of iteration steps increases, the error in the solution starts to increase after an initial decay. A widely used a posteriori choice for the stopping index

k^{☆}

is the discrepancy principle. According to this principle, the iterative process is terminated after

k^{☆}

steps such that:

{∥r_{m k^{☆}}^{δ}∥}^{2} \leq η Δ_{m}^{2} < {∥r_{m k}^{δ}∥}^{2}, 0 \leq k < k^{☆},

with

η > 1

; hence, the regularized solution is

x_{m k^{☆}}^{δ}

. Since the noise level cannot be estimated a priori for many practical problems arising in atmospheric remote sensing, we adopt a practical approach. This is based on the observation that the squared residual

‖ r_{m k}^{δ} ‖^{2}

decreases during the iterative process and attains a plateau at approximately

Δ_{m}^{2}

. Thus, if the nonlinear residuals

‖ r_{m k}^{δ} ‖

converge to

‖ r_{m \infty}^{δ} ‖

within a prescribed tolerance, we use the estimate:

Δ_{m}^{2} = ‖ r_{m \infty}^{δ} ‖^{2} .

The above heuristic stopping rule does not have any mathematical justification, but works sufficiently well in practice.

Since the amount of regularization is gradually decreased during the iterative processes, the iteratively regularized Gauss–Newton method can handle problems that practically do not require much regularization.

The numerical experiments performed in [5] showed that at the solution

x_{m k^{☆}}^{δ}

, (i)

α_{k^{☆} - 1}

is close to the optimal regularization parameter, and (ii)

x_{m k^{☆}}^{δ}

is close to the Tikhonov solution corresponding to the optimal regularization parameter. In this regard, we make the following assumptions:

$\hat{α} = α_{k^{☆} - 1}$ is an estimate for the optimal regularization parameter, and
$x_{m \hat{α}}^{δ} = x_{m k^{☆}}^{δ}$ is the minimizer of the Tikhonov function with regularization parameter $\hat{α}$ , $F_{m \hat{α}} (x)$ .

Some consequences of the second assumption are the following. The optimality condition

\nabla F_{m \hat{α}} (x_{m \hat{α}}^{δ}) = 0

, yields:

{\bar{K}}_{m \hat{α}}^{T} [{\bar{F}}_{m} (x_{m \hat{α}}^{δ}) - {\bar{y}}^{δ}] + \hat{α} L L^{T} (x_{m \hat{α}}^{δ} - x_{a}) = 0

and further,

x_{m \hat{α}}^{δ} = x_{a} + {\bar{K}}_{m \hat{α}}^{†} {\bar{y}}_{m \hat{α}}^{δ},

where

{\bar{K}}_{m \hat{α}} = {\bar{K}}_{m k^{☆}}

,

{\bar{y}}_{m \hat{α}}^{δ} = {\bar{y}}_{m k^{☆}}^{δ} = {\bar{y}}^{δ} - {\bar{F}}_{m} (x_{m \hat{α}}^{δ}) + {\bar{K}}_{m \hat{α}} (x_{m \hat{α}}^{δ} - x_{a}),

(21)

and:

{\bar{K}}_{m \hat{α}}^{†} = {({\bar{K}}_{m \hat{α}}^{T} {\bar{K}}_{m \hat{α}} + \hat{α} L^{T} L)}^{- 1} {\bar{K}}_{m \hat{α}}^{T} .

(22)

Consequently, we see that:

p_{m \hat{α}}^{δ} = x_{m \hat{α}}^{δ} - x_{a} = {\bar{K}}_{m \hat{α}}^{†} {\bar{y}}_{m \hat{α}}^{δ}

(23)

is the minimizer of the Tikhonov function:

F_{l m \hat{α}} (p) = {∥{\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p∥}^{2} + \hat{α} {∥L p∥}^{2},

and that the residual vector of the linear equation

{\bar{y}}_{m \hat{α}}^{δ} = {\bar{K}}_{m \hat{α}} p

is equal to the nonlinear residual vector at the solution, i.e.,

{\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p_{m \hat{α}}^{δ} = {\bar{y}}^{δ} - {\bar{F}}_{m} (x_{m \hat{α}}^{δ}) = r_{m \hat{α}}^{δ} .

(24)

In summary, in the iteratively regularized Gauss–Newton method:

the computation of the regularized solution $x_{m \hat{α}}^{δ}$ depends only on the initial value $α_{1}$ and the ratio q of the geometric sequence $α_{k},$ which determine the rate of convergence, and
the regularization parameter at the solution is an estimate for the optimal regularization parameter, and so, for the ratio of the data error variance and the a priori state variance.

Taking these results into account, we may conclude that the iteratively regularized Gauss–Newton method gives a solution to Problem 1 of the Bayesian parameter estimation.

5. Parameter Estimation and Model Selection

In this section, we address Problems 2 and 3 related to Bayesian model selection.

To solve Problem 3, we suppose that the forward model can be linearized around the solution

x_{m \hat{α}}^{δ}

[6]. In other words, in the first-order Taylor expansion:

{\bar{F}}_{m} (x) = {\bar{F}}_{m} (x_{m \hat{α}}^{δ}) + {\bar{K}}_{m \hat{α}} (x - x_{m \hat{α}}^{δ}) + {\bar{R}}_{m} (x, x_{m \hat{α}}^{δ})

(25)

the remainder term

{\bar{R}}_{m} (x, x_{m \hat{α}}^{δ})

can be neglected. As a result, the nonlinear data model

{\bar{y}}^{δ} = {\bar{F}}_{m} (x) + {\bar{δ}}_{m}

becomes the linear model:

{\bar{y}}_{m \hat{α}}^{δ} = {\bar{K}}_{m \hat{α}} p + {\bar{δ}}_{m},

(26)

where

{\bar{y}}_{m \hat{α}}^{δ}

is given by Equation (21),

p = x - x_{a} \sim N (0, C_{x} = σ_{m}^{2} {(\hat{α} L^{T} L)}^{- 1})

and

{\bar{δ}}_{m} \sim N (0, {\bar{C}}_{δ m} = σ_{m}^{2} I_{M})

. The direct consequences of dealing with the linear model (26) are the following results:

the a posteriori density $p (x ∣ {\bar{y}}^{δ}, m)$ can be expressed as a Gaussian distribution:

$p (x ∣ {\bar{y}}^{δ}, m) \propto exp [- \frac{1}{2} {(x - x_{m \hat{α}}^{δ})}^{T} {\hat{C}}_{x m}^{- 1} (x - x_{m \hat{α}}^{δ})],$

where:

${\hat{C}}_{x m} = σ_{m}^{2} {({\bar{K}}_{m \hat{α}}^{T} {\bar{K}}_{m \hat{α}} + \hat{α} L^{T} L)}^{- 1} .$

(27)

is the a posteriori covariance matrix, and
as shown in Appendix A, the marginal likelihood density $p ({\bar{y}}^{δ} ∣ m)$ can be computed analytically; the result is:

$\begin{matrix} p ({\bar{y}}^{δ} ∣ m) & = \sqrt{\frac{\det (I_{M} - {\hat{A}}_{m \hat{α}})}{{(2 π σ_{m}^{2})}^{M}}} \\ \times exp [- \frac{1}{2 σ_{m}^{2}} {\bar{y}}_{m \hat{α}}^{δ T} (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ}], \end{matrix}$

(28)

where ${\hat{A}}_{m \hat{α}} = {\bar{K}}_{m \hat{α}} {\bar{K}}_{m \hat{α}}^{†}$ is the influence matrix at the solution $x_{m \hat{α}}^{δ}$ .

The replacement of the nonlinear data model by a linear data model will also enable us to deal with Problem 2. More specifically, for this purpose, we will adapt some regularization parameter choice methods for linear problems, i.e., maximum marginal likelihood estimation [7,8,9] and generalized cross-validation [10,11], to model selection. Furthermore, estimates for the data error variance

σ_{m}^{2}

delivered by these methods will be used to compute the marginal likelihood density and, therefore, the relative evidence.

In Appendix B, it is shown that:

in the framework of maximum marginal likelihood estimation, the data error variance can be estimated by:

${\hat{σ}}_{L m}^{2} = \frac{1}{M} {\bar{y}}_{m \hat{α}}^{δ T} (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ},$

(29)

and the model with the smallest value of the marginal likelihood function, defined by:

$λ (m) = \frac{{\bar{y}}_{m \hat{α}}^{δ T} (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ}}{\sqrt[M]{\det (I_{M} - {\hat{A}}_{m \hat{α}})}},$

(30)

is optimal.
in the framework of generalized cross-validation, the data error variance can be estimated by:

${\hat{σ}}_{G m}^{2} = \frac{‖ r_{m \hat{α}}^{δ} ‖^{2}}{[trace (I_{M} - {\hat{A}}_{m \hat{α}})]},$

(31)

and the model with the smallest value of the generalized cross-validation function, defined by:

$υ (m) = \frac{‖ r_{m \hat{α}}^{δ} ‖^{2}}{{[trace (I - {\hat{A}}_{m \hat{α}})]}^{2}},$

(32)

is optimal.

Denoting by

p_{L} ({\bar{y}}^{δ} ∣ m)

and

p_{G} ({\bar{y}}^{δ} ∣ m)

the marginal likelihoods corresponding to

{\hat{σ}}_{L m}^{2}

and

{\hat{σ}}_{G m}^{2}

, respectively, we define:

the relative evidence of an approach based on marginal likelihood and the computation of the data error variance in the framework of the Maximum Marginal Likelihood Estimation (MLMMLE) by:

$\begin{matrix} p_{MLMMLE} (m ∣ y^{δ}) & = \frac{p_{L} ({\bar{y}}^{δ} ∣ m)}{\sum_{m = 1}^{N_{m}} p_{L} ({\bar{y}}^{δ} ∣ m)}, \end{matrix}$

(33)
the relative evidence of an approach based on the marginal likelihood and computation of the data error variance in the framework of the Generalized Cross-Validation (MLGCV) by:

$\begin{matrix} p_{MLGCV} (m ∣ y^{δ}) & = \frac{p_{G} ({\bar{y}}_{m}^{δ} ∣ m)}{\sum_{m = 1}^{N_{m}} p_{G} ({\bar{y}}_{m}^{δ} ∣ m)} . \end{matrix}$

(34)

Note that for the data error variance estimate (29), the marginal likelihood density becomes:

p_{L} ({\bar{y}}^{δ} ∣ m) = \frac{c_{M}}{\sqrt{{[λ (m)]}^{M}}},

with:

c_{M} = \sqrt{{(\frac{M}{2 π})}^{M}} exp (- \frac{M}{2}) .

On the other hand, the statements (30) and (32) are equivalent to (cf. Equation (15)):

{\hat{x}}_{\max}^{δ} = x_{m^{☆} \hat{α}}^{δ}, m^{☆} = arg max_{m} \frac{1}{λ (m)},

(35)

and:

{\hat{x}}_{\max}^{δ} = x_{m^{☆} \hat{α}}^{δ}, m^{☆} = arg max_{m} \frac{1}{v (m)},

(36)

respectively. Deviating from a stochastic interpretation of the relative evidence and regarding this quantity in a deterministic setting merely as a merit function characterizing the solution

x_{m \hat{α}}^{δ}

, we define:

the relative evidence of an approach based on Maximum Marginal Likelihood Estimation (MMLE) by:

$p_{MMLE} (m ∣ y^{δ}) = \frac{1 / λ (m)}{\sum_{m = 1}^{N_{m}} 1 / λ (m)},$

(37)
the relative evidence of an approach based on Generalized Cross-Validation (GCV) by:

$p_{GCV} (m ∣ y^{δ}) = \frac{1 / υ (m)}{\sum_{m = 1}^{N_{m}} 1 / υ (m)} .$

(38)

Note that

p_{MMLE} (m ∣ y^{δ})

and

p_{GCV} (m ∣ y^{δ})

do not depend on the data error variance

σ_{m}^{2}

. Note also that

{\hat{σ}}_{G m}^{2}

is defined in terms of the square residual:

\begin{matrix} ‖ r_{m \hat{α}}^{δ} ‖^{2} & = ‖ {\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p_{m \hat{α}}^{δ} ‖^{2} = ‖ (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ} ‖^{2}, \end{matrix}

while, according to Equation (29),

{\hat{σ}}_{L m}^{2}

is defined in terms of the quantity

{\bar{y}}_{m \hat{α}}^{δ T} (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ}

. The same difference exists between the numerators of the cross-validation function

v (m)

and the marginal likelihood function

λ (m)

(see Equations (32) and (30)).

6. Algorithm Description

An algorithmic implementation of the iteratively regularized Gauss–Newton method in connection with model selection is as follows:

compute the scaling matrix $P$ by means of Equation (5) and the scaled data vector ${\bar{y}}^{δ} = P y^{δ}$ ;
given the current iterate $x_{m k}^{δ}$ at step k, compute the forward model $F_{m} (x_{m k}^{δ})$ , the Jacobian matrix $K_{m k}$ , and the scaled quantities ${\bar{F}}_{m} (x_{m k}^{δ}) = P F_{m} (x_{m k}^{δ})$ and ${\bar{K}}_{m k} = P K_{m k}$ ;
compute the linearized data vector:

${\bar{y}}_{m k}^{δ} = {\bar{y}}^{δ} - {\bar{F}}_{m} (x_{m k}^{δ}) + {\bar{K}}_{m k} (x_{m k}^{δ} - x_{a});$
compute the singular value decomposition of the quotient matrix ${\bar{K}}_{m k} L^{- 1} = U Γ V^{T}$ , where $Γ = diag {(γ_{m i})}_{M}$ with $γ_{m i} = 0$ for $i > N$ is a diagonal matrix containing the singular values $γ_{m i}$ in decreasing order and $U = [u_{1}, \dots, u_{M}] \in R^{M \times M}$ and $V = [v_{1}, \dots, v_{N}] \in R^{N \times N}$ are orthogonal matrices containing the left and right singular column vectors $u_{i}$ and $v_{i}$ , respectively;
if $k = 1$ , choose $α_{k} = max (\sqrt{γ_{m 1} γ_{m N}}, α_{min})$ , where $γ_{m 1}$ and $γ_{m N}$ are the largest and the smallest singular values, respectively; otherwise, set $α_{k} = max (q α_{k - 1}, α_{min})$ ;
compute the minimizer of the Tikhonov function for the linearized equation,

$p_{m k}^{δ} = \sum_{i = 1}^{n} \frac{γ_{m i}}{γ_{m i}^{2} + α_{k}} (u_{i}^{T} {\bar{y}}_{m k}^{δ}) v_{i},$

and the new iterate, $x_{m k + 1}^{δ} = p_{m k}^{δ} + x_{a}$ ;
compute the nonlinear residual vector at $x_{m k + 1}^{δ}$ ,

$r_{m k + 1}^{δ} = {\bar{y}}^{δ} - {\bar{F}}_{m} (x_{m k + 1}^{δ}),$

and the residual $r_{k + 1} = ‖ r_{m k + 1}^{δ} ‖^{2}$ ;
compute the condition number $c_{k} = γ_{m 1} / γ_{m N}$ , the scalar quantities:

$\begin{matrix} {\bar{y}}_{k} & = \sum_{i = 1}^{N} \frac{α_{k - 1}}{γ_{m i}^{2} + α_{k - 1}} {(u_{i}^{T} {\bar{y}}_{m k}^{δ})}^{2} + \sum_{i = N + 1}^{M} {(u_{i}^{T} {\bar{y}}_{m k}^{δ})}^{2}, \\ t_{k} & = M - N + \sum_{i = 1}^{N} \frac{α_{k - 1}}{γ_{m i}^{2} + α_{k - 1}}, \\ d_{k} & = \prod_{i = 1}^{N} \frac{α_{k - 1}}{γ_{m i}^{2} + α_{k - 1}}, \end{matrix}$

and the normalized covariance matrix:

$\begin{matrix} {\hat{C}}_{x m k} & = \hat{V} diag {(\frac{1}{γ_{m i}^{2} + α_{k - 1}})}_{N} {\hat{V}}^{T}, \end{matrix}$

where $\hat{V} = L^{- 1} V$ ;
if $r_{k + 1} \geq r_{k}$ , recompute $x_{m k + 1}^{δ}$ by means of a step-length algorithm such that $r_{k + 1} < r_{k}$ ; if the residual cannot be reduced, set $r_{\infty} = r_{k}$ . and go to Step 12;
compute the relative decrease in the residual $▵_{r} = (r_{k} - r_{k + 1}) / r_{k}$ ;
if $▵_{r} > ε_{R}$ , go to Step 1; otherwise set $r_{\infty} = r_{k + 1}$ , and go to Step 12;
determine $k^{☆}$ such that $r_{k^{☆}} \leq η r_{\infty} < r_{k}$ , for all $0 \leq k < k^{☆}$ ;
since $r_{k^{☆}} = ‖ r_{m \hat{α}}^{δ} ‖^{2}$ , ${\bar{y}}_{k^{☆}} = {\bar{y}}_{m \hat{α}}^{δ T} (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ}$ , $t_{k^{☆}} = trace (I - {\hat{A}}_{m \hat{α}})$ , and $d_{k^{☆}} = \det (I_{M} - {\hat{A}}_{m \hat{α}})$ , compute the estimates:

$\begin{matrix} λ (m) & = \frac{{\bar{y}}_{k^{☆}}}{\sqrt[M]{d_{k^{☆}}}}, v (m) = \frac{r_{k^{☆}}}{t_{k^{☆}}^{2}}, \\ {\hat{σ}}_{L m}^{2} & = \frac{1}{M} {\bar{y}}_{k^{☆}}, {\hat{σ}}_{G m}^{2} = \frac{r_{k^{☆}}}{t_{k^{☆}}}, \end{matrix}$

the marginal likelihood:

$p_{X} ({\bar{y}}^{δ} ∣ m) = \sqrt{\frac{d_{k^{☆}}}{{(2 π {\hat{σ}}_{Y m}^{2})}^{M}}} exp (- \frac{{\bar{y}}_{k^{☆}}}{2 {\hat{σ}}_{Y m}^{2}}),$

where X stands for the character strings MLMMLE and MLGCV when the character variable Y takes the values L and G, respectively, and the covariance matrix,

${\hat{C}}_{x m} = {\hat{σ}}_{Y m}^{2} {\hat{C}}_{x m k^{☆}};$
compute the relative evidence $p_{X} (m ∣ y^{δ})$ , where X stands for the character strings MLMMLE, MLGCV, MMLE, and GCV by using Equations (33), (34), (37), and (38), respectively, for those situations;
compute the maximum and mean solution estimates ${\hat{x}}_{\max}^{δ}$ and ${\hat{x}}_{mean}^{δ}$ by using Equations (15) and (17), respectively, and the mean a posteriori density $p_{mean} (x ∣ y^{δ})$ by using Equation (16).

The control parameters of the algorithm are: (i) the ratio q of the geometric sequence of regularization parameters, (ii) the minimum acceptable value of the regularization parameter

α_{min}

, (iii) the tolerance

ε_{R}

of the residual test convergence, and (iv) the tolerance

η

of the discrepancy principle stopping rule.

We conclude our theoretical analysis with some comments:

Another estimate for the data error variance is derived in Appendix C; this is given by:

${\hat{σ}}_{R m}^{2} = \frac{1}{M - N} ‖ r_{m \hat{α}}^{δ} ‖^{2} .$

(39)

If $\hat{α} ≪ γ_{m i}^{2}$ for all $i = 1, \dots, N$ , we can approximate (cf. Equation (A17)):

$trace (I - {\hat{A}}_{m \hat{α}}) = M - N + \sum_{i = 1}^{M} \frac{\hat{α}}{γ_{m i}^{2} + \hat{α}}, \approx M - N,$

and in view of Equation (31), we deduce that ${\hat{σ}}_{R m}^{2} \approx {\hat{σ}}_{G m}^{2}$ .
If a model is far from reality, it is natural to assume that the model parameter errors are large or, equivalently, that the data error variance is large. This observation suggests that, in a deterministic setting, we may define the relative evidence as:

$p (m ∣ y^{δ}) = \frac{1 / {\hat{σ}}_{Y m}^{2}}{\sum_{m = 1}^{N_{m}} 1 / {\hat{σ}}_{Y m}^{2}},$

(40)

where the character variable Y takes the values R, L, or G. In this case, for ${\hat{σ}}_{R m}^{2} = ‖ r_{m \hat{α}}^{δ} ‖^{2} / (M - N)$ , the model with the smallest value of the squared residual is considered to be optimal.

7. Application to the EPIC Instrument

The Deep Space Climate Observatory (DSCOVR) flies in a Lissajous orbit about the first Earth-Sun Lagrange point (L

_{1}

), which is 1.5 million kilometers from the Earth towards the Sun. The Earth Polychromatic Imaging Camera (EPIC) is one of the two Earth observing instruments on board DSCOVR. This instrument views the entire sunlit side of the Earth from sunrise to sunset in the backscattering direction (scattering angles between 168.5° and 175.5°) with 10 narrowband filters, ranging from 317.5 to 779.5 nm [12]. This unique Earth observing geometry of EPIC compared to other instruments in Sun-synchronous orbits, that rarely view Earth at such large scattering angles [13], makes it a suitable candidate for several climate science applications including the measurement of cloud reflectivity and aerosol optical thickness and layer height [14].

We apply the algorithm proposed in Section 6 to the retrieval of aerosol optical thickness and aerosol layer height by generating synthetic measurements corresponding to the EPIC instrument. Specifically, Channels 7 and 8 in the Oxygen B-band at 680 and 687.75 nm, respectively, and Channels 9 and 10 in the Oxygen A-band at 764 and 779.5 nm, respectively, are used in the retrieval. The linearized radiative transfer model is based on the discrete ordinates method with matrix exponential and uses the correlated k-distribution method in conjunction with the principal component analysis technique to speed up the computations [15,16].

We consider the aerosol models implemented in the Moderate Resolution Imaging Spectroradiometer (MODIS) aerosol retrieval algorithm over land [1]. There are three spherical, fine-dominated model types (non-absorbing, moderately absorbing, and absorbing) and one spheroid, coarse-dominated model type (dust). These aerosol models, which depend on location and season, are the result of a cluster analysis of the climatology of almucantar retrievals from global AERONET (AErosol RObotic NETwork) measurements [17]. Each model consists of two log-normal modes (accumulated and coarse). A single log-normal mode is described by the number size distribution:

\frac{d N (r)}{d ln r} = \frac{N_{0}}{\sqrt{2 π} σ} exp [- \frac{{(ln r - ln r_{\mod})}^{2}}{2 σ^{2}}],

where

r_{\mod}

is the modal or median radius of the number size distribution,

σ

is the standard deviation, and:

N_{0} = \int_{0}^{\infty} \frac{d N (r)}{d ln r} d ln r

is the total number of particles per cross-section of the atmospheric column. Table 1 displays the log-normal size parameters and the complex refractive indices

m

for the four aerosol models, where, for each log-normal mode,

r_{v} = r_{\mod} exp (- 3 σ^{2})

is the median radius of the volume size distribution:

\frac{d V (r)}{d ln r} = \frac{V_{0}}{\sqrt{2 π} σ} exp [- \frac{{(ln r - ln r_{v})}^{2}}{2 σ^{2}}],

and:

V_{0} = \int_{0}^{\infty} \frac{4 π r^{3}}{3} \frac{d N (r)}{d ln r} d ln r = N_{0} \frac{4 π r_{\mod}^{3}}{3} exp (4.5 σ^{2})

is the volume of particles per cross-section of the atmospheric column.

In our numerical analysis, the state vector is

x = {[τ, H]}^{T}

, where

τ

is the aerosol optical thickness and H the aerosol layer height. The true aerosol optical thicknesses to be retrieved are

τ_{t} = 0.25

, 0.50, 0.75, 1.0, 1.25, and 1.5, while for the true aerosol layer height, we take

H_{t} = 1.0

, 1.5, 2.0, 2.5, and 3.0 km. The a priori values, which coincide with the initial guesses, are

τ_{a} = 2.0

and

H_{a} = 4

km. A Lambertian surface is assumed, and if not stated otherwise, the surface albedo is

A = 0.06

. We generate synthetic measurements by choosing the moderately absorbing aerosol model (Model 2) as the true or exact model. Specifically, we:

compute the radiances for the moderately absorbing aerosol model and the true values $τ_{t}$ and $H_{t}$ , $I_{modabs} (λ_{i}, τ_{t}, H_{t})$ ;
add the measurement noise $δ_{Imes}$ ,

$I_{modabs}^{δ} (λ_{i}, τ_{t}, H_{t}) = I_{modabs} (λ_{i}, τ_{t}, H_{t}) + {[δ_{Imes}]}_{i},$

where $δ_{Imes} \sim N (0, C_{Imes})$ , $C_{Imes} = diag {(σ_{Imes i}^{2})}_{M}$ , and:

$σ_{Imes i} = \frac{I_{modabs} (λ_{i}, τ_{t}, H_{t})}{SNR} with SNR = 290,$

which is the maximum signal-to-noise ratio according to the EPIC camera specifications and
in view of the approximation:

$\begin{matrix} ln I_{modabs}^{δ} (λ_{i}, τ_{t}, H_{t}) \\ \approx ln I_{modabs} (λ_{i}, τ_{t}, H_{t}) + \frac{{[δ_{Imes}]}_{i}}{I_{modabs} (λ_{i}, τ_{t}, H_{t})}, \end{matrix}$

identify ${[y^{δ}]}_{i} = ln I_{modabs}^{δ} (λ_{i}, τ_{t}, H_{t})$ , ${[y]}_{i} = ln I_{modabs} (λ_{i}, τ_{t}, H_{t})$ , and:

${[δ_{mes}]}_{i} = \frac{{[δ_{Imes}]}_{i}}{I_{modabs} (λ_{i}, τ_{t}, H_{t})},$

implying:

$σ_{mes i} = \frac{σ_{Imes i}}{I_{modabs} (λ_{i}, τ_{t}, H_{t})} = \frac{1}{SNR} .$

The above scheme yields

C_{δ m} = I_{M}

and

P = I_{M}

, showing that the regularized solution does not depend on the scaling matrix. Coming to the regularization matrix, we first note that the choice

σ_{x i} = ε_{x} {[x_{a}]}_{i}

for some

ε_{x}

gives:

L = diag {(\frac{σ_{x}}{σ_{x i}})}_{N} = diag {(\frac{\sqrt{\sum_{i = 1}^{N} {[x_{a}]}_{i}^{2}}}{N {[x_{a}]}_{i}})}_{N} .

However, to have a better control on the amount of regularization that is applied to each component of the state vector, we introduce the weight

w_{x i}

of component

{[x]}_{i}

, and take:

L = diag {(w_{x i} \frac{\sqrt{\sum_{i = 1}^{N} {[x_{a}]}_{i}^{2}}}{N {[x_{a}]}_{i}})}_{N} .

If

w_{x i} \to \infty

, the component

{[x]}_{i}

is close to the a priori, while for

w_{x i} \to 0

, the component

{[x]}_{i}

is practically unconstrained. In our simulations, we used

w_{τ} = w_{H} = 1.0

. What is left is the specification of the control parameters of the algorithm; these are

q = 0.1

,

α_{min} = 10^{- 6} γ_{m N}

, where

γ_{m N}

is the smallest singular value of the quotient matrix

{\bar{K}}_{m k} L^{- 1}

at

k = 1

(the first iteration step),

ε_{R} = 10^{- 3}

, and

η = 1.05

.

To analyze the accuracy of the aerosol retrieval we consider several test examples.

Test Example 1

In the first example, we analyze the efficiency of the aerosol model selection by considering all models, i.e., we take m = 1, 2, 3, 4. Thus, the algorithm has the chance to identify the correct aerosol model. The solution accuracy is characterized by the relative errors:

ε_{mean}^{τ} = \frac{| τ_{mean} - τ_{t} |}{τ_{t}} and ε_{mean}^{H} = \frac{| H_{mean} - H_{t} |}{τ_{t}}

corresponding to (cf. Equation (17))

{\hat{x}}_{mean}^{δ} = [τ_{mean}, H_{mean}]

, and:

ε_{\max}^{τ} = \frac{| τ_{\max} - τ_{t} |}{τ_{t}} and ε_{\max}^{H} = \frac{| H_{\max} - H_{t} |}{τ_{t}}

corresponding to (cf. Equation (15))

{\hat{x}}_{\max}^{δ} = [τ_{\max}, H_{\max}]

. In Figure 1 and Figure 2, we illustrate the variations of the relative errors with respect to

τ_{t}

and

H_{t}

, respectively, while in Table 2, we show the average relative errors:

ε_{mean | τ}^{τ} = \frac{1}{N_{τ}} \sum_{i = 1}^{N_{τ}} ε_{mean i}^{τ} and ε_{mean | τ}^{H} = \frac{1}{N_{τ}} \sum_{i = 1}^{N_{τ}} ε_{mean i}^{H}

over

τ_{t}

for

N_{τ} = 6

, and:

ε_{mean | H}^{τ} = \frac{1}{N_{H}} \sum_{i = 1}^{N_{H}} ε_{mean i}^{τ} and ε_{mean | H}^{H} = \frac{1}{N_{H}} \sum_{i = 1}^{N_{H}} ε_{mean i}^{H}

over

H_{t}

for

N_{H} = 5

. The following conclusions can be drawn:

the relative errors $ε_{mean}^{τ}$ and $ε_{mean}^{H}$ corresponding to the generalized cross-validation (GCV and MLGCV) are in general smaller than those corresponding to the maximum marginal likelihood estimation (MMLE and MLMMLE).
the relative errors $ε_{\max}^{τ}$ and $ε_{\max}^{H}$ are very small, and so, we deduce that the algorithm recognizes the exact aerosol model;
the best method is GCV, in which case the average relative errors are $ε_{mean | τ}^{τ} = 0.009$ , $ε_{mean | τ}^{H} = 0.020$ , $ε_{mean | H}^{τ} = 0.001$ , and $ε_{mean | H}^{H} = 0.024$ ;
the aerosol optical thickness is better retrieved than the aerosol layer height.

Test Example 2

In the second example, we consider all aerosol models m except the exact one, i.e., we take

m = 1, 3, 4

. In this more realistic scenario, the algorithm tries to find an aerosol model that is as close as possible to the exact one. The variations of the relative errors with respect to

τ_{t}

and

H_{t}

are illustrated in Figure 3 and Figure 4, respectively, while the average relative errors are given in Table 3. The following conclusions can be drawn:

the relative errors $ε_{mean}^{τ}$ and $ε_{mean}^{H}$ corresponding to the generalized cross-validation (GCV and MLGCV) are still smaller than those corresponding to the maximum marginal likelihood estimation (MMLE and MLMMLE).
the relative errors $ε_{\max}^{τ}$ and $ε_{\max}^{H}$ are extremely large, and so, we infer that the maximum solution estimate ${\hat{x}}_{\max}^{δ} = [τ_{\max}, H_{\max}]$ is completely unrealistic;
as before, the best method is GCV characterized by the average relative errors $ε_{mean | τ}^{τ} = 0.060$ , $ε_{mean | τ}^{H} = 0.111$ , $ε_{mean | H}^{τ} = 0.022$ , and $ε_{mean | H}^{H} = 0.096$ ;
the relative errors are significantly larger than those corresponding to the case when all four aerosol models are taken into account.

In Figure 5, we show the mean a posteriori density of

τ

and H computed by GCV for Test Examples 1 and 2. In Test Example 1, the a posteriori density is sharply peaked, indicating small errors in the retrieval, while in Test Example 2, the a posteriori density is wide and spread over all the aerosol models.

Test Example 3

In the third series of our numerical experiments, we include the surface albedo in the retrieval. In fact, the surface albedo regarded as a model parameter can be (i) assumed to be known, (ii) included in the retrieval, or (iii) treated as a model uncertainty. The second situation, which leads to more accurate results, is considered here. In this case, however, the aerosol optical thickness and the surface albedo are strongly correlated (the condition number of

{\bar{K}}_{m k} L^{- 1}

at k = 1 is

10^{3}

–

10^{4}

times larger than for the case in which the surface albedo is not part of the retrieval). Since we are not interested in an accurate surface albedo retrieval, we chose the weight controlling the constraint of the surface albedo in the regularization matrix as

w_{A} = 10^{3}

. As

w_{τ} = w_{H} = 1.0

, this means that the surface albedo is tightly constrained to the a priori. We use the a priori and true values

A_{a} = 0.06

and

A_{t} = 0.063

, respectively; thus, the uncertainty in the surface albedo with respect to the a priori is 5%. The synthetic measurements are generated with the true surface albedo of 0.063.

In Figure 6, we illustrate the variations of the relative errors

ε_{mean}^{τ}

and

ε_{mean}^{H}

with respect to

τ_{t}

and

H_{t}

, when all four aerosol models are taken into account. The average relative errors are given in Table 4. The results show that:

the relative errors $ε_{mean}^{τ}$ and $ε_{mean}^{H}$ corresponding to generalized cross-validation (GCV and MLGCV) are in general larger than those corresponding to maximum marginal likelihood estimation (MMLE and MLMMLE);
the best methods are MMLE and MLMMLE; the average relative errors given by MLMMLE are $ε_{mean | τ}^{τ} = 0.051$ , $ε_{mean | τ}^{H} = 0.060$ , $ε_{mean | H}^{τ} = 0.011$ , and $ε_{mean | H}^{H} = 0.027$ ;
the relative errors are significantly larger than those obtained in the first two test examples (when the surface albedo is known exactly).

Figure 7 illustrates the variations of the relative errors

ε_{mean}^{τ}

and

ε_{mean}^{H}

with respect to

τ_{t}

and

H_{t}

, when all aerosol models except the exact one are considered. The corresponding average relative errors are given in Table 5. The results show that:

as in the previous scenario, the relative errors $ε_{mean}^{τ}$ and $ε_{mean}^{H}$ corresponding to generalized cross-validation (GCV and MLGCV) are in general larger than those corresponding to maximum marginal likelihood estimation (MMLE and MLMMLE).
the best methods are MMLE and MLMMLE; the average relative errors delivered by MMLE are $ε_{mean | τ}^{τ}$ = 0.101, $ε_{mean | τ}^{H}$ = 0.113, $ε_{mean | H}^{τ}$ = 0.070, and $ε_{mean | H}^{H}$ = 0.204;
the relative errors are the largest among all the test examples.

8. Conclusions

We designed a retrieval algorithm that takes into account uncertainty in model selection. The solution corresponding to a specific model is characterized by a metric called relative evidence, which is a measure of the fit between the model and the measurement. Based on this metric, the maximum solution estimate, corresponding to the model with the highest evidence, and the mean solution estimate, representing a linear combination of solutions weighted by their evidence, are introduced.

The retrieval algorithm is based on:

an application of the prewhitening technique in order to transform the data model into a scaled model with white noise;
a deterministic regularization method, i.e., the iteratively regularized Gauss–Newton method, in order to compute the regularized solution (equivalent to the maximum a posteriori estimate of the solution in a Bayesian framework) and to determine the optimal value of the regularization parameter (equivalent to the ratio of the data error and a priori state variances in a Bayesian framework);
a linearization of the forward model around the solution in order to transform the nonlinear data model into a linear model and, in turn, facilitate an analytical integration of the likelihood density over the state vector;
an extension of maximum marginal likelihood estimation and generalized cross-validation to model selection and data error variance estimation.

Essentially, the algorithm includes four selection models corresponding to:

the two parameter choice methods used (maximum marginal likelihood estimation and generalized cross-validation) and
the two settings in which the relative evidence is treated (stochastic and deterministic).

The algorithm is applied to the retrieval of aerosol optical thickness and aerosol layer height from synthetic measurements corresponding to the EPIC instrument. In the simulations, the aerosol models implemented in the MODIS aerosol algorithm over land are considered, and the surface albedo is either assumed to be known or included in the retrieval. The following conclusions are drawn:

The differences between the results corresponding to the stochastic and deterministic interpretations of the relative evidence are not significant.
If the surface albedo is assumed to be known, generalized cross-validation is superior to maximum marginal likelihood estimation; if the surface albedo is included in the retrieval, the contrary is true.
The errors in the aerosol optical thickness retrieval are smaller than those in the aerosol layer height retrieval. In the most realistic situation, when the exact aerosol model and surface albedo are unknown, the average relative errors in the retrieved aerosol optical thickness are about 10%, while the corresponding errors in the aerosol layer height are about 20%.
The maximum solution estimate is completely unrealistic when both the aerosol model and surface albedo are unknown.

Author Contributions

Conceptualization: V.N. and A.D.; Data curation: S.S.; Formal analysis: S.S. and V.M.G.; Funding acquisition: V.N., D.L. and A.D.; Investigation: S.S.; Methodology: V.N. and A.D.; Project administration: D.L. and A.D.; Supervision: V.N., D.S.E. and A.D.; Validation: V.N. and A.D.; Writing—original draft: S.S.; Writing—review & editing: S.S., V.N., D.S.E., D.L. and A.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the German Aerospace Center (DLR) and the German Academic Exchange Service (DAAD) through the program DLR/DAAD Research Fellowships 2015 (57186656), with Reference Numbers 91613528 and 91627488.

Acknowledgments

A portion of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004). V.N. acknowledges support from the NASA Earth Science US Participating Investigator program (Solicitation NNH16ZDA001N-ESUSPI).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this Appendix, we compute the marginal likelihood for the linear data model (26) with

p \sim N (0, C_{x} = σ_{m}^{2} {(\hat{α} L^{T} L)}^{- 1})

and

{\bar{δ}}_{m} \sim N (0, {\bar{C}}_{δ m} = σ_{m}^{2} I_{M})

. In this case, the a priori and likelihood densities

p (p ∣ m)

and

p ({\bar{y}}^{δ} ∣ p, m)

are given by:

\begin{matrix} p (p ∣ m) & = \frac{1}{\sqrt{{(2 π σ_{m}^{2})}^{N} det [{(\hat{α} L^{T} L)}^{- 1}]}} \\ \times exp (- \frac{\hat{α}}{2 σ_{m}^{2}} p^{T} L^{T} Lp) \end{matrix}

and:

\begin{matrix} p ({\bar{y}}^{δ} ∣ p, m) & = \frac{1}{\sqrt{{(2 π σ_{m}^{2})}^{M}}} \\ \times exp [- \frac{1}{2 σ_{m}^{2}} {({\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p)}^{T} ({\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p)], \end{matrix}

respectively. Using the identity:

\begin{matrix} {({\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p)}^{T} ({\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p) + \hat{α} p^{T} L^{T} Lp \\ = (p - p_{m \hat{α}}^{δ})^{T} ({\bar{K}}_{m \hat{α}}^{T} {\bar{K}}_{m \hat{α}} + \hat{α} L^{T} L) (p - p_{m \hat{α}}^{δ}) \\ + {\bar{y}}_{m \hat{α}}^{δ T} (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ}, \end{matrix}

where (cf. Equation (23))

p_{m \hat{α}}^{δ} = {\bar{K}}_{m \hat{α}}^{†} {\bar{y}}_{m \hat{α}}^{δ}

and

{\hat{A}}_{m \hat{α}} = {\bar{K}}_{m \hat{α}} {\bar{K}}_{m \hat{α}}^{†}

, we express the integrand in Equation (8) as:

\begin{matrix} p ({\bar{y}}^{δ} ∣ x, m) p (x ∣ m) \\ = \frac{1}{\sqrt{{(2 π σ_{m}^{2})}^{N + M} det [{(\hat{α} L^{T} L)}^{- 1}]}} \\ \times exp [- \frac{1}{2 σ_{m}^{2}} (p - p_{m \hat{α}}^{δ})^{T} ({\bar{K}}_{m \hat{α}}^{T} {\bar{K}}_{m \hat{α}} + \hat{α} L^{T} L) (p - p_{m \hat{α}}^{δ})] \\ \times exp [- \frac{1}{2 σ_{m}^{2}} {\bar{y}}_{m \hat{α}}^{δ T} (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ}] . \end{matrix}

The normalization condition:

\begin{matrix} \int exp \{- \frac{1}{2} {(p - p_{m \hat{α}}^{δ})}^{T} {[σ_{m}^{2} {({\bar{K}}_{m \hat{α}}^{T} {\bar{K}}_{m \hat{α}} + \hat{α} L^{T} L)}^{- 1}]}^{- 1} (p - p_{m \hat{α}}^{δ})\} d p \\ = \sqrt{{(2 π σ_{m}^{2})}^{N} \det [{({\bar{K}}_{m \hat{α}}^{T} {\bar{K}}_{m \hat{α}} + \hat{α} L^{T} L)}^{- 1}]}, \end{matrix}

the identities

\det (A^{- 1}) = {[\det (A)]}^{- 1}

and:

{({\bar{K}}_{m \hat{α}}^{T} {\bar{K}}_{m \hat{α}} + \hat{α} L^{T} L)}^{- 1} \hat{α} L^{T} L = I_{N} - A_{m \hat{α}},

where

A_{m \hat{α}} = {\bar{K}}_{m \hat{α}}^{†} {\bar{K}}_{m \hat{α}}

is the averaging kernel matrix, and the relation:

det (I_{N} - A_{m \hat{α}}) = \det (I_{M} - {\hat{A}}_{m \hat{α}}),

yield the following expression for the marginal likelihood density:

\begin{matrix} p ({\bar{y}}^{δ} ∣ m) & = \sqrt{\frac{\det (I_{M} - {\hat{A}}_{m \hat{α}})}{{(2 π σ_{m}^{2})}^{M}}} \\ \times exp [- \frac{1}{2 σ_{m}^{2}} {\bar{y}}_{m \hat{α}}^{δ T} (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ}] . \end{matrix}

(A1)

Appendix B

In this Appendix, we present a general tool for estimating the data error variance and the optimal model. Consider the linear model (26), i.e.,

{\bar{y}}_{m \hat{α}}^{δ} = {\bar{K}}_{m \hat{α}} p + {\bar{δ}}_{m}

, and a singular value decomposition of the quotient matrix

{\bar{K}}_{m \hat{α}} L^{- 1} = U Γ V^{T}

, where

Γ = diag {(γ_{i})}_{N}

with the convention

γ_{m i}^{2} = 0

for

i > N

,

U = [u_{1}, \dots, u_{M}]

, and

V = [v_{1}, \dots, v_{N}]

. The covariance matrix of the data

{\bar{y}}_{m \hat{α}}^{δ}

can be computed as:

\begin{matrix} E {{\bar{y}}_{m \hat{α}}^{δ} {\bar{y}}_{m \hat{α}}^{δ T}} & = {\bar{K}}_{m \hat{α}} C_{x} {\bar{K}}_{m \hat{α}}^{T} + {\bar{C}}_{δ m} \\ = \frac{σ_{m}^{2}}{\hat{α}} {\bar{K}}_{m \hat{α}} {(L^{T} L)}^{- 1} {\bar{K}}_{m \hat{α}}^{T} + σ_{m}^{2} I_{M} \\ = U Σ_{\bar{y}} U^{T} \end{matrix}

where:

Σ_{\bar{y}} = diag {[σ_{m}^{2} (\frac{γ_{m i}^{2} + \hat{α}}{\hat{α}})]}_{M} .

(A2)

Define the scaled data:

{\bar{Y}}_{m \hat{α}}^{δ} = U^{T} {\bar{y}}_{m \hat{α}}^{δ}

and note that the covariance matrix of

{\bar{Y}}_{m \hat{α}}^{δ}

is the diagonal matrix

Σ_{\bar{y}}

, i.e.,

E {{\bar{Y}}_{m \hat{α}}^{δ} {\bar{Y}}_{m \hat{α}}^{δ T}} = Σ_{\bar{y}} .

If

σ_{m}^{2}

is the correct data error variance, we must have (cf. Equation (A2)):

E {{\bar{Y}}_{m \hat{α} i}^{δ 2}} = σ_{m}^{2} (\frac{γ_{m i}^{2} + \hat{α}}{\hat{α}}), i = 1, \dots, M,

where

{\bar{Y}}_{m \hat{α} i}^{δ} = {[{\bar{Y}}_{m \hat{α}}^{δ}]}_{i}

. If

σ_{m}^{2}

is unknown, we can find the estimate

{\hat{σ}}_{m}^{2}

from the equations:

E {{\bar{Y}}_{m \hat{α} i}^{δ 2}} = {\hat{σ}}_{m}^{2} (\frac{γ_{m i}^{2} + \hat{α}}{\hat{α}}), i = 1, \dots, M .

(A3)

However, since only one realization of the random vector

{\bar{Y}}_{m \hat{α}}^{δ}

is known, the calculation of

{\hat{σ}}_{m}^{2}

from Equation (A3) may lead to erroneous results. Therefore, we look for another selection criterion.

Set:

a_{i} (σ_{m}^{2}) = σ_{m}^{2} (\frac{γ_{m i}^{2} + \hat{α}}{\hat{α}})

(A4)

and define the function:

f ({\bar{Y}}_{m \hat{α}}^{δ} | m, σ_{m}^{2}) = \sum_{i = 1}^{M} \{ψ (a_{i} (σ_{m}^{2})) + ψ^{'} (a_{i} (σ_{m}^{2})) [{\bar{Y}}_{m \hat{α} i}^{δ 2} - a_{i} (σ_{m}^{2})]\}

with

ψ (a)

being a strictly concave function, i.e.,

ψ^{''} (a) < 0

. The expected value of

f (\cdot)

is given by:

E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, σ_{m}^{2})} = \sum_{i = 1}^{M} \{ψ (a_{i} (σ_{m}^{2})) + ψ^{'} (a_{i} (σ_{m}^{2})) [E {{\bar{Y}}_{m \hat{α} i}^{δ 2}} - a_{i} (σ_{m}^{2})]\},

hence, defining the estimate

{\hat{σ}}_{m}^{2}

through the relation (cf. Equations (A3) and (A4)):

E {{\bar{Y}}_{m \hat{α} i}^{δ 2}} = a_{i} ({\hat{σ}}_{m}^{2}),

(A5)

we express

E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, σ_{m}^{2})}

as:

E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, σ_{m}^{2})} = \sum_{i = 1}^{M} \{ψ (a_{i} (σ_{m}^{2})) + ψ^{'} (a_{i} (σ_{m}^{2})) [a_{i} ({\hat{σ}}_{m}^{2}) - a_{i} (σ_{m}^{2})]\} .

Then, we obtain:

\begin{matrix} E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, σ_{m}^{2})} - E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})} & = \sum_{i = 1}^{M} {ψ (a_{i} (σ_{m}^{2})) - ψ (a_{i} ({\hat{σ}}_{m}^{2})) \\ + ψ^{'} (a_{i} (σ_{m}^{2})) [a_{i} ({\hat{σ}}_{m}^{2}) - a_{i} (σ_{m}^{2})]} . \end{matrix}

(A6)

Considering the second-order Taylor expansion around

a_{i} (σ_{m}^{2})

,

\begin{matrix} ψ (a_{i} ({\hat{σ}}_{m}^{2})) & = ψ (a_{i} (σ_{m}^{2})) + ψ^{'} (a_{i} (σ_{m}^{2})) [a_{i} ({\hat{σ}}_{m}^{2}) - a_{i} (σ_{m}^{2})] \\ + \frac{1}{2} ψ^{''} (ξ_{i}) {[a_{i} ({\hat{σ}}_{m}^{2}) - a_{i} (σ_{m}^{2})]}^{2} \end{matrix}

for some

ξ_{i}

between

a_{i} (σ_{m}^{2})

and

a_{i} ({\hat{σ}}_{m}^{2})

and taking into account the fact that

ψ

is strictly concave, we deduce that each term in the sum (A6) is non-negative and vanishes only for

a_{i} (σ_{m}^{2}) = a_{i} ({\hat{σ}}_{m}^{2}) .

Thus, we have:

E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, σ_{m}^{2})} \geq E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})}

for all

σ_{m}^{2}

. In conclusion,

{\hat{σ}}_{m}^{2}

, defined through Equation (A5), is the unique global minimizer of

E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, σ_{m}^{2})}

, i.e.,

{\hat{σ}}_{m}^{2} = \arg min_{σ_{m}^{2}} E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, σ_{m}^{2})} .

(A7)

Coming to the optimal model

\hat{m}

, it seems that a natural choice of

\hat{m}

is:

\hat{m} = \arg min_{m} E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})} .

(A8)

However, instead of Equation (A8), we use a more general selection criterion, namely the optimal model

\hat{m}

defined as:

\hat{m} = \arg min_{m} F (E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})}),

(A9)

where F is a monotonic increasing function of

E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})}

.

The generalized cross-validation [10,11] and maximum marginal likelihood estimation [7,8,9] can be obtained by a particular choice of the function

ψ (a)

.

Generalized cross-validation

For the choice:

ψ (a) = 1 - \frac{1}{a},

we obtain:

\begin{matrix} E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, σ_{m}^{2})} & = M + \sum_{i = 1}^{M} [\frac{E {{\bar{Y}}_{m \hat{α} i}^{δ 2}}}{a_{i}^{2} (σ_{m}^{2})} - \frac{2}{a_{i} (σ_{m}^{2})}] . \end{matrix}

(A10)

Since

{\hat{σ}}_{m}^{2}

is the unique global minimizer of

E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, σ_{m}^{2})}

, the condition:

\frac{d}{d σ_{m}^{2}} E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})} = 0,

(A11)

yields:

\sum_{i = 1}^{M} [\frac{E {{\bar{Y}}_{m \hat{α} i}^{δ 2}}}{a_{i}^{2} ({\hat{σ}}_{m}^{2})} - \frac{1}{a_{i} ({\hat{σ}}_{m}^{2})}] = 0 .

(A12)

The above equation together with the relation:

a_{i} ({\hat{σ}}_{m}^{2}) = {\hat{σ}}_{m}^{2} \frac{γ_{m i}^{2} + \hat{α}}{\hat{α}},

(A13)

provide the following estimate for the data error variance:

{\hat{σ}}_{m}^{2} = \frac{\sum_{i = 1}^{M} {(\frac{\hat{α}}{γ_{m i}^{2} + \hat{α}})}^{2} E {{\bar{Y}}_{m \hat{α} i}^{δ 2}}}{\sum_{i = 1}^{M} \frac{\hat{α}}{γ_{m i}^{2} + \hat{α}}} .

(A14)

On the other hand, from Equations (A10) and (A12), we obtain:

\begin{matrix} E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})} & = M - \sum_{i = 1}^{M} \frac{1}{a_{i} ({\hat{σ}}_{m}^{2})}, \end{matrix}

and we define the function

F (\cdot)

by:

\begin{matrix} F (E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})}) & = \frac{1}{M - E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})}} . \end{matrix}

Taking Equations (A13) and (A14) into account, we find that the function

F (\cdot)

can be calculated as:

\begin{matrix} F (E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})}) & = \frac{\sum_{i = 1}^{M} {(\frac{\hat{α}}{γ_{m i}^{2} + \hat{α}})}^{2} E {{\bar{Y}}_{m \hat{α} i}^{δ 2}}}{{(\sum_{i = 1}^{M} \frac{\hat{α}}{γ_{m i}^{2} + \hat{α}})}^{2}} . \end{matrix}

In practice, the expectation

E {{\bar{Y}}_{m \hat{α} i}^{δ 2}}

cannot be computed since only one realization

{\bar{Y}}_{m \hat{α} i}^{δ} = u_{i}^{T} {\bar{y}}_{m \hat{α}}^{δ}

is known. Therefore, we approximate

E {{\bar{Y}}_{m \hat{α} i}^{δ 2}} \approx {(u_{i}^{T} {\bar{y}}_{m \hat{α}}^{δ})}^{2}

and consider the so-called generalized cross-validation function:

\begin{matrix} v (m) & = \frac{\sum_{i = 1}^{M} {(\frac{\hat{α}}{γ_{m i}^{2} + \hat{α}})}^{2} {(u_{i}^{T} {\bar{y}}_{m \hat{α}}^{δ})}^{2}}{{(\sum_{i = 1}^{M} \frac{\hat{α}}{γ_{m i}^{2} + \hat{α}})}^{2}} . \end{matrix}

(A15)

Finally, using the representations:

\begin{matrix} ‖ {\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p_{m \hat{α}}^{δ} ‖^{2} & = ‖ r_{m \hat{α}}^{δ} ‖^{2} = \sum_{i = 1}^{M} {(\frac{\hat{α}}{γ_{m i}^{2} + \hat{α}})}^{2} {(u_{i}^{T} {\bar{y}}_{m \hat{α}}^{δ})}^{2}, \end{matrix}

(A16)

\begin{matrix} trace (I_{M} - {\hat{A}}_{m \hat{α}}) & = \sum_{i = 1}^{M} \frac{\hat{α}}{γ_{m i}^{2} + \hat{α}}, \end{matrix}

(A17)

we express the estimate of the data error variance as:

{\hat{σ}}_{m}^{2} = \frac{‖ r_{m \hat{α}}^{δ} ‖^{2}}{[trace (I_{M} - {\hat{A}}_{m \hat{α}})]},

(A18)

and the generalized cross-validation function as:

v (m) = \frac{‖ r_{m \hat{α}}^{δ} ‖^{2}}{{[trace (I_{M} - {\hat{A}}_{m \hat{α}})]}^{2}} .

(A19)

Maximum marginal likelihood estimation

For the choice:

ψ (a) = ln a \Rightarrow ψ^{'} (a) = \frac{1}{a},

we obtain:

\begin{matrix} E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, σ_{m}^{2})} & = - M + \sum_{i = 1}^{M} [\frac{E {{\bar{Y}}_{m \hat{α} i}^{δ 2}}}{a_{i} (σ_{m}^{2})} + ln a_{i} (σ_{m}^{2})] . \end{matrix}

(A20)

From the minimization condition (A11), we find that

{\hat{σ}}_{m}^{2}

satisfying the equation:

\sum_{i = 1}^{M} \frac{E {{\bar{Y}}_{m \hat{α} i}^{δ 2}}}{a_{i} ({\hat{σ}}_{m}^{2})} = M,

(A21)

is given by:

{\hat{σ}}_{m}^{2} = \frac{1}{M} \sum_{i = 1}^{M} \frac{\hat{α}}{γ_{m i}^{2} + \hat{α}} E {{\bar{Y}}_{m \hat{α} i}^{δ 2}} .

(A22)

Equations (A20) and (A21) yield:

\begin{matrix} E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})} & = \sum_{i = 1}^{M} ln a_{i} ({\hat{σ}}_{m}^{2}) \end{matrix}

and we define the function

F (\cdot)

by:

\begin{matrix} F (E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})}) & = exp [\frac{1}{M} E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})}] \end{matrix}

Furthermore, using the result:

\sum_{i = 1}^{M} ln a_{i} ({\hat{σ}}_{m}^{2}) = ln {(\sum_{i = 1}^{M} \frac{\hat{α}}{γ_{m i}^{2} + \hat{α}} E {{\bar{Y}}_{m \hat{α} i}^{δ 2}})}^{M} - \sum_{i = 1}^{M} ln (\frac{\hat{α}}{γ_{m i}^{2} + \hat{α}}),

we express

F (\cdot)

as:

\begin{matrix} F (E {f ({\bar{Y}}_{m \hat{α}}^{δ} | m, {\hat{σ}}_{m}^{2})}) & = \frac{\sum_{i = 1}^{M} \frac{\hat{α}}{γ_{m i}^{2} + \hat{α}} E {{\bar{Y}}_{m \hat{α} i}^{δ 2}}}{\prod_{i = 1}^{M} {(\frac{\hat{α}}{γ_{m i}^{2} + \hat{α}})}^{1 / M}} . \end{matrix}

Considering the approximation

E {{\bar{Y}}_{m \hat{α} i}^{δ 2}} \approx {(u_{i}^{T} {\bar{y}}_{m \hat{α}}^{δ})}^{2}

, we define the maximum likelihood function

λ (m)

by:

\begin{matrix} λ (m) & = \frac{\sum_{i = 1}^{M} \frac{\hat{α}}{γ_{m i}^{2} + \hat{α}} {(u_{i}^{T} {\bar{y}}_{m \hat{α}}^{δ})}^{2}}{\prod_{i = 1}^{M} {(\frac{\hat{α}}{γ_{m i}^{2} + \hat{α}})}^{1 / M}} \end{matrix}

(A23)

Using the results:

\begin{matrix} {\bar{y}}_{m \hat{α}}^{δ T} (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ} & = \sum_{i = 1}^{M} \frac{\hat{α}}{γ_{m i}^{2} + \hat{α}} {(u_{i}^{T} {\bar{y}}_{m \hat{α}}^{δ})}^{2}, \end{matrix}

(A24)

\begin{matrix} \det (I_{M} - {\hat{A}}_{m \hat{α}}) & = \prod_{i = 1}^{N} \frac{\hat{α}}{γ_{m i}^{2} + \hat{α}}, \end{matrix}

(A25)

we express the estimate for the data error variance as:

{\hat{σ}}_{L m}^{2} = \frac{1}{M} {\bar{y}}_{m \hat{α}}^{δ T} (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ},

(A26)

and the maximum likelihood function as:

\begin{matrix} λ (m) & = \frac{{\bar{y}}_{m \hat{α}}^{δ T} (I_{M} - {\hat{A}}_{m \hat{α}}) {\bar{y}}_{m \hat{α}}^{δ}}{\sqrt[M]{\det (I_{M} - {\hat{A}}_{m \hat{α}})}} . \end{matrix}

(A27)

Appendix C

In this Appendix, we derive an estimate for the data error variance by analyzing the residual of the linear equation

{\bar{y}}_{m \hat{α}}^{δ} = {\bar{K}}_{m \hat{α}} p

. In terms of a singular value decomposition of the quotient matrix

{\bar{K}}_{m \hat{α}} L^{- 1} = U Γ V^{T}

with

Γ = diag {(γ_{i})}_{N}

,

γ_{m i}^{2} = 0

for

i > N

,

U = [u_{1}, \dots, u_{M}]

, and

V = [v_{1}, \dots, v_{N}]

, the squared norm of the residual of the linear equation

{\bar{y}}_{m \hat{α}}^{δ} = {\bar{K}}_{m \hat{α}} p

is:

\begin{matrix} ‖ {\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p_{m \hat{α}}^{δ} ‖^{2} & = \sum_{i = 1}^{M} {(\frac{\hat{α}}{γ_{m i}^{2} + \hat{α}})}^{2} {(u_{i}^{T} {\bar{y}}_{m \hat{α}}^{δ})}^{2} . \end{matrix}

(A28)

In the data model (cf. Equation (26))

{\bar{y}}_{m \hat{α}}^{δ} = {\bar{K}}_{m \hat{α}} p + {\bar{δ}}_{m}

, we set

{\bar{y}}_{m \hat{α}} = {\bar{K}}_{m \hat{α}} p_{m \hat{α}}^{δ}

and rewrite this equation as:

{\bar{y}}_{m \hat{α}}^{δ} = {\bar{y}}_{m \hat{α}} + {\bar{δ}}_{m} .

(A29)

Using the relation:

\begin{matrix} E {(u_{i}^{T} {\bar{δ}}_{m}) (u_{j}^{T} {\bar{δ}}_{m})} & = E \{\sum_{k = 1}^{M} \sum_{l = 1}^{M} {[{\bar{δ}}_{m}]}_{k} {[{\bar{δ}}_{m}]}_{l} {[u_{i}]}_{k} {[u_{j}]}_{l}\} \\ = σ_{m}^{2} u_{i}^{T} u_{j} \\ = σ_{m}^{2} δ_{i j}, \end{matrix}

where

δ_{i j}

is the Kronecker delta function, we get:

E {{(u_{i}^{T} {\bar{δ}}_{m})}^{2}} = σ_{m}^{2},

(A30)

so that from Equations (A29) and (A30), we obtain:

\begin{matrix} E {{(u_{i}^{T} {\bar{y}}_{m \hat{α}}^{δ})}^{2}} & = {(u_{i}^{T} {\bar{y}}_{m \hat{α}})}^{2} + σ_{m}^{2} . \end{matrix}

(A31)

Since

{\bar{y}}_{m \hat{α}} = {\bar{K}}_{m \hat{α}} p_{m \hat{α}}^{δ}

belongs to the range of the matrix operator

{\bar{K}}_{m \hat{α}}

, which, in turn, is spanned by the vectors

{u_{i}}_{i = 1}^{N}

, we have

u_{i}^{T} {\bar{y}}_{m \hat{α}} = 0

for

i > N

, and further (cf. Equation (A31)):

E {{(u_{i}^{T} {\bar{y}}_{m \hat{α}}^{δ})}^{2}} = σ_{m}^{2} for i > N .

(A32)

Taking the expected value of Equation (A28) and using Equations (A31) and (A32), we get:

\begin{matrix} E {‖ {\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p_{m \hat{α}}^{δ} | |^{2}} & = (M - N) σ_{m}^{2} + \sum_{i = 1}^{N} {(\frac{\hat{α}}{γ_{m i}^{2} + \hat{α}})}^{2} [{(u_{i}^{T} {\bar{y}}_{m \hat{α}})}^{2} + σ_{m}^{2}] \end{matrix}

Now, if

\hat{α} ≪ γ_{m i}^{2}

for all

i = 1, \dots, N

, we approximate:

E {‖ {\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p_{m \hat{α}}^{δ} {| |}^{2}} \approx (M - N) σ_{m}^{2},

and deduce that an estimate for the data error variance is:

\begin{matrix} {\hat{σ}}_{m}^{2} & \approx \frac{1}{M - N} E {‖ {\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p_{m \hat{α}}^{δ} ‖^{2}} \\ \approx \frac{1}{M - N} ‖ {\bar{y}}_{m \hat{α}}^{δ} - {\bar{K}}_{m \hat{α}} p_{m \hat{α}}^{δ} ‖^{2} \\ = \frac{1}{M - N} ‖ r_{m \hat{α}}^{δ} ‖^{2} . \end{matrix}

(A33)

References

Levy, R.C.; Remer, L.A.; Dubovik, O. Global aerosol optical properties and application to Moderate Resolution Imaging Spectroradiometer aerosol retrieval over land. J. Geophys. Res. Atmos. 2007, 112. [Google Scholar] [CrossRef] [Green Version]
Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian Model averaging: A tutorial. Stat. Sci. 1999, 14, 382–401. [Google Scholar]
Määttä, A.; Laine, M.; Tamminen, J.; Veefkind, J. Quantification of uncertainty in aerosol optical thickness retrieval arising from aerosol microphysical model and other sources, applied to Ozone Monitoring Instrument (OMI) measurements. Atmos. Meas. Tech. 2014, 7, 1185–1199. [Google Scholar] [CrossRef]
Kauppi, A.; Kolmonen, P.; Laine, M.; Tamminen, J. Aerosol-type retrieval and uncertainty quantification from OMI data. Atmos. Meas. Tech. 2017, 10, 4079–4098. [Google Scholar] [CrossRef] [Green Version]
Doicu, A.; Trautmann, T.; Schreier, F. Numerical Regularization for Atmospheric Inverse Problems; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar] [CrossRef] [Green Version]
Tarantola, A. Inverse Problem Theory and Methods for Model Parameter Estimation; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 2005. [Google Scholar] [CrossRef] [Green Version]
Patterson, H.; Thompson, R. Recovery of inter-block information when block sizes are unequal. Biometrika 1971, 58, 545–554. [Google Scholar] [CrossRef]
Smyth, G.K.; Verbyla, A.P. A conditional likelihood approach to residual maximum likelihood estimation in generalized linear models. J. R. Stat. Soc. Ser. Methodol. 1996, 58, 565–572. [Google Scholar] [CrossRef]
Stuart, A.; Ord, J.; Arnols, S. Kendall’s Advanced Theory of Statistics. Volume 2A: Classical Inference and the Linear Model; Oxford University Press Inc.: Oxford, UK, 1999. [Google Scholar]
Wahba, G. Practical approximate solutions to linear operator equations when the data are noisy. SIAM J. Numer. Anal. 1977, 14, 651–667. [Google Scholar] [CrossRef]
Wahba, G. Spline Models for Observational Data; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1990. [Google Scholar] [CrossRef]
Marshak, A.; Herman, J.; Szabo, A.; Blank, K.; Carn, S.; Cede, A.; Geogdzhayev, I.; Huang, D.; Huang, L.K.; Knyazikhin, Y.; et al. Earth Observations from DSCOVR EPIC Instrument. Bull. Am. Meteorol. Soc. 2018, 99, 1829–1850. [Google Scholar] [CrossRef] [PubMed]
Geogdzhayev, I.V.; Marshak, A. Calibration of the DSCOVR EPIC visible and NIR channels using MODIS Terra and Aqua data and EPIC lunar observations. Atmos. Meas. Tech. 2018, 11, 359–368. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Xu, X.; Wang, J.; Wang, Y.; Zeng, J.; Torres, O.; Reid, J.; Miller, S.; Martins, J.; Remer, L. Detecting layer height of smoke aerosols over vegetated land and water surfaces via oxygen absorption bands: Hourly results from EPIC/DSCOVR in deep space. Atmos. Meas. Tech. 2019, 12, 3269–3288. [Google Scholar] [CrossRef] [Green Version]
García, V.M.; Sasi, S.; Efremenko, D.S.; Doicu, A.; Loyola, D. Radiative transfer models for retrieval of cloud parameters from EPIC/DSCOVR measurements. J. Quant. Spectrosc. Radiat. Transf. 2018, 213, 228–240. [Google Scholar] [CrossRef] [Green Version]
García, V.M.; Sasi, S.; Efremenko, D.S.; Doicu, A.; Loyola, D. Linearized radiative transfer models for retrieval of cloud parameters from EPIC/DSCOVR measurements. J. Quant. Spectrosc. Radiat. Transf. 2018, 213, 241–251. [Google Scholar] [CrossRef] [Green Version]
Holben, B.; Eck, T.; Slutsker, I.; Tanré, D.; Buis, J.; Setzer, A.; Vermote, E.; Reagan, J.; Kaufman, Y.; Nakajima, T.; et al. AERONET—A federated instrument network and data archive for aerosol characterization. Remote Sens. Environ. 1998, 66, 1–16. [Google Scholar] [CrossRef]

Figure 1. Relative errors

ε_{mean}^{τ}

,

ε_{\max}^{τ}

,

ε_{mean}^{H}

, and

ε_{\max}^{H}

versus

τ_{t}

for

H_{t}

= 3.0 km. All four aerosol models are involved in the retrieval.

Figure 1. Relative errors

ε_{mean}^{τ}

,

ε_{\max}^{τ}

,

ε_{mean}^{H}

, and

ε_{\max}^{H}

versus

τ_{t}

for

H_{t}

= 3.0 km. All four aerosol models are involved in the retrieval.

Figure 2. Relative errors

ε_{mean}^{τ}

,

ε_{\max}^{τ}

,

ε_{mean}^{H}

, and

ε_{\max}^{H}

versus

H_{t}

for

τ_{t}

= 1.0. All four aerosol models are involved in the retrieval.

Figure 2. Relative errors

ε_{mean}^{τ}

,

ε_{\max}^{τ}

,

ε_{mean}^{H}

, and

ε_{\max}^{H}

versus

H_{t}

for

τ_{t}

= 1.0. All four aerosol models are involved in the retrieval.

Figure 3. Relative errors

ε_{mean}^{τ}

,

ε_{\max}^{τ}

,

ε_{mean}^{H}

, and

ε_{\max}^{H}

versus

τ_{t}

for

H_{t} = 3.0

km. All aerosol models except the exact one are involved in the retrieval.

Figure 3. Relative errors

ε_{mean}^{τ}

,

ε_{\max}^{τ}

,

ε_{mean}^{H}

, and

ε_{\max}^{H}

versus

τ_{t}

for

H_{t} = 3.0

km. All aerosol models except the exact one are involved in the retrieval.

Figure 4. Relative errors

ε_{mean}^{τ}

,

ε_{\max}^{τ}

,

ε_{mean}^{H}

, and

ε_{\max}^{H}

versus

H_{t}

for

τ_{t} = 1.0

. All aerosol models except the exact one are involved in the retrieval.

Figure 4. Relative errors

ε_{mean}^{τ}

,

ε_{\max}^{τ}

,

ε_{mean}^{H}

, and

ε_{\max}^{H}

versus

H_{t}

for

τ_{t} = 1.0

. All aerosol models except the exact one are involved in the retrieval.

Figure 5. Upper panels: mean a posteriori density of

τ

computed by GCV for Test Examples 1 (left) and 2 (right). Lower panels: mean a posteriori density of H computed by GCV for Test Examples 1 (left) and 2 (right). The true values are

τ_{t}

= 1.5 and

H_{t}

= 3.0 km.

Figure 5. Upper panels: mean a posteriori density of

τ

computed by GCV for Test Examples 1 (left) and 2 (right). Lower panels: mean a posteriori density of H computed by GCV for Test Examples 1 (left) and 2 (right). The true values are

τ_{t}

= 1.5 and

H_{t}

= 3.0 km.

Figure 6. Relative errors

ε_{mean}^{τ}

and

ε_{mean}^{H}

versus

τ_{t}

for

H_{t} = 3.0

km (upper panels) and

H_{t}

for

τ_{t} = 1.0

(lower panels). The surface albedo is included in the retrieval and all four aerosol models are considered.

Figure 6. Relative errors

ε_{mean}^{τ}

and

ε_{mean}^{H}

versus

τ_{t}

for

H_{t} = 3.0

km (upper panels) and

H_{t}

for

τ_{t} = 1.0

(lower panels). The surface albedo is included in the retrieval and all four aerosol models are considered.

Figure 7. Relative errors

ε_{mean}^{τ}

and

ε_{mean}^{H}

versus

τ_{t}

for

H_{t} = 3.0

km (upper panels) and

H_{t}

for

τ_{t} = 1.0

(lower panels). The surface albedo is included in the retrieval and all aerosol models excepting the exact one are considered.

Figure 7. Relative errors

ε_{mean}^{τ}

and

ε_{mean}^{H}

versus

τ_{t}

for

H_{t} = 3.0

km (upper panels) and

H_{t}

for

τ_{t} = 1.0

(lower panels). The surface albedo is included in the retrieval and all aerosol models excepting the exact one are considered.

Table 1. Optical properties of the aerosol models. For each model, the first and second lines are related to the accumulated and coarse modes, respectively. The complex refractive index of dust corresponds to the wavelengths 470 (1), 550 (2), 660 (3), and 2100 (4) nm. In the simulations, the values of the refractive index at

λ = 660

nm are assumed.

Table 1. Optical properties of the aerosol models. For each model, the first and second lines are related to the accumulated and coarse modes, respectively. The complex refractive index of dust corresponds to the wavelengths 470 (1), 550 (2), 660 (3), and 2100 (4) nm. In the simulations, the values of the refractive index at

λ = 660

nm are assumed.

Model	$r_{v} (μ m)$	$σ$	$m = (Re (m), Im (m))$	$V_{0} (μ m^{3} / μ m^{2})$
$\begin{matrix} 1 \\ Nonabs . \end{matrix}$	$\begin{matrix} 0.160 + 0.0434 τ \\ 3.325 + 0.1411 τ \end{matrix}$	$\begin{matrix} 0.364 + 0.1529 τ \\ 0.759 + 0.0168 τ \end{matrix}$	$\begin{matrix} (1.42, 0.004 - 0.0015 τ) \\ (1.42, 0.004 - 0.0015 τ) \end{matrix}$	$\begin{matrix} 0.1718 τ^{0.821} \\ 0.0934 τ^{0.639} \end{matrix}$
$\begin{matrix} 2 \\ Modabs . \end{matrix}$	$\begin{matrix} 0.145 + 0.0203 τ \\ 3.101 + 0.3364 τ \end{matrix}$	$\begin{matrix} 0.374 + 0.1365 τ \\ 0.729 + 0.098 τ \end{matrix}$	$\begin{matrix} (1.43, 0.008 - 0.002 τ) \\ (1.43, 0.008 - 0.002 τ) \end{matrix}$	$\begin{matrix} 0.1642 τ^{0.775} \\ 0.1482 τ^{0.684} \end{matrix}$
$\begin{matrix} 3 \\ Abs . \end{matrix}$	$\begin{matrix} 0.134 + 0.0096 τ \\ 3.448 + 0.9489 τ \end{matrix}$	$\begin{matrix} 0.383 + 0.0794 τ \\ 0.743 + 0.0409 τ \end{matrix}$	$\begin{matrix} (1.51, 0.02) \\ (1.51, 0.02) \end{matrix}$	$\begin{matrix} 0.1748 τ^{0.891} \\ 0.1043 τ^{0.682} \end{matrix}$
$\begin{matrix} 4 \\ Dust \end{matrix}$	$\begin{matrix} 0.1416 τ^{- 0.052} \\ 2.2 \end{matrix}$	$\begin{matrix} 0.7561 τ^{0.148} \\ 0.554 τ^{- 0.052} \end{matrix}$	$\begin{matrix} (1.48 τ^{- 0.021}, 0.0025 τ^{0.132}) (1) \\ (1.48 τ^{- 0.021}, 0.002) (2) \\ (1.48 τ^{- 0.021}, 0.0018 τ^{- 0.08}) (3) \\ (1.46 τ^{- 0.040}, 0.0018 τ^{- 0.30}) (4) \\ (1.48 τ^{- 0.021}, 0.0025 τ^{0.132}) (1) \\ (1.48 τ^{- 0.021}, 0.002) (2) \\ (1.48 τ^{- 0.021}, 0.0018 τ^{- 0.08}) (3) \\ (1.46 τ^{- 0.040}, 0.0018 τ^{- 0.30}) (4) \end{matrix}$	$\begin{matrix} 0.0871 τ^{1.026} \\ 0.6786 τ^{1.057} \end{matrix}$

Table 2. Average relative errors for the results plotted in Figure 1 and Figure 2. MMLE, Maximum Marginal Likelihood Estimation; GCV, Generalized-Cross Validation.

Method	Average Relative Error
Method	$ε_{mean τ}^{τ}$	$ε_{mean τ}^{H}$	$ε_{mean H}^{τ}$	$ε_{mean H}^{H}$
MMLE	0.046	0.091	0.027	0.102
MLMMLE	0.009	0.062	0.022	0.059
GCV	0.009	0.020	0.001	0.024
MLGCV	0.009	0.025	0.006	0.042

Table 3. Average relative errors for the results plotted in Figure 3 and Figure 4.

Method	Average Relative Error
Method	$ε_{mean τ}^{τ}$	$ε_{mean τ}^{H}$	$ε_{mean H}^{τ}$	$ε_{mean H}^{H}$
MMLE	0.095	0.206	0.073	0.318
MLMMLE	0.180	0.210	0.139	0.348
GCV	0.060	0.111	0.022	0.096
MLGCV	0.059	0.200	0.076	0.259

Table 4. Average relative errors for the results plotted in Figure 6.

Method	Average Relative Error
Method	$ε_{mean τ}^{τ}$	$ε_{mean τ}^{H}$	$ε_{mean H}^{τ}$	$ε_{mean H}^{H}$
MMLE	0.042	0.046	0.018	0.042
MLMMLE	0.051	0.060	0.011	0.027
GCV	0.101	0.213	0.076	0.237
MLGCV	0.119	0.311	0.100	0.398

Table 5. Average relative errors for the results plotted in Figure 7.

Method	Average Relative Error
Method	$ε_{mean τ}^{τ}$	$ε_{mean τ}^{H}$	$ε_{mean H}^{τ}$	$ε_{mean H}^{H}$
MMLE	0.101	0.113	0.070	0.204
MLMMLE	0.117	0.203	0.057	0.291
GCV	0.136	0.269	0.109	0.274
MLGCV	0.116	0.307	0.147	0.346

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sasi, S.; Natraj, V.; Molina García, V.; Efremenko, D.S.; Loyola, D.; Doicu, A. Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory. Remote Sens. 2020, 12, 3724. https://doi.org/10.3390/rs12223724

AMA Style

Sasi S, Natraj V, Molina García V, Efremenko DS, Loyola D, Doicu A. Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory. Remote Sensing. 2020; 12(22):3724. https://doi.org/10.3390/rs12223724

Chicago/Turabian Style

Sasi, Sruthy, Vijay Natraj, Víctor Molina García, Dmitry S. Efremenko, Diego Loyola, and Adrian Doicu. 2020. "Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory" Remote Sensing 12, no. 22: 3724. https://doi.org/10.3390/rs12223724

APA Style

Sasi, S., Natraj, V., Molina García, V., Efremenko, D. S., Loyola, D., & Doicu, A. (2020). Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory. Remote Sensing, 12(22), 3724. https://doi.org/10.3390/rs12223724

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory

Abstract

1. Introduction

2. Data Model

3. Bayesian Approach

3.1. Bayesian Parameter Estimation

3.2. Bayesian Model Selection

4. Iteratively Regularized Gauss–Newton Method

5. Parameter Estimation and Model Selection

6. Algorithm Description

7. Application to the EPIC Instrument

8. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI