Model Selection in Atmospheric Remote Sensing with Application to Aerosol Retrieval from DSCOVR/EPIC. Part 2: Numerical Analysis

: An algorithm for retrieving aerosol parameters by taking into account the uncertainty in aerosol model selection is applied to the retrieval of aerosol optical thickness and aerosol layer height from synthetic measurements from the EPIC sensor onboard the Deep Space Climate Observatory. The synthetic measurements are generated using aerosol models derived from AERONET measurements at different sites, while other commonly used aerosol models, such as OPAC, GOCART, OMI, and MODIS databases are used in the retrieval. The numerical analysis is focused on the estimation of retrieval errors when the true aerosol model is unknown. We found that the best aerosol model is the one with a value of the asymmetry parameter and an angular variation of the phase function around the viewing direction that is close to the values corresponding to the reference


Introduction
The retrieval of aerosol and cloud optical thickness, as well as aerosol layer height and cloud top height, requires the selection of a model that describes their microphysical properties. If there is insufficient information for an appropriate microphysical model selection, the solution accuracy can be improved if this model uncertainty is taken into account and appropriately quantified.
In Ref. [1], we presented a retrieval algorithm that takes into account uncertainty in model selection. The algorithm is based on (i) the iteratively regularized Gauss-Newton method to compute the solution for each model, (ii) a linearization of the forward model around the solution, and (iii) an extension of maximum marginal likelihood estimation and generalized cross-validation methods to model selection and data error variance estimation. Essentially, the algorithm includes four selection models corresponding to (i) the two parameter choice methods used (maximum marginal likelihood estimation and generalized cross-validation), and (ii) the two settings in which the relative evidence is treated (stochastic and deterministic). In practice, the algorithm can be used for the retrieval of (i) aerosol parameters from different spectral instruments, and (ii) cloud parameters, when the selection of the cloud type and/or the effective radius may play an important role.
The above algorithm was applied to the retrieval of aerosol optical thickness and layer height from synthetic measurements corresponding to the EPIC instrument onboard the Deep Space Climate Observatory [2]. Specifically, Channels 7 and 8 in the Oxygen B-band at 680 and 687.75 nm, respectively, and Channels 9 and 10 in the Oxygen A-band at 764 and 779.5 nm, respectively, were used for this purpose. The simulations utilize aerosol models implemented in the MODIS aerosol algorithm over land [3]; the surface albedo is either assumed to be known or included in the retrieval.
In this paper, we utilize the methodology developed in Ref. [1] to perform retrievals of synthetic measurements from the EPIC instrument. For this purpose, 1. a set of aerosol models derived from Aerosol Robotic Network (AERONET) measurements at different sites [4,5] is used as reference, under the simplified assumption that the surface albedo is known, 2.
The numerical analysis is devoted to the estimation of errors in the retrieval of aerosol optical thickness and layer height, when the true aerosol model is unknown.
The paper is organized as follows. In Section 2, we recapitulate the main features of the aerosol retrieval algorithm. In Section 3, we describe the aerosol models used in this study. We present the results of our numerical analysis in Section 4.

Algorithm Description
For N m microphysical aerosol models, we consider the scaled white-noise data model where F m (x) ∈ R M is the forward model corresponding to the model m, m = 1, . . . , N m , y δ ∈ R M the measurement vector or the noisy data vector, and δ m ∈ R M the data error vector summing the contributions of the measurement and model error vectors. In a stochastic setting, δ m and x are random vectors, and Equation (1) is solved by means of a Bayesian approach. Specifically, for x ∼ N(x a , C x = σ 2 m (αL T L) −1 ) and δ m ∼ N(0, C δm = σ 2 m I M ), the maximum a posteriori estimator x δ mα is computed as where is the a posteriori potential, the notation N(x mean , C x ) stands for a normal distribution with mean x mean and covariance matrix C x , x a is the a priori state vector, α = σ 2 m /σ 2 x is the regularization parameter, i.e., the ratio of data error variance σ 2 m and a priori state variance σ 2 x , and L is the regularization matrix. In a deterministic setting, δ m is characterized by the noise level ∆ m (defined as an upper bound for δ m i.e., ||δ m || ≤ ∆ m ), x is a deterministic vector, and we are faced with the solution of the nonlinear equation y δ = F m (x). In the framework of Tikhonov regularization, a regularized solution x δ mα to the nonlinear equation y δ = F m (x) minimizes the Tikhonov function F mα (x) = σ 2 m V α (x | y δ , m); hence, the maximum a posteriori estimate coincides with the Tikhonov solution, i.e., x δ mα = x δ mα . Since the computation of the regularized solution x δ mα in the framework of the method of Tikhonov regularization requires knowledge of the optimal value of the regularization parameter α, the nonlinear equation y δ = F m (x) is solved by means of the iteratively regularized Gauss-Newton method. In addition to the optimal value of the regularization parameter α, this method also provides the corresponding regularized solution x δ m α .
The key quantity in Bayesian model selection is the relative evidence p(m | y δ ), also known as the a posteriori probability of the model m given the measurement y δ . In terms of this quantity, the mean and maximum solution estimates are defined by and respectively. The relative evidence p(m | y δ ) can be computed in a stochastic or a deterministic setting.

1.
In a stochastic setting, p(m | y δ ) is calculated using the relation where p(y δ | m) is the marginal likelihood density. By assuming a linearization of the forward model around the solution, the marginal likelihood density p(y δ | m) can be computed analytically provided that the data error variance σ 2 m is known. Estimates for σ 2 m can be obtained in the framework of maximum marginal likelihood estimation [9][10][11] and generalized cross-validation methods [12,13].

2.
In a deterministic setting, p(m | y δ ) is regarded as a merit function characterizing the solution x δ m α . More precisely, p(m | y δ ) is defined in terms of the marginal likelihood function or the generalized cross-validation function. In the latter case, we have where is the generalized cross-validation function, r δ m α = y δ − F m (x δ m α ) is the nonlinear residual vector, A m α = K m α K † m α is the influence matrix, K m α is the Jacobian matrix, and K † m α is the generalized inverse at the solution x δ m α . The numerical simulations performed in Ref. [1] show that 1.
the differences between the results corresponding to the stochastic and deterministic settings are not significant, and 2.
the maximum solution estimate x δ max is completely unrealistic.
In the present study, we assume a deterministic setting and compute the mean solution estimate x δ mean and the maximum solution estimate x δ max by means of Equations (3) and (4), respectively, and the relative evidence p(m | y δ ) according to Equations (5) and (6).

Aerosol Models
To describe the wide range of possible compositions, the aerosol particles are modeled as components, each of them representing an internal mixture of all chemical substances that have a similar origin. The size distribution of an aerosol component is assumed to be log-normal, described by the number size distribution where r mod is the modal or median radius of the number size distribution, σ is the standard deviation, and is the total number of particles (per cross-section of atmospheric column). Alternatively, the log-normal model can be described by the volume size distribution where is the median radius of the volume size distribution and is the volume of particles (per cross section of atmospheric column). Thus, the size distribution of an aerosol component is characterized by (i) the modal radius r mod , (ii) the standard deviation σ, and (iii) the total number of particles N 0 . Alternatively, in addition to the standard deviation σ, the median radius of the volume size distribution r v and the volume of particles V 0 can be used to characterize the size distribution. When these parameters together with the wavelength-dependent refractive index m aer are specified, the scattering characteristics of an aerosol component, i.e., 1. the size averaged extinction and scattering cross sections C ext and C sct , 2.
the single scattering albedo ω, and 3.
the coefficients a n of the expansion of the size averaged phase function P(Θ) in terms of Legendre polynomials P n (cos Θ), i.e. P(Θ) = ∑ n≥0 a n P n (cos Θ) can be computed using an electromagnetic scattering model. In particular, for spherical particles, the size averaged quantities are calculated using the formulas a n = r max r min a n (r)p(r) dr, n ≥ 0, where p(r) is the probability density function associated with the number size distribution, for which we have C ext (r), C sct (r), and a n (r) correspond to a spherical particle of radius r, and r min and r max are the lower and upper limits of the size distribution.
The aerosol components can be externally mixed to form aerosol models (classes). External mixture means that there is no physical or chemical interaction between particles of different components. If an aerosol model m consists of N aerosol components, and C exti , C scti , and a ni correspond to the ith aerosol component, the extinction cross section C extm , the scattering cross section C sctm , the single scattering albedo ω m , and the expansion coefficients a mn of the aerosol model are computed using the external mixing formulas where the weight is the number mixing ratio, and N 0i is the total number of particles of the ith aerosol component. For the extinction coefficient, we have the computational formula where is the total number density of the aerosol particles, and n 0i is the number density of the ith aerosol component. Note that relation (22) can be expressed in terms of n 0i as

Aerosol Model Sets
This section briefly describes the aerosol model sets used in this study. The parameters of the particle size distributions (i.e., modal radii and standard deviations) and the refractive index data for each set are summarized in Tables A1-A5 in Appendix A.

Set 1
The first model set includes the aerosol components from the Optical Properties of Aerosols and Clouds (OPAC) database [6]. These are 1.
the water-insoluble part of aerosol particles, consisting mostly of soil particles with a certain amount of organic material; 2.
the water-soluble part of aerosols, originating from gas to particle conversion and consisting of various kinds of sulfates, nitrates, and organic, water-soluble substances; 3.
the soot component, describing the absorbing black carbon (note that carbon is not soluble in water and therefore, the particles are assumed not to grow with increasing relative humidity);

4.
sea-salt (accumulated and coarse) particles, consisting of the various kinds of salt contained in seawater; 5.
mineral aerosol (accumulated and coarse) or desert dust, consisting of a mixture of quartz and clay minerals; 6.
mineral transported, describing the desert dust that is transported over long distances with a reduced amount of large particles; and 7.
the sulfate component, describing the amount of sulfate found in the Antarctic aerosol (also used as the stratospheric background aerosol).
Assuming that the aerosol particles are spherical, we generate a database using Mie scattering theory. Essentially, the database gives the values of C ext , C sct , ω, and a n , for (i) each aerosol component, The aerosol components can be combined to form aerosol models, where for each of them, the number of components N and the corresponding number mixing ratios w i have to be specified. In the present application, we use the 10 aerosol models proposed in the OPAC database [6].

Set 2
The second model set includes the following aerosol components: organic carbon, 4.
As in the previous case, we generate a database that gives the values of C ext , C sct , ω, and a n , for (i) each aerosol component, (ii) a set of wavelengths (61 values in the range 0.250 µm-40 µm), and (iii) a set of relative humidities (36 values: 0.00, 0.05, . . . , 0.80, 0.81, 0.82, . . . , 0.99). The aerosol single scattering properties are calculated based on the formalism presented in Ref. [14]. The following peculiarities of the computational process can be pointed out.

1.
A log-normal distribution with radius between 0.005 µm and 0.3 µm is assumed for black carbon, organic carbon and sulfate, with size parameters as in Table 2 of Ref. [14].

2.
The optical properties of dust and sea salt are computed across each of the five size bins used in the simulation. For dust, a constant number size distribution across the bin is assumed, while for sea salt, a functional form of the particle size distribution taken from Ref. [15] is considered. The dry size bins for dust are between the following radius limits (in µm): 0.1-1, 1-1.6, 1.6-3, 3-6, 6-10. For sea salt, the limits are (in µm): 0.03-0.1, 0.1-0.5, 0.5-1.5, 1.5-5, 5-10.

3.
Dust has no relative humidity induced swelling. Black carbon, organic carbon, sea salt and sulfate have relative humidity dependent growth factors (ratio of wet to dry particle diameter). The growth factors for organic carbon and sulfate are taken from the OPAC database [6], with modifications as in Ref. [14]. The empirical relationship of Gerber [16] is used for obtaining growth factors for sea salt. 4.
The spectral complex refractive index is taken from the OPAC database. The scattering characteristics are computed using the Mie theory, except for dust, where a precomputed database for ellipsoidal particles is considered [17].
For this application, we use the 10 aerosol models (mixtures of sulfate, dust, sea salt, black carbon, and organic carbon) obtained by a cluster analysis using the Goddard Chemistry Aerosol Radiation and Transport (GOCART) model [7]. This uses data from global circulation or chemical transport models employing the sensitivity of multi-angle imaging to natural mixtures of aerosols. Multi-angle imaging is sensitive to aerosol optical depth and aerosol type and this principle is employed here.

Set 3
The third set comprises the aerosol models included in the OMI multiwavelength aerosol retrieval algorithm [8]. There are five major aerosol types, where each type consists of several aerosol models depending on their optical properties and particle size distribution. On a global scale, four main tropospheric aerosol types can be distinguished: (i) urban-industrial aerosols originating from fossil fuel combustion, (ii) carbonaceous aerosols generated from natural and anthropogenic biomass burning, (iii) desert dust aerosols, injected into the atmosphere by winds, and (iv) naturally produced oceanic aerosols. After major volcanic eruptions, the aerosol optical thickness of the stratosphere can be significantly increased for several years. For this reason, a volcanic aerosol type is also included.

Set 4
The fourth set comprises the aerosol models included in the MODIS aerosol retrieval algorithm [3]. These are derived by performing a cluster analysis on the entire time series of almucantur aerosol properties from global AERONET sites. There are three fine-dominated (spherical) and one coarse-dominated (spheroid) aerosol optical models that represent the range of likely and observable global aerosol conditions. The fine-dominated aerosol models differ mainly in their values of single scattering albedo ω m , i.e., moderately absorbing (ω m = 0.90), absorbing (ω m = 0.85), and non-absorbing (ω m = 0.95). Each aerosol model is bilognormal, with dynamic (function of optical depth) size parameters (radius, standard deviation, volume distribution) and complex refractive index.

Set 5
The fifth set includes the aerosol models derived from several AERONET sites [5]. At the selected aerosol sites, with well-known meteorological and environmental conditions, the following aerosol models are identified: (i) urban-industrial from fossil fuel combustion in populated industrial regions, (ii) biomass burning produced by forest and grassland fires, (iii) desert dust blown into the atmosphere by wind, and (iv) aerosol of marine origin. In addition, a mixed aerosol model over the Maldives is included. The parameters of each aerosol model have dynamic variability as a function of optical thickness.

Numerical Simulations
In this section, we apply the algorithm to the retrieval of aerosol optical thickness and layer height by generating synthetic measurements corresponding to the EPIC instrument. The retrieval is performed under the assumption that the surface albedo is known.
The state vector x comprises the aerosol optical thickness τ and layer height H, i.e., x = [τ, H] T . As in Ref. [1], the true aerosol optical thicknesses to be retrieved are τ t = 0.25, 0.50, 0.75, 1.0, 1.25, and 1.5, while for the true aerosol layer height, we take H t = 1.0, 1.5, 2.0, 2.5, and 3.0 km. The a priori values, which coincide with the initial guesses, are τ a = 2.0 and H a = 4 km, while the surface albedo is A = 0.06. The solution accuracy is characterized by the relative errors corresponding to the mean solution estimate (cf. Equation (3)) x δ mean = [τ mean , H mean ], and those corresponding to the maximum solution estimate (cf. Equation (4)) x δ max = [τ max , H max ]. For the mean solution estimate, the average relative errors over τ t for H t = 3.0 km are defined by where N τ = 6, while the average relative errors over H t for τ t = 1.0 are defined by To analyze the accuracy of the aerosol retrieval, we consider three test examples. In the first example, the synthetic measurements are generated by choosing as truth, the urban industrial model derived from AERONET measurements at the Goddard Space Flight Center (GSFC) in Greenbelt, Maryland; in the second example, the true model is the mixed aerosol model derived from AERONET measurements over the Maldives; in the third example, the true model is the biomass burning model derived from AERONET measurements over the African savanna in Zambia. For each test, we consider extended and reduced sets of aerosol models. These are illustrated in Table 1. The reduced sets of aerosol models require less computational time and, in principle, should be sufficient to reproduce the true model. Since the OPAC and GOCART aerosol models depend on the relative humidity U, these models are applied with U = 0.80, 0.90, and 0.95.
The relative errors for the three test examples corresponding to the extended and reduced sets of aerosol models are given in Figures 1-4.  The following inferences can be drawn.

1.
For the first test example, the best results correspond to the OMI aerosol models (the relative Note that aerosol optical thickness errors are smaller, but not significantly smaller, than errors in the layer height.

2.
For the OPAC and GOCART aerosol models, the best fits correspond to a high and unrealistic value of relative humidity, U = 0.95. Note that the annual relative humidity in Greenbelt, Maryland is about 0.64; the corresponding value for Male, Maldives is about 0.79, and that for Lusaka, Zambia is about 0.61 (with a maximum of 0.80 in January).

3.
The results for the extended set of aerosol models are slightly better than those for the reduced set.

4.
From Figure 2, it is apparent that ε    Figure 1 but for the third test example.
In the retrieval, the input parameters of the forward (radiative transfer) model are the single scattering albedo and the expansion coefficients of the phase function. Therefore, in order to explain the above findings, we show in Table 2 the single scattering albedo ω m , the asymmetry parameter g m , and the relative error ε τ mean for a particular retrieval case of the first test example (τ t = 1.0 and H t = 3.0 km). The corresponding phase functions are illustrated in Figure 5. Taking into account the representation of the single scattering radiance, we deduce that the backscattering region of the phase function (the angular domain around the viewing direction) is relevant for our analysis. The results can be summarized as follows.

1.
The best retrievals correspond to the OMI, followed by the GOCART-0.95, aerosol models.

2.
The single scattering albedo ω m for the AERONET aerosol model is best approximated by the scattering albedos associated with the GOCART-0.80 and OPAC-0.90 aerosol models. However, all values of ω m are above 0.95, so that roughly speaking, all aerosol models are nonabsorbing. Therefore, we believe that, for the present application, ω m is not a relevant indicator of the goodness of fit. 3.
The asymmetry parameter g m for the AERONET aerosol model is best approximated by the asymmetry parameters associated with the OMI and GOCART-0.95 aerosol models. Moreover, the phase function in the backscattering region is also accurately reproduced by the same models. Thus, the accuracy of the asymmetry parameter, which is the first-order moment of the phase function, and the phase function in a region around the viewing direction, determine the retrieval errors. 4.
From Figure 5, it is also apparent that (i) GOCART-0.95 is superior to GOCART-0.80 and GOCART-0.90, and (ii) OPAC-0.95 is superior to OPAC-0.80 and OPAC-0.90. Thus, for these aerosol databases, better approximations for the asymmetry parameter and the phase function in the backscattering region correspond to larger values of U. Unfortunately, as previously mentioned, these large values of U are physically unrealistic. The conclusion which can be drawn is that climatologically correct GOCART and OPAC aerosol models are not always the best option for retrievals. Table 2. Single scattering albedo ω m , asymmetry parameter g m , and relative error ε τ mean for τ t = 1.0 and H t = 3.0 km. The results correspond to the first test example and the extended sets of aerosol models.

Conclusions
This paper presents the results of aerosol retrieval using an algorithm that takes into account the uncertainty in aerosol model selection. The solution corresponding to a specific aerosol model is characterized by the relative evidence, which is a measure of how well the aerosol model fits the measurement. Based on this quantity, the following solution estimates are used: the maximum solution estimate, corresponding to the aerosol model with the highest evidence, and the mean solution estimate, representing a linear combination of solutions weighted by their evidences.
We generated synthetic measurements for the EPIC instrument using as reference certain aerosol models derived from AERONET measurements at different sites. In the retrieval, the surface albedo was assumed to be known, and four aerosol models, included in the OPAC, GOCART, OMI, and MODIS databases, were evaluated. The goal of our numerical analysis was to quantify errors in the retrieval of aerosol optical thickness and layer height, when the true aerosol model is unknown. The following conclusions were drawn.

1.
For the first test example, the best results correspond to the OMI aerosol models, while for the second and third test examples, the best results correspond to the MODIS, followed by the OMI, aerosol models.

2.
The maximum solution estimate is less accurate than the mean solution estimate.

3.
On average, the best relative retrieval errors, corresponding to the mean solution estimate, are about 0.20 for both the aerosol optical thickness and layer height.

4.
The best aerosol model is the one with (i) a value of the asymmetry parameter and (ii) an angular variation of the phase function around the viewing direction that are close to the values corresponding to the reference aerosol model. These criteria can also be fulfilled by aerosol models that are climatologically incorrect.
Since the MODIS aerosol models are derived from AERONET measurements, it seems that in practice, the OMI aerosol models are the most suitable option. However, the retrieval errors in the mean solution estimate are large (on average, larger than 0.2). These can be improved if, instead of a single-angle measurement, a multi-angle measurement approach is used. In this case, the weighting factors for each aerosol component are included in the retrieval.

Conflicts of Interest:
There are no conflicts of interest.

Appendix A
In this appendix, we summarize the aerosol model sets used in this study. Table A1. Geometrical and optical properties of the aerosol models included in the OPAC database. The relative refractive index corresponds to the wavelength λ = 750 nm and the relative humidity U = 0.8.