A Proof-of-Concept Algorithm for the Retrieval of Total Column Amount of Trace Gases in a Multi-Dimensional Atmosphere

: An algorithm for the retrieval of total column amount of trace gases in a multi-dimensional atmosphere is designed. The algorithm uses (i) certain differential radiance models with internal and external closures as inversion models, (ii) the iteratively regularized Gauss–Newton method as a regularization tool, and (iii) the spherical harmonics discrete ordinate method (SHDOM) as linearized radiative transfer model. For efﬁciency reasons, SHDOM is equipped with a spectral acceleration approach that combines the correlated k-distribution method with the principal component analysis. The algorithm is used to retrieve the total column amount of nitrogen for two- and three-dimensional cloudy scenes. Although for three-dimensional geometries, the computational time is high, the main concepts of the algorithm are correct and the retrieval results are accurate.


Introduction
The retrieval of trace gas products from UV/VIS spectrometers is strongly affected by the presence of clouds. In general, there are three different cloud contributions: (i) the albedo effect, associated with the enhancement of reflectivity for cloudy scenes compared to cloud-free scenes; (ii) the shielding effect when a part of the trace gas column below the cloud is hidden; and (iii) the increase in absorption, related to multiple scattering inside clouds. The albedo and in-cloud absorption effects increase the detectability of trace gases at and above the cloud-top (in case of low clouds), while the shielding effect normally results in an underestimation of the trace gas column [1].
The majority of retrieval algorithms are based on one-dimensional radiative transfer modeling and use the so-called independent pixel approximation (IPA). The IPA for an atmosphere containing partial cloudiness means to compute separately the radiances for completely cloudy and clear skies, and then to express the partially cloudy radiance as a weighted sum of the separate radiances; the weighting factor being provided by the intensity weighted cloud fraction. The clouds within each pixel are assumed to be planeparallel and homogeneous in both horizontal and vertical directions. The main advantage of the IPA is its computational efficiency, as it requires the solution of only two independent one-dimensional radiative transfer problems. The disadvantage is that for cloudy scenes of small horizontal extent, the errors due to the three-dimensional effects may be very significant [2][3][4]. This is the case of the new generation of atmospheric composition UV-VIS-NIR sensors, such as Sentinel-5P, Sentinel-4, and Sentinel-5. In [5] the importance of three-dimensional effects on computations of air mass factors has been shown.
To analyze the impact of cloud inhomogeneity on the retrieval, multi-dimensional radiative transfer models are used. In particular, for a specific cloudy scene, a spectral signal is simulated by a multi-dimensional radiative transfer model and then included in a one-dimensional retrieval algorithm to derive for example, the total column amount of a trace gas. By comparing the retrieved total column with the true value, a bias due to cloud effects can be deduced, and possible correction strategies can be explored.
In this paper, we go one step further and present the main concepts of an algorithm for retrieving the total column amount of trace gases in a multi-dimensional atmosphere. To achieve this goal we combine inverse models and regularization tools from inversion theory with a fast linearized version of the spherical harmonic discrete ordinate method [6]. These theoretical results are summarized in Section 2, while in Section 3, the accuracy and efficiency of the algorithm, used to retrieve the total column amount of nitrogen dioxide (NO 2 ) for two-and three-dimensional cloudy scenes, are numerically investigated. We conclude our analysis with some recommendations for operational retrievals.

Retrieval Algorithm
For a multi-dimensional geometry, the spectral signal of an instrument that measures the radiance at the top of the atmosphere in direction Ω m at wavelength λ in the spectral interval [λ min , λ max ], is given by where, under the assumption that the distance from the top of the atmosphere to the instrument is large, is the signal integrated over the field of view (FOV) of the instrument at wavelength λ (see, e.g., in [7] for a general description), g(λ) the slit function (defined on all R), s the spectral bandwidth, h(r t ) the characteristic function that takes the value 1/A tm for r t ∈ S tm and 0 otherwise, S tm the footprint of the instrument on the top face S t , and A tm the area of the instrument footprint. Essentially, Equations (1) and (2) state that the mean radiance across the wavelength and the area S t at the top of the atmosphere is measured. The top of atmosphere radiance at the exiting point r t along the characteristic Ω m , I(λ, r t , Ω m ), is computed by using the integral form of the radiative transfer equation × e − s 0 s σ ext (λ,r b +s Ω m )ds ds, where I(λ, r b , Ω m ) is the radiance at the surface point r b , J(λ, r, Ω m ) the source function, and s 0 = |r t − r b | the distance between the entering and exiting points, respectively. Equation (3) can be derived by integrating the radiative transfer equation [8,9]. The first term corresponds to the upwelling radiance at the bottom of the atmosphere and attenuated exponentially as it propagates through the atmosphere. The second term refers to the gain due to sources. The source function J contains only the solar term and the multiple scattering term [10].
In the case of an atmosphere consisting of gas molecules and a cloud, we assume that (i) the optical coefficients of the gas molecules depend on the altitude level and the wavelength, while (ii) the optical coefficients of the cloud depend on the spatial coordinates but not on the wavelength. The second assumption is justified by the fact that a narrow spectral interval is considered in the retrieval. According to the additivity property of optical cross sections [11], in the case of N g gases, the extinction coefficient is computed as where σ cloud ext (r) is the extinction coefficient in the cloud (methods for computing σ cloud ext (r) are summarized in [12]), σ mol sct (λ, z) the molecular scattering coefficient due to Rayleigh scattering [13], σ gas absg (λ, z) the absorption coefficient of gas g, and (x, y, z) the Cartesian coordinates of point r. Considering a discretization of the atmosphere in N z levels, i.e., {z j } N z j=1 , the absorption coefficient on level z j is given by where C absg (λ, z j ) and n g (z j ) are the absorption cross section and the number density of gas j on level z j , respectively. Under the assumption that on a layer j bounded below by z j and above by z j+1 , the number density n g (z) varies linearly with respect to z, the partial and total column of gas g are defined, respectively, by and where ∆z j = z j+1 − z j . The retrieval algorithm is based on the assumption that cloud information are available from co-located imagers, as, for example, the Moderate Resolution Imaging Spectroradiometer (MODIS) on board the Terra and Aqua satellites and the Visible Infrared Imaging Radiometer Suite (VIIRS) on board the Suomi National Polar-Orbiting Partnership spacecraft. The main cloud products delivered by MODIS/VIIRS are the cloud mask (indicator function of the cloud field), cloud optical thickness, cloud-top height (cloud-top pressure), and effective radius of the size distribution. In the MODIS/VIIRS retrieval algorithm, the cloud geometrical thickness is derived under the assumption that the clouds are homogeneous in the vertical direction, or equivalently, that the liquid water content does not change with the altitude. Therefore, it is appropriate to use stochastic cloud models (in particular statistical cloud generators) which operate with the same assumption. For this reason, we will use stochastic cloud models relying on the vertical homogeneity assumption, and in particular, the two-dimensional broken cloud model of Alexandrov et al. [14].
In general, the design of a retrieval algorithm requires the specification of 1. an inverse model, 2.
a solution method for solving the inverse problem, and 3.
a linearized multi-dimensional radiative transfer model for computing the forward model and the Jacobian matrix at each iteration step.

Inverse Models
The retrieval of the total columns X g of N g gases can be performed by using two differential radiance models (see details in [15][16][17] and references therein): 1.
the Differential Radiance Model with Internal closure (DRMI), in which the measured and simulated differential spectral signals are fitted, and 2.
the Differential Radiance Model with External closure (DRME), in which the measured differential spectral signal and a simulated spectral signal with its smooth component extracted are fitted.
Actually, in DRMI we solve the nonlinear equation where R δ mes (λ k ) = ln I δ mes (λ k ) − P mes (λ k , c mes ) (9) and R sim (λ k , X) = ln I sim (λ k , X) − P sim (λ k , c sim (X)) (10) are the measured and simulated differential spectral signals, respectively, while in DRME we solve the nonlinear equation In Equations (8)-(11), X = [X 1 , . . . , X N g ] is the vector comprising the total columns of trace gases; X a an a priori estimate of X; I δ mes (λ k ) the (noisy) spectral signal measured by the instrument at wavelength λ k , where k = 1, . . . N λ , N λ the number of measurement wavelengths; I sim (λ k , X) the simulated spectral signal computed by a multi-dimensional radiative transfer model; S j (λ k , X a ) with j = 1, . . . , N s , the correction spectra including, for example, the polarization correction spectrum and the Ring spectrum, N s the number of correction spectra; and b j with j = 1, . . . , N s , the (wavelength independent) amplitudes of the correction spectra. The polynomials P mes (λ, c mes ), P sim (λ, c sim (X)), and P(λ, c) with degree N p − 1 account for the low-order spectral structure due to the scattering by clouds and aerosols, while the differential spectral signals R sim and R δ mes are mostly due to the gaseous absorption processes (see Ref. [18] for details). Specifically, in DRMI, the coefficients of the smoothing polynomials P mes (λ, c mes ) and P sim (λ, c sim (X)), respectively, are computed as the solutions of the least-squares problems and respectively. Thus, these coefficients, which are uniquely determined by ln I δ mes (λ k ) and ln I sim (λ k , X), are not a part of the retrieval, while in DRMI, the coefficients c = [c 0 , . . . , c N p −1 ] of the smoothing polynomial P(λ, c) are included in the retrieval.
Similar inversion models can be used for tropospheric column retrieval. These models together with the underlying retrieval algorithms are described in Appendix A.
To speed up the calculation, two approximate inverse models can be designed.

1.
In DRME, we assume the linearized model [19] ln I sim (λ k , X) = ln I sim (λ k , X a ) + (14) where the second term in the right-hand side of the equation describes the variations of ln I sim with respect to the total column amounts of trace gases X g with g = 1, . . . , N g , and the third term δR p (λ k ) comprises the contributions of all other atmospheric parameters. Approximating δR p (λ k ) by a smoothing polynomial, i.e., δR p (λ k ) ≈ P p (λ k , c p ), and putting P(λ k , c) − P p (λ k , c p ) → P(λ k , c), we are led to the linear equation which is equivalent with the Differential Optical Absorption Spectroscopy (DOAS) equation [18]. This equivalence is proved in Appendix B.

2.
We adopt an approximate inverse model in which we compute the spectral signal by an one-dimensional radiative transfer model, and (similar to the Ring and polarization correction spectra) introduce a correction spectrum S 3D (λ k , X a ) that accounts on threedimensional effects. To define the correction spectrum, we take into account that, for example, in DRME, we can write where I 3D sim (λ k , X) and I 1D sim (λ k , X), are the spectral signals computed by a three-and an one-dimensional radiative transfer model, respectively, and is a relative correction spectrum. In deriving Equation (16), we used the first-order Taylor approximation ln(1 + δI 3D sim ) ≈ δI 3D sim for δI 3D sim 1, and the decomposition P(λ k , c) = P 3D (λ k , c 3D (X) + P 1D (λ k , c 1D ), in which, the coefficients c 3D (X) are uniquely determined by δI 3D sim (λ k , X) (see below), while the coefficients c 1D are unknown (as the coefficients c are). In view of Equation (16), we assume that in the retrieval, the differential spectral term is the (differential) correction spectrum due to three-dimensional effects, P 3D (λ k , c 3D (X a )) the associated smoothing polynomial, and b 3D the amplitude of the correction spectrum (to be included in the retrieval). From Equation (18), we see that the correction spectrum S 3D (λ k , X a ) reproduces the differential structure of the relative correction spectrum δI 3D sim (λ k , X a ).

Solution Method
The nonlinear Equations (8) and (11) can be written in compact form as where F : R n → R m is the forward model, x ∈ R n the state vector, and y δ ∈ R m the noisy data vector. In DRMI and DRME, the state vectors are x = [X, b] T and x = [X, b, c] T , respectively, where b = [b 1 , . . . , b N s ] is the vector comprising the amplitudes of the correction spectra b j , j = 1, . . . , N s . Furthermore, the number of data points is m = N λ , and the measured differential spectral signals affected by noise R δ mes (λ k ) for k = 1, . . . N λ are the components of the noisy data vector y δ .
Because the nonlinear Equation (19) is usually ill-posed, a regularization method, as, for example, the iteratively regularized Gauss-Newton method, is used to compute a solution with physical meaning [15,20,21]. In this approach, at the iteration step k, we consider a linearization of F(x) around the current iterate x δ k and solve the linearized equation by means of Tikhonov regularization with the penalty term α k ||L(x − x a )|| 2 , where α k is the regularization parameter at the iteration step k, L the regularization matrix, and x a the a priori state vector, the best beforehand estimate of the solution. Thus, the new iterate minimizing the function is given by where is the regularized generalized inverse, the linearized data vector at the iteration step k. The following peculiarities of the iteratively regularized Gauss-Newton method deserve to be mentioned [15].

1.
In contrast to the method of Tikhonov regularization [22], the regularization parameter is not constant during the iterative process. Instead, the regularization parameters α k are the terms of a decreasing (geometric) sequence, i.e., α k = qα k−1 with q < 1. In this way, the amount of regularization is gradually decreased during the iterative process.

2.
For iterative regularization methods, the number of iteration steps k plays the role of the regularization parameter, and the iterative process is stopped after an appropriate number of steps k in order to avoid an uncontrolled expansion of the errors in the data. The stopping rule used in this study is the discrepancy principle [23], according to which, the iterative process is terminated after k steps such that where r δ k = y δ − F(x δ k ) is the residual vector at x δ k , τ > 1 a control parameter, and ∆ the noise level (an upper bound for the noise in the data). Because in practice the noise level cannot be a priori estimated, we adopt a practical approach based on the observation that the residual ||r δ k || decreases during the iterative process and attains a plateau at approximately ∆. Thus, if the nonlinear residuals ||r δ k || converge to ||r δ ∞ || within a prescribed tolerance, we use the estimate ∆ = ||r δ ∞ ||.

3.
The regularization matrix is chosen as a diagonal matrix, that is, the penalty term for DRMI and for DRME. Here, the scalars {w g } give the weight of each component of the state vector into the regularization matrix.
As compared to the method of Tikhonov regularization, the iteratively regularized Gauss-Newton method is less sensitive to overestimations of the regularization parameter, but requires more iteration steps to achieve convergence.
We conclude this section by mentioning that in the case of DRME, one iteration step of the iteratively regularized Gauss-Newton method corresponds to the linear problem described by Equations (11) and (14).

Linearized Radiative Transfer Model
In order to solve the nonlinear Equation (19) by means of the iteratively Gauss-Newton method, the forward model F(x) and the Jacobian matrix K(x) have to be calculated at each iteration step. From Equation (3) we infer that the computation of F(x) requires the computation of the radiance I(λ, r b , Ω) and the source function J(λ, r, Ω) at all surface and grid points, respectively, while the computation of K(x) requires the computation of the partial derivatives of the extinction field ∂σ ext (·, r)/∂X g , the surface radiance ∂I(·, r b , ·)/∂X g , and the source function ∂J(·, r, ·)/∂X g .
For derivative calculations, we employ the same assumption as in the standard DOAS model, that is, the (discrete) number density profile {n g,j } N z j=1 , where n g,j = n g (z j ), is assumed to be a scaled version of an a priori profile {n ag,j } N z j=1 . Consequently, if s g is the scale factor of gas g, i.e., n g,j = s g n ag,j , for all j = 1, . . . , N z , the following equalities, with σ gas absg,j (λ) = σ gas absg (λ, z j ), are valid. Thus, for any value of the total column X g , the absorption coefficient is updated as and the corresponding top of atmosphere radiance I(λ, r t , Ω m ) is determined by means of a radiative transfer calculation. In general, for computing the derivative of a spectral quantity F(λ) with respect to the total column X g , we regard F(λ) as a function of σ gas absg,j (λ), i.e., F = F(σ gas absg,1 , . . . , σ gas absg,N z ), and assume that the partial derivatives ∂F/∂σ gas absg,j , j = 1, . . . , N z , are known. Taking into account the one-to-one correspondence (28), we are led to the computational formula Thus, the radiative transfer model should be linearized with respect to the absorption coefficient, that is, the linearized radiative transfer model should deliver the partial derivatives ∂F/∂σ gas absg,j for j = 1, . . . , N z and g = 1, . . . , N g . The multi-dimensional radiative transfer model used in our retrieval algorithm is the spherical harmonic discrete ordinate method (SHDOM) [6,24]. Linearizations of SHDOM by means of a forward and a forward-adjoint approach have been discussed in [25]. These can be summarized as follows.

1.
The linearized forward approach relies on an analytical computation of the derivatives. The method is accurate and has the advantage that no assumptions rather than those of the forward model have to be made. However, the method is time consuming and memory demanding when the number of parameters to be retrieved is large. The reason is that not only the source function has to be stored as a spherical harmonic series at each grid point, but also its derivatives with respect to the atmospheric parameters of interest.

2.
The linearized forward-adjoint approach relies on the application of the adjoint radiative transfer theory. The method requires less storage for derivatives calculation, is much faster, but relatively less accurate. The main reason for this lower accuracy is that different interpolation schemes are used for radiance and derivative calculations.
Because in our application the number of retrieved quantities is relatively small, we decided in the favor of the linearized forward approach. Specifically, the partial derivatives are computed in an analytical manner by using the chain rule in a sort of backward procedure starting with the output radiance and following the chain of dependencies back to the inputs to the model.
In [26], a simplified approach, leading to a considerable reduction of the computation time, was proposed. The idea is to neglect in Equation (3) the contributions of the partial derivatives of the surface radiance ∂I(·, r b , ·)/∂X g and the source function ∂J(·, r, ·)/∂X g . In other words, when computing the partial derivative of the top of atmosphere radiance ∂I(·, r t , ·)/∂X g , only the contributions of the partial derivatives of the extinction field ∂σ ext (·, r)/∂X g are taken into account.
To speed up the computations, the linearized SHDOM is equipped with a spectral acceleration approach that combines the correlated k-distribution method with dimensionality reduction techniques. This approach was applied for computing the spectral signal in [27], and can be extended to derivative calculations as follows.

1.
Correlated k-distribution method. Consider a discretization of the spectral interval of W equally spaced wavelengths with the discretization step ∆λ, and assume that the transmission within a spectral interval depends only on the distribution of the gas absorption coefficient σ gas abs (λ) within the spectral interval [28]. Letting F = F(σ gas absk ) be the cumulative density function of σ gas abs (λ) in the spectral interval [λ k − ∆λ/2, λ k + ∆λ/2], σ gas absk (F) the inverse distribution function, and {F l , l } N q l=1 a set of N q quadrature points and weights in the interval [0, 1], the spectral signal (1) and its partial derivative with respect to the total column are computed as where λ w = λ k , ω w = ∆λ l , and σ gas abs (λ w ) = σ gas absk (F l ) for w = l + (k − 1)N q , k = 1, . . . , W, l = 1, . . . , N q , and W = W N q . Thus, in the framework of the correlated kdistribution method, W monchromatic radiative transfer calculations are required for computing I FOV (λ w ) and ∂I FOV (λ w )/∂X g , and so, I(λ) and ∂I (λ)/∂X g . A further acceleration can be achieved when I FOV (λ w ) and ∂I FOV (λ w )/∂X g are computed by using dimensionality reduction techniques, as for example, the principal component analysis.

2.
Principal component analysis. At wavelength λ w , the integrated signal I FOV (λ w ) is related to the integrated signal calculated by a simplified (approximate) radiative transfer model I s FOV (λ w ) through the relation i.e., The correction factor f (λ w ) in Equation (32) is actually the quantity that is calculated by means of the principal component analysis. To summarize this approach, we assume that for each wavelength λ w , the spectral variability of the optical parameters can be described by a vector x w ∈ R N , defined by where σ gas absk and σ mol sctk are the optical coefficients in the kth level, N = 2N z + 1, and N z is the number of altitude levels. Denoting by x = (1/W) ∑ W w=1 x w the sample mean of the data, the aim is to find an M-dimensional subspace (M < N) spanned by a set of linear independent vectors {a k } M k=1 , such that the centered data x w − x belong to this subspace, i.e., Furthermore, approximating the correction factor f (x w ) by a second-order Taylor expansion around x, and the gradient and the Hessian of f by central differences, we are led to the computational formula To compute the derivative of the integrated signal with respect to the total column X g , we may use the principal component analysis to calculate the derivative correction factor f X g (λ w ), defined by [29] f X g (λ w ) = ln in which case, we have Alternatively, taking the derivative of Equation (33) with respect to X g , i.e., we may use the principal component analysis to calculate the derivative correction factor f X g (λ w ), defined by

Numerical Simulations
In this section, we analyze the accuracy and efficiency of the algorithms for retrieving total column amount of NO 2 under cloudy conditions. The cloudy scenes considered in our simulations are similar to those described in [27]. Specifically, the geometrical and optical parameters are chosen as follows.

1.
The domain of analysis is a rectangular prism of lengths L x = L y = 15 km and L z = 50 km. The discretization steps along the horizontal directions are ∆x = ∆y = 0.5 km. Along the vertical direction, the atmosphere between 0 and 50 km is discretized with a step of 0.5 km between 0 and 3 km, 0.1 km between 3 and 4 km, 0.5 km between 4 and 10 km, 1.0 km between 10 and 14 km, 2 km between 14 and 30 km, and 5 km between 30 and 50 km.

2.
A homogeneous cloud is placed between 3 and 4 km. The cloud extinction field is given by σ cloud ext (x, y) = σ 0 f (x, y), where σ 0 = 6 km −1 and f (x, y) is the indicator function (note that f (x, y) takes the values 1 and 0 inside and outside the cloud, respectively). The cloud phase function is a Henyey-Greenstein phase function [30] with the asymmetry parameter g = 0.8, and the cloud single-scattering albedo is ω cloud = 0.99. Eight cloudy scenes are generated by a two-dimensional broken cloud model [14] with a cloud fraction of about 0.4. The extinction field σ cloud ext (x, y) is smoothed at the boundary of a cloudy region in order to avoid abrupt changes in the horizontal plane. The indicator functions corresponding to the eight cloudy scenes are illustrated in Figure 1a-h. For two-dimensional geometries, slices at y = 7 km are selected from the cloud fields 1, 2, 3, 5, 6, and 8. The corresponding indicator functions are shown in Figure 2. Note that the slices corresponding to the cloudy scenes 4 and 7 are similar to the slice corresponding to the cloudy scene 2, and are therefore omitted. 3.

4.
A Lambertian reflecting surface with the surface albedo A = 0.2 is considered. 5.
The footprint of the detector is a square of length 2a = 10∆x centered at x 0 = y 0 = L x /2, and z 0 = L z , and a wavelength-dependent slit function corresponding to the TROPOMI instrument is assumed. 6.
In addition to the scattering and absorption by the cloud, molecular Rayleigh scattering and the absorption by NO 2 , ozone (O 3 ), oxygen dimer (O 4 ), and water vapor (H 2 O) are considered. The measurement spectral grid roughly resembles the TROPOMI's spectral resolution and consists of 119 spectral points between 425 nm and 450 nm.
The process of generating synthetic measurement spectra is organized as follows.

1.
For a clean scenario, we use the a priori partial column profile of NO 2 [17] illustrated in Figure 3 to generate the true (exact) partial column profile. In this regard, denoting the a priori partial columns of gas g by x ag,j , we choose the true partial columns as x † g,j = s g x ag,j , j = 1, . . . , N z with s NO 2 = 2.0 and The true total column X † g of gas g is then computed as , we generate the simulated spectral signal I sim (λ k , X † ) by means of SHDOM.

3.
For cubic smoothing polynomials, i.e., N p = 4, we determine the coefficients c sim (X † ) of the polynomial P sim (λ, c sim (X † )) as the solutions of the least-squares problem (13).

4.
We compute the noisy spectral signal as I δ sim (λ k , X † ) = I sim (λ k , X † ) + δ k , where the measurement errors δ k are assumed to be independent Gaussian random variables with zero mean and standard deviation σ k = I sim (λ k , X † )/SNR, where SNR is the signal-to-noise ratio. It should be pointed out that in view of the approximation the error in ln I δ sim (λ k , X † ) is δ ln k = δ k /I sim (λ k , X † ) yielding σ ln k = 1/SNR for all k. In other words, the measurement error vector is white noise with the covariance matrix (1/SNR 2 )I m , where I m is the identity matrix. Because in our simulations we are mainly interested in multi-dimensional effects, we take SNR = 10 4 , that is, we assume an almost perfect instrument and neglect the forward model errors.

5.
We include the Ring correction spectrum S R (λ k , X a ) illustrated in Figure 4 in the retrieval, and choose the a priori and true Ring amplitudes as b aR = 5 × 10 −2 and b † R = 2b aR , respectively. Note that the inelastic scattering is described by a first-order Rayleigh scattering model, i.e., by applying a first-order iteration scheme to the one-dimensional radiative transfer equation for inelastic scattering [31]. 6.
For DRMI, we compute the measured differential spectral signal as while for DRME, we choose c † = 0.5 c sim (X † ) and compute the measured differential spectral signal as SHDOM is run with a solution accuracy of 10 −4 and by using 1. an adaptive grid with a splitting accuracy of 10 −4 , 2.
the principal component analysis with M = 4, and the derivative correction factor f X g (λ w ) as in Equation (39), 3. periodic boundary conditions, 4.
the delta-M approximation [32], and the untruncated phase function single-scattering solution (TMS correction of Nakajima and Tanaka [33].
Some parameters characterizing the iteratively regularized Gauss-Newton algorithm are chosen as follows.

1.
The regularization parameters α k are the terms of a geometric sequence with ratio q = 0.1 and initial value α 0 = 10 −3 . Thus, at the first iteration step, the regularization parameter is α 1 = 10 −4 .

2.
The weighting factors specifying the contribution of each component of the state vector into the regularization matrix are w NO 2 = 1.0, w O 3 = w O 4 = w H 2 O = 10 2 , w R = 10 −3 , and w cp = 1.0 for all p = 0, . . . , N p − 1. By this choice, the total columns of the auxiliary gases are stronger constrained to the a priori than the total column of NO 2 and the Ring correction spectrum. 3.
The control parameter τ in the discrepancy principle Equation (24) Figure 1. Indicator function f (x i , y j ) with x i = i∆x and y j = j∆y for the 8 cloudy scenes (cases (a-h)); i and j refer to the pixel index in the xy plane, while each pixel corresponds to a 500 m × 500 m area.

Test Example 1
In the first test example we consider a two-dimensional geometry and perform the retrieval by using the DRMI and DRME inverse models. It may in general be noted that 1.
the inverse problem is severely ill-posed (for the DRMI model, the condition number of the Jacobian matrix at the initial guess is κ DRMI = 2.92 × 10 7 ) and 2.
there is a strong correlation between the NO 2 and Ring effect signatures (when the Ring correction spectrum is not included in the retrieval, the condition number is κ DRMI = 2.16 × 10 4 ; thus, κ DRMI decreases by three order of magnitude). The histories of the residuals with respect to the iteration index are shown in Figure 5. The following conclusions can be drawn.

1.
At the initial guess, the residual corresponding to DRMI is much smaller than that corresponding to DRME. This occurs because the discrepancies between the differential spectra are usually small.

2.
In DRMI, the residual decreases very fast at the first iteration step and then more steadily, while in DRME, the residual gradually decreases.
The relative errors and the computation times for DRMI and DRME are given in the first two columns of Tables 1 and 2, respectively. Because the SNR is high, in both cases, the relative errors are smaller than 0.01%. On average, the computation times for DRMI and DRME are of about 40 and 35 CPU minutes, respectively. The higher computation time for DRMI is due to the fact that, according to Equation (13), the partial derivative of c sim (X) with respect to X needs also to be computed.
In the second two columns of Table 1, we list the relative errors and the computation times for DRMI when the partial derivative of the top of atmosphere radiance ∂I(·, r t , ·)/∂X g is computed by taking into account only the contributions of the partial derivatives of the extinction field ∂σ ext (·, r)/∂X g . The reduction of the computation time by about 25% does not justify the large relative errors. Thus, the simplified approach suggested in [26] for the retrieval of cloud extinction field, seems not to work in the case of total column trace gas retrieval.
In the second two columns of Table 2, we list the relative errors and the computation times for DRME when the iterative process is stopped after one iteration, that is, when the linear model (14) is considered. A reasonable accuracy can be obtained when the regularization parameter α is optimally chosen. Some hints regarding the selection of the optimal value of the regularization parameter α opt can be found by analyzing the history of the residual curve in Figure 5. From this curve, we infer that for α = 10 −4 the residual is of about 10 −5 , for α = 10 −5 of about 10 −6 , and for α = 10 −6 of about 10 −8 . Therefore, by choosing α in the range 10 −6 , . . . , 10 −5 we expect good reconstruction errors. However, as it can be seen in Table 2, α opt depends on the cloud field. Moreover, for some cloudy scenes, α opt should be precisely determined (for example, in the case of the cloudy scene 8, the relative errors are 8.30 × 10 −2 , 1.87 × 10 −2 , and 9.02 × 10 −2 when α takes the values 1 × 10 −5 , 2 × 10 −5 , and 3 × 10 −5 , respectively). In this regard, it should be mentioned that although less inefficient, the iteratively regularized Gauss-Newton method is the best option, i.e., when the initial value of regularization parameter α 0 is chosen sufficiently large, accurate results will always be found in comparatively small number of iteration steps (which depends on the initial value α 0 and the ratio q).

Test Example 2
For the cloudy scenes considered in Figure 1, we illustrate, in Table 3, the relative errors and the computation times for DRME with several and one iteration steps. The solution accuracies are comparable to those corresponding to two-dimensional geometries. However, the computation time is extremely high; on average, it is of about 14 h and 15 min for DRME with several iteration steps, and 2 h and 30 min for DRME with one iteration step.
By taking a closer look at the cloudy scenes depicted in Figure 1, we see that the footprint of the detector is almost cloud-free. In this regard, we may try to retrieve the total column amount by using an one-dimensional radiative transfer model for a cloud-free atmosphere. The results given in the first two columns of Tables 4 and 5 show that in this case, the relative error is about 15%. In these simulations, the spectral signal measured by the instrument is computed by a three-dimensional radiative transfer model, while in the retrieval, the simulated spectral signal is computed by an one-dimensional radiative transfer model. The relative errors can be approximately halved when the differential correction spectrum due to three-dimensional effects (18) is included in the inverse model and its amplitude is retrieved. This is shown in the second two columns of Tables 4 and 5. Note that the computation time given in these tables does not take into account the time for computing the correction spectrum by a three-dimensional radiative transfer model. As shown in [27], this computation time is of about 1 h and 20 min, so that roughly speaking, this is the time for retrieving the total column amount by an one-dimensional inverse model that accounts on three-dimensional effects. The correction spectra included in the inverse model are shown in Figure 6, and it is apparent that they depend on the cloud field (some similitude can be observed for the cloudy scenes 2 and 7).

Discussion
The main concepts of an algorithm for the retrieval of the total column amount of trace gases in a multi-dimensional atmosphere have been presented. The algorithm uses (i) as inverse models, certain differential radiance models with internal and external closures; (ii) as regularization tool, the iteratively regularized Gauss-Newton method; and (iii) as linearized radiative transfer model, the spherical harmonics discrete ordinate method with a spectral acceleration approach that combines the correlated k-distribution method with the principal component analysis.
The algorithm has been applied to the retrieval of the total column amount of NO 2 under cloudy conditions. For two-dimensional geometries, we come to the following findings.

1.
The differential radiance models with internal and external closures yield accurate results with reasonable computation times (of about 35-40 min).

2.
An inverse model based on an approximate computation of the partial derivative leads to a reduction of the computation time by about 25%, but to large relative errors.

3.
Provided that the regularization parameter is optimally chosen, reasonable accurate results with a computation time of about 6 min can be obtained when the iterative process, corresponding to the differential radiance model with external closure, is stopped after one iteration step. The fact that the optimal value of the regularization parameter depends on the cloudy scene, makes it more difficult to apply this one-step retrieval algorithm, or equivalently, DOAS-type models.
For three-dimensional geometries, we come to the following results.

1.
Although accurate, the retrievals based on the differential radiance model with external closures are inefficient; the computation time is of about 14 h and 15 min for a full-step retrieval algorithm, and 2 h and 30 min for an one-step retrieval algorithm. The one-step retrieval algorithm is less accurate than in the case of two-dimensional geometries, but the results are still acceptable.

2.
The application of a fast one-dimensional radiative transfer model to retrieve the NO 2 column amount for a three-dimensional cloudy scene leads to relative errors up to 15%. These errors can be reduced when a differential correction spectrum due to three-dimensional effects is included in the retrieval.
The main conclusion of our analysis is that although for three-dimensional geometries the computational time is high, the main concepts of the algorithm are correct and the retrieval results are accurate.
For operational retrievals, we recommend 1.
to construct a database for the spectral signal I sim (λ k , X a ) and its derivatives ∂ ln I sim (λ k , X a )/∂X g , and to use these spectra in an one-step, three-dimensional retrieval algorithm, or 2.
to construct a database for the differential correction spectra accounting on threedimensional effects S 3D (λ k , X a ), and to use these spectra in a full-step, one-dimensional retrieval algorithm.
The databases should be built up for different cloud scenarios, surface albedo, and solar and satellite geometries. The main problem which has to be solved is the design of a classification algorithm for broken clouds. One option is to use a nearest neighbor classifier based on the two-dimensional principal component analysis. Another option is to use a classification algorithm in which only the distributions of the cloud fields in the two-dimensional domains around the azimuth planes of the solar and viewing directions are of interest. The width of these domains corresponds to the instrument footprint. Note that these domains are also relevant in the tilted independent pixel approximation [34]: the direct solar beam is computed at all grid points which belong to the vertical columns inside the instrument footprint and for all characteristics within the entrance (solar direction) domain, while the measured radiance is computed by integrating the source function along all characteristics within the exit (viewing direction) domain. This will be the topic of a forthcoming paper.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A
In this appendix we describe two inversion models for tropospheric column retrieval. Assuming that the troposphere extends from level 1 to N t , and the stratosphere from level N t to N z , the tropospheric and stratospheric total columns are given, respectively, by and Obviously, we have X g = X tg + X sg . The retrieval of the tropospheric column of a gas g (specifically, NO 2 ) is performed under the assumption that we have some a priori knowledge about the stratospheric column. In other words, we suppose that X sg can be approximated by X sg ≈ X sg , where X sg is delivered for example, by the reference sector method. In this method, the stratospheric NO 2 columns is estimated from measurements over the remote regions, under the assumptions of low longitudinal variations of stratospheric NO 2 , and negligible tropospheric NO 2 [35][36][37]. Denoting by X −g the set of all total columns excepting X g , i.e., X = {X g } ∪ X −g , a tropospheric column retrieval algorithm can be summarized as follows [17].

1.
Solve the nonlinear Equation (8) of the DRMI inversion model for x = [X, b] T , or the nonlinear Equation (11) of the DRME inversion model for x = [X, b, c] T .

2.
With X sg delivered by the reference sector method, and X −g and b determined at Step 1, solve the nonlinear equation of the DRMI inversion model for x = [X tg ], or the nonlinear equation R δ mes (λ k ) = ln I sim (λ k , X tg , X sg , of the DRME inversion model for x = [X tg , c] T .

Appendix B
In this appendix we present a simplified derivation of the standard DOAS equation. For a more rigorous treatment, we refer to the work in [19].
We start with the linearized model (14) written as ln I sim (λ k , X) = N g ∑ g=1 X g W g (λ k , X a ) + ln I sim (λ k , X a ) − N g ∑ g=1 X ag W g (λ k , X a ) + P p (λ k , c p ), where W g (λ k , X a ) = ∂ ln I sim ∂X g (λ k , X a ).
In general, ln I sim (λ k , 0) can be approximated by a polynomial, but ln I sim (λ k , 0) not. However, if ε lin (λ k , −X a ) is negligible, then we have ln I sim (λ k , 0) ≈ ln I sim (λ k , 0); thus, ln I sim (λ k , 0) can be approximated by a polynomial, which can be included in P p (λ k , c p ). Consequently, we obtain ln I sim (λ k , X) = N g ∑ g=1 X g W g (λ k , X a ) + P p (λ k , c p ), so that by putting P(λ k , c) − P p (λ k , c p ) → P(λ k , c) we are led to the following representation for the DRME model (11), In view of the computational formula (cf. Equation (29)) W g (λ k , X a ) = ∂ ln I sim ∂X g (λ k , X a ) = 1 X ag N z ∑ j=1 C absg (λ k , z j )n ag,j ∂ ln I sim ∂σ gas absg,j (λ k , X a ) (A14) we define C absg (λ k ) = ∑ N z j=1 C absg (λ k , z j )n ag,j ∂ ln I sim ∂σ gas absg,j (λ k , X a ) ∑ N z j=1 n ag,j ∂ ln I sim ∂σ gas absg,j and rewrite Equation (A13) as where is the air mass factor. Equation (A16) represents the standard DOAS equation. Note that if the computation of the partial derivatives in Equation (A15) is an expensive process, we may define C absg (λ k ) by the relation C absg (λ k ) = ∑ N z j=1 C absg (λ k , z j )n ag,j ∑ N z j=1 n ag,j . (A18)