Insight into Construction of Tikhonov-Type Regularization for Atmospheric Retrievals

In atmospheric science we are confronted with inverse problems arising in applications associated with retrievals of geophysical parameters. A nonlinear mapping from geophysical quantities (e.g., atmospheric properties) to spectral measurements can be represented by a forward model. An inversion often suffers from the lack of stability and its stabilization introduced by proper approaches, however, can be treated with sufficient generality. In principle, regularization can enforce uniqueness of the solution when additional information is incorporated into the inversion process. In this paper, we analyze different forms of the regularization matrix L in the framework of Tikhonov regularization: the identity matrix L0, discrete approximations of the first and second order derivative operators L1 and L2, respectively, and the Cholesky factor of the a priori profile covariance matrix LC. Each form of L has its intrinsic pro/cons and thus may lead to different performance of inverse algorithms. An extensive comparison of different matrices is conducted with two applications using synthetic data from airborne and satellite sensors: retrieving atmospheric temperature profiles from microwave spectral measurements, and deriving aerosol properties from near infrared spectral measurements. The regularized solution obtained with L0 possesses a reasonable magnitude, but its smoothness is not always assured. The retrieval using L1 and L2 produces a solution in favor of the smoothness, and the impact of the a priori knowledge is less critical on the retrieval using L1. The retrieval performance of LC is affected by the accuracy of the a priori knowledge.


Introduction
An inverse problem is the process of using a given set of data to infer the parameters in a model that can characterize the given data. Rank-deficiency (insufficient information contained in the given data) and ill-posedness (no solution or a non-unique solution, and/or an unstable solution procedure) are often encountered when dealing with practical inverse problems in science and engineering, e.g., astronomy, geophysics, remote sensing, signal processing, etc. Both situations are characterized by a function which is ill-conditioned, i.e., the condition number of this function is very large. This leads to the primary difficulty that these inverse problems are practically underdetermined. Therefore, it is essential to incorporate additional information about the desired solution in order to stabilize the problem-solving process and to obtain a meaningful solution.
where the kernel K represents the underlying model that describes the relationship between g and f .
To solve such problems, discretization is often needed to obtain a numerical solution. The so-called Picard condition [1,2] puts requirements on the right-hand term g so that the solution f is square integrable. As a matter of fact, the solution f is extremely sensitive to perturbations of the right-hand term g, i.e., even a very small random perturbation to g may result in a significant deviation in f , and in this sense the inverse problem of inferring f from g is ill-posed. In atmospheric science, solving an inverse problem is a process of estimating geophysical quantities based on a given set of spectral measurements recorded by an instrument on various platforms [3][4][5]. A retrieval is often done by iteratively adjusting atmospheric state parameters associated with radiative transfer to mimic those spectral measurements; however, the atmospheric radiative transfer modeling can be a challenging task, since a number of assumptions have to be made and the computational process needs to be optimized using different numerical approaches [6][7][8].
The radiative transfer model cannot be inverted analytically. Instead, the inverse problem is an exercise in optimization, i.e., minimization of an objective function. A physically meaningful solution could be difficult to find because the governing equations tend to be mathematically ill-posed. The regularization procedure aims to stabilize the solution-finding process by introducing additional information into the objective function. In many numerical regularization methods, this is achieved by adding a term with constraints (so-called penalty term), such as restrictions for smoothness or bounds on the normed vector space (see Section 2.2 for mathematical expressions). Modern and sophisticated regularization approaches for computing regularized solutions to inverse problems in geoscience can be found in a number of textbooks, e.g., [1,[9][10][11][12].
The majority of deployed retrieval algorithms use classical methods e.g., Tikhonov regularization (TR) approaches [13,14], and statistical methods based on Bayes' theorem [15], e.g., the optimal estimation method (OEM) [5]. The principle difference between OEM and TR is the formulation of the penalty term: OEM introduces the statistical information of natural variability, whereas TR enforces the smoothness and/or impacts the magnitude of the solution. Eriksson [16] compared both methods theoretically and discovered no fundamental differences in the retrieval output. However, to find an optimal regularized solution to the underlying problem, the penalty term needs to be properly constructed.
The regularization used in atmospheric retrievals (of trace gas concentrations, temperature, etc.) reduces the noise effect with a cost in the reduction in effective resolution in terms of degrees of freedom for signal. TR-based methods need to deal with the selection of a proper regularization parameter which is frequently done on an ad hoc basis. Typical parameter selection approaches were outlined in [17,18]. Recently, new methods for satellite retrievals were proposed in [19,20]. Additionally, Koner and Drummond [21] analyzed the impact of the regularization strength using the regularized trust-region method.
The major objective of this study is to address the impact of the penalty term on the retrieval outcome using TR-based methods and to explore appropriate strategies in coping with different retrieval problems. This topic has not been heavily discussed, as in atmospheric remote sensing OEM has become a standard retrieval method by taking into account an estimate of the covariances associated with the spectral measurements and parameters of interest. Few studies focused on the choice of regularization matrix from a practical point of view. This study is accordingly intended to offer potential solutions to these problems. This paper is based on an analysis of the numerical performance of the TR-based retrieval methods with applications to the estimation of atmospheric properties. The main focus is on the regularization matrix and its impact on the retrieval outcome. The remainder of this paper is structured as follows: Section 2 presents theoretical concepts addressing forward and inverse problems in atmospheric remote sensing, for a TR-based inversion framework various designs of the regularization matrix are specifically described. In Section 3, the numerical performance of solving ill-posed inverse problems using different regularization matrices is investigated based on retrievals of temperature profiles and aerosol properties from an airborne microwave radiometer and a satellite hyperspectral sensor, respectively. Section 4 concludes with a summary of our findings.

Theory
This section deals with the theoretical concepts underlying atmospheric inversion. As an application of physics-based inversion algorithms, a retrieval of atmospheric state typically requires inverting the radiative transfer equation to derive geophysical variables of interest from spectral measurements.

Forward Problem
The attenuation of radiation through the atmosphere dI is proportional to the incoming radiation, the differential path distance ds in the direction Ω, and the composition of the atmospheric medium. The radiation detected by a sensor is described by the theory of radiative transfer. Generally, the radiative transfer equation can be formulated as: where I(r, Ω) is the diffuse radiance at position r and in the direction Ω = (µ, θ) characterized by the cosine of the zenith angle µ and the azimuth angle θ, σ ext (r) is the extinction coefficient, J(r, Ω) is the source function. For practical purposes, the radiative transfer can be treated with different techniques in different spectral regions: it is justified to neglect the thermal emission across the ultraviolet (UV), visible (VIS), and near infrared (NIR), while the scattering processes are usually neglected or simplified in the IR/microwave. Extensive discussions of the radiative transfer theory can be found in many textbooks, e.g., [22][23][24][25][26].

Radiative Transfer in the UV-VIS-NIR
A common way to compute the diffuse radiance in a spherical shell atmosphere is the pseudo-spherical model [27] that treats the multiple scattering in a plane-parallel atmosphere and the solar beam attenuation in a curved atmosphere. Thus, we have σ ext (r) = σ ext (r) and I(r, Ω) = I(r, Ω), Equation (2) is then rewritten as where the single and multiple scattering source terms are computed by and J ms = σ sct (r) 4π 4π p(r, Ω, Ω )I(r, Ω ) dΩ , respectively. σ ext and σ sct are the extinction and scattering coefficients (of gas molecules and aerosols), respectively; F sun stands for the solar irradiance at the top of atmosphere (TOA) before attenuation by the atmospheric medium; the incident solar direction is defined as Ω sun = (−µ sun , θ sun ) with µ sun > 0; τ sun ext (r, r TOA ) is the solar optical depth between point r and r TOA ; p(r, Ω, Ω ) is the effective scattering phase function with Ω and Ω being unit vector specifying the incident and scattering directions, respectively. Note that the thermal emission is neglected in this spectral domain. The dependence of p on the angle Θ between the incident and scattering directions allows for an expansion in terms of normalized Legendre functions P n : where cos Θ = Ω · Ω . p needs to take into account Rayleigh scattering by molecules (small particles) and Mie scattering by aerosols (large particles). Aerosol microphysical properties need to be taken into account and can be extracted from an aerosol model with a set of aerosol types. In practice, the discrete ordinate approach [28][29][30] has been widely employed to determine the solution to the radiative transfer equation. In particular, the range of zenith angles µ is divided to a number of discrete angular intervals, and I and p are replaced by a discrete set of direction vectors. By using the matrix exponential technique in [31] or the eigenvalue approach in [30], the integral in Equation (5) becomes then a sum, while the radiative transfer equation is converted into a set of coupled linear ordinary differential equations that are later equipped with corresponding boundary conditions.

Radiative Transfer in the IR/Microwave
In the IR and microwave spectral range, assuming a non-scattering medium in the state of local thermodynamic equilibrium (LTE), the radiance I at frequency f received by an instrument at position r can be represented by the Schwarzschild equation: where I( f ; r s ) is the background contribution at position r s , and B( f , T(s)) is the Planck function at temperature T. T stands for the monochromatic transmission and is computed via A simulation of high-resolution spectroscopic measurements usually requires line-by-line computations. The absorption coefficient σ abs is determined by the product of molecular absorption cross sections and number densities summed over the molecules; the absorption cross section k for each line l is obtained by summing over the contributions from many lines: where S l (T), g, andf l represent the line strength at temperature T, a normalized line shape function, and central frequency of transition l, respectively. The monochromatic frequency/wavenumber grid point spacing related to the half-widths of the spectral lines is commonly very fine so that a large number of spectral grid points can be needed for an accurate modeling of the spectrum. Furthermore, the Voigt line profile is conventionally used to simulate the combined effect of the pressure and Doppler broadening mechanisms throughout the atmosphere. Solving the Voigt and complex error function accurately and efficiently is challenging and requires an optimized implementation of numerical approximations [32,33].

Modeling of Instrumental Effect
The signal observed by an instrument is characterized by the associated instrument parameters. For any spectral domain, the limited spectral resolution of an instrument produces a smearing effect on the incoming spectrum. The instrument characteristics can be described by a spectral response function, and an accurate modeling requires the convolution of the monochromatic spectrum with this function.
The final aperture of a spectrometer leads to an additional widening of the ideal, infinitesimal narrow beam ("pencil beam") along the line of sight. This can be simulated by convolving the "pencil-beam" spectra with a field-of-view function.

Inverse Problem
As can be inferred from Section 2.1, the radiative transfer processes are nonlinear and rather complex, especially over strong absorption or scattering regimes, making it difficult to approximate them by using a particular class of functions, such as convex, logarithmic, quadratic, etc. In general, an inverse problem in atmospheric remote sensing can be formulated as follows: Let x ∈ R n be the state vector estimated from the "noisy" measurement vector y δ = y + δ ∈ R m , where y and δ represent the "exact" measurement vector and the noise vector, respectively. The objective function can be formulated as where . is the two-norm, the vector-valued forward model F : R n → R m is used to interpret the signal through the atmosphere measured by the instrument. On the right-hand side of Equation (10), the first term F(x) − y δ is the residual term that quantifies the goodness of fit. Hereafter, we define the residual vector R(x) as R(x) = F(x) − y δ . The residual term R(x) should be sufficiently small, albeit not smaller than the average size of the errors in y δ , otherwise only the noise in the data is fitted. The second term L(x − x a ) 2 is the penalty term that quantifies the regularity of the solution. The a priori vector x a is generally used as an estimate of the state before the retrieval is done. x a can be the same as the first guess x λ,0 , but is not always the case. L and λ denote the regularization matrix and parameter, respectively, which are the two distinct parts of regularization. λ determines the relative weight of the penalty term L(x − x a ) and the residual term R(x) ; while L determines the magnitude or smoothness of the solution.

Nonlinear Optimization
In principle, the minimization of Equation (10) can be reformulated as min and the corresponding Jacobian matrix K λ (x) is given by with K(x) being the derivative of F at x. For a physical interpretation of K(x), see examples from the two retrieval applications in Section 3.
We perform nonlinear optimizations by Newton-type methods employing a quadratic model M to approximate the objective function F around the current iterate: with d being the search direction. Here, the gradient G and the Hessian H of F are defined by and respectively, where Q is the second-order derivative term with H k as the Hessian of [R λ ] k . The optimization scheme is adapted from a trust-region approach [34], which seeks an optimal search direction d at the step i, yielding the new iterate x λ,i+1 = x λ,i + d λ,i . Let Γ i be the trust-region radius, the computation of d can be essentially formulated as an constrained minimization problem where M i is the quadratic model at the current iterate x λ,i . In case that the solution to the problem (17) is found on the boundary of the constraint region, the Lagrangian function is defined by With the first-order optimality conditions for Equation (18), we have where γ is the Lagrange multiplier for d 2 = Γ 2 i . With Equation (14) and a Gauss-Newton approximation to Equation (15), Equation (19) can be rewritten as The solution to Equation (20) is obtained by solving the least squares problem where d λ,i can be as a damped intermediate between the Newton direction (λ → 0) and the steepest descent direction −G(x λ,i ) (λ → +∞). A trust-region based method determines the acceptance of the current trial step d λ,i based on two quantities, that are • the estimated change in the quadratic model ∆ est In practice, the size of the next trust-region radius Γ i+1 can be updated using the ratio of ∆ λ,i to ∆ est λ,i .
The stopping tolerances are always required so that the iterative process can be terminated when a favorable "convergence" is reached, for instance, the regularized solution x λ is close to a local minimizer x * or F (x λ ) reaches an optimal F (x * ). Thus, the optimization scheme adopts two convergence tests: The X-convergence test checks whether the sequence x λ,i is converging and the change in x satisfies a predefined criteria; the relative-function-convergence test checks whether the relative change of F is within a predefined value.

Regularization Matrix
The design of TR is associated with the choice of λ and construction of L. The classical TR uses a constant regularization parameter λ and needs to determine a proper choice of λ in advance. Xu et al. [18] compared various parameter selection strategies for choosing λ and further advocated the usage of iterative regularization approaches (e.g., IRGN -the iteratively regularized Gauss-Newton method) by using a monotonically decreasing sequence of λ and an a posteriori stopping criterion. In this study, we focus on the impact of L on the retrieval outcome, and the influence of the λ choice will be not further discussed.
The formulation of L and its variants could have significant impact on the quality of the regularized solution. In practice, L imposes smoothing, ad hoc constraints, or climatological information. The state vector x includes the target parameter, and auxiliary parameter, if necessary.
The standard form of L is the identity matrix L 0 = I, aiming to control the magnitude of the solution. This is usually a suitable choice when nothing is known about the solution. However, L in this form frequently results in a lack of smoothness in the solution, which may not be preferable in atmospheric retrieval applications.
Alternatively, L can be a discrete approximation to first-order derivative operator, that is or Likewise, we can construct L using an approximation to the second-order derivative operator, i.e., or L 1 in Equation (22) and L 2 in Equation (24) impose regularization only when the null space of L does not overlap with the null space of K. The regularization matrices in Equations (23) and (25) are not singular and possess a regularizing effect in spite of the null space of K. L 1 and L 2 are therefore referred to as smoothing regularization matrices.
When some statistical information about the magnitude of the solution is available, the Cholesky factor of a given covariance matrix for the solution can be used: with the a priori profile covariance matrix S x where σ xi is the profile standard deviation and l i is the correlation length between the parameter at different altitudes z i . Considering an equidistant altitude grid and assuming l i = l for all i = 1, . . . , n, we find that L → L 0 as l → 0, and that L → L 1 as l → ∞.
Eriksson et al. [35] suggested a modified form based on a Gaussian correlation function In this form, the penalty term of TR is equivalent to the Bayesian approach (OEM). The OEM method is more sensitive to the state of the a priori information. Nevertheless, no fundamental differences in the solutions are expected between these two methods [16]. More focus is being placed on tackling an inverse problem with an optimal solution, when the simplicity is needed and the background information (e.g., a priori) is limited.

Results and Discussion
As pointed out previously, radiative transfer calculations are not easily approximated. It can be very hard to theoretically determine which retrieval algorithm performs better from real data in which an exact quantification of the relevant errors (the forward model and instrument model errors in the state space) becomes problematic. The retrieval experiments in this section were performed using synthetic measurements simulated with realistic measurement conditions and instrument parameters. In this case, measurement error correlations can be neglected.
The retrieval quality is essentially associated with the accuracy, the goodness of fit, and the useful information content extracted from measurements. The accuracy is characterized by the error between the retrieved and true states, while the goodness of fit can be represented by the residual.
Two retrieval experiments were carried out for analyzing different forms of L in the penalty term of the Tikhonov-type objective function. The main goal of the retrieval experiments is to find an optimal construction of the penalty term under measurement conditions. This can help us to optimize the retrieval configuration under realistic conditions and to better understand the instrument capabilities.

Temperature Profile Retrieval
At microwave frequencies, there are a variety of detectable oxygen lines around 60 GHz. Thermal emissions of oxygen molecules (O 2 ) in this particular spectral range are frequently utilized to derive atmospheric temperature profile from satellite and ground-based measurements, e.g., [36][37][38][39]. An airborne Microwave Temperature Profiler (MTP) [40,41] was initially developed at NASA's Jet Propulsion Laboratory (JPL), which has participated in several campaigns and the data has been used in a number of studies, e.g., [42][43][44]. The German Aerospace Center (DLR) has acquired a copy and designed as a wing-canister instrument that has been deployed on the High Altitude and Long Range Research Aircraft (HALO) [45]. For more details about measurement characteristics of DLR-MTP, see [46].
The DLR-MTP instrument used three microwindows covering strong oxygen absorption lines around 60 GHz. A complete measurement cycle was carried out with 10 viewing angles (five above the horizon, four underneath, and one towards the horizon). The MTP instrument parameters and measurement characteristics (LO frequencies, viewing angles, etc.) are listed in Table 1. The forward model F is described in Section 2.1.2. The state vector x comprised the vertical profile of atmospheric temperature, whereas the measurement vector y δ comprised 30 elements. The aircraft altitude was 10 km. The vertical altitude grid was discretized with constant steps of 400 m between 6 km and 14 km, 500 m between 5.5 km and 6 km and between 14 km and 14.5 km. Here, the Jacobian matrix K(x) contained the partial derivatives of the radiance vector with respect to the temperature at different altitude levels, i.e., K(x) = ∂F/∂x, with x = [T 1 , T 2 , . . . , T N ] and N being the number of altitude levels. In addition to O 2 , H 2 O, CO 2 , O 3 , CH 4 , N 2 O, CO, and CH 4 were considered in the forward model. The relevant molecular spectroscopic parameters were taken from the HITRAN (HIgh-resolution TRANsmission molecular absorption database) [47] compilation. The atmospheric profiles were derived from the Air Force Geophysics Laboratory (AFGL) model [48].
The temperature retrieval experiment was performed for the following two cases: Case 1 the true profile used as the a priori profile; Case 2 the scaled true profile used as the a priori profile.
The inverse problem in this case was considered to be severely ill-conditioned as the condition number was estimated to be larger than 10 6 . The least squares solution can be very sensitive to the perturbations of the measurements, and therefore, both the form of the penalty term (L and x a ) and the regularization strength (λ) appeared to be important. The IRGN method as a variant of TR was used in this experiment due to its superiority over the classical TR in selecting an optimal λ. Figure 1 shows the temperature retrieval performance using four regularization matrices, including the retrieved profile and the difference with respect to the true profile (the AFGL U.S. standard profile). In this case, the a priori profile was assumed to be perfect and the initial guess x 0 was set to a constant vertical profile of 220 K: x 0 = 220 , where x t is the true profile.   Table 2 lists the degree of freedom for signal (DOFS) that was interpreted as a measure of the effective information content. Apparently, the noisier the measurement was, the less effective information content the retrieval could obtain. Although the DOFS for L C was slightly lower than that for the other regularization matrices, there was no significant discrepancy between them. For these retrievals, the corresponding averaging kernels (not shown here) behaved in a very similar way: the kernels captured the peak at altitudes close to the observer and broadened out at altitudes far from the observer. Using the noise-free measurements, the retrieval using all of these regularization matrices reproduced the truth very well. It can be realized from Figure 1 that the retrieved profiles nearly overlapped with the true profile and their differences approached zero. When dealing with the noise-contaminated measurements, except for L 2 , the difference between the retrieved and true profiles was up to 3 K. Despite the lack of smoothness, the identity matrix L 0 demonstrated the ability of approximating the magnitude of the solution. On the contrary, L 1 and L 2 brought about more smoothness, but lower accuracy due to the measurement noise being evident in the retrieval using both matrices. The maximum difference of up to 5 K was produced by the retrieval using L 2 . As expected, the retrieval using L C produced the best result, indicating that the retrieval was not strongly influenced by the measurement noise if the a priori information was reliable.
In practice, reasonable a priori knowledge was often not easy to find. Two scenarios corresponding to two temperature profiles were tested. Figure 2 shows the temperature retrieval performance using four regularization matrices when the a priori profile and initial guess were assumed to be a scaled version of the true profile: Here, only the noise-free measurements were considered so that the retrieval error could largely reflect the impact from the regularization itself. The retrieved temperature profiles using L 0 approximated the true profiles at altitudes that were close to the aircraft altitude (∼10 km) and approached the a priori profile elsewhere. In both scenarios, the retrievals using L 1 and L 2 matrices generated smoother temperature profiles, and a better retrieval accuracy was given by L 1 . Similar to L 0 , L C could only capture the temperature information within a limited range from 8 to 12 km, but strong oscillations were found at lower and higher altitudes. Table 3 lists the corresponding root mean square error (RMSE) and DOFS. In agreement with Figure 2, the RMSE for L 1 and L 2 was smaller. Larger values of RMSE for L 0 and L C corresponded with the oscillating patterns observed at lower and higher altitudes. The DOFS did not show any distinguishable difference between the four matrices. Similarly, the averaging kernels peaked between 8 and 12 km where the retrieval relied on the measurement information. Less useful information content from the measurement could be extracted at lower altitudes.   As can be seen from both comparison cases, the retrieved temperature information at altitudes close to the aircraft altitude was more sensitive to the measurement information and less sensitive to the a priori information, while the retrieval at lower and higher altitudes (except for L 1 ) appeared to be insensitive to the measurement information and the a priori error turned out to be dominant. This limitation is associated with instrument characteristics, as confirmed in [46]. The difference from the retrieved profile to the true profile was larger at altitudes where the vertical distance to the aircraft was larger with a lower retrieval sensitivity.
The retrieval using L 0 performed very well when the a priori knowledge was reliable, but the intrinsic property of L 0 could make the resulting profile with unphysical oscillations. Likewise, the impact of the a priori knowledge was significant for a retrieval using L C that resulted in unconvincing retrieval in the case that the a priori error was large. Compared to L 0 and L C , both L 1 and L 2 imposed sufficient smoothness on the retrieved profiles. Concerning a retrieval without reliable a priori knowledge, the results of L 1 and L 2 seemed still reasonable, although the influence of the a priori error seemed more noticeable on the retrieval using L 2 . Overall, this retrieval experiment suggests that L 1 outperformed other choices.

Aerosol Retrieval
A satellite hyperspectral spectrometer called TROPOspheric Monitoring Instrument (TROPOMI) was developed to measure global distributions of trace gases, aerosols, and clouds. The instrument comprised four spectrometer modules that covers the UV-VIS (270-495 nm) and NIR (675-775 nm), and the shortwave IR spectral range (2305-2385 nm). The O 2 A-band (758-770 nm) was particularly designed to derive the height information of aerosols and clouds. Here, we carried out an experiment of aerosol properties retrieval from synthetic TROPOMI O 2 A-measurements. The relevant setup of aerosol microphysical properties and geometry/surface conditions is given in Table 4. The synthetic TROPOMI data were generated with a set of aerosol properties, i.e., the aerosol layer height (ALH) and optical depth (AOD). The forward model F is described in Section 2.1.1. In contrast to the profile retrieval in Section 3.1, the state vector x in this case had two components, i.e., ALH and AOD. In this case, the Jacobian matrix K(x) depicts the contribution of the aerosol properties to the radiance, i.e., K(x) = ∂F/∂x, with x = [X ALH , X AOD ]. The components of x in this section were different from those for the temperature profile retrieval in Section 3.1. Thus, L C calculated from the a priori profile covariance matrix was not available for comparison. The initial guess x 0 and a priori x a were assumed to be identical with 3 km and 1 corresponding to ALH and AOD, respectively. The inverse problem in the aerosol retrieval was considered to be mildly/moderately ill-posed as the order of magnitude of the condition number was about 2. A suitable form of the penalty term (L) was expected to be influential as compared to a cautiously-chosen regularization strength (λ). In this experiment, the classical TR with three values of λ (0.0001, 0.001, and 0.01) was employed. A better retrieval performance with a smaller λ would be anticipated.
The errors of ALH retrieval using the three regularization matrices are shown in Figure 3 in terms of the relative difference to the true value. In this figure and the following ones, the (x, y) coordinates represent the pairs of true values of ALH and AOD. The best retrieval result was obtained with the lowest λ and the relative error was essentially within 5%. The retrieval results using L 0 and L 1 were reasonable and the error became larger in the case of the measurements with lower AODs. Both were significantly better than that of L 2 with the larger λ. Underestimations were frequently found in the retrievals using all three forms of L when λ = 0.01, except for the measurements with lower ALHs. The retrieval error reached up to −60% from the measurements with larger ALHs. Besides the too strong regularization, this was caused by unreliable a priori information. The relative errors of AOD retrieval are displayed in Figure 4. Similar to the ALH retrieval, the retrieval performance of L 0 and L 1 was comparable and better than that of L 2 . Nevertheless, the difference of the AOD retrieval between the three forms of L was not recognizable compared to that of the ALH retrieval. When λ = 0.0001, the retrieval error of L 0 and L 1 was within 3% and smaller than that of L 2 (∼5%); when λ = 0.01, an error of up to 20% could be seen in the case using L 2 . The retrieved AODs were particularly overestimated from the measurements with lower AODs and higher ALHs. These results indicated a possible correlation between the retrieved AOD and ALH. In the case of higher ALHs, a negative correlation between ALH and AOD was noticed and turned out to be stronger for L 2 . For smaller values of λ, this correlation became weaker. According to the retrieval output in Figures 3 and 4, the retrieved quantities using L 0 and L 1 approximated the true state in most cases. The retrieval trended to be worse using larger values of λ as the a priori error was also notable. Regarding the ALH retrieval, the retrieval using L 1 was slightly better than that using L 0 , whereas L 0 marginally outperformed L 1 in the case of the AOD retrieval. The retrieval using L 2 achieved the acceptable accuracy with λ = 0.0001 but could produce the larger error with a factor of 2-3 with an inappropriate λ.
In addition to the retrieved aerosol parameters, Figure 5 depicts the corresponding residual after convergence. Unsurprisingly, for all the forms of L, the residual decreased while the λ decreased. As the ALH increased, the residual corresponding to L 2 considerably increased, whereas only a gradual increase was found in the residuals corresponding to the other two matrices. This increasing trend between the residual and AOD did not seem to be linear, especially in the case of lower AODs. In other words, the spectrum in the spectral range of interest seemed to be more influenced by the contribution of ALH. The residuals corresponding to L 0 and L 1 were comparable and apparently better than that corresponding to L 2 . In contrast to the temperature retrieval in Section 3.1, the aerosol retrieval dealt with two independent properties. The retrieval performance of L 0 was greatly improved as it managed to approximate the true magnitude of the solution. The retrieval using L 1 also produced satisfactory results even when the a priori state was far from the true state. However, the retrieval performance of L 2 was not convincing in the case that suboptimal choice of λ was used.

Conclusions
The penalty term of the Tikhonov-type objective function plays an important role in solving atmospheric retrieval problems. The goal of this paper was to give an insight into the impact of the regularization matrix. The design of this matrix can crucially affect the retrieval performance and greatly degrade the inversion accuracy.
We have performed retrievals of atmospheric temperature profile and aerosol properties using the identity matrix L 0 , discrete approximations of the first and second order derivative operators L 1 and L 2 , respectively. For the temperature profile retrieval, the Cholesky factor of the a priori profile covariance matrix L C has also been included. The pros and cons of each regularization form have been analyzed.
The temperature retrieval from an airborne microwave radiometer demonstrates that the inversion algorithms enable reliable retrieval only within a limited altitude range. This is mainly because of the measurement characteristics. Although reliable a priori information appears to be important, the retrieval performance of L 1 and L 2 is superior to that of L 0 and L C in case of large a priori errors.
The aerosol properties retrieval from a satellite hyperspectral spectrometer shows that the performance of L 0 and L 1 is evidently better than that of L 2 . The retrieval errors are within 5% while imposing weak regularization strength.
Based on the numerical experiments, we may draw the following conclusions: • L 0 has a good control on the magnitude of the solution; • L 1 imposes the smoothness of the solution with satisfactory retrieval accuracy even when the a priori information is not reliable; • L 2 also tends to produce a smoothed solution, but the retrieval accuracy can be degraded when the a priori information is not reliable; • L C associates with the a priori information.
Although the observation geometry and spectral range are drastically different, L 1 turns out to be favorable due to its "robustness" in both experiments. Nevertheless, a definite choice for the universal form of L may not exist, as the best one could always be problem dependent. From a practical point of view, the application of L 1 (and sometimes L 2 ) is beneficial when the desired solution (e.g., profile retrieval) is expected to be smooth. L 0 is a preferable choice when the desired solution prefers the magnitude.
Author Contributions: J.X. designed the study and prepared the original draft. J.X., F.S., D.S.E., and A.D. were involved in the development of the retrieval algorithms and radiative transfer models. J.X. and L.R. carried out the retrieval simulations. F.S., D.S.E., A.D., and T.T. contributed in many scientific discussions on the entire progress of this work. All co-authors read the manuscript and provided comments and suggestions. All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported by the DLR programmatic [Nachwuchsgruppe "Retrieval der nächsten Generation", 2 472 469]. The contribution of L. Rao was funded by the Chinese Government Scholarship.