Fast Hyper-Spectral Radiative Transfer Model Based on the Double Cluster Low-Streams Regression Method

: Fast radiative transfer models (RTMs) are required to process a great amount of satellite-based atmospheric composition data. Speciﬁcally designed acceleration techniques can be incorporated in RTMs to simulate the reﬂected radiances with a ﬁne spectral resolution, avoiding time-consuming computations on a ﬁne resolution grid. In particular, in the cluster low-streams regression (CLSR) method, the computations on a ﬁne resolution grid are performed by using the fast two-stream RTM, and then the spectra are corrected by using regression models between the two-stream and multi-stream RTMs. The performance enhancement due to such a scheme can be of about two orders of magnitude. In this paper, we consider a modiﬁcation of the CLSR method (which is referred to as the double CLSR method), in which the single-scattering approximation is used for the computations on a ﬁne resolution grid, while the two-stream spectra are computed by using the regression model between the two-stream RTM and the single-scattering approximation. Once the two-stream spectra are known, the CLSR method is applied the second time to restore the multi-stream spectra. Through a numerical analysis, it is shown that the double CLSR method yields an acceleration factor of about three orders of magnitude as compared to the reference multi-stream ﬁne-resolution computations. The error of such an approach is below 0.05%. In addition, it is analysed how the CLSR method can be adopted for efﬁcient computations for atmospheric scenarios containing aerosols. In particular, it is discussed how the precomputed data for clear sky conditions can be reused for computing the aerosol spectra in the framework of the CLSR method. The simulations are performed for the Hartley–Huggins, O 2 A-, water vapour and CO 2 weak absorption bands and ﬁve aerosol models from the optical properties of aerosols and clouds (OPAC) database.


Introduction
The operational processing of remote sensing data requires fast radiative transfer models (RTMs), which simulate the radiance field scattered in the atmosphere. The high spectral resolution simulation in the gaseous absorption bands is a demanding task. As the gaseous absorption coefficient changes rapidly with wavelength, the accurate computations based on a fine resolution grid may require thousands of calls to monochromatic RTMs. To accelerate these computations, several techniques have been developed over the years. For instance, the correlated-k models [1][2][3][4] take into account that the mean radiance across a spectral range depends more on the distribution of the absorption coefficient than on its variation in the spectral range. The state-of-the-art acceleration techniques are based on predicting the spectrum by using a fast two-stream RTM (instead of relatively more time consuming multi-stream RTMs) and then refining the result by introducing a correction function. The latter can be estimated in the original basis of optical parameters, as in the low-stream interpolation (LSI) method [5], or in the reduced basis, as in some principal component analysis (PCA)-based RTMs [6][7][8][9][10][11][12][13]. Recently, we introduced the cluster lowstreams regression (CLSR) method, in which such a correction function is found in the space of radiances computed with a two-stream RTM [14]. This approach was applied to the simulations in the O 2 A and the weak CO 2 bands for different atmospheric scenarios, including aerosols and clouds. It was shown that the error of the CLSR method did not exceed 0.1%, while providing the speedup factor of about two orders of magnitude compared to the line-by-line (LBL) model. Here, we refer to LBL computations as the computations of the radiance field in a fine resolution grid. Therefore, to alleviate the computational burden of the fine resolution radiances, a reduction in the number of radiative transfer simulations is performed. Significant time can be saved by using the LBL model to precompute hyperspectral radiances in different atmospheric conditions, to store them in look-up-tables (LUT) for further use in the development of retrieval algorithms. In this sense, LUT are implemented for monochromatic radiances while other applications precompute the monochromatic transmittances or absorption cross sections (e.g., [15]).
Additional performance can be achieved by combining several acceleration techniques or by using an acceleration method twice. For example, in the double k-approach, the integration is performed across the total absorption optical depth and the absorption optical depth from the top of the atmosphere to the scattering layer [16]. The dimensionality reduction scheme based on PCA can be utilized twice, first, to the atmospheric optical properties and then to the radiance datasets. Such a technique has been applied for simulations in the UV range [12] and for the solar spectral range (the spectral data compression (SDCOMP) method [17]). In Molina García et al. [11], the correlated-k method was used in conjunction with the PCA-based RTM. A similar approach but for three dimensional computations was applied in Doicu et al. [18]. In Kopparla et al. [19], the PCA-based RTM was combined with a sort of spectral binning. Since the performance bottleneck of the CLSR method is the two-stream RTM used for the LBL computations, our intention is to examine the effect of the double application of the CLSR method.
In this paper, two modifications of the CLSR method are proposed. In the first modification, the two-stream spectra are computed by using the CLSR method and the single-scattering RTM as an approximate RTM. Thus, the CLSR method is applied twice. Such a scheme is referred to as double CLSR. In the second modification, the influence of the aerosols is modelled as a perturbation of the clear sky spectra. In this regard, a spectrum for actual aerosol conditions is estimated from a spectrum computed for clear sky conditions in the framework of the CLSR method.
The rest of the paper is organized as follows. In Section 2, we briefly outline the CLSR method and describe the double CLSR method, as well as an improvement for the aerosol scenarios. In Section 3, we present the results of the simulations and comparisons between the single CLSR and double CLSR methods in terms of accuracy and computation time. The obtained acceleration factors are compared with the state-of-the-art acceleration techniques. In addition, the efficiency of the improved aerosol treatment in the single and double CLSR method is analysed.

Data Overview
To check the efficiency of the proposed modifications of the CLSR method, we consider high spectral resolution computations of the top-of-the-atmosphere (TOA) radiances in the Hartley-Huggins (315 nm), O 2 A-band (760 nm), water vapour (885 nm) and CO 2 band (1610 nm), as in our previous work [12,14]. The information about absorption bands is summarized in Table 1. The corresponding absorption coefficients in the O 2 A-, water vapour and CO 2 bands are computed with the LBL model Py4CAtS [20], while the ozone absorption cross-sections in the Hartley-Huggins band are taken from the HITRAN 2016 database [21]. Hence, the absorption coefficients are pre-computed and stored. The Rayleigh scattering coefficients are modelled as proposed in [22]. The atmosphere assumed for the tests is the mid-latitude summer reference atmospheric model profile [23]. In our simulations, the atmosphere is discretized with a step of 1 km between 0 and 25 km, and a step of 2.5 km between 25 km and 50 km, resulting in 35 layers. The chosen altitude grid is used for the purpose of comparing the computational speed of different models. The boundary conditions at the bottom are defined by the Lambertian surface with an albedo of 0.3. The solar zenith angle, the viewing zenith angle and the relative azimuth angle are 45 • , 35 • and 90 • , respectively. The radiative transfer solver used in the study is based on the discrete ordinates with matrix exponential (DOME) method [24,25]. The number of discrete ordinates per hemisphere (N do ), often referred to as streams, controls the accuracy and performance of the computations. The RTM is called multi-stream when N do ≥ 2. For the specific case of N do = 1, the model is called two-stream. In the single-scattering approximation, the multiple scattering term of the radiative transfer equation is neglected and the solution can be derived analytically without using the discrete ordinate method. In calculations involving aerosol single-scattering phase functions, the delta-M scaling [26] and the TMScorrection [27] procedures are applied. The boundary value problem for the multilayer atmosphere is solved by using the matrix operator method [28], which merges layers into a single layer. The radiance along a viewing direction is computed by using the false discrete ordinate method [8,29].
The optical properties of aerosols are computed by using the optical properties of aerosols and clouds (OPAC) database [30]. The following aerosol types are considered: tropospheric, continental clean, urban, desert and continental polluted. For this study, clouds are not taken into account. The values of the aerosol optical depth (AOD), single scattering albedo (SSA) and asymmetry factor (g) are summarized in Figure 1. For this study, the atmospheric composition affects the accuracy results for the different aerosol types. Therefore, we have included several types of aerosols with different optical properties in order to test as many cases as possible.  The idea of the CLSR method is to obtain the spectrum corresponding to the reference RTM by using the approximate RTM and the regression model between approximate and reference RTMs [14]. The method can be summarized as follows: We consider a LBL spectrum {I TS (λ i )} N i=1 computed at N spectral points {λ i } N i=1 by means of the two-stream (TS) RTM. Then, the radiances are sorted in ascending order, obtaining the set {Î TS (λ i )} N i=1 , which is split into C clusters. In each cluster containing N C = N/C radiance points, we select n equidistant radiance points, and for the corresponding wavelengths, we compute the radiances {I c MS,q } n q=1 by using the multi-stream (MS) RTM. Assuming a regression model between TS and MS radiances within each cluster c, we obtain where α c , β c and γ c are the regression coefficients of the c-th cluster andT c i is the corresponding direct transmittance (T = exp(−τ aer ) with τ aer being the total AOD). Equation (1) can also be written in matrix form as follows: where The regression coefficients are found as a solution of the following least squares problem: (4) Finally, the MS radiances {Ĩ MS,i } N i=1 are restored at full spectral resolution using the TS radiances and the regression coefficients. The number of calls to the MS RTM equals to nC. Note that the TS spectra are computed in a LBL framework, imposing a performance bottleneck.

Double Cluster Low-Streams Regression Method
In the double CLSR method, the TS spectra are computed also by applying the CLSR technique. In this case, as an approximate model, we use the single-scattering (SS) RTM. Thus, the algorithm can be described as follows: Step 1: We compute the LBL spectra {I SS (λ i )} N i=1 by using the SS RTM and apply sorting and clustering to the space of SS radiances. Assuming a regression model between SS and TS radiances within each cluster z, we obtain with number of radiance points N Z = N/Z and cluster index z. The regression coefficients [a z , b z , d z ] are found as a solution of the following least squares problem where A = [a z , b z , d z ] and Y = Î z TS,i . By knowing the regression coefficients, the TS spectra at high spectral resolution.
Step 2: We apply the CLSR method as described in the previous section (Section 2.2.1) using the TS spectra computed in Step 1. Figure 2 shows a schematic representation of the CLSR and double CLSR methods. Note that the TS spectra derived at Step 1 by using the CLSR method differ from those computed by the TS RTM in a LBL manner. However, the possible bias obtained in the double CLSR method at Step 1 is removed by the regression model at Step 2.

CLSR Method: Improvement to Aerosol Scheme
It is important to include aerosols in the simulations, using a numerically efficient RTM to quantify the impact of aerosol scattering [31]. To compute the spectrum in the case of the atmosphere with aerosol, the CLSR and double CLSR methods can be applied. For this case, the regression model (Equation (2)) is used, in which and the original X-matrix is substituted by where the upper index 'aer' explicitly says that the computations are performed for the aerosol case. However, as shown in Figure 3, the spectra for the clear sky atmosphere (i.e., Rayleigh atmosphere) with and without aerosols have a similar spectral behaviour, i.e., the aerosol spectra depend almost linearly on the clear sky spectra. In this regard, alternative formulations of the CLSR method can be considered. For instance, taking the X-matrix as we obtain a method, which converts the MS clear sky spectra into spectra corresponding to the aerosol conditions. The upper index 'clear' indicates that the computations are performed for the clear sky case. The possible benefit of such a scheme is that the clear sky spectra can be precomputed and stored in LUTs and perform the computations offline, while the computations for actual aerosol properties can be performed online. Alternatively, we consider the X-matrix in the following form: In this case, the regression model is supplied with the first order perturbation computed by using the TS RTM. As a matter of fact, in this case, we do not expect performance enhancement compared to the X 0 -scheme. The question is if the error can be reduced by involving precomputed LUTs for clear sky cases as compared to the original X 0 -scheme. Note that for all these cases, the MS RTM for the aerosol scenarios is called for a few spectral points.

Single CLSR vs. Double CLSR: Accuracy Results
In this section, we compare the single and double application of the CLSR method in terms of accuracy. To assess it, we compute the relative error (residual) with respect to the continuum at each spectral point in an LBL manner, where I CLSR is the radiance calculated with either the CLSR method or the double CLSR method, I ref MS is the reference radiance with absorption, while I cont MS is the radiance in the continuum, i.e., without absorption, which is used to avoid values close to zero in the denominator, as in [19]. Other metrics used to estimate the accuracy of the simulations are the mean absolute relative error with respect to the continuum, the probability density functions and the cumulated probability. Figure 4 shows the probability density function of the spectral residuals ∆I res for both methods and the four spectral bands. The main conclusions that can be drawn from the figure are the following:

•
More than 70% and 60% of the residuals are below 0.01% for the single and double CLSR methods, respectively, for all bands, with the exception of the water vapour band. • The residuals of the water vapour band present a wider distribution in comparison with the other spectral bands. • The probability densities are almost indistinguishable for both acceleration methods, demonstrating that both techniques provide accurate results among the different spectral bands.
It can be seen that the accuracy of the CLSR method is slightly higher than that of the double CLSR method. This result can be expected, since the TS spectra used in the double CLSR method are approximate and obtained from the SS spectrum (see Figure 2).  Figure 5 shows the cumulated probability functions of the CLSR and double CLSR methods for all spectral bands. Over 90% residuals are less than 0.05% in the case of the Hartley-Huggins band and 0.01% in the case of the CO 2 band. Higher differences can be seen between the CLSR and double CLSR methods for the O 2 A-and water vapour bands. For these bands, over 90% of the CLSR residuals are less than 0.025%. Meanwhile, the double CLSR provides slightly larger errors: over 60% and 80% of the residuals are less than 0.05% for the O 2 A-and water vapour band, respectively. However, these residual values are still low and of the same order as those obtained by using the PCA-based RTMs. For instance, in Liu et al. [17] PCA was applied to optical parameters and spectral radiances yielding an error lower than 0.2% in the solar region (775-920 nm). Kopparla et al. [19] combined the PCA technique for optical parameters and the spectral binning for accurate computations in the case of aerosols. The residuals were below 0.01% for the O 2 A-band, which are of the same order as our results.
In our previous paper on the CLSR method [14], the convolved spectra were computed, showing errors of the same order as the non-convolved spectra. Here, the same conclusions apply for the double CLSR method.

Single CLSR vs. Double CLSR: Computational Performance
In this section, we analyse the computational performance of the single and double CLSR methods. Tables 2-4 show the number of calls to the SS, TS and MS RTMs, computational times and corresponding acceleration factors with respect to the MS LBL simulations. We recall that, as the reference model, the MS LBL RTM with 32 streams is used.
For the single CLSR method, 5 clusters and 4 regression points per cluster are used. Hence, computations for each absorption band involve 20 MS RTM calls, while the number of TS RTM calls is equal to the number of spectral points in the high-resolution LBL computations (i.e., 300, 20,000 and 40,000 calls to the TS RTM for the Hartley-Huggins, O 2 A-and CO 2 and water vapour band, respectively). As can be seen in Tables 2-4, the TS RTM imposes the computational burden of the single CLSR method for the O 2 A-, CO 2 and water vapour bands, consuming up to 70% of the whole computation time.   At the first step of the double CLSR method, 8 clusters and 4 regression points are used. Thus, the TS RTM is called for 32 spectral points. In the case of the double CLSR, the SS RTM is utilized for the LBL computations, resulting in an additional performance enhancement by 2 times for the O 2 A-and CO 2 bands and 3 times for the water vapour band. We note that in the double CLSR, the computational burden corresponds to the MS RTM, while the computation times related to the TS and SS RTMs are three and two orders of magnitude lower than those of the MS RTM, respectively. In the case of the Hartley-Huggins band, the computational burden is still due to the MS RTM [12] and the double CLSR does not further improve the performance.
As a final remark, the accuracy is crucial to determine the number of calls needed for the CLSR methods, and we could improve it by increasing the number of calls to the RTM models. However, this would add a computational burden to the simulations, while providing little improvement in the accuracy. Several tests have been performed by increasing the number of calls to the SS RTM or TS RTM but the errors are of the same order of magnitude as the actual values.

Computational Performance: State-of-the-Art Acceleration Techniques
In this section, we compare the CLSR and double CLSR methods against other stateof-the-art acceleration techniques. Table 5 summarizes the spectral bands/regions and the acceleration factors for the selected acceleration techniques, including this study. In this analysis, we consider the studies covering absorption bands in the 280-3000 nm spectral range.
Note that the acceleration factors depend not only on the method used, but also on other aspects, such as the number of discrete ordinates N do used for the reference RTM. In turn, the required number of N do depends on aerosol properties, surface properties, geometry and the required accuracy. Therefore, the acceleration factors are ambiguous, as different numbers of streams or atmospheric parameters are used for the reference RTM in the different studies. Nevertheless, the following conclusions can be made: Table 5. Selected acceleration techniques with the corresponding spectral region or band, acceleration factor (computed with respect to the LBL model) and their reference. Note that the acceleration factors are sometimes given in orders of magnitude compared to the LBL approach. One or two order of magnitude are indicated as 10× and 100×, respectively. The references are ordered chronologically.  (2) the radiance data set. b Nadir observations. c Glint observations. d Computation times are not considered, only the relative computational efficiency with respect to the accurate simulations. e Relative to the k-distribution method. * It is estimated from the information found in the reference, but the value does not appear explicitly.

Acceleration
1. For simulations in the Hartley-Huggins band, PCA techniques, linear embedding methods (LEM) and double CLSR have been applied. The double CLSR does not further improve the performance, since the computational burden is due to the MS RTM computations (see Section 3.2). The highest acceleration factor is provided by the method described in [12], in which PCA is applied to both optical parameters and spectral radiances. The performance enhancement in this case is up to 18 times. 2. There are several studies in which fast RTMs for the O 2 A-and CO 2 bands (either weak or strong) have been designed. In general, all considered techniques provide acceleration factors of about 2-3 orders of magnitude, including those based on artificial neural networks (NN) [37]. 3. The water vapour band represents a challenge for acceleration techniques due to its complicated spectral structure. Therefore, the accuracy of the acceleration techniques is lower than for the O 2 A-band. For this band, the double CLSR method provides an acceleration factor of about 3 orders of magnitude, while the k-distribution [32] and PCA-based RTMs [19] achieve lower acceleration factors, of one order of magnitude.

Further Improvements to Aerosol Schemes
In this section, the efficiency of the CLSR method for various configurations outlined in Section 2.3 is examined for several OPAC aerosol models. The results obtained using X 1 and X 2 (corresponding to Equations (9) and (10), respectively) are compared against the original CLSR method in which the matrix X 0 (Equation (8)) is used. The mean absolute relative errors are shown in Figure 6. In general, mean relative absolute errors are below 0.05% for all aerosol types and bands when using the original X 0 -configuration for the CLSR method. However, these errors are slightly higher for the water vapour band due to its higher spectral complexity. Almost in all cases, the X 1 configuration is less accurate than the original CLSR method. This result can be expected, as X 1 carries no information about the radiances of aerosols. However, for low aerosol load (tropospheric and clean aerosols), the errors are of the same order or below 0.05% compared to the ones from X 0 . The advantage of the X 1 configuration is that it is based on spectra computed for the clear sky atmosphere. The corresponding LUT, therefore, is independent on aerosol properties. By using the CLSR method, the spectrum for actual aerosol conditions can be computed by calling MS RTM at a few spectral points (in our case, 20 spectral points).
The X 2 configuration comprises X 0 , which contains the spectra corresponding to the TS RTM, and X 2 , which corresponds to TS and MS RTMs for clear sky scenarios. An improvement with respect to the original configuration is provided by the X 2 -configuration for the Hartley-Huggins and CO 2 bands, where absolute relative errors are of the same order as for X 0 -configuration or below 0.01%. For the O 2 A-band, there is almost no enhancement compared to the original configuration. We have excluded the transmittance in the numerical simulations for the CO 2 band in order to obtain slightly more accurate results.
In sum, we present two alternative configurations to the original X 0 , in order to obtain more accurate results for: (1) low aerosol loading conditions with the X 1 configuration for all bands; and (2) the Hartley-Huggins, water vapour and CO 2 bands with the X 2 configuration.

Combined Application of the Single CLSR vs. Double CLSR Method for Aerosol Scenarios
Finally, the single and double CLSR methods are tested for the full set of aerosol models and X i -configurations. The probability density functions of the residuals for the tropospheric aerosol model are shown in Figures 7 and 8.  The conclusions drawn from the comparison of the two figures can be summarized as follows: • The residual distributions of the single CLSR method are narrower than those of the double CLSR method, meaning that the single CLSR method is more accurate. However, in general, the residuals are below 0.01% for both methods and all spectral bands, except for the water vapour band, where the residual distributions are slightly wider and still below 0.05%. The distributions are not biased.
• For the Hartley-Huggins, O 2 A-and CO 2 bands, the residuals are below 0.05% for both single and double CLSR methods. Regarding the water vapour band, the residuals are below 0.05% and 0.1% for the single and double CLSR method, respectively. Similar accuracies were achieved in Kopparla et al. [19] for the water vapour band using the PCA-based RTM. • In the case of the low aerosol load, the probability density functions are similar for all X i -configurations. However, as the aerosol load increases, the residual distributions for the X 1 configuration provided by the single and double CLSR methods sometimes become biased for the water vapour band.

Conclusions
In this study, we have proposed two modifications to the CLSR method for fast computations of radiance spectra in absorption bands. In the first modification, the CLSR method is used twice, the first time for assessing the spectra corresponding to the TS RTM, and the second time for computing the spectra corresponding to the MS RTM. Our tests reveal that the double CLSR method further improves the computational performance of the original CLSR method by two (CO 2 and O 2 A-bands) and three times (water vapour band), yet keeping the error below 0.05% for all spectral points on a high resolution grid. The designed approach has been compared with the state-of-the-art techniques found in the literature. Although the acceleration factors are ambiguous, in our simulations, the double CLSR method seems to provide a slightly higher performance than the PCA-based RTMs.
The second modification of the CLSR method is proposed for the atmospheric scenarios involving aerosols. The LBL computations performed for clear sky scenarios can be reused in the CLSR method as an approximate spectrum instead of the TS RTM, which is then corrected by calling the RTM for the atmospheric model with aerosol, but in a few spectral points. Therefore, by using the configuration of the CLSR method only with the information of the clear sky spectra, we can reproduce the LBL spectra with low aerosol load. We showed that in the case of low aerosol load, such a configuration provides absolute relative errors below 0.05% for all spectral bands, while enhancing the computational performance by three orders of magnitude. Low aerosol loading conditions occur when the atmosphere is in practice mainly clean. As the aerosol load increases (e.g., for polluted aerosol, with AOD equal to 5), the error also increases and reaches ∼0.2% for water vapour band and ∼0.1% in the O 2 A-and CO 2 band. The new configurations have been tested for both CLSR and double CLSR methods, revealing similar probability density functions. These results have the potential to be applied in near-real-time applications of aerosol computations or in aerosol retrieval algorithms considering an appropriate aerosol model selection [38,39]. It also seems beneficial to implement the CLSR method in conjunction with gradient-based LUT generators [40,41], to reduce the size of LUTs and interpolation errors. Besides, the CLSR method provides an accurate alternative to the LUT interpolation, which combines the regression and online RTM simulations. Therefore, the single and double CLSR methods could potentially be incorporated into the generation of LUT radiance fields in a fine resolution grid.
Our future work will be focused on coupling the CLSR method and machine learning approaches. Recent studies have shown that the accuracy of hyperspectral radiative transfer simulations can be improved by applying machine learning techniques (e.g., see [37]). Essentially, NNs are used as universal approximators which replace time-consuming RTMs. Further studies will be undertaken to replace the regression model with an NN, within the framework of the CLSR method. By doing this, we expect to reduce the number of clusters and spectral points for which the MS RTM is called.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: