A GPU-Accelerated Radiation Transfer Model Using the Lattice Boltzmann Method

: A prototype of a three-dimensional (3-D) radiation model is developed using the lattice Boltzmann method (LBM) and implemented on a graphical processing unit (GPU) to accelerate the model’s computational speed. This radiative transfer-lattice Boltzmann model (RT-LBM) results from a discretization of the radiative transfer equation in time, space, and solid angle. The collision and streaming computation algorithm, widely used in LBM for ﬂuid ﬂow modeling, is applied to speed up the RT-LBM computation on the GPU platform. The isotropic scattering is assumed in this study. The accuracy is evaluated using Monte Carlo method (MCM) simulations, showing RT-LBM is quite accurate when typical atmospheric coefﬁcients of scattering and absorption are used. RT-LBM runs about 10 times faster than the MCM in a same CPU. When implemented on a NVidia Tesla V100 GPU in simulation with a large number of computation grid points, for example, RT-LBM runs ~120 times faster than running on a single CPU. The test results indicate RT-LBM is an accurate and fast model and is viable for simulating radiative transfer in the atmosphere with ranges for the isotropic atmosphere radiative parameters of albedo scattering (0.1~0.9) and optical depth (0.1~12).


Introduction
Radiative energy transfer plays an important role in many areas of science and technology.Accurate modeling of incoming solar radiation and outgoing infrared radiation from the Earth's surface and their interactions with atmospheric constituents has been among the most important tasks for the atmospheric sciences and remote sensing of environments.Compared with cloud-free and large-scale atmospherics, in which radiative transfer is usually treated one-dimensionally by assuming horizontal homogeneity, radiation transfer near the ground is a complex three-dimensional (3-D) phenomenon due to interactions with the Earth's surface features, such as terrains, buildings, soil, and vegetation.The mathematical description of electromagnetic radiation propagation is the radiative transfer equation (RTE) with associated emission, absorption, and scattering parameters and complex boundary conditions.Analytical solutions for real atmospheric conditions are not available.Due to the complexity and difficulties in solving the RTE, many numerical methods, such as the Monte Carlo method (MCM) [1], finite volume method (FVM) [2], and discrete ordinates method (DOM) [3], to name a few, have been developed.Proper choice of numerical method is dependent on the problem parameters and boundary conditions.The MCM is considered as a versatile and accurate method that can handle complex situations and is free from numerical errors such as ray effects, numerical smearing, and false scattering [1,3,4].Therefore, the MCM is often used as a benchmark tool for radiative transfer modeling.The drawback with the MCM is it requires a very large number of photons to be released to avoid statistical noise; therefore, it is computationally expensive.FVM and DOM have the advantages of ease and efficiency to set up the radiative transfer problem but have the problems of ray effects and slow convergence in small optical thickness situations.All the above numerical methods for solving the RTE demand a great deal of computation power for a large computational domain.Moreover, variations of scattering and absorption coefficients depend on atmosphere constituents in different wavelengths, which require multiple computations for different wavelengths, thus making the computation load even more formidable.
In recent years, a novel and powerful method, the lattice Boltzmann method (LBM), has been developed for solving the RTE.One of the motives for using the LBM in solving the RTE is to improve the computation speed of radiative transfer modeling.The LBM was first discovered and developed in the fluid mechanics community [5][6][7][8][9][10][11] and has become one of the most effective methods for fluid flow and heat transfer simulations.The LBM is based on the kinetic theory of statistical mechanics and solves the Boltzmann equation that governs the probability distribution of fluid particles.The LBM solves the Boltzmann equation for a particle at each grid point by performing collision and propagation calculations of the particle's probability distribution function (PDF) over a discrete and symmetric lattice mesh with certain fixed directions.The macroscopic variables, such as fluid density and velocity, are computed from the statistical moments of the particle PDF.The major advantages of the LBM in fluid modeling are its intrinsic parallelism and the ease with which it treats complex boundary conditions.Flow modeling using the LBM is as accurate as FV or finite elements (FE) methods, and has a computation speed several hundred times faster than FV and FE methods.
It is natural to explore the LBM as a computational method to accelerate the computation speed in solving the RTE because radiative transfer is one of most computationally intensive tasks in atmospheric modeling.The LBM can be considered a direct discretization of the Boltzmann equation [7].The earliest work of solving the RTE using the LBM by Geist et al. [12] was to investigate lighting in computer graphics with angular discretization with 19 directions and using a graphics processing unit (GPU) for computation [13].Because of the similarity between the RTE and Boltzmann equation [14], earlier research and development of the LBM for the RTE started with direct discretization of the RTE with respect of space, time, and angular direction, which is a valid approach [7,14].Asinari et al. and Mishra et al. [15,16] developed a two-dimensional (2-D) LBM for radiative transfer modeling in a particular medium.Ma et al. [17] derived an LBM used for a one-dimensional (1-D) radiation problem that compared well with an analytical solution.Bindra and Patil [18] and McCulloch and Bindra [19] also developed a 2-D LBM for the RTE for a simulation of the conjugated radiative and convective heat transfer problem.A general review on modeling neutron and photon transport using LBM is provided in [20].The LBM was also used in a non-equilibrium radiation transfer problem [21].Zhang et al. [22] and Yi et al. [23] derived a 2-D LBM using the Chapman-Enskog expansion for a steady-state radiative transfer problem that can deal with both thin and high optical depths.The LBM was used in a model for astronomical radiation transfer by Weih et al. [24].For a better treatment of the radiation source term, a multi-relaxation time LBM was developed by Liu et al. [25].McHardy et al. [26,27] developed a 3-D LBM model using a direct discretization of the RTE and the model produced accurate results for the ballistic radiation condition in which the medium scattering albedo is less than 0.7.An anisotropic case of Mie scattering was also computed and compared well with the LBM method [26].Mink et al. [28,29] developed a 3-D LBM method for high optical thickness situations based on the Chapman-Enskog expansion and a steady-state RTE was approximated by the Helmholtz equation and solved with the LBM.
The LBM with a GPU has shown to be very effective in numerical simulation of turbulent flow in urban environments with at least a 200 to 500 times speed-up (CPU/GPU time ratio) depending on the GPU type [30,31].Since radiative transfer is a very important component of energy transfer in the atmospheric boundary layer and the computation is very challenging, it is advantageous to exploit the LBM method with a GPU when solving the RTE.It is also beneficial to have the same computational methodology and grids set up for coupling our LBM flow model and the LBM radiative transfer model.
The objective of this study is to evaluate the accuracy and computation capability in a newly developed radiative transfer model using the lattice Boltzmann method, called RT-LBM.Specifically, we focus on RT-LBM's accuracy in simulating direct solar radiation with different incoming boundary conditions.The computation speeds using a GPU and a CPU are compared for different sizes of computational grid setups.The organization of this work is as follows: The second section describes the derivation of RT-LBM, radiation parameters, boundary conditions, and its computation method.The Monte Carlo (MC) radiative transfer model used for the comparison study is also described in this section.The third section presents the results of RT-LBM simulations of radiative transfer around buildings and compares the model results using the well-established MCM.The computation speeds of RT-LBM on a GPU are described and compared with CPU implementation.The final section gives a summary and discussion of applications of RT-LBM.

The Lattice Boltzmann Model for Radiative Transfer
Spectral radiance propagation in a scattering and absorbing medium is described by the following RTE: where L(x, n, t) is the radiance at spatial point x and time t that travels along unit vector n into the solid angle Ω with the speed of light c.µ a and µ s are the absorption and scattering coefficients of the medium, respectively; L b is the blackbody radiance of the medium; and Φ is the scattering phase function of the medium.S is other radiation source such as radiation from ground, road, and buildings in the atmospheric boundary layer.This term is epically important in the atmospheric boundary layer.It is also a very complex term which deserved a very careful and thorough study.Since this paper is focused on the solar radiation transfer, we neglected the source term hereafter in this paper.The integral term represents the radiation scattered from the other directions onto the volume surface.The spectral dependence is omitted since a participating medium with a specific wavelength band is considered in this paper.
According to a kinetic theory of radiative transport [14], the RTE can be written as the Boltzmann equation form using a probability distribution function (PDF), f of a virtual radiative particle or a photon [26,29].The relation between the PDF at a direction i ( f i (x, t)) of a virtual particle or photon and the radiance is expressed as where w i are the weights corresponding to the lattice directions (Figure 1).Neglecting the medium blackbody radiation source term for a much smaller magnitude in a clear atmospheric boundary layer, the RTE of Equation ( 1) can be written in following form: where c is the speed of light and c i = cn i in the finite directions.The Boltzmann form of the RTE can be discretized in space in specific lattice directions, i (Figure 1), and time, t, as follows [7,26]: where Φ is the discrete scattering matrix describing the probability that a photon is scattered from the i to j direction, and  are the weighting factors corresponding to the direction i.This function can be used for describing the anisotropic scattering by prescribing the elements of Φ .For the isotropic scattering considered in this work, Φ = 1.The computation domain is first divided into structured cubic grids.For each grid point (0 point in Figure 1), there are 26 lattice directions and neighbor points.The computational algorithm for RT-LBM takes typical collision and streaming operations for each time step.The collision operation is computed in the terms on the right hand of Equation ( 4), where the interactions, the scattering and absorption, of the photon with medium particles in every lattice direction are accounted for.The equilibrium PDF is computed as in Equation ( 9).In the streaming operation, the probability  ( +   ∆,  + ∆) in a grid point is propagated in every direction to neighbor grid points (1 to 26) for the next time step.The macroscopic radiative variables are computed from Equations ( 5) and (6).The time step ∆t is related to the lattice length ∆x and c, c = ∆x ∆t .With the above definitions, the macroscopic radiation quantities, I (radiative intensity) and J (radiation flux vector), are computed from the statistical moments of the particle PDF, f, which are resulted from following integral form equations providing the (2) as the connection.
It is important to point out that the equilibrium function f eq i in the collision term has a different mechanism in radiative transfer than in fluid flow.The f eq i in radiative transfer represents the interaction of photons with the surrounding medium rather than the equilibrium PDF of fluid particle collision in the LBM for fluid modeling.The f eq i is defined as follows: where Φ ij is the discrete scattering matrix describing the probability that a photon is scattered from the i to j direction, and w i are the weighting factors corresponding to the direction i.This function can be used for describing the anisotropic scattering by prescribing the elements of Φ ij .For the isotropic scattering considered in this work, The computation domain is first divided into structured cubic grids.For each grid point (0 point in Figure 1), there are 26 lattice directions and neighbor points.The computational algorithm for RT-LBM takes typical collision and streaming operations for each time step.The collision operation is computed in the terms on the right hand of Equation ( 4), where the interactions, the scattering and absorption, of the photon with medium particles in every lattice direction are accounted for.The equilibrium PDF is computed as in Equation ( 9).In the streaming operation, the probability f i (x + c i ∆t, t + ∆t) in a grid point is propagated in every direction to neighbor grid points (1 to 26) for the next time step.The macroscopic radiative variables are computed from Equations ( 5) and (6).
To keep the model non-dimensional for the comparisons and applications, the medium's scattering albedo, a, and optical depth, b, (non-dimensional parameters) are used instead of the coefficients of absorption and scattering.The characteristic length scale for the photon is l c = (µ a + µ s ) −1 , representing the length of a photon's free path between two consecutive scattering events.The relationship between these parameters is expressed as where l phy = 1 is a modeled normalized physical domain length.

The Monte Carlo Model of Solar Radiation
A MC model is used to evaluate RT-LBM.It tracks a plentiful luminosity packet (referred to hereafter as MC "photons") first and then counts them statistically for distribution of radiative intensity as a function of location, direction, and frequency.Each package carries energy L ∆t/N, where N is the number of MC photons.As a result, each MC photon represents L ∆t/(N hν) real photons, where hν denotes the energy of a real photon.
The MC model emits plentiful MC photons to mimic a radiation source.Each photon travels a distance s and then is scattered, absorbed, or re-emitted.The distance s is determined by where ξ is a random number between 0 and 1.After traveling the distance s, the photon is scattered if a new random number, ξ, is below a; otherwise, the photon is absorbed.The direction of scattering photon is described by the zenith angle θ and the azimuth angle φ.Since scattering is assumed to be isotropic in the model, θ and φ are chosen as [32] φ = 2πξ (13) The MC model uses (10) and (11) to simulate solar radiation that penetrates the model top downward.It emits 5 × 10 9 MC photons to mimic the incoming solar radiation and then tracks them in the atmosphere individually.Its statistical results are eventually used to obtain the distribution of radiative intensity.

The Computation Domains Setup and Boundary Conditions
The design of the 3-D modeling domains is shown in Figure 2. All three cubic domains have the same number of computational grid points (101 × 101 × 101) in the x, y, and z directions.In the GPU computation speed test (Section 3.3), two setups of computational grid points were made much more dense, 501 × 501 × 201, to evaluate the effect of the number of grid points on computation speed.

Results
RT-LBM is evaluated with the MC models, since high-density 3-D radiation field data for these kinds of simulation are not available for comparison.Although the MC model generally requires much more computation power, it has been proven to be a versatile All the incoming solar beam radiation is from the top boundary.The first is the incoming boundary which includes the entire top plane of the computational domain (Figure 2a), the second is the center window incoming boundary condition of the top boundary (Figure 2b), and the third (Figure 2c) is the window incoming boundary with oblique incoming direct solar radiation.A unit radiative intensity at the top surface is prescribed for direct solar radiation, f 6 = 1, f 13,14,17,18,19,22,24,25 = 0, for perpendicular beam f 13 = 1, f 6,14,17,18,19,22,24,25 = 0, for 45 • solar zenith angle beam (15)

Results
RT-LBM is evaluated with the MC models, since high-density 3-D radiation field data for these kinds of simulation are not available for comparison.Although the MC model generally requires much more computation power, it has been proven to be a versatile and accurate method for modeling radiative transfer processes [1,26,29].In the following validation cases, the same computation domain setups, boundary conditions, and radiative parameters were used in the RT-LBM and MC models.In these simulations, we set every variable as non-dimensional, including the unit length of the simulation domain in the x, y, and z directions.Normalized, non-dimensional results provide convenience for application of the simulation results.The model domain is a unit cube, with 101 × 101 × 101 grid points in these simulations except in Section 3.3.The top face of the cubic volume is prescribed with a unit of incoming radiation intensity.The rest of the boundary faces are black walls, i.e., there is no incoming radiation and outgoing radiation freely passes out of the lateral and bottom boundaries.

Direct Solar Beam Radiation Perpendicular to the Entire Top Boundary
Figure 3 shows the simulation results of the plane (Y = 0.5) with RT-LBM (left panel) and the MC model (right panel).In these simulations, the entire top boundary was a prescribed radiation beam with a unit of intensity and the other boundaries were black walls.The simulation parameters were a = 0.9 and b = 12, which is optically very thick as in a clouded atmosphere or atmospheric boundary layer in a forest fire situation [31].The two simulation methods produced similar radiation fields in most areas except the MCM produced slightly greater radiative intensity near the top boundary.Near the side boundaries, the radiative intensity values were smaller due to less scattering of the beam radiation near the black boundaries.This case is also simulated in Mink et al. [29] with their MC model.Figure 4 is a plot of the radiative intensities along the line at the center of the computation domain using these three models.The simulation results from the three methods compare well.First, the results from the two MC models agree well, which validates the correctness of our own MC model.There are small differences near the top boundary between RT-LBM and the MC models.The reason for over-estimation near the incoming boundary area is caused by a small effect of false anisotropic radiative transport in LBM where only the direct beam radiation is specified in the incoming boundary.However, after penetration of two times of the free path lengths, the diffuse radiation becomes dominant and the results are much closer to the MC.Since the optical depth is very high, the radiation intensity from the top boundary to the bottom boundary gradually has a two orders of magnitude reduction.The MC model produced a radiative intensity field that had very little fluctuation in the contour plots (Figure 2), indicating that the 10 9 photons release in this simulation is adequate for removing the statistical noise.

Direct Solar Radiation from a Top Boundary Window
In this case, a perpendicular incoming beam entered a window (0.2 × 0.2) in t dle of the top boundary (Figure 2b).The parameters (a = 0.9, b = 2) of the particu dium are comparable to episodes of heavily polluted atmosphere in some urban ar 35].The LBM simulation was also evaluated with our MC model and other MC [29] results.

Direct Solar Radiation from a Top Boundary Window
In this case, a perpendicular incoming beam entered a window (0.2 × 0.2) in the middle of the top boundary (Figure 2b).The parameters (a = 0.9, b = 2) of the particular medium are comparable to episodes of heavily polluted atmosphere in some urban areas [33][34][35].The LBM simulation was also evaluated with our MC model and other MC model [29] results.
Figure 5 compares our RT-LBM and the MC simulations.The results between the two models matched reasonably well except at the area at the top of the window.Again, the MC model produced slightly larger radiative intensity values near the radiation entrance window.The other area, away from perpendicular to the incoming window, also had much smaller values due to the scattering of the direct beam area for this relatively medium optical depth and large scattering albedo.Some difference between RT-LBM and the MC model was observed in these low-intensity areas.The RT-LBM-simulated slightly smaller values near the incoming radiation boundary are also reported in Mink et al. [29].Figure 6 compares the line samples in the z direction (Y = 0.5; X = 0.5, 0.75, 0.85) for RT-LBM, our MC model, and the other MC model [29] simulations.The simulations compare well in the centerline, excepting slight differences near the window area.The radiation intensity compares reasonably well but there are slightly more differences off the centerline.

Direct Solar Radiation from a Top Boundary Window
In this case, a perpendicular incoming beam entered a window (0.2 × 0.2) in the middle of the top boundary (Figure 2b).The parameters (a = 0.9, b = 2) of the particular medium are comparable to episodes of heavily polluted atmosphere in some urban areas [33][34][35].The LBM simulation was also evaluated with our MC model and other MC model [29] results.
Figure 5 compares our RT-LBM and the MC simulations.The results between the two models matched reasonably well except at the area at the top of the window.Again, the MC model produced slightly larger radiative intensity values near the radiation entrance window.The other area, away from perpendicular to the incoming window, also had much smaller values due to the scattering of the direct beam area for this relatively medium optical depth and large scattering albedo.Some difference between RT-LBM and the MC model was observed in these low-intensity areas.The RT-LBM-simulated slightly smaller values near the incoming radiation boundary are also reported in Mink et al. [29].Figure 6 compares the line samples in the z direction (Y = 0.5; X = 0.5, 0.75, 0.85) for RT-LBM, our MC model, and the other MC model [29] simulations.The simulations compare well in the centerline, excepting slight differences near the window area.The radiation intensity compares reasonably well but there are slightly more differences off the centerline.To analyze the effect of window size on radiative intensity, we conducted multiple runs with different window sizes (0.05 × 0.05, 0.2 × 0.2, and 0.9 × 0.9) but with identical radiation intensities prescribed at the incoming window.Figure 7 displays the X-Z cross section at Y = 0.5.The medium parameters, a = 0.5 and b = 0.1, resemble a typical clean atmosphere [36].In spite of more fluctuations, in the MC simulation results many more photon particles (10 12 ) were released (Figure 7, bottom panel) due to the low optical depth in clean air, and the RT-LBM and MC simulations compare reasonably well.The variation of the centerline radiative intensity values was less than 5% with the larger window size having a slight greater radiative intensity.In these cases of the lower scattering albedo (Figures 7 and 8), the differences between RT-LBM and the MC model are much smaller near the incoming boundary compared with the case of large scattering albedo (Figure 4).This analysis indicates that very clean atmospheric conditions can be parameterized without much error using a single computation given an identical boundary or very similar boundary conditions.To analyze the effect of window size on radiative intensity, we conducted multiple runs with different window sizes (0.05 × 0.05, 0.2 × 0.2, and 0.9 × 0.9) but with identical radiation intensities prescribed at the incoming window.Figure 7 displays the X-Z cross section at Y = 0.5.The medium parameters, a = 0.5 and b = 0.1, resemble a typical clean atmosphere [36].In spite of more fluctuations, in the MC simulation results many more photon particles (10 12 ) were released (Figure 7, bottom panel) due to the low optical depth in clean air, and the RT-LBM and MC simulations compare reasonably well.The variation of the centerline radiative intensity values was less than 5% with the larger window size having a slight greater radiative intensity.In these cases of the lower scattering albedo (Figures 7 and 8), the differences between RT-LBM and the MC model are much smaller near the incoming boundary compared with the case of large scattering albedo (Figure 4).This analysis indicates that very clean atmospheric conditions can be parameterized without much error using a single computation given an identical boundary or very similar boundary conditions.To analyze the effect of window size on radiative intensity, we conducted multiple runs with different window sizes (0.05 × 0.05, 0.2 × 0.2, and 0.9 × 0.9) but with identical radiation intensities prescribed at the incoming window.Figure 7 displays the X-Z cross section at Y = 0.5.The medium parameters, a = 0.5 and b = 0.1, resemble a typical clean atmosphere [36].In spite of more fluctuations, in the MC simulation results many more photon particles (10 12 ) were released (Figure 7, bottom panel) due to the low optical depth in clean air, and the RT-LBM and MC simulations compare reasonably well.The variation of the centerline radiative intensity values was less than 5% with the larger window size having a slight greater radiative intensity.In these cases of the lower scattering albedo (Figures 7 and 8), the differences between RT-LBM and the MC model are much smaller near the incoming boundary compared with the case of large scattering albedo (Figure 4).This analysis indicates that very clean atmospheric conditions can be parameterized without much error using a single computation given an identical boundary or very similar boundary conditions.Another situation, of solar direct beam radiation oblique to the level ground surface, is simulated.The atmospheric optical parameters of a clean air (a = 0.5, b = 0.1) situation were used.The motivation for this simulation was to look into whether direct solar radiation decreases when the solar ray is not perpendicular to the top boundary surface.The incoming solar zenith angle was set to 45° from the west and the incoming direct solar radiative intensity was set to one.The RT-LBM and MC simulations compare reasonably well (Figure 8).The decaying of the solar radiative intensity along the centerline of the perpendicular to window (Figure 7) and oblique to the window at 45° shows small differences between the perpendicular and oblique cases.This comparison analysis indicates that a perpendicular-to-window simulation can be used to approximate the radiative intensity for different solar elevation angles, which has a very significant implication for reducing the radiation computation in a situation where the atmospheric optical parameters are constant and vertically homogenous.Another situation, of solar direct beam radiation oblique to the level ground surface, is simulated.The atmospheric optical parameters of a clean air (a = 0.5, b = 0.1) situation were used.The motivation for this simulation was to look into whether direct solar radiation decreases when the solar ray is not perpendicular to the top boundary surface.The incoming solar zenith angle was set to 45° from the west and the incoming direct solar radiative intensity was set to one.The RT-LBM and MC simulations compare reasonably well (Figure 8).The decaying of the solar radiative intensity along the centerline of the perpendicular to window (Figure 7) and oblique to the window at 45° shows small differences between the perpendicular and oblique cases.This comparison analysis indicates that a perpendicular-to-window simulation can be used to approximate the radiative intensity for different solar elevation angles, which has a very significant implication for reducing the radiation computation in a situation where the atmospheric optical parameters are constant and vertically homogenous.Another situation, of solar direct beam radiation oblique to the level ground surface, is simulated.The atmospheric optical parameters of a clean air (a = 0.5, b = 0.1) situation were used.The motivation for this simulation was to look into whether direct solar radiation decreases when the solar ray is not perpendicular to the top boundary surface.The incoming solar zenith angle was set to 45 • from the west and the incoming direct solar radiative intensity was set to one.The RT-LBM and MC simulations compare reasonably well (Figure 8).The decaying of the solar radiative intensity along the centerline of the perpendicular to window (Figure 7) and oblique to the window at 45 • shows small differences between the perpendicular and oblique cases.This comparison analysis indicates that a perpendicular-to-window simulation can be used to approximate the radiative intensity for different solar elevation angles, which has a very significant implication for reducing the radiation computation in a situation where the atmospheric optical parameters are constant and vertically homogenous.

GPU Implementation of RT-LBM and Computational Speed
In solving the complex RTE, RT-LBM uses the typical collision and streaming steps that the LBM used to solve fluid simulations [30,31].By writing the Equation (4) in the following form, The complex scattering and absorption processes in the RTE are computed within each node (Figure 1) by a photon-medium collision process (Equation ( 16)), while a streaming process of photons is handled by simple linear propagation between the neighbor grid nodes (Equation ( 17)).This makes the LBM very effective within the massive parallelization of the GPU architecture.Modern GPUs have thousands of compute cores and thus are capable of running thousands of threads simultaneously.In the LBM algorithm, the data locality is satisfied and time-stepping is explicit.Each computation grid cell in the domain is assigned to a GPU thread.The entire computation procedure described in Section 2.1 can be summarized in following pseudo-code: 1.
Set up the radiation parameters and computation grids; 2.
For each iteration time step t do; 3.
For each lattice node x do; 4.
If node x is a boundary point then; 5.
Apply the boundary conditions; 6.
The model code is written in NVidia CUDA (Common Unified Device Architecture).In our implementation, the 3-D computation grids are mapped to 1-D memory.In GPUs, threads execute in lockstep in group sets called warps.The threads within each warp need to load memory together in order to use the hardware most effectively.This is called memory coalescing.In our implementation, we manage this by ensuring threads within a warp are accessing consecutive global memory as often as possible.For instance, when calculating the PDF vectors in Equation ( 15), we must load all 26 lattice PDFs per grid cell.We organize the PDFs such that all the values for each specific direction are consecutive in memory.In this way, as the threads of a warp access the same direction across consecutive grid cells, these memory accesses can be coalesced.
A common bottleneck in GPU-dependent applications is transferring data between main memory and GPU memory.In our implementation, we are performing the entire simulation on the GPU and the only time data need to be transferred back to the CPU during the simulation is when we calculate the error norm to check the convergence.In our initial implementation, this step was conducted by first transferring the radiation intensity data for every grid cell to main memory each time step and then calculating the error norm on the CPU.To improve performance, we only check the error norm every 10 time steps.This leads to a 3.5× speedup over checking the error norm every time step for the 101 3 domain case.This scheme is sufficient, but we took it a step further, implementing the error norm calculation itself on the GPU.To achieve this, we implement a parallel reduction to produce a small number of partial sums of the radiation intensity data.It is this array of partial sums that is transferred to main memory instead of the entire volume of radiation intensity data.
On the CPU, we calculate the final sums and complete the error norm calculation.This new implementation only results in a 1.32× speedup (101 3 domain) over the previous scheme of checking only every 10 time steps.However, we no longer need to check the error norm at a reduced frequency to achieve similar performance; checking every 10 time steps is only 0.057× faster (101 3 domain) than checking once a frame using GPU-accelerated calculation.In the tables below, we opted to use the GPU calculation at 10 frames per second but it is comparable to the results of checking every frame.
Tables 1 and 2 list the computational efficiency of our RT-LBM.A computational domain with a direct top beam (Figures 2 and 3) was used for the demonstration.In order to see the domain size effect on computation speed, the computation was carried out for different numbers of the computational nodes (101 × 101 × 101 and 501 × 501 × 201).The RTE is a steady-state equation, and many iterations are required to achieve a steady-state solution.These computations are considered to converge to a steady-state solution when the error norm is less than 10 −6 .The normalized error or error norm ε at iteration time step t is defined as: where I is the radiation intensity at grid nodes, n is the grid node index, and N is the total number of grid points in the entire computation domain.The single-thread CPU computation using a FORTRAN version of the code, which is slightly faster than the code in C, is used for the computation speed comparison.The speed of the RT-LBM model and MC model in a same CPU are compared for the first case only to demonstrate that the MC model is much slower than the RT-LBM.RT-LBM in the CPU is about 10.36 times faster than the MC model from the first domain setup using the CPU.A NVidia Tesla V100 (5120 cores, 32 GB memory) was run to observe the speed-up factors for the GPU over the CPU.The CPU used for the RT-LBM model computation is an Intel CPU (Intel Xeon CPU at 2.3 GHz).For the domain size of 101 × 101 × 101, the Tesla V100 GPU showed a 39.24 times speed-up compared with single CPU processing (Table 1).It is worthwhile noting the speed-up factor of RT-LBM (GPU) over the MC model (CPU) was 406.53 (370/0.91)times if RT-LBM was run on a Tesla V100 GPU.For the much larger domain size, 501 × 501 × 201 grid nodes (Table 2), the RT-LBM in the Tesla V100 GPU had a 120.03 times speed-up compared with the Intel Xeon CPU at 2.3 GHz.These results indicated the GPU is even more effective in speeding up RT-LBM computations when the computational domain is much larger, which is consistent with what we found with the LBM fluid flow modeling [30].We are in the process of extending our RT-LBM implementation to multiple GPUs which will be necessary in order to handle even larger computational domains.
The computational speed-up of RT-LBM using the single GPU over CPU is not as great as in the case of turbulent flow modeling [30], which showed a 200 to 500 speed-up using older NVidia GPU cards.The reason is turbulent flow modeling uses a timemarching transient model, while RT-LBM is a steady-state model, which requires many more iterations to achieve a steady-state solution.Nevertheless, the GPU speed-up of 120 times in RT-LBM is significant for implementing radiative transfer modeling which is computationally expensive.
The model code is also tested for the grid dependency by computing the radiation field in a same domain using three different grid densities.Figure 9 shows the radiation intensities in three different grid densities (101 3 , 201 3 , and 301 3 computation grids).The convergence criteria were set to be 10 −5 for the error norm.The LBM model produced very similar radiation fields that are hard to see the differences visually.This fact indicates that 10 −5 error norm for convergence criteria is probably good enough for this model domain situation.The convergence behaviors of the different densities of computation grids were also recorded at each iteration step and plotted in Figure 10.The three different grid densities showed a similar trend in convergence behavior with respect to iteration time.The iteration time to reach a same small error norm of 10 −6 for denser grids requires many more steps of iterations.In these simulations, the Courant number are all equal to one.Numerical stable algorithm was assured and also observed.The model code is also tested for the grid dependency by computing the radiation field in a same domain using three different grid densities.Figure 9 shows the radiation intensities in three different grid densities (101 3 , 201 3 , and 301 3 computation grids).The convergence criteria were set to be 10 −5 for the error norm.The LBM model produced very similar radiation fields that are hard to see the differences visually.This fact indicates that 10 −5 error norm for convergence criteria is probably good enough for this model domain situation.The convergence behaviors of the different densities of computation grids were also recorded at each iteration step and plotted in Figure 10.The three different grid densities showed a similar trend in convergence behavior with respect to iteration time.The iteration time to reach a same small error norm of 10 −6 for denser grids requires many more steps of iterations.In these simulations, the Courant number are all equal to one.Numerical stable algorithm was assured and also observed.The model code is also tested for the grid dependency by computing the radiation field in a same domain using three different grid densities.Figure 9 shows the radiation intensities in three different grid densities (101 3 , 201 3 , and 301 3 computation grids).The convergence criteria were set to be 10 −5 for the error norm.The LBM model produced very similar radiation fields that are hard to see the differences visually.This fact indicates that 10 −5 error norm for convergence criteria is probably good enough for this model domain situation.The convergence behaviors of the different densities of computation grids were also recorded at each iteration step and plotted in Figure 10.The three different grid densities showed a similar trend in convergence behavior with respect to iteration time.The iteration time to reach a same small error norm of 10 −6 for denser grids requires many more steps of iterations.In these simulations, the Courant number are all equal to one.Numerical stable algorithm was assured and also observed.The unit conversion in LBM fluid modeling is complex [37].However, in this steadystate radiative transfer modeling, the time step is only for the iteration computation and there is no problem to map the non-dimensional variables to variables' units.Since the LBM-RT in this paper is a steady-state problem, only conversions are needed between physical length and non-dimensional length, and the scattering and absorption coefficients and non-dimensional parameters a and b (a scattering albedo, b optical depth) can be transformed using Equations (10) and (11).The radiation intensity can be converted to a physical unit by multiplying the value of incoming boundary intensity with a physical unit.

Discussion and Conclusions
This paper reported a newly developed radiative transfer model using the lattice Boltzmann method, RT-LBM, for applications in atmospheric environments.The test results indicated the new RT-LBM has reasonably accurate results compared with traditional MC models.The model takes advantage of the LBM algorithms of collision and streaming to accelerate the computation speed.The implementation of RT-LBM using the GPU has realized a computation speed-up of 120 times faster than a CPU implementation for a very large domain.RT-LBM also had a 10 times speed-up over the MC model for a same radiative case on the same CPU, which makes a total of a 406 times speed-up for RT-LBM on a GPU over the MC model on a CPU.
The atmospheric environment is a complex composite of many different gases, aerosols, and hydrometers, and the composition is very dynamic.The optical parameters are often very different for different wavelengths of radiation.In atmospheric radiative transfer modeling, many runs for different spectral lengths with different optical parameters must be made to complete the entire radiative energy transfer domain.Since radiative modeling is computationally intensive, the newly developed RT-LBM provides advantages.However, many research areas, such as complex boundary specification, anisotropic scattering by large aerosols, and optical parameters specification, need to be carried out to realize the potential of this new method for specific applications.Some applications, such as for solar energy, are feasible with RT-LBM using broadband optical parameters to reduce the complexity.In this case, solar radiation can be divided into two spectral bands, shortwave and longwave.Two different sets of bulk optical parameters can be used for solar shortwave radiation and longwave radiation from the ground surface.

Figure 1 .
Figure 1.D3Q26 lattice used in RT-LBM.The numbered arrows are the lattice directions of the photon propagation to neighbor lattice nodes.

Figure 1 .
Figure 1.D3Q26 lattice used in RT-LBM.The numbered arrows are the lattice directions of the photon propagation to neighbor lattice nodes.

Atmosphere 2021 , 15 Figure 2 .
Figure 2. Three types of incoming radiation boundaries (a-c) and setups for the simulations.The red vertical planes are the Z-X cross sections at Y = 0.5, which are plotted in the Results section.

Figure 2 .
Figure 2. Three types of incoming radiation boundaries (a-c) and setups for the simulations.The red vertical planes are the Z-X cross sections at Y = 0.5, which are plotted in the Results section.

Figure 3 .
Figure 3.Comparison of the simulation results from RT-LBM (left panel) and the MC mod panel).The X-Z cross sections (Y = 0.5) are from the 3-D radiative intensity fields.The radia parameters are a = 0.9 and b = 12.

Figure 4 .
Figure 4. Comparison of the radiative intensity along the Z lines (X = 0.5, Y = 0.5) for RT-LB MC model, and the MC model from Mink et al. (2020).The radiative parameters are a = 0.9 12.
compares our RT-LBM and the MC simulations.The results between models matched reasonably well except at the area at the top of the window.Ag MC model produced slightly larger radiative intensity values near the radiation e window.The other area, away from perpendicular to the incoming window, a much smaller values due to the scattering of the direct beam area for this relativ dium optical depth and large scattering albedo.Some difference between RT-LB the MC model was observed in these low-intensity areas.The RT-LBM-simulated smaller values near the incoming radiation boundary are also reported inMink et

Figure 3 .
Figure 3.Comparison of the simulation results from RT-LBM (left panel) and the MC model (right panel).The X-Z cross sections (Y = 0.5) are from the 3-D radiative intensity fields.The radiative parameters are a = 0.9 and b = 12.

Figure 3 .
Figure 3.Comparison of the simulation results from RT-LBM (left panel) and the MC model (right panel).The X-Z cross sections (Y = 0.5) are from the 3-D radiative intensity fields.The radiative parameters are a = 0.9 and b = 12.

Figure 4 .
Figure 4. Comparison of the radiative intensity along the Z lines (X = 0.5, Y = 0.5) for RT-LBM, the MC model, and the MC model from Mink et al. (2020).The radiative parameters are a = 0.9 and b = 12.

Figure 4 .
Figure 4. Comparison of the radiative intensity along the Z lines (X = 0.5, Y = 0.5) for RT-LBM, the MC model, and the MC model from Mink et al. (2020).The radiative parameters are a = 0.9 and b = 12.

Figure 5 .
Figure 5. Windowed simulation results from RT-LBM (left panel) and the MC model (right panel).The X-Z cross sections (Y = 0.5) are from the 3-D radiative intensity fields.The radiative parameters are a = 0.9, b = 2.

Figure 5 . 15 Figure 5 .
Figure 5. Windowed simulation results from RT-LBM (left panel) and the MC model (right panel).The X-Z cross sections (Y = 0.5) are from the 3-D radiative intensity fields.The radiative parameters are a = 0.9, b = 2.

Figure 7 .
Figure 7. Different window size effects on the direct solar radiation intensity.The top row are from RT-LBM simulations.The bottom row are from MC model simulations.The radiative parameters are a = 0.5, b = 0.1.

Figure 8 .
Figure 8. Oblique incoming solar direct beam radiation simulation case.Comparison of the radiative intensity at X-Z cross section at Y = 0.5.for RT-LBM and the MC model.The radiative parameters are a = 0.5, b = 0.1.

7 . 15 Figure 7 .
Figure 7. Different window size effects on the direct solar radiation intensity.The top row are from RT-LBM simulations.The bottom row are from MC model simulations.The radiative parameters are a = 0.5, b = 0.1.

Figure 8 .
Figure 8. Oblique incoming solar direct beam radiation simulation case.Comparison of the radiative intensity at X-Z cross section at Y = 0.5.for RT-LBM and the MC model.The radiative parameters are a = 0.5, b = 0.1.

Figure 8 .
Figure 8. Oblique incoming solar direct beam radiation simulation case.Comparison of the radiative intensity at X-Z cross section at Y = 0.5.for RT-LBM and the MC model.The radiative parameters are a = 0.5, b = 0.1.

Figure 9 .
Figure 9.The plots of vertical (Y = 0.5) cross-sections of radiation intensities with different computation grid densities (left: 101 3 , center: 201 3 , and right: 301 3 ).The radiation parameters (a = 0.5, b = 0.1) and the domain sizes are the same for these three runs.

Figure 10 .
Figure 10.The convergence behaviors of the model simulations with three different grid densities.

Figure 9 .
Figure 9.The plots of vertical (Y = 0.5) cross-sections of radiation intensities with different computation grid densities (left: 101 3 , center: 201 3 , and right: 301 3 ).The radiation parameters (a = 0.5, b = 0.1) and the domain sizes are the same for these three runs.

Figure 9 .
Figure 9.The plots of vertical (Y = 0.5) cross-sections of radiation intensities with different computation grid densities (left: 101 3 , center: 201 3 , and right: 301 3 ).The radiation parameters (a = 0.5, b = 0.1) and the domain sizes are the same for these three runs.

Figure 10 .
Figure 10.The convergence behaviors of the model simulations with three different grid densities.Figure 10.The convergence behaviors of the model simulations with three different grid densities.

Figure 10 .
Figure 10.The convergence behaviors of the model simulations with three different grid densities.Figure 10.The convergence behaviors of the model simulations with three different grid densities.