Hybrid Scene Structuring for Accelerating 3D Radiative Transfer Simulations

: Three-dimensional (3D) radiative transfer models are the most accurate remote sensing models. However, presently the application of 3D models to heterogeneous Earth scenes is a computationally intensive task. A common approach to reduce computation time is abstracting the landscape elements into simpler geometries (e.g., ellipsoid), which, however, may introduce biases. Here, a hybrid scene structuring approach is proposed to accelerate the radiative transfer simulations while keeping the scene as realistic as possible. In a ﬁrst step, a 3D description of the Earth landscape with equal-sized voxels is optimized to keep only non-empty voxels (i.e., voxels that contain triangles) and managed using a bounding volume hierarchy (BVH). For any voxel that contains triangles, within-voxel BVHs are created to accelerate the ray–triangle intersection tests. The hybrid scheme is implemented in the Discrete Anisotropic Radiative Transfer (DART) model by integrating the Embree ray-tracing kernels developed at Intel. In this paper, the performance of the hybrid algorithm is compared with the original uniform grid approach implemented in DART for a 3D city scene and a forest scene. Results show that the removal of empty voxels can accelerate urban simulation by 1.4 × ~3.7 × , and that the within-voxel BVH can accelerate forest simulations by up to 258.5 × .


Introduction
Understanding and modeling the radiative behavior at Earth's surface is essential for a wide range of scientific domains including agriculture, forestry and urban microclimate. The applications vary from structural [1] and biophysical parameter retrieval [2] of vegetation, to urban radiation flux studies [3]. For vegetation monitoring, much attention has been paid to radiative transfer modeling and numerous physical models have been developed in the past decades to describe the interactions between solar radiation and complex canopies [4]. Compared to one-dimensional (1D) models (e.g., SAIL [5]), which treat the canopy as a horizontally homogeneous medium, three-dimensional (3D) radiative transfer (RT) models can take into account the highly heterogeneous Earth surfaces. Accurate and efficient 3D models are essential tools for deriving accurate parameters from remote sensing measurements [6], as well as for validating satellite-based geo-physical products [7]. This is notably true as 3D information is more and more available with the development of 3D acquisition technology. For example, LiDAR (Light Detection and Ranging) is more and more used in forest studies for measuring the vegetation 3D attributes [8,9], which are important input parameters for 3D radiative transfer models. However, a drawback of 3D RT models is that they are more difficult to parameterize, especially for some LUT (look up table)-based inversion methods of remote sensing acquisitions. For example, hundreds of thousands of simulations and representatives of different environmental and instrumental conditions may be needed to build an invertible LUT database [10]. Thus, the consideration of computational efficiency becomes crucial in these kinds of applications. In particular, 3D RT models should take advantage of the techniques developed in the computer graphics domain to optimize the computation. The poor efficiency is mainly due to the fact that RT models are designed to simulate remote sensing acquisitions that are very accurate in terms of radiometry, and not simply realistic from the human point of view, conversely to most computer graphics algorithms.
In 3D RT models, the complex landscape elements are described with triangle meshes, geometrical objects and turbid media. Triangle meshes can be used to simulate very precise 3D structures (e.g., position of each leaf), while turbid media can be used to simulate the canopy structure with statistical parameters (e.g., LAD: leaf area density and LOD: leaf orientation distribution). To simulate large and complex scenes, the triangle and turbid elements in 3D space must be organized into efficient computer data structures to avoid unmanageable computer constraints in terms of volume memory and computation time [11], e.g., the algorithm that tests all objects one by one to determine if an intersection occurs is too computationally expensive to simulate complex scenes [12]. Most of the RT models rely on a space subdivision technique to avoid some redundant computations. With space subdivision, objects which are located in a subspace (or voxel) which is not intersected by a ray can be directly excluded from the intersection tests, which can significantly increase the efficiency when handling scenes with a large number of elements. The space subdivision often partitions the space into hierarchical subspaces (or voxels), where the scene elements are located [13]. This very important technique is heavily used in some ray-tracing-based (discrete-ordinate or Monte-Carlo) RT models, for achieving computationally efficient ray-object intersections [14][15][16].
Up until now, various space subdivision approaches have been implemented, including uniform grid [17], BSP (binary space partitioning) tree [18], octree [13] and BVH (bounding volume hierarchies) [19] etc. Because the performances of these approaches usually depend on the specific scene structures, no single technique can be universally the best [20]. The uniform grid is well suited for homogeneous scenes, i.e., landscape elements are relatively uniformly distributed in space, while other approaches, such as BVH, are better for inhomogeneous scenes. These facts lead us to develop a hybrid scene structure which can combine the advantages of the uniform grid and BVH tree, which is also self-adaptive to different scene structures.
In this paper, we present a hybrid scene structuring scheme that is designed to accelerate the radiative transfer modeling in the Discrete Anisotropic Radiative Transfer (DART) model, because the latter one simulates the radiative budget and remote sensing images of Earth landscapes using the uniform grid. DART is one of the most comprehensive physical-based models [6]. It can simulate the 3D radiative transfer budget and various remote sensing acquisitions (e.g., LiDAR, spectroradiometer and solar-induced fluorescence, etc.) over the whole optical domain, including atmosphere. Compared to other models, DART can work on scenes that are simulated with any combinations of triangles and fluids, including the so-called turbid medium that is often used for giving a statistical representation of vegetation. This newly designed structuring scheme combines the advantages of uniform grid and BVH, which enables DART to simulate larger scenes under higher spatial resolutions with lower computational cost. This paper is structured as follows. Section 2 presents the new hybrid scene structuring algorithm and its implementation in DART. In Sections 3 and 4, the hybrid algorithm is compared to the original DART, in terms of accuracy, computer time and memory usage. Section 5 concludes this paper and gives the outlook for future improvements.

Uniform Grid in Discrete Anisotropic Radiative Transfer (DART)
DART uses the discrete-ordinate ray-tracing method to simulate the propagation and interaction of radiation in the "Earth-Atmosphere" scene ( Figure 1), which is represented as a 3D array of rectangular voxels. The tracked radiation is along N directions that discretize the 4π space. Each direction Ω i is associated with a solid angle ∆Ω i with N i=1 ∆Ω i = 4π sr [21]. The 3D scene contains two major parts: the atmosphere and the Earth scene. Its usefulness is illustrated here with its so-called "reflectance mode", where the sun is the only radiation source. At the beginning, the propagation of radiation in the atmosphere is simulated [22], which gives the direct sun irradiance and diffuse atmospheric radiation that reach the voxels of the top of the Earth scene at the bottom of atmosphere (BOA). Then, this so-called BOA radiation illuminates the Earth scene. A typical Earth scene can usually contain landscape elements, such as trees, grass, houses and water. These elements are represented with any combination of triangles, fluids and turbid media.

Uniform Grid in Discrete Anisotropic Radiative Transfer (DART)
DART uses the discrete-ordinate ray-tracing method to simulate the propagation and interaction of radiation in the "Earth-Atmosphere" scene ( Figure 1), which is represented as a 3D array of rectangular voxels. The tracked radiation is along N directions that discretize the 4 space. Each direction is associated with a solid angle with ∑ = 4 =1 sr [21]. The 3D scene contains two major parts: the atmosphere and the Earth scene. Its usefulness is illustrated here with its socalled "reflectance mode", where the sun is the only radiation source. At the beginning, the propagation of radiation in the atmosphere is simulated [22], which gives the direct sun irradiance and diffuse atmospheric radiation that reach the voxels of the top of the Earth scene at the bottom of atmosphere (BOA). Then, this so-called BOA radiation illuminates the Earth scene. A typical Earth scene can usually contain landscape elements, such as trees, grass, houses and water. These elements are represented with any combination of triangles, fluids and turbid media.  Figure 1. Radiative transfer modeling in the "Earth-Atmosphere" scene of the Discrete Anisotropic Radiative Transfer (DART) model. The scene of DART is divided into two parts: Earth and Atmosphere. Earth is represented with regular voxels while Atmosphere is represented with different horizontal layers.
As presented in Figure 1, DART partitions the Earth scene into equal-sized voxels, which are used to organize the landscape elements (triangles, fluids and turbid media). Equal-sized voxels are mandatory for simulating the 3D radiative budget (e.g., absorbed radiation) of Earth's landscapes. For each triangle, a triangle-voxel intersection test [23] is performed to determine the voxel that contains it. The turbid medium is just a set of descriptive statistical parameters (e.g., LAD and LOD for vegetation), and it usually fills the whole voxel. To store this information in memory for radiation tracking, a list of voxels is first created. The length of this list is equal to the number of voxels in the scene. For each voxel, DART stores a list of references of all the triangles which intersect this voxel and a list of turbid media that are in the voxel ( Figure 2).
The radiative transfer with the Earth scene (landscape) starts by tracing rays from the BOA illumination plane to the scene. Figure 2 illustrates the case of a ray that enters a voxel at the entry point A and then crosses five voxels. When a ray enters a voxel, ray-triangle tests are performed sequentially for all the triangles inside the voxel. Intersection occurs for the closest triangle that As presented in Figure 1, DART partitions the Earth scene into equal-sized voxels, which are used to organize the landscape elements (triangles, fluids and turbid media). Equal-sized voxels are mandatory for simulating the 3D radiative budget (e.g., absorbed radiation) of Earth's landscapes. For each triangle, a triangle-voxel intersection test [23] is performed to determine the voxel that contains it. The turbid medium is just a set of descriptive statistical parameters (e.g., LAD and LOD for vegetation), and it usually fills the whole voxel. To store this information in memory for radiation tracking, a list of voxels is first created. The length of this list is equal to the number of voxels in the scene. For each voxel, DART stores a list of references of all the triangles which intersect this voxel and a list of turbid media that are in the voxel ( Figure 2). enter, because the exit face of the current voxel is also the entry face of the next voxel. It is a major advantage of the uniform grid partitioning technique. However, this step-by-step algorithm cannot skip empty spaces (e.g., voxel 4, 8 and 9 in Figure 2), which may lead to many redundant computations in highly heterogeneous scenes with large empty gaps. This uniform voxel approach may also consume a lot of memory, since all the voxels are identical and need to be created, except that some of them store a link to a list of triangles or turbid media.

Voxel-Level Ray Tracking
DART uses the voxel as the basic unit to calculate the transferred energy in the Earth scene. This approach is also adopted in our new hybrid scene approach, since voxels can also support turbid medium, which makes it possible to simulate triangle and turbid medium simultaneously in a single scene. The voxels in the 3D space are categorized into three groups: non-empty voxel, empty voxel, and null voxel ( Figure 3). The non-empty voxel is the voxel that contains triangles or fluids/turbid media. The empty voxel contains nothing, but they are needed to set up the source rays and store radiation that exits the Earth scene during the ray-tracing process. Therefore, they only exist on the side of the scene. For null voxels, they neither contain any triangles or turbid media nor store any energy, thus, they do not need to be created in the scene, which can save computer memory, especially when the 3D scene contains many empty spaces (e.g., landscape with topography). Null voxels only exist inside the scene.
To skip the empty spaces (null voxels) during the ray-tracing method, the empty and non-empty voxels are organized using a BVH tree, which is well adapted to non-uniform geometry distributions. The BVH used in this study is a binary tree that structures the bounding volumes of all the voxels into a hierarchy structure. As shown in Figure 3b, it first creates a root node that represents the bounding volume of the whole scene. Then, all the voxels are divided into two sides around the midpoint along the longest scene axis (e.g., the x axis). For each side, a bounding volume containing the voxels inside it is created. In the BVH tree, these two bounding volumes are the child nodes of the root. This partition process is recursively repeated until the child node only contains a small number of voxels (e.g., 1 or 2).
To track rays in this hybrid scene, the algorithms of uniform grid and BVH are combined. This is illustrated here with a ray that enters a voxel (i.e., voxel 2) on the side of the simulated earth scene The radiative transfer with the Earth scene (landscape) starts by tracing rays from the BOA illumination plane to the scene. Figure 2 illustrates the case of a ray that enters a voxel at the entry point A and then crosses five voxels. When a ray enters a voxel, ray-triangle tests are performed sequentially for all the triangles inside the voxel. Intersection occurs for the closest triangle that intersects the incident ray. Usually, the order of the intersections is not known, thus a sorting algorithm is performed, which may require large computation time if too many triangles are inside the voxel. If the ray intersects a triangle, scattering is computed using the optical properties of this triangle. If present, turbid media is taken into account, both for transmittance and scattering, including within voxel single and multiple scattering. If no triangle intersection is found, the ray crosses the voxel, and an exit point (e.g., point B in Figure 2) will be calculated by a ray-plane intersection algorithm [24]. Then, it is very straightforward to find the next voxel that the ray will enter, because the exit face of the current voxel is also the entry face of the next voxel. It is a major advantage of the uniform grid partitioning technique. However, this step-by-step algorithm cannot skip empty spaces (e.g., voxel 4, 8 and 9 in Figure 2), which may lead to many redundant computations in highly heterogeneous scenes with large empty gaps. This uniform voxel approach may also consume a lot of memory, since all the voxels are identical and need to be created, except that some of them store a link to a list of triangles or turbid media.

Voxel-Level Ray Tracking
DART uses the voxel as the basic unit to calculate the transferred energy in the Earth scene. This approach is also adopted in our new hybrid scene approach, since voxels can also support turbid medium, which makes it possible to simulate triangle and turbid medium simultaneously in a single scene. The voxels in the 3D space are categorized into three groups: non-empty voxel, empty voxel, and null voxel ( Figure 3). The non-empty voxel is the voxel that contains triangles or fluids/turbid media. The empty voxel contains nothing, but they are needed to set up the source rays and store radiation that exits the Earth scene during the ray-tracing process. Therefore, they only exist on the side of the scene. For null voxels, they neither contain any triangles or turbid media nor store any energy, thus, they do not need to be created in the scene, which can save computer memory, especially when the 3D scene contains many empty spaces (e.g., landscape with topography). Null voxels only exist inside the scene.
To skip the empty spaces (null voxels) during the ray-tracing method, the empty and non-empty voxels are organized using a BVH tree, which is well adapted to non-uniform geometry distributions. The BVH used in this study is a binary tree that structures the bounding volumes of all the voxels into a hierarchy structure. As shown in Figure 3b, it first creates a root node that represents the bounding volume of the whole scene. Then, all the voxels are divided into two sides around the midpoint along the longest scene axis (e.g., the x axis). For each side, a bounding volume containing the voxels inside it Remote Sens. 2019, 11, 2637 5 of 15 is created. In the BVH tree, these two bounding volumes are the child nodes of the root. This partition process is recursively repeated until the child node only contains a small number of voxels (e.g., 1 or 2). 21). For that, the BVH algorithm tests if the ray intersects the bounding volume of the root node. If there is an intersection, the algorithm tests the occurrence of the intersection with child nodes. At the leaf node, the ray will test the ray-voxel intersection with all the voxels, one by one, inside the node using a ray-box intersection algorithm [25]. This binary search scheme is very efficient for finding voxels that are far from the current voxel. However, it is less efficient if the next cell is very close, since the BVH top-down algorithm searches from the root node to leaf node for ray. Thus, as long as a ray meets non-null voxels, the grid tracking method is used. This mechanism makes the model to choose the tracking scheme automatically, and it can adapt to different scene structures without user intervention.

Within-Voxel Ray Tracking
In the hybrid structuring scheme, the triangles are not directly managed by BVH but organized by voxels. When a ray hits a voxel, the algorithm tests the intersection with all the triangles one by one. For some applications (e.g., large voxel size), each voxel may contain many triangles (e.g., 100 triangles). Hence, testing triangle intersections one by one may also slow down the radiative transfer process. Thus, if the number of triangles in a voxel exceeds a threshold, a BVH tree (so called withinvoxel BVH) is created inside this voxel to organize all the triangles directly. In that case, during ray tracking, the test of the triangle intersection is calculated with the BVH tree instead of the plain list of triangles. Figure 4 shows a simple 1 m × 1 m scene, which contains only one voxel with N randomly distributed triangles. The number N takes the value 1, 5, 10, 50, 100, 200, 400 or 800. The illumination density is set to 100 rays per square meter, which gives a total number of 100 rays. The total intersection time is measured, which is shown in Figure 4b. It can be seen that the total time increases linearly with the increase of N if the within-voxel BVH algorithm is not used, while it is nearly constant if the within-voxel BVH is used. When N is as low as 10, the time is nearly the same. It explains that the number 10 is used as a threshold to indicate whether a BVH tree should be created in a voxel, which avoids creating BVH trees for voxels that contain only a few triangles (i.e., less than 10). This approach is useful because the BVH tree also consumes memory. Another advantage of the within-voxel BVH is that it can overcome the floating-point limitation of Embree by offsetting triangle coordinates with voxel coordinates, especially when handling very large scenes, which is very important for the application of the 3D radiative transfer model in satellite products' validation. To track rays in this hybrid scene, the algorithms of uniform grid and BVH are combined. This is illustrated here with a ray that enters a voxel (i.e., voxel 2) on the side of the simulated earth scene (Figure 3a). Then, the computation of the exit point gives the next possible voxel (i.e., voxel 8). If the next voxel is a null voxel, the BVH algorithm is used to determine the next non-null voxel (i.e., voxel 21). For that, the BVH algorithm tests if the ray intersects the bounding volume of the root node. If there is an intersection, the algorithm tests the occurrence of the intersection with child nodes. At the leaf node, the ray will test the ray-voxel intersection with all the voxels, one by one, inside the node using a ray-box intersection algorithm [25]. This binary search scheme is very efficient for finding voxels that are far from the current voxel. However, it is less efficient if the next cell is very close, since the BVH top-down algorithm searches from the root node to leaf node for ray. Thus, as long as a ray meets non-null voxels, the grid tracking method is used. This mechanism makes the model to choose the tracking scheme automatically, and it can adapt to different scene structures without user intervention.

Within-Voxel Ray Tracking
In the hybrid structuring scheme, the triangles are not directly managed by BVH but organized by voxels. When a ray hits a voxel, the algorithm tests the intersection with all the triangles one by one. For some applications (e.g., large voxel size), each voxel may contain many triangles (e.g., 100 triangles). Hence, testing triangle intersections one by one may also slow down the radiative transfer process. Thus, if the number of triangles in a voxel exceeds a threshold, a BVH tree (so called within-voxel BVH) is created inside this voxel to organize all the triangles directly. In that case, during ray tracking, the test of the triangle intersection is calculated with the BVH tree instead of the plain list of triangles. Figure 4 shows a simple 1 m × 1 m scene, which contains only one voxel with N randomly distributed triangles. The number N takes the value 1, 5, 10, 50, 100, 200, 400 or 800. The illumination density is set to 100 rays per square meter, which gives a total number of 100 rays. The total intersection time is measured, which is shown in Figure 4b. It can be seen that the total time increases linearly with the increase of N if the within-voxel BVH algorithm is not used, while it is nearly constant if the within-voxel BVH is used. When N is as low as 10, the time is nearly the same. It explains that the number 10 is used as a threshold to indicate whether a BVH tree should be created in a voxel, which avoids creating BVH trees for voxels that contain only a few triangles (i.e., less than 10). This approach is useful because the BVH tree also consumes memory. Another advantage of the within-voxel BVH is that it can overcome the floating-point limitation of Embree by offsetting triangle coordinates with voxel coordinates, especially when handling very large scenes, which is very important for the application of the 3D radiative transfer model in satellite products' validation.

Implementation
The hybrid structuring scheme was implemented in DART using a ray-tracing library named Embree [26], which is a collection of high-performance ray-tracing kernels and is optimized for Intel ® processors with support for streaming SIMD extensions (SSE) and Intel advanced vector extensions (AVX, AVX2, and AVX-512) instructions. To implement the voxel-level ray tracing, a user-defined geometry that represents the voxel needs to be defined in Embree. A voxel can be described by its central position and size and the ray-voxel intersection algorithm should be provided by users to make the geometry able to be recognized by Embree. The implementation details are illustrated in Appendix A.
When the number of triangles within a voxel exceeds a threshold (e.g., 10), a BVH tree which manages all the triangles inside the voxel is built. In DART, a triangle may belong to different voxels. To avoid copying a triangle several times, we store all the triangles from the whole scene in a vertex array, as illustrated in Figure 5. Each voxel only stores a list of references to the triangles contained. Thus, all the within-voxel BVHs can share the same vertex array in memory. Due to floating-point precision problems, a ray scattered by a triangle can be intercepted again by the same triangle. This self-intersection can significantly decrease radiation scattering if it is not well handled. Here, an intersection filtering mechanism is used to avoid this self-intersection

Implementation
The hybrid structuring scheme was implemented in DART using a ray-tracing library named Embree [26], which is a collection of high-performance ray-tracing kernels and is optimized for Intel ® processors with support for streaming SIMD extensions (SSE) and Intel advanced vector extensions (AVX, AVX2, and AVX-512) instructions. To implement the voxel-level ray tracing, a user-defined geometry that represents the voxel needs to be defined in Embree. A voxel can be described by its central position and size and the ray-voxel intersection algorithm should be provided by users to make the geometry able to be recognized by Embree. The implementation details are illustrated in Appendix A.
When the number of triangles within a voxel exceeds a threshold (e.g., 10), a BVH tree which manages all the triangles inside the voxel is built. In DART, a triangle may belong to different voxels. To avoid copying a triangle several times, we store all the triangles from the whole scene in a vertex array, as illustrated in Figure 5. Each voxel only stores a list of references to the triangles contained. Thus, all the within-voxel BVHs can share the same vertex array in memory.

Implementation
The hybrid structuring scheme was implemented in DART using a ray-tracing library named Embree [26], which is a collection of high-performance ray-tracing kernels and is optimized for Intel ® processors with support for streaming SIMD extensions (SSE) and Intel advanced vector extensions (AVX, AVX2, and AVX-512) instructions. To implement the voxel-level ray tracing, a user-defined geometry that represents the voxel needs to be defined in Embree. A voxel can be described by its central position and size and the ray-voxel intersection algorithm should be provided by users to make the geometry able to be recognized by Embree. The implementation details are illustrated in Appendix A.
When the number of triangles within a voxel exceeds a threshold (e.g., 10), a BVH tree which manages all the triangles inside the voxel is built. In DART, a triangle may belong to different voxels. To avoid copying a triangle several times, we store all the triangles from the whole scene in a vertex array, as illustrated in Figure 5. Each voxel only stores a list of references to the triangles contained. Thus, all the within-voxel BVHs can share the same vertex array in memory. Due to floating-point precision problems, a ray scattered by a triangle can be intercepted again by the same triangle. This self-intersection can significantly decrease radiation scattering if it is not well handled. Here, an intersection filtering mechanism is used to avoid this self-intersection problem. Each triangle has a unique ID (Identity Document), if the intersection occurs twice, the Due to floating-point precision problems, a ray scattered by a triangle can be intercepted again by the same triangle. This self-intersection can significantly decrease radiation scattering if it is not well handled. Here, an intersection filtering mechanism is used to avoid this self-intersection problem. Each triangle has a unique ID (Identity Document), if the intersection occurs twice, the second successive intersection will be ignored. This mechanism can be achieved through the intersection filter function provided by Embree and the implementation details can be found in Appendix B.

Radiation Tracking
When a ray enters a non-empty voxel (i.e., a voxel containing triangles or turbid media), scattering may occur. Figure 6 illustrates the scattering process inside non-empty voxels. For a turbid voxel, suppose an incident ray with radiant power W inc (Ω i ) enters the cell from direction Ω i . Then the transmitted radiation which exits the voxel is: where, T(∆l, Ω i ) is the transmittance along segment ∆l within the cell, it is calculated as: where G(Ω i ) = 2π is the normalized leaf angle distribution function, and u f is the foliage volume density. Thus, the total intercepted radiation within the voxel is Scattering events may happen anywhere along the interval [0, ∆l]. To make it simpler, two scattering points (SP) are defined along ∆l: one is for upward scattering, and the other one is for downward scattering. The angular distribution of the scattered radiation is determined by the phase function of medium defined in this cell, which is given by: where, f Ω f , Ω i → Ω s denotes the leaf scattering phase function. For a bi-Lambertian leaf facet, it is determined by the leaf diffuse reflectance (ρ f ront and ρ back for front and back reflectance, respectively) and transmittance ( τ for both sides). f Thus, the scattered radiation in direction . This radiation will be attenuated again during its way from the SP to the border of the cell, which gives the first-order scattering radiation outside the cell W f irst (Ω s ) = W sca (Ω s ) · T(∆l i , Ω s ). When the radiation transfers from the SP to the outside of the cell, it will be scattered again, the total intercepted radiation for each direction Ω s is W Int− f irst (Ω s ) = W sca (Ω s ) · [1 − T(∆l i , Ω s )]. Since each scattering event will produce N new rays, the number of rays becomes unmanageable for higher order scattering events. Therefore, we approximate all higher order (≥2) scatterings with only one scattering point, which is defined as the center of the cell, and then a phase function , which can describe the angular distribution of all the higher scattering is defined. The detailed derivation of this phase function can be found in the Appendix of Reference [27].
After the calculation of first-order and multi-order within-cell scattering, the scattered radiation may exit the cell from any position of the six faces, and they will enter the next non-empty voxel or exit the scene. To track the rays which exit the current voxel, each voxel is divided into n 3 sub-voxels, and each voxel face is divided into n 2 sub-faces. All the J rays that exit the same sub-face along the same direction are grouped as an unique exit point (P E ), which is calculated from all the exit points (P 1 , . . . , P J ) weighted by their respective energy (W j ): where, W j is the flux of the ray exit from P j . This approach greatly reduces the number of rays. For any voxel that contains triangles or parts of triangles, if a ray-triangle intersection point occurs, the scattered radiation is determined by the bidirectional reflectance distribution function (BRDF) of the triangle, which is given by f (Ω i → Ω s ) = ρ π , if the triangle is assumed as a Lambertian surface. Any ray W i,j (Ω v ) that exits the top scene through the voxel V(i, j, k top ) along the direction (Ω v , ∆Ω v ) is stored at the top level of the top cells, in order to create a 2D power distribution (i.e., image creation) per DART upward discrete direction. If the sun is the unique source of radiation, the total power per voxel V(i, j, k top ) is transformed into radiance L i,j (Ω v ) along direction Ω v : where, ∆x and ∆y are the x and y dimensions of the cell, respectively. θ v is the zenith angle of observation. ∆Ω v is the associated solid angle along direction Ω v . For any voxel that contains triangles or parts of triangles, if a ray-triangle intersection point occurs, the scattered radiation is determined by the bidirectional reflectance distribution function (BRDF) of the triangle, which is given by ( → ) = , if the triangle is assumed as a Lambertian surface.
Any ray , (Ω ) that exits the top scene through the voxel ( , , ) along the direction (Ω , ΔΩ ) is stored at the top level of the top cells, in order to create a 2D power distribution (i.e., image creation) per DART upward discrete direction. If the sun is the unique source of radiation, the total power per voxel ( , , ) is transformed into radiance , (Ω ) along direction Ω : where, and are the x and y dimensions of the cell, respectively. is the zenith angle of observation.
is the associated solid angle along direction . Figure 6. Radiation tracking in non-empty voxels.

BASEL City Scene
BASEL is a full 3D city model of the Basel city (Switzerland), which was adapted in the frame of the European Union-funded H2020 project UrbanFluxes (http://urbanfluxes.eu/). It has a size of 7 km × 6.5 km with 1,593,457 triangles. One of the goals of this project was to address the challenge of combating heat in cities by using Earth observation (EO) to identify urban energy budget (UEB) spatial patterns. For that, DART was used to simulate a time series of maps of radiative budget with the help of satellite images and urban geometric databases [3]. The approach required running thousands of DART simulations with changing parameters (e.g., sun direction), which stressed the usefulness of the hybrid algorithm to reduce computation time and memory. Since the city scene mainly consists of houses, which are represented with triangles, there are large empty spaces between houses and inside the houses (Figure 7). This is ideal for testing the acceleration performance of the

BASEL City Scene
BASEL is a full 3D city model of the Basel city (Switzerland), which was adapted in the frame of the European Union-funded H2020 project UrbanFluxes (http://urbanfluxes.eu/). It has a size of 7 km × 6.5 km with 1,593,457 triangles. One of the goals of this project was to address the challenge of combating heat in cities by using Earth observation (EO) to identify urban energy budget (UEB) spatial patterns. For that, DART was used to simulate a time series of maps of radiative budget with the help of satellite images and urban geometric databases [3]. The approach required running thousands of DART simulations with changing parameters (e.g., sun direction), which stressed the usefulness of the hybrid algorithm to reduce computation time and memory. Since the city scene mainly consists of houses, which are represented with triangles, there are large empty spaces between houses and inside the houses (Figure 7). This is ideal for testing the acceleration performance of the new scene structuring scheme. The trees included in this scene are represented as a pure turbid medium, which is described with a set of statistical parameters (LAD, LOD, etc.). Thus, the dimensions of voxels will influence the shapes of the simulated trees. For example, if voxels are too large (e.g., 10 m), trees may be only represented by a single voxel or totally removed from the scene if this tree only occupies less than 20% of a voxel. To get a reasonable representation of trees, the dimensions of a voxel should be relatively small (e.g., 0.5 m). However, the realistic description of scene elements (i.e., very small voxels) and computation time are to some extent contradictory. As illustrated in Figure 7c

RAMI Forest Scene
We consider a heterogeneous forest scene, named Järvselja Birch Stand (Summer), that was used by the RAMI experiment (http://rami-benchmark.jrc.ec.europa.eu/HTML/RAMI-IV/RAMI-IV.php), in order to inter-compare radiation transfer models [28]. This scene contains 7 species of different trees. 1059 individual trees of these species are distributed in a 100 m × 100 m plot. Because these trees are extremely well simulated and due to the limitation of computational resources, a 20 m × 20 m subplot was chosen for testing. Figure 8 shows the 2D and 3D views of this subplot. Since all the elements in this scene (branches and leaves) are converted into triangle mesh, this subplot has more than 51,000,000 triangles. The bidirectional reflectance factor (BRF) image at nadir was simulated under different resolutions (0.1 m, 0.25 m and 0.5 m). The corresponding computation time and

RAMI Forest Scene
We consider a heterogeneous forest scene, named Järvselja Birch Stand (Summer), that was used by the RAMI experiment (http://rami-benchmark.jrc.ec.europa.eu/HTML/RAMI-IV/RAMI-IV.php), in order to inter-compare radiation transfer models [28]. This scene contains 7 species of different trees. 1059 individual trees of these species are distributed in a 100 m × 100 m plot. Because these trees are extremely well simulated and due to the limitation of computational resources, a 20 m × 20 m subplot was chosen for testing. Figure 8 shows the 2D and 3D views of this subplot. Since all the elements in this scene (branches and leaves) are converted into triangle mesh, this subplot has more than 51,000,000 triangles. The bidirectional reflectance factor (BRF) image at nadir was simulated under different resolutions (0.1 m, 0.25 m and 0.5 m). The corresponding computation time and memory (RAM) were also recorded for hybrid and uniform structuring schemes.

Results
Here, simulation accuracy, computation time and memory usage of the hybrid scene structuring approach and the original uniform approach are compared for the city of BASEL (Figure 9) and the RAMI forest scene (Figure 10). In both cases, there is no bias in accuracy at all spatial resolutions. The averaged root mean square error (RMSE) is 0.0001 and 0.0056 for the Basel city scene and the forest scene, respectively. Conversely to the uniform approach, the accuracy of the hybrid approach is nearly independent of the voxel resolution. For Basel city, simulations are accelerated by 1.4×, 1.7× and 3.7× for 1 m, 0.5 m and 0.25 m voxel resolutions, respectively (Table 1).  For RAMI (Table 2), the hybrid approach reduces the computation time from 67 h to 15 minutes (258.5× speedup) when the resolution is set to 0.5 m. When voxel resolution changes from 0.5 m to

Results
Here, simulation accuracy, computation time and memory usage of the hybrid scene structuring approach and the original uniform approach are compared for the city of BASEL ( Figure 9) and the RAMI forest scene ( Figure 10). In both cases, there is no bias in accuracy at all spatial resolutions. The averaged root mean square error (RMSE) is 0.0001 and 0.0056 for the Basel city scene and the forest scene, respectively. Conversely to the uniform approach, the accuracy of the hybrid approach is nearly independent of the voxel resolution. For Basel city, simulations are accelerated by 1.4×, 1.7× and 3.7× for 1 m, 0.5 m and 0.25 m voxel resolutions, respectively (Table 1).

Results
Here, simulation accuracy, computation time and memory usage of the hybrid scene structuring approach and the original uniform approach are compared for the city of BASEL ( Figure 9) and the RAMI forest scene ( Figure 10). In both cases, there is no bias in accuracy at all spatial resolutions. The averaged root mean square error (RMSE) is 0.0001 and 0.0056 for the Basel city scene and the forest scene, respectively. Conversely to the uniform approach, the accuracy of the hybrid approach is nearly independent of the voxel resolution. For Basel city, simulations are accelerated by 1.4×, 1.7× and 3.7× for 1 m, 0.5 m and 0.25 m voxel resolutions, respectively (Table 1).  For RAMI (Table 2), the hybrid approach reduces the computation time from 67 h to 15 minutes (258.5× speedup) when the resolution is set to 0.5 m. When voxel resolution changes from 0.5 m to For RAMI (Table 2), the hybrid approach reduces the computation time from 67 h to 15 minutes (258.5× speedup) when the resolution is set to 0.5 m. When voxel resolution changes from 0.5 m to 0.1 m, the speedup deceases from 258.5× to 4.9×. As for the memory, the reduction due to the removal of null voxels does not overcome the memory increase caused by within-BVH data structures. Actually, 1,181,824,193,580 and 36,081 within-voxel BVHs are built for resolution at 0.1 m, 0.25 m and 0.5 m, respectively. It can be seen that using less within-BVHs is better, in terms of memory usage, than using more BVHs to manage the same number of triangles. It stresses that for triangle-based scenes, the best improvements are obtained for coarser voxel resolutions.
Remote Sens. 2019, 11, x FOR PEER REVIEW 11 of 15 and 0.5 m, respectively. It can be seen that using less within-BVHs is better, in terms of memory usage, than using more BVHs to manage the same number of triangles. It stresses that for trianglebased scenes, the best improvements are obtained for coarser voxel resolutions.

Accuracy
The accuracy of the proposed method was evaluated by a pixel-wise comparison between images simulated by the hybrid approach and the uniform approach. For the BASEL city scene, these two approaches give the same results for all pixels, while some differences existed for the RAMI forest scene. This incompatible result is mainly caused by the floating-point precision of Embree. In the BASEL city scene, the number of triangles was small, and the size of each triangle was very large, while the RAMI forest scene contained a lot of small triangles (e.g., leaves). Therefore, the floatingpoint representation of the vertex may cause many differences, compared with the double representation in original DART, which has a significant impact on the ray-triangle intersection, especially when rays traverse through the border of a triangle. For example, some rays may intersect with a triangle in double representation mode, while they may miss the triangle in floating-point mode. This uncertainty becomes larger when many small triangles exist, such as in the RAMI forest scene.

Memory Usage
One of the advantages of the proposed hybrid scene structuring is to save computation memory by removing null voxels from the 3D scene. This can be confirmed by the BASEL city scene (Table 1), where memory usage reduction due to the removal of the null voxels was observed for all voxel resolutions. When resolution increased, the degree of reduction became larger because of the increased percentage of null voxels. For the RAMI forest scene (Table 2), the memory increased with the increase of voxel resolution, which can be explained by the percentage of null voxels and the presence of within-BVHs. Compared with the BASEL city scene, the RAMI forest scene had more scene elements (e.g., leaves) and a lower percentage of null voxels, which was 57.0% for 0.5 m resolution, while the BASEL city scene had a percentage of 91.3% for the same resolution. On the other hand, the sizes of facets in the RAMI forest scene were much smaller, and each voxel may

Accuracy
The accuracy of the proposed method was evaluated by a pixel-wise comparison between images simulated by the hybrid approach and the uniform approach. For the BASEL city scene, these two approaches give the same results for all pixels, while some differences existed for the RAMI forest scene. This incompatible result is mainly caused by the floating-point precision of Embree. In the BASEL city scene, the number of triangles was small, and the size of each triangle was very large, while the RAMI forest scene contained a lot of small triangles (e.g., leaves). Therefore, the floating-point representation of the vertex may cause many differences, compared with the double representation in original DART, which has a significant impact on the ray-triangle intersection, especially when rays traverse through the border of a triangle. For example, some rays may intersect with a triangle in double representation mode, while they may miss the triangle in floating-point mode. This uncertainty becomes larger when many small triangles exist, such as in the RAMI forest scene.

Memory Usage
One of the advantages of the proposed hybrid scene structuring is to save computation memory by removing null voxels from the 3D scene. This can be confirmed by the BASEL city scene (Table 1), where memory usage reduction due to the removal of the null voxels was observed for all voxel resolutions. When resolution increased, the degree of reduction became larger because of the increased percentage of null voxels. For the RAMI forest scene (Table 2), the memory increased with the increase of voxel resolution, which can be explained by the percentage of null voxels and the presence of within-BVHs. Compared with the BASEL city scene, the RAMI forest scene had more scene elements (e.g., leaves) and a lower percentage of null voxels, which was 57.0% for 0.5 m resolution, while the BASEL city scene had a percentage of 91.3% for the same resolution. On the other hand, the sizes of facets in the RAMI forest scene were much smaller, and each voxel may contain many triangles, which induced the creation of within-voxel BVHs. The increase of memory used by within-voxel BVHs may overcome the reduction of memory caused by the removal of null voxels.

Computation Time
In DART, there are mainly two factors that influence the computation time: voxel-level ray tracking and within-voxel ray-triangle intersection. For the BASEL city scene, the number of triangles is not numerous for each voxel, thus the major factor that influenced computation time was the voxel-level ray tracking. Because of the high percentage of null voxels, the hybrid approach can skip the empty space between non-empty voxels during the ray tracing, thus the improvement of computation time was observed ( Table 1).
As for the RAMI forest scene, a significant improvement of computation time can be noticed ( Figure 10). In this case, the improvement due to removal of null voxels was much weaker than the improvement due to within-voxel BVH. Under coarser resolution (e.g., 0.5 m), there are many triangles in a voxel. The uniform approach tests the intersection for all triangles within a voxel for each incident ray, and then sorts the order according to the distance to the origin of the ray. In hybrid mode, a ray can find the closest intersected triangle directly without sorting, which can significantly accelerate the simulation. With better spatial resolutions, the average number of triangles in each voxel decreases, and the time difference between uniform approach and hybrid approach also decreases. This can be confirmed by the results in Table 2, where the speedup decreased to 4.9× when voxel resolution was equal to 0.1 m.

Conclusions
A hybrid scene structuring scheme has been proposed to accelerate the radiative transfer process in complex landscapes. This approach utilizes the BVH to skip large empty areas between voxels and to accelerate ray-triangle intersection inside a voxel. Compared to the uniform approach, the hybrid approach can usually reduce the computation time by 2×~3× in city scenes, and more than 100× in forest scenes. In terms of accuracy, it is worth noting that a hybrid algorithm allows to avoid an important limitation of the Embree ray-tracing kernels library: all geometrical variables are stored with floating-point numbers which implies important limitations in terms of accuracy for remote sensing applications over large landscapes. The hybrid algorithm overcomes these limitations by using a double BVH algorithm for voxels and for triangles within the voxels.
The hybrid algorithm opens new opportunities for using DART to derive very accurate remote sensing products for many scientific domains such as forestry, agriculture, urbanism, etc. For example, it allows one to use DART to simulate a time series of maps for an urban radiative budget. The inversion of satellite images is another major domain of application, because it usually requires the simulation of hundreds of thousands of simulations.
Finally, it is worth noting that a recent DART improvement allows to simulate three crowns that are filled with triangles, which could be more accurate than turbid medium. With the original uniform approach, this kind of simulation could be very slow due to the ray-triangle intersection within each voxel, while the hybrid approach accelerates the process by removing the null voxels and optimizing the ray-triangle intersection, which are noted for the city and forest scene, respectively.