Efﬁcient Algorithms for Real-Time GPU Volumetric Cloud Rendering with Enhanced Geometry

: This paper presents several new techniques for volumetric cloud rendering using efﬁcient algorithms and data structures based on ray-tracing methods for cumulus generation, achieving an optimum balance between realism and performance. These techniques target applications such as ﬂight simulations, computer games, and educational software, even with conventional graphics hardware. The contours of clouds are deﬁned by implicit mathematical expressions or triangulated structures inside which volumetric rendering is performed. Novel techniques are used to reproduce the asymmetrical nature of clouds and the effects of light-scattering, with low computing costs. The work includes a new method to create randomized fractal clouds using a recursive grammar. The graphical results are comparable to those produced by state-of-the-art, hyper-realistic algorithms. These methods provide real-time performance, and are superior to particle-based systems. These outcomes suggest that our methods offer a good balance between realism and performance, and are suitable for use in the standard graphics industry.


Introduction
Dynamic rendering of realistic clouds has become very valuable for applications such as computer games featuring outdoor scenarios, flight simulation systems, and virtual reality environments.However, physics-based models of clouds need to solve the Navier-Stokes fluid dynamics equations and complex photo-realistic radiometric functions for lighting effects, which can prevent real-time rendering in modern graphics processing units (GPUs) [1,2], causing a lack of realism [3] or the absence of animation effects [4].As an alternative approach, we propose efficient methods for atmospheric cloud geometry generation using pseudo-spheroids and fractals, which produce the stochastic and irregular shapes of natural phenomena, as a baseline for an easy-to-control framework for primitive cloud animation.In addition, our methods allow for approaching, turning around, and passing through gaseous shapes by use of the volumetric rendering technique.The aim of our investigation is to confirm the hypothesis that it is possible to create an efficient, lightweight, and user-friendly model of real-time cloud rendering with an optimum balance between realism and performance, to be applied in the entertainment and educational industries even in situations where the graphical hardware is not very powerful.For this reason, our hypothesis is based on a multi-core processing approach that has arisen in recent years and the possibility of optimizing the complexity of the referenced algorithms with GPU and central processing unit (CPU) programming techniques.
Our methods are performed using the ray-tracing technique, based on the work of Appel [5] and Whitted [6], and the emulation of the density of water vapour inside a cloud by applying our base algorithm to generate a gaseous body inside implicit surfaces.To maximize the performance of the pre-computing phase, an original algorithm is used to prevent duplicate ray-marching inside overlapping volumes, which we have called no duplicate tracing (NDT).This achieves fewer loop iterations by discarding the overlapping parts of spheres and reduces computing times by half.We have also developed new approaches for lighting and shading that were inspired by the paradigms of distinguished authors, with optimizations to reduce the cost of light-scattering calculations.The use of boundary representations based on Smits' algorithm [7] for the ray-marching volume, along with Euclidean equations for ray-spheroid collision, allows for minimal linear complexity when searching for primitives, hence avoiding the use of hard-to-code space partitioning algorithms.An improved cumulus generation algorithm based on Gaussian distributions simulates the stochastic nature of clouds.Additionally, an innovative application of L-system grammar rules, with random parameters to emulate the fractal nature of clouds, is compared to the method used by Kang et al. [8].We also provide an innovative contribution to cloud rendering based on triangulated three-dimensional meshes, obtained from standard editors, to produce clouds in the shapes of recognizable entities.Complexity analysis and benchmarking of the proposed algorithms demonstrate that there are more benefits than disadvantages in comparison to other particle and slow, hyper-realistic implementations.Hence, our algorithms are suitable for application in the standard industry.

Related Works and Main Contributions
Researchers studying computer-generated clouds have developed different approaches to model the natural phenomena.Huang et al. [1] reports on the two main groups of cloud rendering methods: Physics-Based Models and Ontogenetic Models.In the first approach, atmospheric equations of cloud advection, thermodynamics, radiometry, and fluid dynamics are applied [3,9].In the second method, a formal geometric visualization of the cloud in an artistic simulation, using minimal physical characteristics, is employed.The following paragraphs describe the methods used to realize natural-looking clouds along with our contributions to each of them:

•
Texturized primitives: In this option, basic surfaces like skydomes or skyboxes are texturized with an omni-directional true color capture of the sky.Other implementations use procedural textures applied to spheres and ellipsoids, which was the starting point for cloud generation in the early years with the works of Gardner [10] and Max [11].These methods may be considered as obsolete with modern GPUs, however, they are employed in smaller devices that do not have three-dimensional (3D) acceleration.A state-of-the-art contribution to this approach was recently provided by Mukina and Bezgodov [4].The main limitation of these methods is the inability to approach, turn around, or traverse gaseous bodies, and additionally the lack of animation of the different cloud parts.We decided not to use this method due to the aforementioned limitations.

•
Particle systems: Since 2002, researchers have tried to achieve simulations of clouds using basic quads or triangle surfaces with Gaussian textures [1].Harris conceived a new algorithm using particles with an efficient multiple scattering illumination system, which was based on Max equations [12].The resulting rendering was improved with the use of impostors [3,13,14].The use of this technology increases performance and speed.This technique is considered useful for physical workload model simulators.However, the overall realism is not accurate.We decided to use pseudo-spheroids as the basic primitive instead of particles to allow for animation of the cloud with reduced system overhead.The realism of each pseudo-spheroid is better than using a collection of texturized particles.This method is very convenient to produce time-evolving cumuliform clouds.

•
Geometry distortion: The basic idea behind this technique consists of drawing a complex cloud mesh with a group of megaparticles as explained by Ebert [15] and Engel [16].Then, a lighting algorithm, like Phong or Gouraud shading, is used on the geometry.After this step, the cloud map is blurred using a Gaussian filter to distort it.To do this, a quad is placed at the center of every cloud and billboard vertex with respect to the camera, covering the entire cloud from any angle.This quad is rendered to distort the cloud map and sample a two-channel fractal/noise texture to obtain a distortion offset.Afterwards, the blurred cloud map is sampled using these texture coordinates and the distance from the quad to the camera.Optionally, a radial blur may be performed to soften the resulting image.In the final stage, the render target is merged to the back buffer.This method is midway between the particle systems and volumetric rendering techniques, with good performance and easy shading.However, it lacks accurate realism and is not suitable for cloud shapes other than cumulus.We have decided not to use this method because it is difficult to adapt for arbitrary-shaped clouds.

•
Volumetric rendering: This model represents the current state-of-the-art in cloud rendering, and as such it is used in computer games, flight simulations, and real-time computer graphics.The primary previous studies on volumetric rendering are found in References [17,18].The methods proposed in this paper are based on volumetric rendering.This technique is the same as that used to obtain two-dimensional (2D) medical images from 3D magnetic resonance (MRI) or computed tomography (CT) data.Volumetric rendering of clouds still requires a heavy workload for most modern GPUs, but the realism is very accurate and may become standard in the coming years.Later contributions to this approach were made by Yusov [19], Elek et al. [20], Dobashi et al. [21], and Goswami-Neyret [22], while further contributions include the latest cutting-edge findings in light scattering made by Klehm et al. [23] and Peters et al. [24], with very high performance at 1920 × 1080 pixels.We contributed to the state-of-the-art by improving the performance of existing algorithms and developing new efficient algorithms and methods to reduce the physics workload of many slow hyper-realistic implementations, without loss of quality.These methods are well suited for animation and real-time rendering on standard computers.Table 1 summarizes the main features and shortcomings of the mentioned methods.

Materials and Methods
Volumetric rendering applications require extensive computing.Any small improvement in the algorithms produces a substantial overall increase in speed.The following are our contributions and improvements to minimize rendering efforts in a state-of-the-art trend.

Water Vapour Emulation
Fluid and asymmetrical cloud shapes were a serious challenge during this research.An efficient 3D data structure that can hold density information and respond to light and wind advection is very convenient for real-time cloud rendering.However, ray-tracing of three-dimensional points in real time requires hardware support by GPU shader technology that parallelizes the ray-marching of discrete frame buffer pixels using multiple processing elements.
To create an efficient vapour density simulation, a three-dimensional cube of 64 × 64 × 64 single-precision floats is filled with uniform random noise pre-calculated in the host before passing it to the GPU.This cube is used to generate fractal Brownian motion (fBm) noise, as shown in Figure 1.The uniform noise is transformed into a Gaussian-like noise using the fractal Brownian motion (fBm) method (Figures 2 and 3) described on Iñigo Quilez's homepage (http://www.iquilezles.org/www/articles/morenoise/morenoise.htm(6 April 2018)).Basically an fBm can be implemented as a fractal sum of Perlin noise functions [25].The Perlin noise is a pseudo-random mapping of R d into R with an integer d which can be arbitrarily large but which is usually 2, 3, or 4. It is used as a procedural texture in computer graphics to emulate controllable pseudo-random appearances, with the aim of generating a wide variety of object surfaces, for example fire, smoke, and clouds.This noise usually imitates textures in nature due to its stochastic properties.The implementation implies three steps: (1) definition of an n-dimensional grid with random gradient vectors, (2) computation of the dot product between the gradient vectors for distance calculations, and (3) interpolation of the mentioned dot products.The algorithm cost is O(2 n ) where n is the number of dimensions (https://rosettacode.org/wiki/Perlin_noise (6 April 2018)).
Let w be the octave scale factor, s the noise sampling factor, and i the octave index.The fBm equation is then defined as: where we choose w = 1/2 and s = 2.Each iteration is called an octave.In this case, our Algorithm 1 uses five octaves with uniform noise instead of Perlin noise.When the number of octaves is increased above five, the algorithm does not produce better vapour deformation shape and decreases the overall frame performance.To improve performance, an unrolled version of the previous summation is implemented in our fragment shader code.
Other authors (see for example Reference [26]) also make use of fractional octave division in their research with respect to 2D flat animation of clouds.
The fBm noise is calculated for each unit of the frame buffer along the ray-tracing to produce cloud density.Alternatively, the noise may be pre-calculated in the host application before passing it to the shader to achieve higher frames-per-second.If the basic pre-calculated 3D noise is used for the entire scene, a simple ray marching algorithm may be used, as explained on the next page. 11: candidates[k] ← (λ in , λ out , j, i) R ← (0, 0, 0, 0) {Consider alpha-channel} 33: for each (i in B) do 34: (C, n) ← getCandidates(rayOrigin, rayDirection, i) R ← R+ lighting(ρ, rayPos, rayDir, sunDir,

Cloud Tracing
Rendering a cloud involves tracing rays through a volume delimited by a set of implicit surfaces, while sampling the noise texture.The noise texture is only sampled when the ray traverses the volume defined by a set of spheres.The pseudocode scheme in Algorithm 1 is an improved version of Apodaca's code [27] and illustrates the execution over a list of N randomized bounded-surface volumes.The use of spheroids and ellipsoids instead of other surfaces is also cited by Gardner [10], and Elinas and Stürzlinger [28].To discard positions outside the cloud and thus reduce the computational effort, we iterate over the pseudo-spheroids contained in the corresponding bounding box only.In addition, a short list of pseudo-spheroids that intersect the ray is created and sorted with the insertion sort algorithm.This collision method requires the computation of the discriminant of the quadratic equation, resulting from the equality of the implicit formulas for the rays and spheres, and uses reference code based on Shirley's book [29].
For every pixel in the frame buffer, the ray is traced from back to front to calculate the noise, with the condition that the density value is smaller than γ.This variable is modulated as a function of the position inside the pseudo-spheroid with a random component to produce the effect of the cloud surface as shown in Equation ( 2 . ( The previous equation filters the unit volume densities and generates a pseudo-spheroid by scaling the nominal sphere radius by ± k, where rayPos is the Euclidean straight line point of the ray and sphereCenter is the center of the sphere in R 3 .The fBm function serves as a normalized random variable.The distance to the center of the sphere is used to modulate the maximum density that water vapour is allowed to have, as shown in Figure 4. Additionally, an improved level of detail (LOD) equation controls the increment on the Euclidean straight line [30] according to the distance from the center of the cloud, thus executing longer steps when the cloud is close to the camera and smaller steps when it is far.This method balances the higher number of pixels receiving a trace from inside the volume with the lower requirements for detail when traversing the cloud.

The No Duplicate Tracing Algorithm
The purpose of this algorithm is to avoid duplicate or void tracing when spheres or ellipsoids overlap or have gaps.This simple method is implemented in a few lines of CPU code to pre-compute the transmittance from the light source to the target voxel, reducing calculations while preserving accurate cloud rendering, as shown in Algorithm 2. The image and flow chart in Figure 5a,b respectively explain the possible cases that may arise during the execution of a ray-marching.Others research regarding bounding volumes overlapping is the work of Lipuš and Guid [31].

Scene Delimitation
The explained technique eliminates the need to use complex space-partitioning algorithms like K-D trees, Octrees, or Quadtrees for locating primitives, due to the fact that each cloud only needs a small number of spheres to create a cumulus in our model.In addition, the use of bounding boxes to delimit the volume of each cloud also improves performance.The proposed system uses Smits' algorithm [7] and the improvement of Reference [32] to limit ray-marching to the volume inside the cloud's boundaries, as shown in Figure 6.This method increases the frames-per-second (FPS) rate by discarding rays that never hit the boundaries of the rectangular prism containing the cloud in the scene, and avoiding iterations over spheres outside the camera view frustum.As an additional advantage, the bounding box model allows for assigning a separate voxel grid to each cloud, which facilitates the calculation and storage of illumination data.Each spheroid in the candidates' lists has a pointer to its corresponding voxel grid, as seen in Algorithm 1 in Section 3.2.The improved Smits' algorithm is implemented in GPU code in a fragment shader with a very reduced linear cost.

Lighting and Shading
To optimize performance, we used a two-pass drawing algorithm.The first pass computes illumination inside the cloud and is executed only when the location of the light source changes.The second pass renders the projection of the cloud on the frame buffer.The calculation of transmittance and scattering are inspired by the principles exposed by Max [12], and the code optimizations via the approximations of Harris [13] and Tessendorf [33] in our pseudo-spheroid-based volumetric rendering.The main differences with these works are the application of the ray-marching technique over a set of pseudo-spheroids/ellipsoids instead of a particle system, the use of a pre-calculated voxel grid with our NDT algorithm, and the combination of simplifications to generate a new model of code optimization.This is summarized in Algorithm 3. pL ← texture(gridIndex, voxel Index)

Algorithm 3 Lighting and Shading
The mentioned algorithms are illustrated in Figure 7 where the green ray represents the illumination pass pre-computed in the CPU and the red ray represents the rendering pass performed in the GPU.The illumination pass traces a ray l from each voxel v(l) inside the volume of the cloud towards the light source I, covering a distance D. The voxel stores the amount of light received by transmission and scattering in that point of the cloud.This part is pre-calculated in the CPU.The render pass traces a ray s that traverses the entire cloud from 0 to P to calculate the light projected on the frame buffer, taking into account the contributions of emission and reflection.This part is executed in real-time in the GPU.

• Transmittance
This calculation is performed in each pass.The attenuation of light inside the cloud is governed by a property called transmittance that represents the amount of light passing through a given section of the volume.All of the light inside the volume of the cloud, whether it is absorbed or scattered, is affected by transmittance.The amount of light passing along a ray from point s to point s' is expressed as: where τ(s) is the extinction coefficient that represents the amount of area that occludes light in a plane perpendicular to the ray, ρ(s) is the particle density of water vapour at location s, and A is the area occluded by a particle projected on the plane perpendicular to the ray.For the forthcoming expressions we will use the simplified notation T(s, s ).
• Illumination Pass (CPU side) The first pass computes the illumination inside the cloud assuming that the light source is static, and therefore the calculations are performed only when the light source changes location or intensity.The illumination inside the cloud is stored as a grid of voxels using floating point scalars.The light collected in a voxel is the sum of the light transmitted from the light source across the volume plus the light scattered from all the preceding points along the ray between the voxel and the light source.This second term involves a summation of multiple light contributions with their corresponding transmittance.Illumination of a voxel is computed using a back-to-front ray-marching algorithm with the NDT method.
Let L(v) be the light collected at point v inside the cloud, I 0 the intensity of the light source, and D be the depth of v from the surface of the cloud.The light incident on point v is hence: The first term in Equation ( 5) is the light propagated across the volume of the cloud to point v, which is affected by the total transmittance from the surface to the point.The second term is the light scattered in the forward direction along the ray (l) from the surface of the cloud to the destination point.This term is the summation of the light emitted by scattering from each intermediate point along the ray, attenuated by the corresponding transmittance of the segment.
The term C(l) is the approximation of the forward light-scattering density according to Reference [13].This work reports that 50% of the light scatters in the forward direction of the ray over a small solid angle γ that is of about 0.0001 steradians.The remaining 50% is ignored because it requires a large amount of computation.Hence: Equation ( 5) is converted to its discrete form and calculated using a ray-marching algorithm with step size ∆l.Given T i is the transmittance between the target point and point i along the ray, the incident light is: • Render Pass (GPU side) The second pass renders the projection of the cloud on the frame buffer using a ray-marching algorithm that takes into account reflection and scattering.
The light reflected at point s inside the cloud depends on the illumination of this point L(s) and the area occluded by particles τ(s).The reflected light is further affected by the transmittance from point s to the surface of the cloud.
The light scattered at point s is affected by the same properties as the reflected light but also a phase function that approximates the refraction of light depending on the angle between the light source and the observer.The amount of light projected on point x of the frame-buffer is hence expressed as: The first term in Equation ( 8) accounts for reflected light, and the second term accounts for scattered light.P( n s , n l ) is the phase function representing the amount of light from direction n l refracted in the direction n s .Both terms are affected by the transmittance T(0, S)from the surface to point s.
To compute the emitted light using a ray-marching algorithm with step size ∆s, we used the approximation proposed by Reference [33], that converts the continuous integral into a summation of segment integrals: Let T i represents the transmittance from the surface to point i.The phase term is considered constant along the segment integral and can be factored out of the integral: The differential transmittance ∆T i along segment ∆s is denoted as: The exponential function e can be approximated using the first two terms of its McLaurin series expansion as: By employing this result and assuming that L(i∆s + t) is constant inside the segment integral, Equation ( 10) is simplified to: In our implementation, we employ the Henyey-Greenstein phase function to approximate the refraction of light in water droplets: where θ( n s • n l ) is the scattering angle and the parameter g is equal to the asymmetry factor.To improve performance, our implementation passes the voxel grid lighting in a 3D texture to the GPU.This permits the use of hardware accelerated trilinear interpolation when obtaining the lighting values along the marching ray.

Cloud Shape Improvement
This section presents four different methods to define the geometry of the cloud which fulfill different requirements from an artistic point of view.These methods range from purely mathematical random generation to user-defined 3D meshes modeled in an editor.

3D Gaussian Cloud Generation
The proposed algorithm accomplishes cumulus generation by using a three-dimensional Gaussian distribution of basic pseudo-spheroid primitives, as already discussed in Section 3.2.Our implementation uses a Gaussian distribution with clamping to generate the position of the pseudo-spheroids as is done for particles in the method by Huang et al. [1].Unlike their work, the number of primitives to produce a realistic cloud is much smaller than in a particle system.Typically we use 35 pseudo-spheroids while the particle system requires thousands of them.
Let P be the position of the pseudo-spheroid, C the center of the cloud, and N a random variable with normal distribution, the base equation is then: Table 2 shows the typical values, µ (mean) and σ (standard-deviation), of the normal distribution used for each of the axes.The previous values must be added to the center of the cumulus to place the cloud into the scene.

Axis
The radii of the pseudo-spheroids depend on the distance from their center (P) to the center of the cloud (C) according to a Gaussian function, as shown in Figure 8. Two variants of the proposed distribution can be used to compute the magnitude of the radius as described in Equations ( 15) and ( 16): where ε i represents the maximum possible radius of each sphere in the corresponding variant.
The experiment used the constants ε 1 = 15.0 and ε 2 = 2.5 for real cumulus generation as seen in Figures 9 and 10 generated with Equation 16.
We introduced another improvement to hit the performance target with a pre-processing algorithm that reduces the list of spheroids by removing those that are within the range [3σ x /4, σ y /3, 3σ z /4] which creates a sort of hollow cloud.This filter causes the removal of ∼20% of spheroids in the nucleus of the cloud, which yields a ∼30% average improvement in frame rate without affecting the look of the cloud.In addition to the prior filtering, the pre-processing algorithm removes spheres contained inside other spheres by comparing the differences in radii (r i , r j ) with the distance between their centers (x i , y i , z i ) and (x j , y j , z j ): The O(n 2 ) complexity of the previous computation is not a drawback due to the reduced number of spheres required in our Gaussian model.In collaboration with Przemyslay Prusinkiewicz, a large collection of impressive images of nature was formed [35].These systems consist of a set of grammar rules with a defined alphabet which are recursively called, starting from an axiom production.Subsequent calls generate a string of symbols which denote a specific meaning.The proposed research uses L-system grammars to generate a wide variety of cloud shapes.A deep study on L-systems is given in Reference [36] with exhaustive algorithms for clouds and plants.
Our approach differs from that of Kang et al. [8]; in our approach, we include a proportional random factor in the generation of the sphere radius and the distance between primitives in each recursive call, as explained in Figure 11.Another difference from the cited work is the use of volumetric rendering and our own density function, Equation (2).The 3D algorithm for L-system generation of spheres is referred to on the website of the Department of Computing for Neurobiology at Cornell University, which contains a complete MATLAB explanation and sources.Before calling the GLSL image renderer, the proposed solution uses an algorithm coded in C++ running on the host as a tiny interpreter for the recursive generation of spheres according to an axiom and rules.For example, in the call to the following Backus-Naur form (BNF) grammar, our C++ algorithm will generate a string of "X", "Y", "+" and "&" symbols: where "+" means turn left with angle δ using a rotation matrix, and "&" means pitch down with angle δ using a different rotation matrix.
As a result, Figure 12a,b were generated using different angles and iterations.The corresponding lexical analyser must insert a sphere of a random radius and length in the current branch step for each of the X and Y items found, and then apply the rotation angle given by the "+" and "&" operators.Hence, the string produced by the Equation (18) will generate 128 spheres according to the L-system grammar shape and will be ready to transfer to the OpenGL shader for real-time animation.
Our L-system grammar derivative is suitable for generating jet streams and a sort of cloud called Castellanus, as cited by Häckel [34] due to its elongated and crenelated shape.The advantage of this interpreter is that it allows a formal definition of clouds for designers to generate a variety of original cloud shapes for computer games and digital art.

Cumuliforms with Optimized Metaballs
The metaball technique proposed by Blinn [37] for the creation of organic-looking n-dimensional objects can also be applied to generate cumulus formations with very high performance and render quality.Our model differs from the Dobashi approach [38] in that we use optimized equations and a mean calculated in the CPU: where x, y, z is the ray-marching position, r i is the current sphere radius, and x i , y i , z i is the sphere center.Additionally we have: and: Equation ( 20) is pre-calculated in the CPU only once, and Equations ( 19) and ( 21) are used in the GPU shader algorithm.φ is a constant to adjust the fading effect in the edges of the cloud and rayPos is the Euclidean straight line point of the ray in R 3 used to evaluate the fBm noise explained in Section 3.1.It usually has a value of 0.6 in our tests.
Our method has very high performance and achieves a remarkable number of frames-per-second.Besides this, render quality is sufficient, as shown in Figure 13, but lower than that generated by Algorithm 1.Just six spheres were used to render the samples by randomizing the sphere radius.

Cloud Generation from 3D Meshes
This section explains the last geometry innovation for real-time cloud rendering based on using three-dimensional mesh editors like Blender, 3D Studio, or Maya.Making smoke or clouds with known shapes is a complex task that requires algorithms and mathematical optimization in addition to a suitable 3D model before rendering.The user may use the general public license (GPL) Blender application as a geometry editor, and this was employed for several models used in this research.Effective real-time rendering needs a prior decimation of big meshes down to a level of 1000 triangles or less.Bigger quantities require more powerful GPUs and do not pay off since the spatial definition of smoke does not allow high detail.The key idea of our algorithm is replacing triangles with ellipsoids to achieve more real and suggestive cumuliforms.To improve performance, the ellipsoids are pre-calculated in the host and then rendered on the GPU, applying the same techniques explained in Sections 3.1-3.5.Initially an ellipsoid is centered at the barycenter of each triangle.This is achieved by scaling the triangle by a specified amount from the barycenter without translating it.The maximum distance between each scaled vertex and the barycenter is used to calculate a circumscribing ellipsoid with radii (a, b, c).This approach is effective and accurate, as explained in Figure 14.
Let P 1 , P 2 , and P 3 be the vertices of a triangle.The proposed model defines barycenter (B) as: and the scale as: radius a = (B − P 1 ) where P i is the scaled triangle vertices (x, y, z) from 1 to 3.An optimization of the previous algorithm can be made by performing an R 3 rotation of the direction vector, where the maximum radius of the ellipsoid is chosen to align with the maximum distance from the triangle vertices to the barycenter.The solution requires the application of the principles of Rodrigues' Rotation Formula.This solution provides more accurate geometry with the cloud description, and a softer shape for the model.This approach may be considered as a real-time filter for mesh geometry, producing rough results.Therefore, continuations of the previous formulas are: The rotation is hence calculated via the following steps.
(1) Calculate the axis and angle using cross products and dot products: (2) Calculate the rotation matrix using an exponential map: (3) Calculate A, a skew-symmetric matrix corresponding to x: Formulas ( 1)-( 3) are pre-calculated in the CPU side only once, and only the rotation matrix R is passed to the GPU for real-time rendering.Thanks to this algorithm, the overall performance is not hindered even in lower speed GPUs as it is demonstrated in the benchmarks Section 4.2.
Decimation of the mesh to 300-700 triangle faces is required to increase performance and provide better conformance to the cloud shape.For instance, the hand mesh wireframe is decimated to 354 faces before rendering as seen in Figure 15.A 95% decimation still produces good visualization and performance.Lower decimations may be considered depending on the hardware being used.

Results
To confirm the initial hypothesis, benchmarks and measurements were taken using different GPUs to evaluate the balance between performance and realism.Analytical verification of Algorithm 1 that runs on the GPU was also performed to assess its complexity.

Complexity Analysis
Let |B| be the number of clouds, n be the bounding box near-plane resolution area in pixels, s be the number of pseudo-spheroids in the bounding box, c be the size of selected candidates to be rendered, and d be the depth of ray-marching.Then, Algorithm 1 has the following execution times (numeric constants refer to the instruction counts):

•
If we assume no sphere collisions, the insertion sort algorithm will not have any elements to classify in the best-best case, so we will obtain an asymptotic execution time of:

•
In the worst-best case the candidates list is already sorted, so no swap will be done. getCandidates() Ray-marching iterations • The worst-worst case happens when a complete collision is detected and the candidate list is in descending order, so the resulting asymptotic worst-case execution time is: getCandidates() Ray-marching iterations • Regarding the average case we assume a p probability of sphere collision, so an approximate measure of total collisions will be calculated as a percentage of total spheres in scene: getCandidates() Our analysis indicates that, despite the algorithm having quadratic complexity in the worst case for a single-frame buffer pixel, modelling the cumulus with fewer spheres reduces the number of hits, resulting in faster sorting and execution.
Some tests were performed by replacing the insertion sort algorithm with iterative quicksort which contains O(nlog 2 n) instead of O(n 2 ) scaling, expecting that it would provide better performance.However, no improvement in frames-per-second (FPS) was obtained in these experiments.On the contrary, there was a decrease in the overall frames-per-second.The cause of this reduction is an increase in memory access on the GPU caused by the quicksort algorithm, so in spite of our in-line optimizations we could not exceed the speed of insertion sort.Another alternative for the insertion sort algorithm is Batcher's Odd-Even Mergesort as cited by Kipfer and Westermann [39], due to the better low-level parallelization and its worst-case complexity parallel time, represented by O(log 2 (n)).

Benchmark Tests
A set of tests were performed on the algorithm suite with using an nVidia GeForce 8800 GTS (96 cores), a GeForce 1030 GT (Pascal, 384 cores), a GeForce GTX 1050 (Pascal, 640 cores) and a GeForce GTX 970 (1664 cores), running on a 64-bit i-Core 7 CPU 860@2.80GHz (first generation, 2009) with 6 GB random access memory (RAM), see Figure 16.The project was implemented entirely in C++ using the OpenGL and GLM math libraries for the host side and the OpenGL Shading Language (GLSL) for the GPU side.The ray-marching step size was determined by λ in line 50 of Algorithm 1 for the cumulus test and was set to be the constant value 0.1 for the 3D mesh tests.Promising results were obtained when the cloud was far from the camera with the 8800 GTS at 800 × 600 and 640 × 480 pixels, especially with metaballs.
The same algorithm suite performs perfectly in all resolutions using the nVidia GTX 970, GT 1030, and GTX 1050, in particular when rendering clouds derived from 3D meshes.Besides this, a significant 2× frame rate improvement in the nVidia 8800 GTS was achieved in relation to Reference [40] for cumulus rendering.For this benchmark, we used 35 spheres for cumulus generation with the filter explained in Section 3.6.1 and the R rotation for the 3D rabbit mesh.Both tests used a grid size of 20 × 20 × 20 voxels and a uniform hypertexture of 64 3 single-precision floats.The pre-computation in the CPU took under 0.010 s thanks to the no duplicate tracing (NDT) algorithm.Without the NDT algorithm, the CPU execution time would double in all cases.The graphic driver used the factory default configuration in all tests.The CPU load did not exceed 15% and and host memory usage did not exceed 24.5 MB in all tests when running at a resolution of 1920 × 1080 pixels.Regarding the GPU hardware usage, with the nVidia 1030 GT, for instance, the top mark frame-buffer usage was 21% at maximum frames-per-second, the bus interface was 4% at maximum power, and the maximum memory allocation peak was up to 7%.Table 3 summarizes the minimum distance in Full-HD (1920 × 1080 pixels) where the analysed graphic cards reach real-time with our algorithms.Table 3.The table below shows the minimum distance from the cloud at which full high-definition (HD), 1920 × 1080 pixels, rendering reaches 30 frames-per-second (FPS, minimum real-time).This distance is suitable for scenarios where getting close to the surface of the cloud is required.GTS, the performance at 800 × 600 pixels overcomes the limit of the hyper-realistic method shown in Reference [40]; (b) With the GeForce 1030 GT, the performance is optimum in most cases, except when the level of detail (LOD) equation is manually bypassed to force higher quality; (c) Performance in a GeForce GTX 1050 is optimum in 99% of cases; (d) Based on empirical tests in a GeForce GTX 970, the proposed model achieves a geometric frame rate increment in all algorithms.Results are very promising for the both cumulus and 3D mesh tracing algorithms in all resolutions, including full-HD.

Discussions
In comparison with particle system and basic ray-tracing methods, the proposed algorithm results in greater realism than that found in References [1,8,14,[41][42][43], as shown in Figure 17a-c, and higher performance, as shown in Table 4.The obtained realism is comparable to that of off-line and photo-realistic models that use all physical characteristics, as shown in Figure 18, which are not suitable for use in real time due to the long execution times required.A novel non-real time approach is found in Reference [2] which applies the radiance-predicting neural network model (RPNN) to emulate real cumuli.However, it takes ∼12 h to train the network using an nVidia Titan (Pascal) GPU and two Intel Xeon CPUs (24 cores/48 threads in total).The present research goes in the direction of balancing the realism of volumetric rendering and the performance of particle systems to overcome the limitations of the non-real time and photo-realistic methods cited in References [44][45][46].It is also a suitable framework for implementing lighting and shadowing algorithms, with lower GPU overhead as compared to the methods of other works [47][48][49].Our research improves pre-computation times to the millisecond level as opposed to minutes [19,48,50], or hours [2].
With respect to the research by Bouthors [40], our model shows an increase in FPS performance on the same graphics hardware.While not obtaining the hyper-realism of radiometry achieved in that model, our cloud generation method using 3D meshes improves on the geometric accuracy and smoothness obtained by Wither et al. in the rapid sketch modelling of clouds [51].In contrast to the realistic method by Mukhina [4], where the cloud is projected on an hemispherical disc, our volumetric algorithm allows for the navigation and traversal of gaseous mass to observe details.Our method also improves on the overall realism in the model of Elek et al. [20], while maintaining the same performance as the conservative ray-marching version of their implementation when executed at 1920 × 1080 pixels.Despite our efforts, our method lacks the precision of the excellent work by Klehm et al. [23] and Peters et al. [24] with respect to scattering and shadow maps.
None of the algorithms cited above cover the aspect of cloud motion and shape alteration.Our algorithm does not cover this aspect either, but it is well-suited to implement the effect of wind advection over a dynamic system of primitives.

Conclusions and Future Work
We have presented a real-time cloud rendering method with a good balance between realism and performance.The algorithm uses flat uniform noise through fBm with low memory usage (64 × 64 × 64 single-precision floats) to ensure efficient computational costs of ray-marching.In addition, the use of bounding boxes with very few spheroids allows for the application of a simple linear loop which discards clouds outside the camera view.This low number of spheroids is achieved by a new Gaussian equation that covers the most typical cumuli used in video games and virtual reality software.The linear discrimination can perform even better if sphere location is made with space-partitioning algorithms, but it requires longer coding time and complex CPU/GPU coordination.
The use of pre-calculated light stored in a voxel grid and optimizations like the no duplicate tracing algorithm improve the computing speed for light.This algorithm is optimal when the location of the viewer changes faster than the location of the light source or the geometry of the clouds.A limitation

Figure 1 .
Figure 1.The uniform noise in the color scale plot shows the irregular density of water droplets in a cloud hypertexture.In the MATLAB four dimensional plot, it is possible to observe the nature of atmospheric vapour in the color values in the cube.

Figure 2 .Figure 3 .
Figure 2. Bi-dimensional representation of uniform noise (left) and fractal Brownian motion (fBm) noise (right).The base noise on the left is sharper while the fBm is softer due to the octave accumulation.

Figure 4 .
Figure 4. Distance/density relationship.The greater the distance between the ray and the sphere center, the lower the density of water vapour.The exponential approach gives natural realism by softening the cloud borders.

Figure 5 .
Figure 5. Graphical and procedural explanation of the no duplicate tracing algorithm.(a) A basic model that illustrates the zones to sweep.In this case only I1 to O2 and I4 to O4 are traced following the ray; (b) A flow chart illustrating the the no duplicate tracing algorithm (Algorithm 2).

Figure 6 .
Figure 6.Bounding box delimitation of clouds.Smits' algorithm is useful for optimizing ray-marching by calculating the segment of the Euclidean straight line along which rendering must be performed according to the λ value.

Figure 7 .
Figure7.A two-pass algorithm for lighting along ray l and rendering along ray s. n s and n l are the direction vectors of the ray-marching and the light respectively.[0, P] and [0, D] are the processing dimensions of line integral in the cloud for ray-marching and light respectively.x(s) is frame-buffer pixel and v(l) is a point into a voxel.

Figure 8 .
Figure8.A three-dimensional Gaussian distribution.Some cloud shapes resemble a 3D Gaussian normal distribution, specifically the cumuliform ones.The cause is the altitude at which the condensation is produced.Since the condensation level may be conceived as a flat and horizontal surface, the clouds formed at this level have a flat bottom side, as cited by Häckel's book[34].

Figure 9 .
Figure 9. Example of clouds with Gaussian distribution.A real photograph (left) and our generated cumulus (right) using 35 pseudo-spheroids in total.

Figure 10 .
Figure 10.A crepuscular landscape with three Gaussian cumuli executing the volumetric scattering and voxel shading with cloud interaction and occlusion.

Figure 11 .Figure 12 .
Figure 11.After each recursive call, the interpreter generates a new proportional random radius and length for the primitives.This produces more natural and impressive cumuliforms.

Figure 13 .
Figure 13.Different cumulus formations using a Gaussian distribution and optimized metaballs.Just six spheres were used to render the samples by randomizing the sphere radius.

Figure 14 .
Figure 14.(a) Ellipsoid scaling process.The proposed algorithm multiplies each triangle vertex (P 1 , P 2 , P 3 ) by a factor that typically falls in the range (0.1, 2], once the barycenter (B) is calculated.Afterwards, the shader algorithm uses the maximum distance from the barycenter to the scaled triangle vertex to estimate the density in the ellipsoid/ray collision equations; (b) After rotation.As seen in the image above, showing the original ellipsoid in black and the resulting one in green, the previous equations allow overlapping of the R 3 direction vectors.Hence, according to the direction of the larger triangle vertex, the algorithms produce the resulting rotation.

Figure 15 .
Figure 15.Example of 3D meshes converted to cumuliform clouds resembling known shapes.(a)A hand mesh transformed into a soft 3D cloud.The final result is successfully optimized for real-time GPU algorithm animation and rendering; (b) A rabbit mesh with 370 triangles.80% decimation has been performed on this mesh, reducing the number of triangles from 1850 to 370 to achieve a suitable real-time performance.

Figure 17 .Table 4 .Figure 18 .
Figure 17.Differences between our model and other works.(a) A generated cloud using the particle systems of Harris [13], and Huang et al. [1].The contour of the cloud and the overall realism lack accuracy.(b) The cloud modeling method of Montenegro et al. [41] that combines procedural and implicit models.(c) Our ontological volumetric cloud rendering method with lighting.The procedural noise improves cloud edges and fuzzy volume effects.

Table 1 .
Pros and cons of each method.

Table 2 .
Typical parameters for our Gaussian equations where k i , t and m i are scale constants.