1. Introduction
The realistic simulation of physical effects is computationally expensive in some cases, especially when it comes to solving differential equations in a multidimensional space. These simulations are widely used in engineering and science, where accuracy is more important than the time it takes to produce the solution. For example, these methods can be used to simulate complex behaviours in physics [
1] and for the analysis of fluids [
2].
These physical models are also very convenient in terms of providing real-world gaming experiences in interactive applications, such as computer games and simulators [
3]. In this case, real-time execution is more important than accuracy, provided that the result is still realistic. However, achieving real-time solving of the equations can be prohibitively expensive, which reduces the addressable market of these applications.
This paper addresses the real-time simulation of 3D atmospheric clouds for computer games, which has been a relevant challenge for decades, as revealed in the [
4,
5] surveys.
To this end, and to perform cloud computer simulations, there are currently two possible approaches for the generation of these gaseous entities as stated by [
6]: ontogenetics and physically based methods. The first type (ontogenetics) means that a mathematical abstraction is used to simplify the complexity of meteorological physics to simulate clouds, which usually works in real time, such as reported in the works of [
7,
8], which are for computer games.
In contrast, the second type (physically based) implements precise physical simulation models of cloud processes to produce accurate and hyper-realistic results at the expense of computational efficiency, which often implies non-real-time results, such as the studies by [
9] for computer games and [
10] for movies, both of which improved the cloud radiometry. However, both approaches are resource intensive and require the use of graphics processing units (GPUs) to achieve efficiency in environments that are intended to be used in real-time simulation.
A particular application of differential equation solving in volumetric space is the computer simulation of cloud dynamics. Since air and water vapour behave as fluids, their dynamic behaviour can be modelled using the Navier–Stokes equations (NSEs). For this reason, they are solved by numerical methods that use finite-element structures rather than analytically. Despite the numerous improvements intended to increase the simulation speed, particularly the execution on GPUs, it is still challenging to execute this process on reasonably priced hardware.
An alternative approach involves using the output of multiple fluid simulations to train a surrogate model based on neural networks (NNs), as deep learning provides a good approximation for multidimensional nonlinear outcomes efficiently. Ref. [
11] demonstrates the use of deep learning algorithms with artificial neural networks and shows how they can predict input parameters in computational fluid dynamics, reducing computational costs and improving accuracy. More recent studies demonstrating the application of neural networks and deep learning methods for the simulation of fluid dynamics include the works of [
12,
13,
14]. This approach is also explored in this work, which involves replacing the parallel Navier–Stokes fluid solver with a recurrent neural network previously trained with multiple simulations in an engine to emulate the dynamics of clouds in real time. Thus, our method employs only such an RNN, as illustrated in the layout summary of the proposal in
Figure 1 to animate cumuliforms. Compared with numerical methods for solving the Navier–Stokes equations, the use of this approach provides a substantial enhancement in speed and eliminates the need for spatial grid bounds, increasing the overall computing power. While RNNs have been used to simulate blood movement [
15], heat propagation [
16] and turbulence prediction [
17], to our knowledge, RNNs have not been used for simulating cloud dynamics to date.
The scope of our method is not limited to the simulation of cloud dynamics and it could be applied to other similar complex physical systems like fire, twisters, waves, birds, etc. The speed-up gains obtained by means of the application of this surrogate model release resources for additional processing on the General-Purpose Graphics Processing Units (GPGPUs). This research proposes an approach that can run on entry-level graphic hardware with low computational cost and minimum energy consumption for flight simulation, virtual reality featuring outdoor scenes, architectural software, digital cultural heritage and nature-based computer games.
Therefore, the contributions of our research work to the state of the art of cloud motion are as follows:
New method that replaces a Navier–Stokes fluid solver with a Recurrent Neural Network.
Better constant real-time performance by the RNN fluids algorithm as compared to the previous literature.
Near-optimal performance in cloud dynamics prediction by using deep RNNs.
Natural cumuli behaviour in real-time irrespective of the 3D grid dimensions.
Novel approach to simulate other complex physical processes with neural networks.
In summary, this paper is organised as follows:
Section 2 presents related works on cloud dynamics simulations and the methods that have been developed to date, while
Section 3 explains the theoretical background for our cloud rendering method and the previous model for the replacement of the Navier–Stokes fluid solver.
Section 4 presents the proposed deep learning RNN structure and dataset used for the training phase, and
Section 5 describes the two experiments and the performance results obtained during the RNN inferences.
Section 6 presents the discussions and factual limitations related to the current proposed model, and finally,
Section 7 proposes possible future work and improvements.
2. Related Works
As mentioned in the previous section, there are two methods for cloud simulation: ontogenetics and physically based methods. A summary of the capabilities of each approach is presented in
Table 1. Notably, some outstanding works on cloud dynamics have paved the way for the present research. Ref. [
18] presented a new method called the coupled map lattice (CML), which extends a cellular automaton to simulate cloud dynamics. The drawbacks of this method are a constrained fluid grid size and rendering time steps from 3 to 30 s. Ref. [
19] proposed an improved method for simulating cloud formation on the basis of an efficient computational Navier–Stokes fluid solver; they combined a Navier–Stokes fluid solver with a model of the natural processes of cloud formation, including the buoyancy, relative humidity and condensation. Ref. [
20] developed a particle system model to render cumuliform clouds while introducing impostors as a feature to improve the speed. The author also developed a cloud dynamics simulation based on the Euler equations of incompressible fluid, the water continuity equation and the thermodynamic equation. Both [
19,
20] were limited to a 3D lattice for cloud dynamics, and their rendering methods are not considered state of the art. Ref. [
21] proposed a simple method for controlling cumuliform cloud simulations that can generate clouds with desired shapes specified by the user. In this method, the cloud formation process is controlled by a feedback controller, and the external forces are calculated by the geometric potential field. This method was constrained by a grid size of 320 × 80 × 100 and rendering time steps starting at 7 s. Ref. [
22] proposed an approach similar to that of [
23] but with accelerated particle system simulation by multicore and multithread hardware techniques. Currently, particle systems are no longer state of the art for cloud rendering. The work of [
8] demonstrated the use of explicit and implicit parallel programming techniques for volumetric cloud rendering and dynamics to achieve an optimum balance between realism and performance. The principle that guides this work is real-time cloud rendering using efficient algorithms that can run on standard computers with modest GPGPUs by conforming to the clouds with pseudo-spheroidal primitives. One of its main contributions is a programming framework that can be reused in the education field or software industry for real-time cloud simulation of outdoor scenarios. However, for cloud dynamics, it also employs a 3D grid volume that underpins the performance achieved in this work. Ref. [
24] proposed an efficient, physics-based procedural model for real-time animation and visualisation of cumulus clouds at the landscape scale. The authors coupled a coarse Lagrangian model for air parcels with procedural amplification by using volumetric noise. Finally, an article by [
25] proposed a novel model to simulate thermodynamic systems, such as cloud dynamics, by using a 2D cellular automaton that is oriented to satellite images; therefore, it cannot be compared with other 3D approaches. Additionally, ref. [
26] generated cumuli, strati, and stratoscumuli as well as realistic formations caused by changes in the atmosphere to simulate large-scale cloud super-cell clusters of cumulonimbus formations. The model also enables the efficient exploration of stormscapes with a lightweight set of high-level parameters that explore cloud formations and dynamics. This method is limited by the grid size and cannot simulate long cumuli transition across a wide space.
To avoid the fluid grid restrictions of previous works, a new approach is needed. Deep neural networks (DNNs) are considered approximators of universal functions and can be used as approximations of highly accurate dynamical models [
28]. Ref. [
29] reviewed neural network frameworks in scientific simulations, highlighting their advantages and limitations and presenting future research opportunities for improving algorithms and applications. As an example of these applications of DNNs in physics simulations, ref. [
30] presented a framework that uses DNNs to learn accurate constitutive models of complex fluids, enabling rapid soft material design and engineering by predicting fluid properties in multidimensional simulations. This problem is very similar to that which we are trying to solve in this paper. Therefore, a trained neural network algorithm can be used to model cloud dynamics, which are described by complex equations representing physical processes. The main advantage of this approach is related to the computation time of the DNN output. Once the network is trained on the cloud dynamics model, the inference times are fixed and low. Thus, the execution time of the DNN is predictable and does not require special computing hardware to calculate the output during the inference process. Additionally, the realistic/natural behaviour of the simulation during the iterative execution of the DNN model can also be evaluated according to the precision metric related to the DNN training process.
Currently, and to the authors’ knowledge, there is no related work that employs a combined DNN with cumulative fluid dynamics for cloud movement simulation in computer graphics. The most similar work is the research published in [
31], which accomplishes cloud animation at the landscape scale by employing machine learning. The authors utilised a deep learning convolutional generative adversarial network (DCGAN) trained with captured cloud videos to generate interactive cloud maps on a real-time 3D application, limiting the input image size to a low image resolution and applying preprocessing. This approach reduces the training time while producing detailed animations without physics simulation and was validated through human perceptual evaluation, producing realistic results with minimal computational overhead. However, this method has a low volumetric shading effect and cannot simulate long cumuli transitions across space. A similar approach, which was investigated at Disney Laboratories, is a non-real-time method for the hyper-realistic rendering of clouds inspired by [
10], and it utilises the radiance-predicting neural network model (RPNN) to emulate real cumuli.
Furthermore, it uses an efficient technique for synthesising cloud images by combining Monte Carlo integration and the RPNN. This method bypasses full light transport simulation during rendering by pre-learning the spatial and directional distributions of light from cloud samples in high-resolution images, and a hierarchical 3D descriptor enhances the neural network’s ability to predict radiance accurately and quickly. The GPU-implemented method produces high-quality, temporally stable cloud images suitable for cloud design and animation. While this method has very good rendering quality, it lacks real-time capability. Finally, ref. [
32] studied cumuliform cloud formation control by using a parameter-predicting convolutional neural network. By employing a rendering device, this research combines the DNN with clouds to generate cumuliforms in real time and generates clouds with desired shapes by solving an inverse cloud formation problem using a convolutional neural network (CNN). The proposed model estimates space–time simulation parameters for cloud images, which are then used to execute fluid dynamics simulations. Furthermore, this approach combines feature extraction, adversarial and parameter generation networks, compressing high-dimensional parameters into a low-dimensional latent space and enabling realistic cloud evolution and shape generation. However, its drawbacks include that it does not consider the transition of these cumuli and requires a high-end rendering device.
Table 2 presents the main characteristics used to compare the different approaches. With respect to the learned models associated with each method, only this research and [
32] focused on cloud dynamics provide a general framework for cloud movement simulations. However, real-time inference cannot be applied in a low-cost environment. In the case of [
10], the radiance function is learned for a single NN architecture (MLP, Multi-Layer Perceptron) with no memory, as is true for the CNN in the case of [
32]. Unlike RNNs, NN architectures with no memory are unsuitable for models that need to remember past information gaps, which in this case includes older cloud movements. The use of a CNN or MLP implies frame-by-frame predictions, whereas the use of an RNN allows several frames to be generated simultaneously, resulting in better performance. Given these considerations, the work presented here enables the development of a general framework for cloud dynamics in an end-to-end performance environment using neural network inference. These features make it a better performance approach than the other architectures presented in this section. The present work extends the research of [
8] to include a new cloud dynamics simulation approach based on deep learning methods and recent artificial intelligence (AI) techniques, avoiding volume lattice space constraints and increasing the speed of real-time fluid dynamics computations in computer games.
4. Proposed Method
In this work, we aim to replace the cloud dynamics method proposed by [
8,
23] with an RNN-based approach to improve the computational efficiency and address the limitations of the 3D spatial grid during real-time execution. In this method, a cloud is modelled on the basis of a set of spheres. Each sphere in the cloud is characterised by its position in three-dimensional space, given by the coordinates
, and its velocity vector, which is also in three dimensions
. The velocities of the spheres are crucial, as they determine the future positions of the spheres and are thus the primary variables of interest in our study. The neural network’s objective is to predict the velocities of each sphere
at the next time step, given the current state of the system. This process includes the velocities of all spheres at the current time step. Working with velocities instead of coordinates allows us to obtain a method that is invariant to spatial translations. The rationale for using a neural network is that it facilitates the creation of a model that can learn complex interactions and dynamics from data, potentially capturing nonlinearities and dependencies that might be challenging to model explicitly. The motivation behind this configuration is to leverage machine learning to model and predict the complex dynamics of the cloud of spheres. By learning from simulation data generated by solving the Navier–Stokes equations, the neural network can potentially serve as a faster surrogate model, enabling rapid predictions without the need to solve the equations directly at each time step. In particular, the neural network learns and replaces the procedures presented in Lines 2 to 9 in Algorithm 2. To train the RNN, we used a spatial domain of 30 × 7 × 7 as the grid size in our learning prototype, and we applied the constants 0.4 for
, 0.00001 for the atmosphere viscosity (
) and 0.2 for the wind force (
F) as inputs. The wind force was set to a constant value and direction for all of the cells in the grid before starting the fluid simulation, as explained in Algorithm 2.
The following subsections first explain how the dataset for training was produced, and then, the details regarding the RNN architecture are presented. Finally, the training and inference details are described.
4.1. Dataset
We built a cloud dynamics dataset by using the simulation provided for the fluid machine proposed by [
8,
23]. To this end, we carried out a series of manual executions and data extractions from the method mentioned above, randomly modifying key simulation parameters such as the wind force and the number of spheres, which define the cumulus form, for up to a total of 1100 different simulations. Each of these simulations was composed of a sequence of 1000 iterations. Then, we applied the sliding window method to those sequences, with a window size set to 10 (see
Figure 4 and
Figure 5). The window size was calculated using a random search hyperparameter procedure in the range of 1 to 50 steps. Metrics associated with training (accuracy, etc.), and also execution times, were used as criteria for selecting the sliding window size parameter. The best value obtained which fulfils both criteria is a size value of 10. In the case of other cloud structures, the generalisation of the method is immediate since it is a standard hyperparameter search technique that depends on the training data used, not on the cloud structure used. We considered that a larger window would confer greater stability and robustness to the prediction despite worsening the model’s initialisation and execution time. Therefore, as the execution time is a key point to consider, we limited this parameter to meet our expectations regarding computational efficiency.
After the sliding window method was applied, the resulting dataset consisted of 1,013,312 training samples and 120,000 test samples. Each sample consists of a sequence of length 10 (the window size), containing the velocities in the three axes of motion
for each sphere at every time step and meaning that each instance in the dataset contains 105 attributes for each timestep. The maximum number of spheres used during data collection was 35, which resulted in 105
coordinate tuples, and samples with fewer spheres were zero-padded to maintain size consistency. The target to predict is the next iteration in every case. A visual scheme of these processes can be seen in
Figure 4 and
Figure 5. The rationale for using 35 spheres is that it has the advantage of generating most types of cumuli by changing the radii of the pseudo-spheres following our parametrised Gaussian density equation as explained in [
8,
33].
Neural networks use a fixed-size input equal to the number of neurons in the input layer for their operation. Therefore, the variability in the number of spheres that make up the cloud to be simulated creates an issue regarding this requirement. To solve this problem, the zero-padding technique was used on samples with fewer than 35 spheres, which was set to be the maximum. This technique consisted of filling necessary positions in the input vector with zeros while maintaining size consistency with the input layer, occupying the vacancies in the missing spheres up to a defined maximum number. A post-padding technique is used, filling with zeros at the end of every sequence (training sample) when needed. This technique ensures that contextual information at the end of the sequence is preserved in the neural network training and ensures that the initial state of the RNN is always be the same. In addition, padding is only used in the first 10 sequences and the last 10, which represents an impact of 20 sequences out of the 1,013,312 used for training. In other words, 0.002% of padding data is added, the effect of which is negligible in relation to the information contained in the training set. Therefore, the effect is practically null in the creation of the model, and therefore in the inference and the generated output.
4.2. Deep Neural Network Architecture
Notably, once the data are stored and processed, there are multiple ways to input them into the neural models. First, the selection of an architecture that best suits this specific scenario was conducted. In the present work, the problem arises from an iterative behaviour in which the desired output in each iteration should form a variable-length sequence, which must be as short or as long as desired. Furthermore, as in any dynamic process, the previous stages, which can be understood as a trajectory, provide the necessary information to infer the successive stages of the sequence being emulated. These elements are often problematic when working with feedforward neural networks, which require a fixed and non-phase input and output size. These problems are solved by using RNNs, whose architectures are specifically designed to process time-dependent data streams.
Within the family of recurrent neural networks, we found several types of recurrent units that perform this computation throughout time. Among the different units, the most important are the multi-layer Elman RNN [
42], the Long Short-Term Memory (LSTM) [
43] and the gated recurrent unit (GRU) [
44], among others. An illustrative example of the working scheme of these models is presented in
Figure 6 to provide a performance comparison:
The Elman RNN [
42] is one of the simplest RNN models, which means that it is fast; however, it has learning issues due to vanishing gradients. For each layer
l of the network at timestep
t, each hidden unit
of that network computes the value shown in Equation (
12).
where
is the hyperbolic tangent function, i.e.,
;
are the input data;
is the learnable weight matrix for the input data;
is the bias of the input data;
is the output of the layer on the previous timestep;
is the learnable weight matrix for the output of the previous timestep; and
is the bias of the output of the previous timestep. Importantly, when
,
is equal to
, i.e., the input is the output of the previous layer.
LSTM is a more complex model designed to overcome the problem of vanishing and exploiting gradients, which is achieved by means of memory cells and gates that control the flow of information through the network [
43]. However, this additional computation makes its learning slower than that of other alternatives. The three main components of the LSTM unit are as follows:
Memory cell , which is responsible for storing long-term information.
Hidden state , which represents the output of the LSTM on each timestep.
Gates that control the information flow. There are three gates: forget , input and output .
For each layer of the LSTM, Equations (
13)–(
18) are computed for each unit:
where
and
with
being the learnable weights of the input data for the input, forget and output gates and memory cell, respectively.
are the learnable weights for the hidden-to-hidden connection,
and
are the biases of these connections, and ⊙ represents the elementwise multiplication operation.
Finally, the GRU is similar to LSTM in that it aims to maintain long-term information retention, avoiding vanishing and exploiting gradients while reducing the number of learnable parameters [
44]. This objective is achieved by using only two gates: update
and reset
, which control the amount of information that should be kept or forgotten, respectively. For each layer of the GRU, each unit computes the operations presented in Equations (
19)–(
22).
where
with
represent the learnable weights for the input data for the reset and update gates and the output, respectively, and where
are the learnable weights for the hidden-to-hidden connection.
For this work, the proposed neural network architecture is as follows: an input layer followed by five stacked, hidden recurrent layers with 350 hidden units each and a final dense layer. The rationale behind this architecture is that the stacked RNN layers provide sufficient depth in the neural network to accurately learn the cloud dynamics. Additionally, each layer consists of 350 units, which is the maximum number of spheres multiplied by the number of timesteps; the idea is that each unit specialises in a specific axis of a sphere at a specific timestep. Finally, the output dense layer performs a linear transformation of the output of the last recurrent layer to produce the final output. This output is the velocity of each sphere of the cloud on the three axes of motion. For each layer, a dropout strategy is employed to avoid overfitting while training. We used a dropout value of 0.2, as it is the most recommended value for this learning approach. Importantly, the hidden units of the recurrent layers can be Elman, LSTM or GRU units, the selection of which is determined on the basis of the experimental study presented in
Section 5. A schematic of the final RNN architecture is shown in
Figure 7. The implementation was carried out by using the PyTorch 1.9 library for the Python 3.x programming language [
45].
4.3. Training Details
The training process is specifically designed to teach the RNN to function as a surrogate for the traditional Navier–Stokes solver. The core objective is to minimize the discrepancy between the sphere velocities predicted by the network and the ground-truth velocities generated by the physics-based fluid simulation. To achieve this, we defined a clear training methodology, detailed as follows:
Loss Function: We employed the Mean Squared Error (MSE) as the loss function between the predicted velocity values for each sphere on each axis of motion and the actual values, as shown in
Figure 5 and defined in Equation (
23):
where
contains the predicted velocity values on each axis of motion for each sphere,
,
is a vector that contains the actual velocity values on each axis of motion for each sphere, and
and
represent the coordinates
for the sphere
i for the predicted values and the actual values, respectively. Finally,
represents the number of spheres. This metric is ideal for this regression task as it quantifies the average squared difference between the predicted and actual velocity vectors, directly measuring the model’s prediction accuracy.
Hyperparameter Tuning: The network’s architecture and training parameters were established through empirical evaluation to optimize performance. The final configuration consists of five stacked recurrent layers with 350 hidden units each, a dropout rate of 0.2 applied to each layer to mitigate overfitting, and a batch size of 64.
Training Convergence: The model was trained using the ADAM optimizer [
46] with
,
for the exponential decay rates for the moment estimates and
. These values are the default values recommended by the ADAM function in the PyTorch library. The initial learning rate value was set to
, which was progressively reduced using an exponential decay scheduler, which was responsible for reducing this parameter as the training progressed, as expressed in Equation (
24).
For this particular case, we set the multiplicative factor of the learning rate decay (
) to 0.95. This optimisation strategy aims to avoid overfitting. We monitored the MSE on a validation set to track convergence and prevent overfitting. As illustrated by the learning curves in
Figure 8, the training process was concluded based on an early stopping policy, which terminated the training if the validation loss failed to show improvement over 20 consecutive epochs, i.e., the training did not improve over 10% of the total epochs.
The experiments were performed by using a laptop equipped with an Nvidia GeForce RTX 2060 (Turing, 1920 cores) running on a 64-bit Intel i7-Core CPU 10750H@2.60 GHz (10th generation, 2020) with 16 GB of random access memory (RAM).
Importantly, since this is a novel approach and the authors have not found any similar approach to that proposed in this paper, autoregressive integrated moving average (ARIMA) models [
47] were developed to determine the velocity of each sphere in the three axes of motion, as they represent one of the most employed methods for time series forecasting. This method was employed as a baseline for comparing the performance of the proposed RNN method. The parameters employed on each individual ARIMA model for each variable were obtained by means of the
auto.arima function of the
forecast R package [
48].
4.4. Inference Details
The inference process is the focus of the present work. At this point in the research, the neural network was tuned and ready to make predictions regarding cloud dynamics. This stage is directly related to the training phase since the inputs must have the same characteristics as those shown to the network during training. With this consideration in mind, the model was implemented in a function that initially receives ten iterations generated by the Navier–Stokes fluid solver. The network next accepts this input and generates, via prediction, the variation in the position of the spheres for the following time instant (see
Figure 1). Then, in an iterative process, the network again accepts a sequence of size ten as input as formed by the newly predicted value and the previous nine. This process is repeated as many times as necessary.
4.5. Computational Complexity Analysis
To address the impact of the model’s components on its computational performance, we present a theoretical complexity analysis of the inference step, as execution time is a critical factor for real-time simulation. The computational complexity of a single inference step for a stacked RNN is primarily determined by the matrix multiplications within its recurrent layers. Actually, in this study, only the time complexity for the LSTM layer is addressed as it is the most complex model. Then, the complexity of the LSTM model is
[
43] where
h is the number of hidden units, and
d is the dimension of input data, i.e., number of timesteps and number of features, which in this work is the number of spheres multiplied by 3.
As we are stacking multiple layers of this network, then the final computational complexity is:
where
L is the number of stacked recurrent layers. The final linear layer adds a smaller term of
. This formulation allows us to analyse the performance impact of each component:
Window Size (d) and Number of Layers (L): The complexity scales linearly with both the window size and the number of layers. This means doubling the number of layers or the window size will roughly double the inference time. This justifies our choice of moderately sized values (10 for W and 5 for L) to maintain real-time performance.
Number of Hidden Units (h): The complexity scales quadratically with the number of hidden units. This makes h the most critical hyperparameter for computational performance. Our choice of was empirically determined to provide sufficient model capacity without being prohibitively expensive.
Type of Recurrent Unit: The choice between Elman RNN, GRU and LSTM primarily affects the constant factor hidden by the Big O notation. An LSTM unit involves more internal calculations (four gates) than a GRU (two gates and a candidate state) or an Elman RNN (one hidden state calculation). Consequently, for the same set of hyperparameters (), an LSTM-based network is computationally more intensive than a GRU or Elman RNN. This presents a trade-off between the unit’s expressive power and its computational cost.
Number of Spheres (): The number of spheres directly influences the input feature size (
). The complexity scales linearly with
i. This theoretical result is strongly supported by our empirical findings in
Section 5. As shown in
Table 4 and
Table 5, the mean inference time scales almost perfectly linearly with the number of cumuli (and thus, the total number of spheres) being simulated.
This analysis highlights the trade-offs made in designing the network and confirms that the number of hidden units (h) is the most sensitive parameter for performance, while the total number of spheres () results in a predictable, linear increase in computation time.
6. Discussion and Limitations
The advantage of our method over the CUDA fluid dynamic simulation method, which is based on the work of [
8,
21,
39], is its ability to emulate cloud movement with constant performance regardless of the size of the three-dimensional grid. Thus, the clouds can be animated over an infinite scenario without computational spatial bounds or other data structures requiring memory. The ability to conform our cumuli by using pseudospheres or pseudo-ellipsoids as shown in [
49,
50,
51], along with the RNN, is another feature that enables fine control over the cumuliform gaseous resemblance that other real-time methods based on meteorological datasets or meshes lack [
9]. Furthermore, the sphere positions were useful as weights for our RNN input during training and inference.
In our tests, we improved the average computation time for each time step of the simulation in [
32] from 0.031 s on an Nvidia Titan X GPU to 0.016 s on a GTX 1070 non-Ti (first experiment) and 0.006 s on a GTX 1070 laptop (second experiment) for the same number of cumuli. We also outperformed the best case in [
26], which used an Nvidia GTX 1080 with a
grid size employing 0.040 s per frame. With respect to the work in [
31], which used a 4 GHz CPU and an Nvidia RTX 2060 GPU (+11% faster in effective speed than an Nvidia GTX 1070 (
https://gpu.userbenchmark.com/Compare/Nvidia-RTX-2060-vs-Nvidia-GTX-1070/4034vs3609, accessed on 22 August 2025 with
pixel screen resolution), we obtained very similar real-time performances and better cumuliform rendering quality even when using older equipment with higher screen resolutions.
The previous comparison is qualitative because the source code of the mentioned work is not publicly available, so no objective measurement could be made. While the Goswami rendering method is more efficient than our method in terms of transforming the cloud maps into a unique 3D hypertexture, we achieve quite a similar performance in terms of arranging a set of spheres as cloud primitives in the scene bounding box. The Goswami inference method is based on Deep Convolutional Generative Adversarial Neural Networks (DCGANs), which were trained to generate synthetic images. Goswami then employed this technique to create a cloud image for each frame without considering the dynamics of cloud movement in the neural network. These dynamics were incorporated into the input parameters for the inference made in the DCGAN to generate a sequence of cloud images. Therefore, the DCGAN does not model the dynamics of cloud movement but rather specific image sequences that it has learned from the input sequences. According to the authors, these sequences are very limited in number, as there are 13 videos. From the previous description, it becomes evident that the proposed models are very different, and the structures of the neural network models are not comparable. However, they can be compared by using the time metrics associated with the computational generation of the models and the image sequences. As a general rule, DCGANs are more complex to train than RNNs are, and in fact, the authors indicate that they required 2000 to 3000 epochs to train the two networks used, which they call T_DCGAN and R_DCGAN, and each training epoch ranged between 6.52 and 9.07 s depending on the encoding, grayscale, and RGB used for the images obtained from the video sequences. According to the authors, these images have a resolution of 256 × 256 pixels; this resolution is inadequate for simulations that render reasonable image quality. In the case of the model proposed in this paper, approximately 200 epochs were used for network training. However, the training times of each epoch were greater since the amount of data considered was much greater than that by Goswami, as described in
Section 4.1. This divergence again indicates that the models are different due to the datasets used in each neural network type.
With respect to the inference times of the networks for image generation, Goswami indicates that “The total per-frame execution time of the proposed method is an average of 3.02 ms”. In the case of the RNN described in this paper, the average result is 2.60 ms per inference step for Full HD (1920 × 1080) resolution qualities without landscape rendering. This advantage is shown in
Table 6 in bold text, where we obtain 2.60 ms as the best-case time based on the benchmarks reported in
Table 4 in bold text.
Additionally, our method reduces the simulation time step compared with previous methods, e.g., [
21]. A comparison of the current state-of-the-art studies is detailed in
Table 6.
To evaluate the overall system performance on a standard computer in frames per second (FPS), we used the same equipment for the first experiment in
Section 5.2.1. The resulting measurement exceeds the minimum required real-time threshold in most cases.
Table 7 depicts the FPS for medium and Full-HD screen resolutions for one and ten cumuli, respectively, with 35 spheres each.
The visual quality of cumuli rendering by our method must sustain the optimum real-time performance above 30 FPS while running the fluid dynamics RNN inference within the required time in most cases. The use of pseudo-spheroidal primitives to cause the clouds to conform has many advantages for AI but imposes constraints on the time complexity and the calculations performed by the rendering shader algorithm. However, the visual quality of our rendering method outperforms a significant number of previous and present related methods. The likelihood of the resulting clouds can be evaluated empirically in the comparison between our rendering method (
Figure 15a) and a real photograph (
Figure 15b). To validate the realism, we applied the quantitative metric of the Universal Quality Image Index (UQI) method by [
52], obtaining a correlation score of 0.89 with the aforementioned images, 1 being high quality.
For further comparison between our method and the referenced state-of-the-art real-time cloud dynamics works, we include snapshots of these clouds in
Figure 16.
According to the results shown in
Figure 9,
Figure 10 and
Figure 11, our method can be applied in computer games, flight simulation systems, atmospheric simulations, educational tools for climate awareness, etc., without losing performance. This goal is possible because of the better efficiency of the RNN when compared to other types of fluid simulation techniques. Furthermore, this method can be applied in the computer game industry when rendering outdoor scenarios with medium screen resolutions to improve both the computing efficiency and the user experience. The realism and likelihood of cumulus atmospheric behaviour are sufficiently accurate, along with various forms that the RNN randomises, avoiding expensive computational overhead at resolutions lower than 1920 × 1080. Higher realism in terms of aspects such as rendering quality implies a severe reduction in the overall real-time performance that would affect users with basic hardware.
It is important to keep in mind that outputs produced by neural networks are approximations of the real solution to the problem. Even if we use a very large sample and greatly reduce the error between the predicted value and the real value, there may always be traces of this approximation. Notably, the data used to train the network were derived from the results calculated by the solver of the incompressible NSE, and these results are already approximations of the mathematical solutions to the equations. Therefore, the solution provided by the neural network will always be an approximation with very high accuracy regarding the calculated data from the NSE, but it will never have better accuracy than the solution to the original incompressible NSE. However, the accuracy of the neural network can be tuned to obtain a more precise model. As the accuracy approaches 100%, the model more closely reproduces the output of the original equation solver presented in Algorithm 2. Therefore, under our iterative method, where the previously predicted values are used to predict the next values, these small errors can be expected to propagate and increase as the number of time steps increases. The use of neural networks also involves an additional problem: the dependence of their performance on the training data used. Therefore, proper performance requires training with the correct data. For the same reason, when scaling the method to a larger number of spheres and thus creating a new network that can handle this increased size, the network parameters must be retrained. This retraining action should not be a problem in terms of the method’s performance, as by following the same training process, the new network should converge back to the correct operation. However, this process can be tedious when the problem requires a very long training time.
7. Conclusions and Future Work
This paper presents a new real-time realistic method for simulating cumuliform fluid dynamics with RNNs. By using a GPU pseudosphere-based approach, we achieve natural-looking movement of cumuli with a diversity of forms after RNN deep learning. Additionally, we achieve better overall system efficiency at resolutions lower than 1920 × 1080 and overcome the need for spatial grid bounds due to the use of an RNN trained with the result of the atmospheric physical simulation previously executed in parallel on the GPGPU. The proposed method consists of training different types of RNNs, such as LSTM, the GRU or the Elman RNN, to iteratively obtain a cloud dynamics predictor, similar to a multidimensional time series forecasting problem. Therefore, the neural model solves the fluid physics equations normally calculated by the software engine more efficiently during real-time execution. The empirical results demonstrate a constant and high real-time performance, which implies the low energy consumption of our RNN method compared with other limited or computationally expensive fluid dynamics models. Furthermore, scalability is also a relevant advantage of our method since the complexity of the method increases linearly with the number of spheres. Thus, the training and implementation of a neural network capable of moving an arbitrary number of spheres are straightforward when using this method, as shown in the two different experiments, which demonstrate that the proposed method achieves a much better computational efficiency than that of the alternatives proposed in the literature. Under these premises, we can confirm the initial hypothesis and the valuable application of our algorithm in computer simulations of natural phenomena.
In addition, this work may open the door to exploring these advantages in other similar processes in which computational efficiency is a crucial issue and in which some loss of precision in the simulation results of the model is acceptable. We refer here to the context of simulation tools for which the dynamics of the bodies involved constitute a computationally complex process; this method could be applied in the same way to reduce the computational time in those contexts. Additionally, the method can be used to obtain neural network approximation functions for more comprehensive models that can incorporate features such as water vapour, cloud density, phase transitions, and buoyancy while maintaining the computational efficiency necessary for integration in realistic scenarios such as those presented in the paper. Regarding future activities, we intend to work on the correlation between these characteristics and data based on the associated physical model simulations. To incorporate these new physical parameters, new equations of state must be added to the solver. For example, in the case of temperature, pressure, and cloud density in [
53], a new equation of state is added that allows these three characteristics to be correlated with the fluid model. For each set of characteristics, a state equation can therefore be added to the Navier–Stokes fluid solver. The data generation procedure for training will use this new solver, modified by adding the state equations.
Under the current assumption, if a different LSTM model is run to control each cloud present in the simulation, the execution time increases linearly with the number of clouds since these models are executed sequentially. However, we believe that this method has the potential to support parallelisation of these processes on the GPU, thus achieving the minimum time regardless of the number of clouds.
On the other hand, in the present work, we have implemented an LSTM network configured to predict the motion of up to 60 spheres, intrinsically limited to this number by the architecture of this particular network. As mentioned above, this method has the advantage of being fully scalable by simply increasing the number of input neurons to the desired value, and thus, when trained with appropriate data, one can extend the results of the present method to clouds composed of an arbitrary number of spheres. In future work, we propose scaling the size, increasing the cloud diversity and experimentally corroborating that the computational efficiency is also maintained for this case. In addition, an exhaustive study of recurrent neural network types and associated metrics will be conducted. The objective will be to provide different types of networks (RNN, LSTM, GRU, and Transformers) as a selection framework based on these metrics, computational performance, and the realism of the images produced.