Reinforcement-Learning-Based Optimization of Convective Fluxes for High-CFL Finite-Volume Schemes

Rozhkov, Andrey; Kozelkov, Andrey; Kurulin, Vadim; Shishlenin, Maxim

doi:10.3390/computation14040075

Open AccessArticle

Reinforcement-Learning-Based Optimization of Convective Fluxes for High-CFL Finite-Volume Schemes

¹

Nizhny Novgorod State Technical University n.a. R.E. Alekseev, Nizhny Novgorod 603155, Russia

²

Russian Federal Nuclear Center—All-Russian Scientific Research Institute of Experimental Physics (RFNC-VNIIEF), Sarov 607188, Russia

³

Institute of Computational Mathematics and Mathematical Geophysics, Novosibirsk 630090, Russia

^*

Author to whom correspondence should be addressed.

Computation 2026, 14(4), 75; https://doi.org/10.3390/computation14040075

Submission received: 15 February 2026 / Revised: 16 March 2026 / Accepted: 17 March 2026 / Published: 24 March 2026

(This article belongs to the Section Computational Engineering)

Download

Browse Figures

Versions Notes

Abstract

In this article, we explore the possibility of using reinforcement learning to create convective flow approximation schemes that maintain accuracy and stability at high Courant-Friedrichs-Lewy (CFL) numbers in the finite-volume discretization of advection equations. Unlike most existing data-driven discretization methods, which primarily concentrate on spatial grid refinement, this work emphasizes increasing the allowable time step without compromising solution accuracy. This approach reduces the total number of time integration steps, thereby enabling faster computation. A neural network is used as a surrogate model for reconstructing the convective flow, which takes as input local information about the flow, scalars, and geometry and predicts scalar values at node points. Reinforcement learning is used for training and is formulated as a policy optimization problem, where the long-term reward is defined as the difference between the numerical and reference solutions over the entire simulation period. Both the genetic algorithm and the Deep Deterministic Policy Gradient (DDPG) method are investigated. The effectiveness of the approach is evaluated using a one-dimensional nonlinear advection problem with a constant velocity field. Despite the simplicity of the test case, the results demonstrate that the trained convective flux approximation scheme achieves accuracy comparable to or better than the classical second-order linear upwind (LUD) scheme, while operating at CFL numbers 2–50 times higher than the optimal CFL for LUD, thereby reducing the simulation time by the same factor. This allows for a wider range of stability and accuracy in the finite-volume method and the use of larger time steps without compromising the quality of the solution. The study is intentionally limited to a single spatial dimension and serves as a basic analysis of the method’s applicability. The results demonstrate that reinforcement learning can successfully find more convective flow approximation schemes that improve efficiency at high CFL numbers than conventional explicit second-order schemes, establishing a framework that is subsequently extended in our follow-up work to improve training methods and three-dimensional complex transport problems. The proposed method improves the spatial discretization of convective fluxes, which is independent of the choice of time integration scheme. Therefore, the neural reconstruction can in principle be used in both explicit and implicit finite-volume solvers.

Keywords:

convective term; approximation scheme; neural network; reinforcement learning

1. Introduction

When designing, analyzing, optimizing, or solving direct or inverse problems in aerodynamics, flow fields are typically modeled using CFD solvers. However, CFD simulations are usually computationally expensive, requiring significant memory and long simulation times. These limitations restrict the ability to explore the design space and prevent interactive design. In recent years, the application of deep learning [1,2,3,4] and data-driven methods has attracted considerable interest due to their potential for faster flow prediction and reduced computational costs compared to traditional CFD methods [5,6], especially for inverse problems [7].

An approximation model for the Navier–Stokes equations in aerodynamics was proposed for real-time prediction of non-uniform steady laminar flows using convolutional neural networks (CNNs) [8]. Machine learning was combined with flux limiting for property-preserving subgrid-scale modeling within flux-limited finite-volume methods for the one-dimensional shallow-water equations. Numerical fluxes of a conservative target scheme were fitted to coarse-mesh averages of a monotone fine-grid discretization, with a neural network parametrizing the subgrid-scale components.

Numerical simulations were conducted for a laminar mixed-convection problem in a lid-driven square cavity containing two internal rectangular blocks, oriented vertically or horizontally [9]. CFD results were then used to train and test an artificial neural network (ANN) to predict new thermal behavior cases due to mixed convection, reducing computational time.

Physics-informed neural networks (PINNs), which integrate data-driven and physics-driven modules for numerical simulation modeling, including diffusion, flow, and phase-transition problems, were investigated in [10]. This study explored the underlying physical laws embedded in data and extended PINN applications to multi-physics coupled systems, addressing inverse problems for governing equations of phase, temperature, and flow fields, thereby enabling parameter inversion under multi-physical conditions.

The Large Language Model Meta AI (Llama) 3 was investigated for predicting fluid flows with varying dynamical complexities [11]. Results demonstrated that the Llama 3 model can be applied to fluid dynamics problems with minimal engineering adaptation and without fine-tuning pre-trained weights, achieving improved accuracy and robustness compared to conventional fully connected or recurrent neural networks with comparable capacity and training data.

It has been shown that strict error bounds exist when approximating the incompressible Navier–Stokes equations with PINNs [12], and that the underlying PDE residual can be made arbitrarily small using tanh neural networks with two hidden layers. The total error is estimated based on the training error, network size, and number of quadrature points.

A review in [13] explored the integration of CFD and artificial intelligence (AI) for modeling multiphase flows and thermochemical systems, which involve nonlinear interactions, complex geometries, and high computational costs. Despite this promise, AI-enhanced CFD still faces challenges, as many AI models rely heavily on empirical data rather than physics-based simulations, limiting generalizability and physical consistency.

CNNs have been applied to design linear parameter-varying approximations of incompressible Navier–Stokes equations [14]. Considering potentially low-dimensional parameterizations, the use of deep neural networks (DNNs) in a semi-discrete PDE context was discussed and compared to approaches based on proper orthogonal decomposition.

A neural-network-based method for obtaining analytical-function solutions of the Navier–Stokes equations was proposed in [15], consisting of two parts: the first satisfies the boundary conditions with no adjustable parameters, and the second ensures that the governing equations are satisfied within the domain while the boundary conditions remain intact. The second part involves a neural network whose parameters are determined to minimize the resulting approximation error.

Integrated computational heuristics were applied to heat transfer and thermal radiation problems in two-phase magnetohydrodynamic flows with nanoparticles, combining neural networks for accurate approximation with global optimization via genetic algorithms and local refinement using sequential quadratic programming [16].

A reduced-order model exploiting PINNs for solving inverse problems in Navier–Stokes equations was presented in [17]. In [18], a numerical method was developed coupling PDE solutions via machine-learning approaches: PDEs are solved in subdomains, while ANNs are trained to couple solutions across interfaces, yielding full-domain solutions.

A combination of machine learning and flux limiting for subgrid-scale modeling in flux-limited finite-volume methods for the 1D shallow-water equations was also proposed [19], fitting conservative scheme fluxes to coarse-mesh averages of monotone fine-grid discretizations.

Simulations of turbulent flows remain limited by the inability of heuristics and supervised learning to model near-wall dynamics [20]. Scientific multi-agent reinforcement learning (SciMARL) was introduced for discovering wall models for LES. In SciMARL, discretization points act as cooperating agents that learn LES closure models from limited data, generalizing to extreme Reynolds numbers and unseen geometries. This approach reduces computational cost by several orders of magnitude while reproducing key flow quantities.

A diffusion model utilizing high-fidelity training data was proposed in [21], enabling reconstruction of high-fidelity fields from low-fidelity or randomly sampled inputs. Physics-informed conditioning based on known PDEs further enhances accuracy when available.

Accurate approximation of the convective term in nonlinear advection equations is critical, as it strongly affects solution quality [22,23]. This is especially important for finite-volume discretizations on arbitrary unstructured meshes, where only second-order schemes are typically achievable, often highly dissipative due to cell irregularity [24,25].

Modern CAE software suites, including LOGOS-Aerohydro [26,27], employ finite-volume discretizations for advection equations. Therefore, developing neural-network-based convective term approximation schemes using one-dimensional problems as a testbed is a reasonable starting point.

Research on basic convective flow discretization schemes on unstructured meshes is presented in [22,24,25]. Dissipative properties of second-order central-difference and upwind schemes on hexahedral and tetrahedral meshes are investigated in [24], showing that central-difference schemes with artificial viscosity are required for stability. In [22], a blended central-upwind scheme is described, activating central differences where stable and necessary for resolving large-scale vortices, while first-order upwind is used near walls or in vortex-free regions.

A survey of high-order, low-dissipation schemes is presented in [28], including variations in NVD (Normalized Variable Diagram) schemes constrained by the CBC criterion [29]. Generalizations of NVD schemes for unstructured meshes are described in [30]. While CBC ensures monotonicity, it introduces additional diffusion [29].

Analysis shows that no existing approximation scheme guarantees accuracy across all conditions. Neural-network-based deep reinforcement learning schemes appear promising. Research applying deep machine learning to CFD has increased recently [31], focusing on turbulent-flow modeling [32,33,34], accuracy improvement on coarse grids [35,36], and overall simulation optimization [37,38,39].

Reinforcement learning (RL), including Deep Reinforcement Learning (DRL), offers a framework to optimize numerical schemes by maximizing long-term accuracy metrics over simulation trajectories [40,41]. This approach can, in principle, discover new dependencies between key dynamic parameters such as the Courant number, the transported scalar gradient, and the scalar itself, enabling acceleration of simulations by increasing the admissible time step.

The novelty of this work is the use of deep reinforcement learning for local fluxes reconstruction in a conservative finite-volume solver, rather than learning global time-stepping operators as in Bar-Sinai et al. (2019) [42] and Kochkov et al. (2021) [43]. This approach preserves discrete conservation, enables stable high-CFL operation, and integrates seamlessly with standard finite-volume methods, providing a pathway for extension to complex geometries. Our goal is to increase the time step while preserving or improving solution accuracy, thereby reducing the number of integration steps and computational time. We demonstrate the method on a scalar transport problem, comparing the neural-network scheme against classical first-order (UD) and second-order (LUD) finite-volume schemes as well as the analytical solution, across a range of Courant numbers. Results show that the RL-based scheme can improve accuracy by factors of 2–50, even at higher CFL numbers.

Regarding the “Warp-DG” work [44], we share a strong interest in hybrid methods. However, our goal is to advance the classical Finite Volume Method (FVM), which remains the cornerstone of industrial CAE simulation. The proposed approach enables the integration of intelligent approximations into existing solvers without altering their core architecture, ensuring practical applicability. Therefore, our work does not compete with the Discontinuous Galerkin (DG) method but rather offers an alternative pathway for enhancing the efficiency of FVM, particularly for long-duration simulations.

This study is intentionally restricted to a one-dimensional benchmark to establish a baseline feasibility framework. Extensions to multidimensional flows and more complex transport problems are reserved for future work, as they require additional model development and training.

2. Main Equations and Discretization Schemes for Convective Flow

The simplified form of the advection equation without the diffusion term, sources, or sinks is written as follows [45]:

\frac{\partial φ}{\partial t} + \frac{\partial}{\partial x_{j}} (φ u_{j}) = 0

(1)

Discretization of Equation (1) on an arbitrary unstructured mesh is most appropriately performed using the finite volume method [46], which is optimal for the numerical solution of computational fluid dynamics problems [26,27]. A simplified view of the interaction between mesh cells is shown in Figure 1.

Here, k is the set of faces of cell P consisting of the set of internal faces $k_{i n t}$ and the set of external faces $k_{s}$ . The neighboring cell across the internal face $k_{i n t}$ is denoted as M. $S_{i, k}$ is the area vector of face k, where i is the component index of the vector. The vector drawn from the center of cell P to the center of cell M across the face $k_{i n t}$ is denoted as $d_{i, P M} = r_{i, M} - r_{i, P}$ , and the vector drawn from the center of P to the center of face $k_{s}$ as $d_{i, P k_{s}} = r_{k_{s}} - r_{P}$ , where $r_{i}$ is the radius vector. The flow through the face is denoted by f and its direction is indicated by an arrow.

Time discretization of Equation (1) (the first term) can be performed using any available scheme, which can be either explicit or implicit. Then, for the sake of simplicity, the Euler scheme will be used for spatial discretization, with time indices omitted, where their values are evident. To perform the spatial discretization of Equation (1), we integrate it over the volume of cell P and proceed to integrate the convective term over the surface (the second term in (1)):

\int_{V_{P}} \frac{ρ^{j + 1} φ^{j + 1} - ρ^{j} φ^{j}}{Δ t} d V + \oint_{S_{P}} ρ φ u_{j} d S_{j} = 0

(2)

To approximate the convective term on an arbitrary unstructured mesh using the finite-volume discretization of the governing equations, it is written as follows:

\oint_{S_{P}} ρ φ u_{j} d S_{j} \approx \sum_{k} ρ_{k} φ_{k} u_{j, k} S_{j, k} \approx \sum_{k} ρ_{k} φ_{k} F_{k}

(3)

where

F_{k}

is the volumetric flux through face k. The value of the transported quantity on the face

φ_{k}

is determined by the applied convective term discretization scheme, which will be discussed in the next section.

The numerical dissipation of the discretization scheme is affected the most by the scheme used specifically for the convective terms [24].

Choosing the optimal discretization scheme for the convective term is among the key challenges in modeling viscous incompressible fluid flows using unstructured meshes. The scheme should, on the one hand, have low dissipation, i.e., generate as little numerical diffusion as possible, and on the other hand, ensure stable computation. There are many discretization schemes applicable to arbitrary unstructured meshes [22,24,28,29,30,45,47]. Among them, several schemes can be shortlisted as having the “highest applicability rating” for solving practical problems, namely Upwind Differences (UD) [45], Linear Upwind Differences (LUD) [45], the QUICK scheme [24], Central Differences (CD) [22,28,47], the NVD (Normalized Variable Diagram) schemes [24,29,30], and hybrid schemes (combinations of the above with the upwind scheme to increase monotonicity).

These schemes differ in the way the value of the transported quantity is reconstructed onto the face, and therefore, their dissipative properties differ as well. The reconstruction algorithm for the presented schemes is approximately the same. A brief description of the most commonly used first- and second-order schemes, namely UD, LUD, and CD, will be given below.

The UD scheme is a first-order scheme proving stable on unstructured meshes while demonstrating high numerical diffusion [45]:

φ_{k, u d} = \{\begin{cases} φ_{P}, f > 0 \\ φ_{M}, f < 0 \end{cases}

(4)

Here,

φ_{k}

is the scalar value interpolated onto the common face between cells P and M.

The LUD scheme is similar to the UD scheme but uses linear interpolation to reconstruct the value onto the face [45]:

φ_{k, l u d} = \{\begin{cases} φ_{p} + \nabla φ_{P} \cdot r_{P, k}, f > 0 \\ φ_{M} + \nabla φ_{M} \cdot r_{M, k}, f < 0 \end{cases}

(5)

Here,

r_{P, k}

and

r_{M, k}

are the distances from the centers of cells P and M to the common face k. The LUD scheme also suffers from numerical diffusion, but to a much lesser extent than the UD scheme. Its main disadvantage is the occurrence of nonphysical oscillations in areas with sharp gradients.

The CD scheme is the least dissipative, but it is absolutely unstable (this is especially true for finite-volume discretization of Navier–Stokes equations on arbitrary unstructured meshes consisting of randomly shaped polyhedra, where designing a scheme with an accuracy above second order is essentially impossible) [22,28,47], as its use in the presence of large gradients leads to oscillations in the solution field. Figure 2 shows an example of how three numerical schemes differ in their numerical diffusion.

Hybrid schemes are linear combinations of high- and low-order schemes [45], which leads to increased monotonicity of the solution. A hybrid scheme, for example, for the CD scheme, can be written as follows:

φ_{k, g i b r} = γ φ_{k, c d} + (1 - γ) φ_{k, u d}

(6)

where

γ

is the blending factor. The scope of the present paper is limited to the schemes where the blending factor stays constant throughout the whole computational domain. Hybrid schemes partially mitigate the disadvantages of first- and second-order schemes but do not eliminate them in principle, and their effectiveness depends on manually selected coefficients and conditions of a specific task.

The results of calculations using the aforementioned schemes depend on both the dynamic parameters of transport and the geometric parameters of mesh cells, and the more complex the scheme, the higher the number of these parameters and their interdependencies. Taking this into account, one may attempt to construct an interpolation scheme based on a trained neural network that calculates the interpolated value based on the optimal relationship between the dynamic and geometric parameters of neighboring cells. The neural network can be trained to predict such values on the faces that converge the numerical solution of the problem to the reference one (at each computational step). The reference solution can be either an analytical solution or a numerical solution obtained using a classical discretization scheme. To accelerate the training process, the neural network can be pre-trained to determine the values on the faces based on a solution obtained using a classical approximation scheme. Here, the LUD scheme will be used.

An important aspect of the proposed approach is the preservation of the discrete conservation property inherent to the finite-volume formulation. In the present method, the neural network is not used to directly predict fluxes independently for each control volume. Instead, the network reconstructs the value of the transported scalar at the cell face, denoted as

ϕ_{f}

, based on the local information from the neighboring cells. The convective flux through the face is then computed using the standard finite-volume expression

F_{f} = (u \cdot n_{f}) ϕ_{f}

. For internal faces shared by two adjacent cells P and M, the same reconstructed face value

ϕ_{f}

is used in the flux computation for both cells. As a result, the flux contribution appears in the discrete balance equations with opposite signs,

F_{P, f} = - F_{M, f}

, which ensures that the amount of scalar leaving one control volume through a face exactly enters the neighboring control volume through the same face. Consequently, the local and global conservation properties of the finite-volume method are preserved.

From this perspective, the neural network replaces only the interpolation procedure used to estimate the scalar value at the face, while the conservative flux balance structure of the finite-volume discretization remains unchanged. This is conceptually similar to classical convective schemes such as UD or LUD, where the face value is reconstructed from neighboring cell values, but the conservative formulation of the numerical flux is retained.

To verify this property in practice, the total scalar quantity in the computational domain was monitored during the simulations. The results show that the total scalar mass remains constant up to numerical precision, confirming that the neural reconstruction does not violate the discrete conservation law.

It should be emphasized that the term high-CFL in the present work refers to the admissible Courant number range within explicit finite-volume discretizations. The proposed neural network modifies only the spatial reconstruction of the convective flux, while the time integration scheme remains explicit. Consequently, the goal of the method is not to remove CFL limitations entirely, as is often achieved with implicit time-integration schemes, but rather to extend the practical CFL range at which explicit convective discretizations maintain acceptable accuracy and stability.

Since the neural network acts as a face-value reconstruction operator, similar to classical schemes such as UD or LUD, it is in principle independent of the choice of time-integration method. Therefore, the proposed reconstruction could potentially be incorporated into both explicit and implicit finite-volume solvers, although the present study focuses on the explicit case as a proof-of-concept.

3. Constructing the Convective Scheme Using Deep Reinforcement Learning

According to (4) and (5), the following data are involved in interpolation using a neural network in convective term approximation schemes: scalar values (

φ_{P}

and

φ_{M}

) in neighboring cells P and M, respectively, as well as the gradient of the scalar quantity

g r a d (φ_{M}) \ g r a d (φ_{P})

depending on the flow direction.

In addition to the parameters mentioned above, the accuracy of convective term discretization depends on the Courant number. Therefore, Equations (4) and (5) can be refined by introducing additional parameters, namely the time step Δt and the Courant number CFL in cells P and M, respectively. Although Δt and CFL are mathematically related through the local flow velocity and cell size, in the present study both parameters were provided separately as inputs to the neural network. This choice was made as an initial experimental simplification to allow the network to independently learn the sensitivity of the convective term to both the absolute time step and the non-dimensional Courant number. In other words, this setup enables the network to capture potential nonlinear interactions between the step size and CFL that may affect stability and accuracy.

We acknowledge that a more rigorous formulation could use only one of these parameters, computing the other internally; however, the current approach served as a practical proof-of-concept and facilitated faster convergence during early-stage training. Future work will refine the input parameterization to reduce redundancy and improve theoretical consistency while preserving the observed generalization and performance gains.

Thus, based on the defined input data, the number of input neurons in the first layer of the neural network will be six. The neural network predicts a single value—the scalar value on the face

φ_{k}

, so the output layer will contain only one neuron (see Figure 3).

According to theory [48], a network with one hidden layer, where neuron values are computed as values of a continuously differentiable function, can approximate any dependency. In addition, a deep architecture is not always better than a single-layer one because complicating the model can lead to overfitting, deterioration of generalizing ability, and excessive computational costs, whereas for a simple dependency, a single-layer network is often enough for a more stable and accurate result. Therefore, to simplify the structure, we will use a neural network with one hidden layer. The number of neurons in the hidden layer should be greater than the number of input neurons and several times smaller than the number of training dataset examples [48] (to answer the implied question, the training dataset will contain about 2250 examples).

During numerical experiments, networks with different numbers of neurons were tested, starting from 10 neurons up to 25, with a step of 5 neurons. It was shown that networks with 20 or more neurons have acceptable training accuracy. Therefore, for further numerical experiments, we will use a network consisting of 20 neurons.

Thus, the neural network used is a perceptron (6; 20; 1) with one hidden layer [49]—6 neurons in the input layer, 20 neurons in the hidden layer, and 1 neuron in the output layer.

It is believed that a neural network using the activation function sin(x) is better at extrapolating data beyond the training range [50,51].

A standard Multilayer Perceptron (MLP) requires supervised learning, which necessitates a pre-existing dataset of “input parameters → target flux values.” However, in the problem we address, such reference data are a priori unavailable, as there is no single “correct” value for the convective flux at an arbitrary boundary face. The objective of our work is not to approximate a pre-defined function but to find an optimal discretization scheme that ensures the physical consistency of the solution throughout the entire simulation. This constitutes a policy optimization problem, which can only be solved using reinforcement learning methods. These methods allow for the maximization of a long-term reward, such as numerical stability and accuracy at large time steps. We employ reinforcement learning to train the neural network, leveraging the fact that the values of the passive scalar in the grid cells are known, which enables the formulation of a scoring function.

Reinforcement learning is performed using the Deep Deterministic Policy Gradient (DDPG) algorithm [49,52] and, separately, the genetic algorithm (GA) [53]. In both methods, the trained neural networks represented as an individual in the genetic algorithm and as an actor in the DDPG algorithm are the same model.

The training incorporates simulation of a series of problems with different characteristic CFL numbers.

To evaluate the suitability of the model, the following function is introduced, which is suitable for both implemented learning algorithms:

R (a g e n t) = \sum_{Δ t} \sum_{t = 0_{+ + Δ t}}^{T} \sum_{k = 0}^{n} r e w a r d (k, t)

(7)

Here, t is the simulation time, k is the numerical designation of the internal face, n is the number of internal faces, T is the final simulation time, and

Δ t

is the time step.

To form local rewards, pointwise convergence of the simulated scalar values

φ

in the cells with the corresponding reference values

φ^{*}

is considered. The reward should increase as the absolute difference

|φ - φ^{*}|

grows smaller, so this difference emerges with a negative sign

- |φ - φ^{*}|

.

The face, for which the neural network determines

φ_{k}

, is shared by two cells, P and M, so the average difference from these two cells is calculated as:

- \frac{1}{2} \cdot (|φ_{M} - φ_{M}^{*}| + |φ_{P} - φ_{P}^{*}|),

where

φ_{P}

and

φ_{M}

are the values in cells P and M, respectively.

Thus, the reward function

r e w a r d (k, t)

for internal face k at time t is calculated as follows:

r e w a r d (k, t) = - \frac{1}{2} [|φ_{M} (k, t) - φ_{M}^{*} (k, t)| + |φ_{P} (k, t) - φ_{P}^{*} (k, t)|]

(8)

As the neural network is trained, the value of the reward function

R (a g e n t)

should increase and tend toward 0, and the numerical solution of the problem using the neural scheme will approach the desired

φ^{*} (t)

.

The implementation of the neural network class and teaching methods were written by us in the C++ programming language.

Problem Description

Let us consider the advection problem for scalar quantity

φ

transported by a constant velocity field

u_{0} = 1 m / s

along a one-dimensional channel with length

L = 101 m

. The initial distribution of

φ

in the channel is described by a function

φ (x) = \sin (0.5 \cdot x)

(see Figure 4).

Advection Equation (1) with the passive scalar quantity

φ

defined as shown above, there exists an analytical solution:

φ (x, t) = φ_{0} \sin (γ \cdot x - ω \cdot t)

(9)

Here,

γ = \frac{2 π}{λ} = 0.5

is the wave number,

ω = \frac{2 π \cdot u_{0}}{λ} = 0.5

is the angular frequency, and

φ_{0} = 1

is the amplitude. The computational domain is divided into cells sized

Δ x \approx 0.08 λ \approx 1 m

.

Let us introduce the variable

t_{0}

, after which the scalar distribution shifts by one cell along the direction of velocity

u_{0}

, i.e.,

t_{0} = \frac{Δ x}{u_{0}}

.

The training of the artificial neural network, which is used as the convective term approximation scheme for solving Equation (1), was carried out by simulating the problem up until the time

T_{t r a i n} = 5 \cdot t_{0}

. Training was also performed at different Courant numbers

C F L = u_{0} \cdot \frac{Δ t}{Δ x}

(i.e., for different time steps of the problem) ranging from 0.01 to 0.5 with a step of 0.01.

The following sections present a comparative study of the numerical solution of the problem using the neural network as the approximation scheme, in comparison with the first-order accuracy scheme UD and the second-order scheme LUD. The analysis is carried out for various parameters

(u_{0}, C F L, Δ x, Δ t, φ_{0})

falling outside the range of the training data.

Notably, the parameter

T_{v a l i d a t i o n}

indicating the scalar transport time, at which the spatial distribution is evaluated, exceeds the value used during the neural network training and equals

T_{v a l i d a t i o n} = 3 T_{t r a i n} = 15 t_{0}

seconds. This allows us to assess the stability and adaptability of the proposed approach when solving problems with extended temporal characteristics.

4. Numerical Experiments

This chapter presents the results of a comprehensive study on the effectiveness of “neural” schemes for approximating the convective term in the scalar transport equation. The focus is on comparing different approaches to training artificial neural networks: the genetic algorithm and the deep reinforcement learning algorithm DDPG, as well as evaluating the accuracy of the obtained solutions.

A series of numerical experiments is conducted to assess the potential of the neural network approach. A key component is the verification against an analytical solution, enabling a quantitative evaluation of the “neural” scheme’s accuracy against the first-order UD and second-order LUD schemes. The experiments further analyze the generalization ability and robustness of the trained networks to input parameter variations. The obtained results provide a foundation for conclusions on the scheme’s prospects and help identify directions for further research.

4.1. Genetic Algorithm

The parameters of the algorithm are as follows:

Each generation consists of a population of 20 individuals;
The selection operator chooses the top 5 individuals of the current generation and also retains the top 5 individuals of all time;
Crossover occurs between the top 5 individuals of the current generation and the top 5 of all time;
Mutation is applied to every weight of all individuals, where each weight is modified by adding a random number drawn from a normal distribution. The standard deviation of the distribution decreases by a factor of 1.01 each time the fitness of a new individual differs by 10% or more from the best fitness of all time and increases by a factor of 1.01 otherwise.

The training process of the neural network continues until a specified accuracy is reached or the maximum number of generations is exceeded. A characteristic feature is that as the target value of the fitness function R(GA_agent) = 0 (7) is approached, the magnitude of changes in the neural network parameters decreases, which slows down the training process. In the context of the considered problem, where the artificial neural network is used as a convective term approximation scheme for solving Equation (1), training was stopped after 5000 generations, since the value of the loss function during further training remained within ± 1% of its average value taken over the last 10 generations (see Figure 5). This stopping criterion is based on achieving an acceptable balance between solution accuracy and computational cost.

The Learning plot in Figure 5 shows the dependence of the fitness value obtained by neural network R(GA_agent) on the current step of the genetic algorithm, with the fitting value obtained by the best neural network displayed for all the steps of the genetic algorithm preceding the current one. The fitness function (7) of the best agent reached the value of R(GA_agent) = −878.

Similarly, the Validation plot shows the fitness value of the best neural network from the Learning plot obtained by solving problems with Courant number values CFL = 0.5, 0.6, 0.7, 0.8, 0.9 (the ones previously not utilized in the training process). The value CFL = 0.5 was also used due to its large magnitude and boundary nature in the training dataset. This plot reflects the degree of overfitting of the neural network [48,49], i.e., the phenomenon of the trained neural network losing its generalization ability [48,54] and being unable to perform approximation on the data outside the training set.

The third plot, Learning + Validation, obtained by aggregating the first two, is also presented. This final plot was used to determine the most optimal neural network in terms of performance.

The fitness function (7) of the most optimal agent reached a value of R(GA_agent) = −1326 on the Learning plot (according to Equation (8)) at the 739th step of genetic algorithm training. Considering that each generation includes 20 agents (neural networks) and each agent solves the problem

\sum_{Δ t = {0.01}_{+ + 0.01}}^{0.50} i = 50

times per generation, the total number of problem solutions is 739 × 20 × 50 = 739,000 times.

For comparison, if we evaluate the solution obtained using the LUD scheme, the fitness function for this approximation scheme is approximately R(LUD_agent) = −11,000. Neural network training slows down as the fitness function (7) increases, while the number of problem solutions grows.

The time required to solve the problem using classical convective term approximation schemes in the numerical experiment coincides with the time required to solve the same problem using the neural network-based approximation scheme.

However, training the neural network takes about 7 h on a single Intel Core i7@2.50 GHz CPU (Intel Corporation, Santa Clara, CA, USA). This significant time cost is due to the nature of the genetic algorithm, in which the training time of the neural network is orders of magnitude greater than the time required to solve the original problem, which turns out to be the primary shortcoming of the method.

The neural scheme was implemented in C++ (in-house code) and compiled using Microsoft Visual Studio 2022. The choice of a CPU-based implementation was deliberate and motivated by the need to ensure compatibility with widely used industrial CFD software CFD software packages, such as ANSYS Fluent, LOGOS-Aerohydro, STAR-CCM+, and OpenFOAM, which predominantly rely on CPU-based high-performance computing architectures. Although GPU acceleration has recently been introduced in some of these tools, CPU-based implementations remain the de facto standard in industrial and legacy CFD workflows. Since our goal is the eventual integration of this method into existing production-level solvers that lack native GPU support, it was methodologically essential to develop and validate our approach within the same computational environment. This ensures that the performance gains we demonstrate are directly relevant and transferable to real-world engineering applications.

4.2. DDPG Training

As a deterministic policy gradient algorithm, DDPG learns a policy that maps states to specific, precise actions from a continuous space. This contrasts with stochastic policy methods (e.g., PPO), which learn a probability distribution over actions. DDPG combines ideas from value-based (DQL [49,55]) and policy-based (DPG [49,56]) methods, belonging to the actor-critic class [49].

The agent’s action is selected by an artificial neural network called the actor. The actor follows a policy-based approach and learns to act by directly estimating the optimal policy and maximizing the reward through gradient ascent.

On the other hand, the chosen action can be evaluated by a second neural network—the critic. The critic uses a value-based approach and learns to assess the value of different state-action pairs.

As a result of combining the actor and critic, we use two separate neural networks. The role of the actor network is to determine the optimal action in a given state. The critic, by evaluating the expected return, assesses the action generated by the actor.

The algorithm works as follows:

The actor performs actions (the neural network predicts scalar values on internal faces);
The problem is solved using the selected actions (step 1 is repeated at each time step of the simulation);
Based on the complete solution of the problem, the actor’s actions are evaluated (the actor’s actions are rewarded, and evaluation functions Q(a,s) are derived based on the rewards for each action a selected in state s at the first time step; the overall solution evaluation is performed as well);
Based on the evaluations obtained, the critic network is trained via supervised learning, performing one step of gradient descent [48];
The gradient of the critic’s evaluation function with respect to the action is calculated. This gradient is used to train the actor network by performing one step of gradient ascent (in the direction that increases the evaluation function of the action);
The DDPG algorithm cycle (steps 1–5) is repeated until the optimal solution is obtained.

The content of Figure 6 is as follows: the environment state s (a vector of input parameters) is fed into both the actor and critic networks. The actor network outputs an action a, which, together with s, is input into the critic network. The critic then outputs a value Q(a,s), evaluating the choice of action a for the environment state s.

The idea of DDPG training is that the critic learns to predict the evaluation of different actions for different environment states, i.e., it turns into a continuous function with an extreme value corresponding to the highest evaluation for each state. Since Q(a,s) depends on a, gradient ascent can be used to find such actions a that maximize Q(a,s). Then, knowing the correct actions a, it is easy to train the actor network to output these same actions a.

The direct training of the critic is performed by using the fitness functions R(agent) defined earlier as reference evaluations Q*(a,s), which we want the critic to learn. Although the same evaluation Q*(a,s) = R(agent) is used for different actions and states, this does not affect the training process, since all these actions and states correspond to the same configuration of the actor network.

Based on the information available regarding the face (6 input neurons), which represents the environment state s, the desired neural network determines the scalar value on that face. The same information along with the predicted value is used by the critic network to produce an evaluation of that prediction. Thus, the critic represents the dependence of the actor network configuration (its weights), while also acting as the criterion for the correct problem solution.

This method demonstrates high sensitivity to the fine-tuning of algorithm parameters. Here, the effectiveness of training largely depends on the quality of the critic’s approximation of the dependence of the evaluation function on action, since the gradient used to guide the actor toward the optimal strategy is derived from this exact dependence. Thus, one should recognize a significant contribution of the exploration noise parameter, which is a relatively small random value added to the actor’s prediction (not exceeding 10% of said prediction value).

It was not possible to achieve optimal tuning of the algorithm parameters during the timeframe of the present paper, which led to slower training of the actor network. As a result, the actor failed to reach the optimal evaluation R(GA_agent) = −1326 achieved by the genetic algorithm.

Despite the mentioned difficulties, the DDPG training process showed a positive trend in terms of the neural network parameter updates as illustrated in Figure 7. The total training duration was about 2000 generations, with the problem being solved 50 times per generation considering time steps, resulting in approximately 2000 × 50 = 100,000 problem solutions. The fitness function value (7) for the actor reached R(DDPG_agent) = −4690.

One key reason is that DDPG, in its classical formulation, relies on continuous action spaces and gradient-based policy updates, which assume relatively smooth and well-behaved reward landscapes. In CFD problems, especially when optimizing discretization schemes, the reward function (e.g., accuracy of the numerical solution) is highly nonlinear and often discontinuous due to numerical instabilities and abrupt changes in solution error with small variations in the scheme parameters. This leads to poor gradient estimates and unstable training.

Additionally, DDPG requires dense feedback from the environment to propagate meaningful gradients. In our setup, computing the reward involves running a full numerical solver, which is computationally expensive and produces noisy feedback due to the discrete nature of grid resolution and time-stepping. As a result, the algorithm struggles to converge within a reasonable number of training episodes.

Finally, CFD discretization problems inherently involve global constraints (e.g., CFL conditions, stability limits) that cannot be easily encoded in the standard DDPG framework. While DDPG excels in control tasks with continuous, bounded actions, enforcing stability constraints in a purely gradient-driven manner is challenging, often resulting in actions that violate physical feasibility or lead to divergence of the numerical solution.

These limitations justify the choice of a genetic algorithm in the present study. Evolutionary methods are inherently more robust to noisy, discontinuous, and constrained objective functions, allowing effective training of the neural convective term approximation even in early proof-of-concept 1D cases. Future work will explore modifications to the DDPG algorithm that address these challenges, including hybrid gradient-evolutionary approaches and physics-informed reward shaping.

4.3. Comparison of Simulation Results with the Analytical Solution

The most effective neural network trained using the genetic algorithm was selected as the reference “neural” scheme. The training process for both methods was carried out under fixed parameters: computational domain size l = 101 m, final advection time T = 5 s, computational domain cell size Δx = 1 m, advection rate u₀ = 1 m/s, amplitude φ₀ = 1, and Courant number values CFL [0.01, 0.5] with a step of 0.01. To ensure the reliability of the results, the experimental verification process included testing the performance of the “neural” scheme under various values of these parameters as follows:

T = 15 s, Δx = 0.5 m, 1.0 m, and 1.98 m, CFL = 0.001, 0.01, 0.1, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, u₀ = 0.5 m/s, 1 m/s,
and 2 m/s, φ₀ = 0.5, 1.0, 2.0.

The analytical solution (9) is compared with the results obtained using the LUD scheme, the UD scheme, and the “neural” scheme. To illustrate the features of scalar quantity distribution along the one-dimensional channel, graphical dependencies are presented (see Figure 8) for various Courant numbers: CFL = 0.01, 0.1, 0.5, and 0.7 under fixed parameters Δx = 1 m, u₀ = 1 m/s, and φ₀ = 1.0. Since the scalar distribution is periodic, the plots only show the distribution near the center of the channel over a width equal to the wavelength of the distribution λ.

An analysis of the graphs shows that the neural scheme yields the best results for Courant numbers above approximately 0.5, as reflected in all tables starting from Table 1. The LUD scheme shows noticeable error, with the solutions starting to diverge significantly from the analytical solution at CFL ≥ 0.1, as demonstrated in Table 2. The UD scheme works adequately for the stated problem only around CFL = 1.0 due to the explicit nature of the solution but diverges significantly at larger or smaller CFL values, so it is excluded from further comparison. The neural network solution also starts diverging at CFL ≥ 0.5 but does so several times as slowly as the LUD scheme, as demonstrated in Table 1, Table 2 and Table 3.

To quantify the accuracy of the solutions, the integral error for each approximation method was calculated using Equation (10).

The results presented are obtained at fixed flow velocity u₀ = 1 m/s and oscillation amplitude φ₀ = 1 and are summarized in Table 1, Table 2 and Table 3.

Δ φ = \sum_{i}^{n} |\frac{φ_{i} - φ_{i}^{*}}{n \cdot φ_{0}}|,

(10)

where n is the number of cells.

What is of interest here is the behavior of the neural scheme at higher Courant number values, where the LUD scheme fails with an error Δφ above 10%. This value range for the discussed problem is CFL > 0.1.

The analysis of the results presented in Table 1, Table 2 and Table 3 shows that the neural network-based solutions converge more closely to the analytical solution than the LUD scheme at higher Courant numbers.

However, from Table 1 it is clear that the LUD scheme demonstrates a smaller integral error Δφ in Figure 9 compared to the neural scheme on finer grids and at lower Courant numbers:

Although the LUD scheme shows better accuracy in some cases at low CFLs (particularly, on finer meshes not used in training), this is not critical, as the primary goal of the neural network is to improve simulation efficiency. The neural scheme makes it possible to maintain solution accuracy at higher CFLs, enabling simulations with larger time steps or coarser meshes. The possibility of using reinforcement learning to improve scalar transport modeling accuracy is confirmed by the obtained neural scheme, which can be considered an efficient implementation of this concept as supported by the data in Table 1, Table 2 and Table 3.

After that, it seems reasonable to investigate the neural network’s responsiveness to flow velocity and oscillation amplitude values falling outside the training data range.

Let us first obtain the results for the varying amplitude values

φ_{0} [0.5, 2.0]

at the parameter settings as follows: Δx = 1 m, u₀ = 1 m/s,

C F L [0.1, 0.5, 0.7, 0.9]

. Here, the results for

φ_{0} = 1.0

can be seen in Table 2 above.

It can be seen that increasing the amplitude of the scalar quantity distribution leads to higher integral errors in the neural scheme. The amplitude variations cause changes to scalar values in the cells and the respective gradients acting as inputs for the neural network. Since these values differ significantly from the ones used in training, the neural network’s response to these values turns out to be unpredictable, which is to be expected. However, it can be seen from Table 4, Table 5 and Table 6 that the convergence is maintained upon reducing the CFLs, which means that the generalization capability has still developed. One may attempt to alleviate this shortcoming via normalization of scalar distribution values.

Let us then obtain the results for the varying advection rates

u_{0} [0 . 5, 1 . 0, 2 . 0]

at the parameter settings as follows: Δx = 1 m,

φ_{0} = 1

,

C F L [0.1, 0 . 5, 0 . 7, 0 . 9]

.

It can be seen from Table 7 that the neural network-based scheme still outperforms LUD in terms of approximation. However, we can also see that the error increases when the simulation is performed with the data falling outside the training range used for neural network fine-tuning. Still, the analysis of the results shows that the neural network-based convection term approximation scheme demonstrates generalization ability even with these datasets.

Notably, the neural scheme ensures higher accuracy compared to the conventional method at higher Courant numbers (CFL > 0.1), even with flow velocity and oscillation amplitude deviating significantly from the training data ranges.

A formal Fourier or von Neumann analysis was not performed in the present study because the neural reconstruction introduces a nonlinear and state-dependent discretization operator. Instead, dispersive effects were evaluated qualitatively through long-time advection tests of periodic scalar distributions.

5. Conclusions

The goal of this paper was to investigate the feasibility of using neural networks, trained via reinforcement learning, to improve the numerical solution of the scalar transport equation using the finite-volume method. Unlike prior approaches focused on coarse-grid accuracy, our method aims to increase the time step while preserving or improving solution accuracy, thereby reducing the total number of integration steps and accelerating computations.

Experimental results show that the neural-network-based convective term approximation outperforms the classical second-order LUD scheme in both accuracy and computational efficiency. Specifically, for CFL numbers above 0.1, the neural scheme maintains high accuracy, while LUD requires much smaller time steps to achieve comparable results. In some cases, the neural scheme at CFL = 0.5 achieves accuracy that LUD cannot reach even at CFL = 0.001, demonstrating a potential speedup of orders of magnitude.

Among the reinforcement learning strategies explored, the genetic algorithm was effective, whereas the classical DDPG formulation did not converge to satisfactory solutions. This limitation is likely due to the sparse and delayed reward structure of the convective term optimization problem, which challenges standard actor-critic methods and necessitates further investigation into RL design and hyperparameter tuning.

Overall, this study establishes a proof of concept for neural-network-based convective term approximation, demonstrating that combining RL-trained neural networks with classical finite-volume schemes can enable larger time steps, improve accuracy, and accelerate CFD simulations. Future work will focus on extending the approach to higher-dimensional problems and optimizing both training and inference efficiency.

Author Contributions

Conceptualization, V.K. and A.R.; methodology, V.K. and A.R.; software, A.R.; validation, A.R., V.K. and A.K.; formal analysis, M.S.; investigation, A.R.; resources, A.K.; data curation, A.R.; writing—original draft preparation, A.R., A.K. and M.S.; writing—review and editing, A.R.; visualization, A.R.; supervision, A.K.; project administration, V.K.; funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Higher Education of the Russian Federation (project No. FSWE-2024-0001 (research topic: «Developing numerical methods, models and algorithms to describe liquid and gas flows in natural environment, and in the context of industrial objects’ operation in standard and emergency conditions on mainframes with exa- and zeta capacity»)). The work of A. Kozelkov and M. Shishlenin was supported by the Russian Science Foundation, project 25-61-00027 “Wave tomography: supercomputer modeling, machine learning and experiment”.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Matsuo, Y.; LeCun, Y.; Sahani, M.; Precup, D.; Silver, D.; Sugiyama, M.; Uchibe, E.; Morimoto, J. Deep learning, reinforcement learning, and world models. Neural Netw. 2022, 152, 267–275. [Google Scholar] [CrossRef]
Eivazi, H.; Tahani, M.; Schlatter, P.; Vinuesa, R. Physics-informed neural networks for solving Reynolds-averaged Navier–stokes equations. Phys. Fluids 2022, 34, 075117. [Google Scholar] [CrossRef]
Xu, Q.; Zhuang, Z.; Pan, Y.; Wen, B. Super-resolution reconstruction of turbulent flows with a transformer-based deep learning framework. Phys. Fluids 2023, 35, 055130. [Google Scholar] [CrossRef]
Yousif, M.Z.; Yu, L.; Hoyas, S.; Vinuesa, R.; Lim, H. A deep-learning approach for reconstructing 3D turbulent flows from 2D observation data. Sci. Rep. 2023, 13, 2529. [Google Scholar] [CrossRef] [PubMed]
Shishlenin, M.; Kozelkov, A.; Novikov, N. Nonlinear Medical Ultrasound Tomography: 3D Modeling of Sound Wave Propagation in Human Tissues. Mathematics 2024, 12, 212. [Google Scholar] [CrossRef]
Korotkov, A.V.; Kozelkov, A.S.; Kurulin, V.V.; Shishlenin, M.A. Applying A Synthetic Turbulence Generator to An Unmatched RANS-LES Interface. J. Comput. Appl. Math. 2025, 475, 116996. [Google Scholar] [CrossRef]
Klyuchinskiy, D.V.; Novikov, N.S.; Shishlenin, M.A. CPU-time and RAM memory optimization for solving dynamic inverse problems using gradient-based approach. J. Comput. Phys. 2021, 439, 110374. [Google Scholar] [CrossRef]
Guo, X.; Li, W.; Iorio, F. Convolutional Neural Networks for Steady Flow Approximation. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16); Association for Computing Machinery: New York, NY, USA, 2016; pp. 481–490. [Google Scholar] [CrossRef]
Filali, A.; Khezzar, L.; Semmari, H.; Matar, O. Application of artificial neural network for mixed convection in a square lid-driven cavity with double vertical or horizontal oriented rectangular blocks. Int. Commun. Heat Mass Transf. 2021, 129, 105644. [Google Scholar] [CrossRef]
Zhao, B.R.; Sun, D.K.; Wu, H.; Qin, C.J.; Fei, Q.G. Physics-informed neural networks for solving inverse problems in phase field models. Neural Netw. 2025, 190, 107665. [Google Scholar] [CrossRef]
Wang, G.; Cheng, S. Can foundation language models predict fluid dynamics? Eng. Appl. Artif. Intell. 2025, 158, 111427. [Google Scholar] [CrossRef]
De Ryck, T.; Jagtap, A.D.; Mishra, S. Error estimates for physics-informed neural networks approximating the Navier–Stokes equations. IMA J. Numer. Anal. 2024, 44, 83–119. [Google Scholar] [CrossRef]
Zhang, D.; Anjum, T.; Chu, Z.; Cross, J.S.; Ji, G. Simulation of multiphase flow with thermochemical reactions: A review of computational fluid dynamics (CFD) theory to AI integration. Renew. Sustain. Energy Rev. 2025, 221, 115895. [Google Scholar] [CrossRef]
Heiland, J.; Benner, P.; Bahmani, R. Convolutional Neural Networks for Very Low-Dimensional LPV Approximations of Incompressible Navier-Stokes Equations. Front. Appl. Math. Stat. 2022, 8, 879140. [Google Scholar] [CrossRef]
Baymani, M.; Effati, S.; Niazmand, H.; Kerayechian, A. Artificial neural network method for solving the Navier–Stokes equations. Neural Comput. Appl. 2015, 26, 765–773. [Google Scholar] [CrossRef]
Raja, M.A.Z.; Mehmood, A.; Khan, A.A.; Zameer, A. Integrated intelligent computing for heat transfer and thermal radiation-based two-phase MHD nanofluid flow model. Neural Comput. Appl. 2020, 32, 2845–2877. [Google Scholar] [CrossRef]
Hijazi, S.; Freitag, M.; Landwehr, N. POD-Galerkin reduced order models and physics-informed neural networks for solving inverse problems for the Navier–Stokes equations. Adv. Model. Simul. Eng. Sci. 2023, 10, 5. [Google Scholar] [CrossRef]
Tang, H.S.; Li, L.; Grossberg, M.; Liu, Y.J.; Jia, Y.M.; Li, S.S.; Dong, W.B. An exploratory study on machine learning to couple numerical solutions of partial differential equations. Commun. Nonlinear Sci. Numer. Simul. 2021, 97, 105729. [Google Scholar] [CrossRef]
Timofeyev, I.; Schwarzmann, A.; Kuzmin, D. Application of machine learning and convex limiting to subgrid flux modeling in the shallow-water equations. Math. Comput. Simul. 2025, 238, 163–178. [Google Scholar] [CrossRef]
Bae, H.J.; Koumoutsakos, P. Scientific multi-agent reinforcement learning for wall-models of turbulent flows. Nat. Commun. 2022, 13, 1443. [Google Scholar] [CrossRef]
Shu, D.; Li, Z.; Farimani, A.B. A physics-informed diffusion model for high-fidelity flow field reconstruction. J. Comput. Phys. 2023, 478, 111972. [Google Scholar] [CrossRef]
Kozelkov, A.S.; Kurulin, V.V. Eddy resolving numerical scheme for simulation of turbulent incompressible flows. Comput. Math. Math. Phys. 2015, 55, 1255–1266. [Google Scholar] [CrossRef]
Weinman, K.A.; Valentino, M. Comparison of Hybrid RANS-LES Calculations within the Framework of Compressible and Incompressible Unstructured Solvers. In Progress in Hybrid RANS-LES Modelling; Springer: Berlin/Heidelberg, Germany, 2010; pp. 329–338. [Google Scholar]
Kozelkov, A.; Kurulin, V.; Emelyanov, V.; Tyatyushkina, E.; Volkov, K. Comparison of convective flux discretization schemes in detached-eddy simulation of turbulent flows on unstructured meshes. J. Sci. Comput. 2016, 67, 176–191. [Google Scholar]
Tyatyushkina, E.S.; Kozelkov, A.S.; Kurkin, A.A.; Kurulin, V.V.; Efremov, V.R.; Utkin, D.A. Evaluation of numerical diffusion of the finite volume method in surface wave modeling. J. Comput. Technol. 2019, 24, 106–119. [Google Scholar]
Kozelkov, A.; Kurkin, A.; Kurulin, V.; Plygunova, K.; Krutyakova, O. Validation of the LOGOS Software Package Methods for the Numerical Simulation of Cavitational Flows. Fluids 2023, 8, 104. [Google Scholar] [CrossRef]
Tyatyushkina, E.S.; Kozelkov, A.S.; Kurkin, A.A.; Pelinovsky, E.N.; Kurulin, V.V.; Plygunova, K.S.; Utkin, D.A. Verification of the LOGOS Software Package for Tsunami Simulations. Geosciences 2020, 10, 385. [Google Scholar] [CrossRef]
Mozer, D.; Kim, J.; Mansour, N.N. DNS of Turbulent Channel Flow. Phys. Fluids 1999, 11, 943–945. [Google Scholar]
Gaskell, P.H. Curvature-compensated convective-transport—SMART, A new boundedness-preserving transport algorithm. Int. J. Numer. Methods Fluids 1988, 8, 617–641. [Google Scholar]
Jasak, H.; Weller, H.G.; Gosman, A.D. High resolution NVD differencing scheme for arbitrarily unstructured meshes. Int. J. Numer. Methods Fluids 1999, 31, 431–449. [Google Scholar] [CrossRef]
Ricardo, V.; Steven, L.B. Enhancing computational fluid dynamics with machine learning. Nat. Comput. Sci. 2022, 2, 358–366. [Google Scholar] [CrossRef]
Duraisamy, K.; Iaccarino, G.; Xiao, H. Turbulence modeling in the age of data. Annu. Rev. Fluid Mech. 2019, 51, 357–377. [Google Scholar] [CrossRef]
Guastoni, L.; Güemes, A.; Ianiro, A.; Discetti, S.; Schlatter, P.; Azizpour, H.; Vinuesa, R. Convolutional-network models to predict wall-bounded turbulence from wall quantities. J. Fluid Mech. 2021, 928, A27. [Google Scholar] [CrossRef]
Ling, J.; Kurzawski, A.; Templeton, J. Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech. 2016, 807, 155–166. [Google Scholar] [CrossRef]
Gonzalez-Sieiro, J.; Pardo, D.; Nava, V.; Calo, V.M.; Towara, M. Reducing spatial discretization error on coarse CFD simulations using an openFOAM-embedded deep learning framework. Eng. Comput. 2025, 41, 1699–1720. [Google Scholar] [CrossRef]
Zhuang, J.; Kochkov, D.; Bar-Sinai, Y.; Brenner, M.P.; Hoyer, S. Learned discretizations for passive scalar advection in a two-dimensional turbulent flow. Phys. Rev. Fluids 2021, 6, 064605. [Google Scholar] [CrossRef]
Illarramendi, E.A.; Alguacil, A.; Bauerheim, M.; Misdariis, A.; Cuenot, B.; Benazera, E. Towards a hybrid computational strategy based on deep learning for incompressible flows. In Proceedings of the AIAA Aviation Forum, Virtual, 15–19 June 2020. [Google Scholar]
Jeon, J.; Lee, J.; Vinuesa, R.; Kim, S.J. Residual-based physics-informed transfer learning: A hybrid method for accelerating long-term cfd simulations via deep learning. Int. J. Heat Mass Transf. 2024, 220, 124900. [Google Scholar] [CrossRef]
Obiols-Sales, O.; Vishnu, A.; Malaya, N.; Chandramowliswharan, A. CFDNet: A deep learning-based accelerator for fluid simulations. In Proceedings of the 34th ACM International Conference on Supercomputing; Association for Computing Machinery: New York, NY, USA, 2020; pp. 1–12. [Google Scholar]
Kim, J.; Kim, H.; Kim, J.; Lee, C. Deep reinforcement learning for large-eddy simulation modeling in wall-bounded turbulence. Phys. Fluids 2022, 34, 105132. [Google Scholar] [CrossRef]
Watkins, C.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Bar-Sinai, Y.; Hoyer, S.; Hickey, J.; Brenner, M.P. Learning data-driven discretizations for partial differential equations. Proc. Natl. Acad. Sci. USA 2019, 116, 15344–15349. [Google Scholar] [CrossRef]
Kochkov, D.; Smith, J.A.; Alieva, A.; Wang, Q.; Brenner, M.P.; Hoyer, S. Machine learning–accelerated computational fluid dynamics. Proc. Natl. Acad. Sci. USA 2021, 118, e2101784118. [Google Scholar] [CrossRef] [PubMed]
Somasekharan, N.; Pan, S. Warp-DG: A Differentiable Discontinuous Galerkin Solver for Compressible Flows. In Proceedings of the 76th Annual Meeting of the APS Division of Fluid Dynamics, Washington, DC, USA, 19–21 November 2023; Available online: https://meetings.aps.org/Meeting/DFD23/Session/J15.4 (accessed on 14 February 2026).
Jasak, H. Error Analysis and Estimation for the Finite Volume Method with Applications to Fluid Flows. Doctoral Thesis, Imperial College of Science, London, UK, 1996. [Google Scholar]
Hirt, C.W.; Nichols, B.D. Volume of fluid (VOF) method for the dynamics of free boundaries. J. Comput. Phys. 1981, 39, 201–225. [Google Scholar] [CrossRef]
Leonard, B.P. A stable and accurate convective modeling procedure based on quadratic upstream interpolation. Comput. Methods Appl. Mech. Eng. 1979, 19, 59–98. [Google Scholar] [CrossRef]
Yasnitskiy, L.N. Introduction to Artificial Intelligence: A Textbook for Students of Higher Educational Institutions, 2nd ed.; Publishing Center Academy: London, UK, 2008; 176p. [Google Scholar]
Aurelien, G. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017; 688p. [Google Scholar]
Parascandol, G.; Huttunen, H.; Virtanen, T. Taming the waves: Sine as activation function in deep neural networks. In Proceedings of the ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
Sitzmann, V.; Julien Martel, N.P.; Bergman, A.W.; Lindell, D.B.; Wetzstein, G. Implicit neural representations with periodic activation functions. In Neural Information Processing Systems 33 (NeurIPS 2020); NeurIPS: San Diego, CA, USA, 2020. [Google Scholar]
Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic Policy Gradient Algorithms. In ICML-14 Proceedings of the 31st International Conference on International Conference on Machine Learning; JMLR: New York, NY, USA, 2014; pp. 387–395. [Google Scholar]
Yemelyanov, V.V.; Kureichik, V.V.; Kureichik, V.M. Theory and Practice of Evolutionary Modeling; Fizmatlit: Moscow, Russia, 2003; 432p, ISBN 5-9221-0337-7. [Google Scholar]
Rosenblatt, F. Principles of Neurodynamics: Perceptrons and Theory of Brain Mechanisms; Mir: Moscow, Russia, 1965; 480p. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.I.; Abbeel, P. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning; JMLR: New York, NY, USA, 2015; Volume 37, pp. 1889–1897. [Google Scholar]

Figure 1. Interaction of two neighboring cells of the computational mesh.

Figure 2. Examples of UD and CD schemes being used.

Figure 3. The neural network used.

Figure 4. Initial distribution of the scalar quantity φ in the channel.

Figure 5. Neural network training plots under the genetic algorithm.

Figure 6. Diagrams of the two neural networks: actor and critic.

Figure 7. Neural network training plot using the DDPG algorithm.

Figure 8. Plots showing φ(x,t) distribution at Δx = 1 m, u₀ = 1 m/s, and φ₀ = 1. (Analytical solution (Analitic), neural network (NN), as well as LUD and UD schemes).

Figure 9. Plots showing φ(x,t) distribution at Δx = 0.5 m, u₀ = 1 m/s, φ₀ = 1, and CEL = 0.001 for LUD and neural schemes.

Table 1. Integral errors for the solutions obtained using different schemes given the characteristic cell size Δx = 0.5 m at u₀ = 1 m/s and φ₀ = 1.

Δx = 0.5 m
Courant Number	Δφ Neural Scheme	Δφ LUD
0.001	0.15	0.07
0.01	0.15	0.07
0.1	0.16	0.09
0.5	0.17	0.36
0.6	0.19	0.47
0.7	0.23	0.63
0.8	0.33	0.88
0.9	0.54	1.30
0.95	0.69	1.51

Table 2. Integral errors for the solutions obtained using different schemes given the characteristic cell size Δx = 1 m at u₀ = 1 m/s and φ₀ = 1.

Δx = 1 m
Courant Number	Δφ Neural Scheme	Δφ LUD
0.001	0.03	0.18
0.01	0.03	0.18
0.1	0.02	0.25
0.5	0.02	0.90
0.6	0.08	1.13
0.7	0.27	1.43
0.8	0.54	1.77
0.9	1.22	2.17
0.95	1.72	2.36

Table 3. Integral errors for the solutions obtained using different schemes given the characteristic cell size Δx = 1.98 m at u₀ = 1 m/s and φ₀ = 1.

Δx = 1.98 m
Courant Number	Δφ Neural Scheme	Δφ LUD
0.001	0.51	0.68
0.01	0.51	0.68
0.1	0.50	0.79
0.5	0.15	2.01
0.6	0.22	2.57
0.7	0.97	3.17
0.8	2.19	3.86
0.9	3.35	4.63
0.95	3.74	4.61

Table 4. Integral errors for the solutions obtained using different schemes given the oscillation amplitude φ₀ = 0.5 at Δx = 1 m and u₀ = 1 m/s.

φ₀ = 0.5
Courant Number	Δφ Neural Scheme	Δφ LUD
0.1	0.03	0.25
0.5	0.03	0.90
0.7	0.23	1.43
0.9	0.90	2.17

Table 5. Integral errors for the solutions obtained using different schemes given the oscillation amplitude φ₀ = 2.0 at Δx = 1 m and u₀ = 1 m/s.

φ₀ = 2.0
Courant Number	Δφ Neural Scheme	Δφ LUD
0.1	0.21	0.25
0.5	0.27	0.90
0.7	0.70	1.43
0.9	2.13	2.17

Table 6. Integral errors for the solutions obtained using different schemes given the oscillation amplitude values φ₀ = 0.5, φ₀ = 1.0, and φ₀ = 2.0 at Δx = 1 m and u₀ = 1 m/s.

Courant Number	Δφ Neural Scheme			Δφ LUD
Courant Number	φ₀ = 0.5	φ₀ = 1.0	φ₀ = 2.0	φ₀ = 0.5	φ₀ = 1.0	φ₀ = 2.0
0.1	0.03	0.02	0.21	0.25	0.25	0.25
0.5	0.03	0.02	0.27	0.90	0.90	0.90
0.7	0.23	0.27	0.70	1.43	1.43	1.43
0.9	0.90	1.22	2.13	2.16	2.16	2.16

Table 7. Integral errors for the solutions obtained using different schemes given the oscillation parameter settings as follows: Δx = 1 m, φ₀ = 1., and u₀ · T = 15 m, where T variation is inversely proportional to flow velocity.

Courant Number	Δφ Neural Scheme			Δφ LUD
Courant Number	$u_{0} = 0.5 m / s$	$u_{0} = 1.0 m / s$	$u_{0} = 2.0 m / s$	$u_{0} = 0.5 m / s$	$u_{0} = 1.0 m / s$	$u_{0} = 2.0 m / s$
0.1	0.03	0.02	0.03	0.25	0.25	0.25
0.5	0.10	0.02	0.08	0.37	0.90	0.90
0.7	0.56	0.27	0.20	1.43	1.43	1.43
0.9	2.26	1.22	0.80	2.17	2.17	2.17

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rozhkov, A.; Kozelkov, A.; Kurulin, V.; Shishlenin, M. Reinforcement-Learning-Based Optimization of Convective Fluxes for High-CFL Finite-Volume Schemes. Computation 2026, 14, 75. https://doi.org/10.3390/computation14040075

AMA Style

Rozhkov A, Kozelkov A, Kurulin V, Shishlenin M. Reinforcement-Learning-Based Optimization of Convective Fluxes for High-CFL Finite-Volume Schemes. Computation. 2026; 14(4):75. https://doi.org/10.3390/computation14040075

Chicago/Turabian Style

Rozhkov, Andrey, Andrey Kozelkov, Vadim Kurulin, and Maxim Shishlenin. 2026. "Reinforcement-Learning-Based Optimization of Convective Fluxes for High-CFL Finite-Volume Schemes" Computation 14, no. 4: 75. https://doi.org/10.3390/computation14040075

APA Style

Rozhkov, A., Kozelkov, A., Kurulin, V., & Shishlenin, M. (2026). Reinforcement-Learning-Based Optimization of Convective Fluxes for High-CFL Finite-Volume Schemes. Computation, 14(4), 75. https://doi.org/10.3390/computation14040075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reinforcement-Learning-Based Optimization of Convective Fluxes for High-CFL Finite-Volume Schemes

Abstract

1. Introduction

2. Main Equations and Discretization Schemes for Convective Flow

3. Constructing the Convective Scheme Using Deep Reinforcement Learning

Problem Description

4. Numerical Experiments

4.1. Genetic Algorithm

4.2. DDPG Training

4.3. Comparison of Simulation Results with the Analytical Solution

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI